Nvidia GPU

GPU consumption is measured in GPU-seconds. It's essential to understand that a GPU-second refers to one second of one GPU allocation, not its utilization. This is distinct from burstable CPU or memory where consumption relates to actual usage. GPU-seconds are not included in the Resource Packages and always incur extra charges.

At the moment, Puzl supports NVIDIA A100 (40GB) GPUs only. The amount of GPU available for the request is defined by each Resource Package.

How to Request Nvidia GPU

To leverage GPUs for your pipeline jobs in Puzl, you need to specify the KUBERNETES_GPU_REQUEST variable in your .gitlab-ci.yml:

variables:
  KUBERNETES_GPU_REQUEST: 2

How to Distribute GPUs Across Containers

Puzl allows you to distribute GPU resources across multiple containers within a single pipeline job. By default, every container in a job will have access to all the requested GPUs. However, you can restrict or specify which GPUs are visible to each container using the NVIDIA_VISIBLE_DEVICES variable.

For example, in the scenario below:

variables:
  KUBERNETES_GPU_REQUEST: 3
  NVIDIA_VISIBLE_DEVICES: "0,1"
services:
  - name: nvidia/gpu-inference
    variables:
      NVIDIA_VISIBLE_DEVICES: "2"

The main build container will have access to GPUs with indexes 0 and 1, while the nvidia/gpu-inference service container will only have access to GPU with index 2.

Ensuring Correct GPU Allocation

There are a few things to remember when distributing GPUs across containers:

If you want to ensure a container does not have access to any GPUs, set the NVIDIA_VISIBLE_DEVICES variable to an empty string "" for that container.
The sum of GPUs indicated in the NVIDIA_VISIBLE_DEVICES environment variables across all containers should match the total GPUs requested in KUBERNETES_GPU_REQUEST.