如何用Docker Compose來(lái)管理GPU資源

發(fā)布時(shí)間：2021-10-11 10:26:58 來(lái)源：億速云閱讀：329 作者：iii 欄目：編程語(yǔ)言

本篇內(nèi)容主要講解“如何用Docker Compose來(lái)管理GPU資源”，感興趣的朋友不妨來(lái)看看。本文介紹的方法操作簡(jiǎn)單快捷，實(shí)用性強(qiáng)。下面就讓小編來(lái)帶大家學(xué)習(xí)“如何用Docker Compose來(lái)管理GPU資源”吧!

在面向 AI 開(kāi)發(fā)的大趨勢(shì)下，容器化可以將環(huán)境無(wú)縫遷移，將配置環(huán)境的成本無(wú)限降低。但是，在容器中配置 CUDA 并運(yùn)行 TensorFlow 一段時(shí)間內(nèi)確實(shí)是個(gè)比較麻煩的時(shí)候，所以我們這里就介紹和使用它。

Enabling GPU access with Compose
Runtime options with Memory, CPUs, and GPUs
The Compose Specification
The Compose Specification - Deployment support
The Compose Specification - Build support

在 Compose 中使用 GPU 資源

如果我們部署 Docker 服務(wù)的的主機(jī)上正確安裝并設(shè)置了其對(duì)應(yīng)配置，且該主機(jī)上恰恰也有對(duì)應(yīng)的 GPU 顯卡，那么就可以在 Compose 中來(lái)定義和設(shè)置這些 GPU 顯卡了。

# 需要安裝的配置$ apt-get install nvidia-container-runtime

舊版本 <= 19.03

# runtime$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

新版本 >= 19.03

# with --gpus$ docker run -it --rm --gpus all ubuntu nvidia-smi# use device$ docker run -it --rm --gpus \
    device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a \
    ubuntu nvidia-smi# specific gpu$ docker run -it --rm --gpus '"device=0,2"' ubuntu nvidia-smi# set nvidia capabilities$ docker run --gpus 'all,capabilities=utility' --rm ubuntu nvidia-smi

對(duì)應(yīng) Compose 工具的老版本(v2.3)配置文件來(lái)說(shuō)的話(huà)，想要在部署的服務(wù)當(dāng)中使用 GPU 顯卡資源的話(huà)，就必須使用 runtime 參數(shù)來(lái)進(jìn)行配置才可以。雖然可以作為運(yùn)行時(shí)為容器提供 GPU 的訪問(wèn)和使用，但是在該模式下并不允許對(duì) GPU 設(shè)備的特定屬性進(jìn)行控制。

services:  test:    image: nvidia/cuda:10.2-base    command: nvidia-smi    runtime: nvidia    environment:      - NVIDIA_VISIBLE_DEVICES=all

在 Compose v1.28.0+ 的版本中，使用 Compose Specification 的配置文件寫(xiě)法，并提供了一些可以更細(xì)粒度的控制 GPU 資源的配置屬性可被使用，因此可以在啟動(dòng)的時(shí)候來(lái)精確表達(dá)我們的需求。咳咳咳，那這里我們就一起看看吧！

capabilities - 必須字段

指定需要支持的功能；可以配置多個(gè)不同功能；必須配置的字段
man 7 capabilities

deploy:  resources:    reservations:      devices:        - capabilities: ["gpu"]

count

指定需要使用的GPU數(shù)量；值為int類(lèi)型；與device_ids字段二選一

deploy:  resources:    reservations:      devices:        - capabilities: ["tpu"]          count: 2

device_ids

指定使用GPU設(shè)備ID值；與count字段二選一

deploy:  resources:    reservations:      devices:        - capabilities: ["gpu"]          device_ids: ["0", "3"]

deploy:  resources:    reservations:      devices:        - capabilities: ["gpu"]          device_ids: ["GPU-f123d1c9-26bb-df9b-1c23-4a731f61d8c7"]

driver

指定GPU設(shè)備驅(qū)動(dòng)類(lèi)型

deploy:  resources:    reservations:      devices:        - capabilities: ["nvidia-compute"]          driver: nvidia

options

指定驅(qū)動(dòng)程序的特定選項(xiàng)

deploy:  resources:    reservations:      devices:        - capabilities: ["gpu"]          driver: gpuvendor          options:            virtualization: false

咳咳咳，看也看了，說(shuō)也說(shuō)了，那我們就簡(jiǎn)單的編寫(xiě)一個(gè)示例文件，讓啟動(dòng)的 cuda 容器服務(wù)來(lái)使用一個(gè) GPU 設(shè)備資源，并運(yùn)行得到如下輸出。

services:  test:    image: nvidia/cuda:10.2-base    command: nvidia-smi    deploy:      restart_policy:        condition: on-failure        delay: 5s        max_attempts: 3        window: 120s      resources:        limits:          cpus: "0.50"          memory: 50M        reservations:          cpus: "0.25"          memory: 20M          devices:            - driver: nvidia              count: 1              capabilities: [gpu, utility]      update_config:        parallelism: 2        delay: 10s        order: stop-first

注意這里，如果設(shè)置 count: 2 的話(huà)，就會(huì)下面的輸出中看到兩塊顯卡設(shè)置的信息。如果，我們這里均未設(shè)置 count 或 device_ids 字段的話(huà)，則默認(rèn)情況下將主機(jī)上所有 GPU 一同使用。

# 前臺(tái)直接運(yùn)行$ docker-compose up
Creating network "gpu_default" with the default driver
Creating gpu_test_1 ... doneAttaching to gpu_test_1
test_1  | +-----------------------------------------------------------------------------+
test_1  | | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
test_1  | |-------------------------------+----------------------+----------------------+
test_1  | | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
test_1  | | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
test_1  | |                               |                      |               MIG M. |
test_1  | |===============================+======================+======================|
test_1  | |     Tesla T4            On   | 00000000:00:1E.0 Off |                     |
test_1  | | N/A   23C    P8     9W /  70W |      MiB / 15109MiB |      %      Default |
test_1  | |                               |                      |                  N/A |
test_1  | +-------------------------------+----------------------+----------------------+
test_1  |
test_1  | +-----------------------------------------------------------------------------+
test_1  | | Processes:                                                                  |
test_1  | |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
test_1  | |        ID   ID                                                   Usage      |
test_1  | |=============================================================================|
test_1  | |  No running processes found                                                 |
test_1  | +-----------------------------------------------------------------------------+
gpu_test_1 exited with code

當(dāng)然，如果設(shè)置了 count 或 device_ids 字段的話(huà)，就可以在容器里面的程序中使用多塊顯卡資源了?？梢酝ㄟ^(guò)以下部署配置文件來(lái)進(jìn)行驗(yàn)證和使用。

services:  test:    image: tensorflow/tensorflow:latest-gpu    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"    deploy:      resources:        reservations:          devices:            - driver: nvidia              device_ids: ["0", "3"]              capabilities: [gpu]

運(yùn)行結(jié)果，如下所示，我們可以看到兩塊顯卡均可以被使用到。

# 前臺(tái)直接運(yùn)行$ docker-compose up
...
Created TensorFlow device (/device:GPU:0 with 13970 MB memory -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5)...Created TensorFlow device (/device:GPU:1 with 13970 MB memory) -> physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
...
gpu_test_1 exited with code

如何用Docker Compose來(lái)管理GPU資源

到此，相信大家對(duì)“如何用Docker Compose來(lái)管理GPU資源”有了更深的了解，不妨來(lái)實(shí)際操作一番吧！這里是億速云網(wǎng)站，更多相關(guān)內(nèi)容可以進(jìn)入相關(guān)頻道進(jìn)行查詢(xún)，關(guān)注我們，繼續(xù)學(xué)習(xí)！

向AI問(wèn)一下細(xì)節(jié)

如何用Docker Compose來(lái)管理GPU資源

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽