华为云UCS-创建GPU应用:验证GPU虚拟化隔离能力
时间:2025-02-12 15:05:17
验证GPU虚拟化隔离能力
工作负载创建成功后,您可以尝试验证GPU虚拟化的隔离能力。
- 登录容器查看容器被分配显存总量
kubectl exec -it gpu-app -- nvidia-smi
预期输出:Wed Apr 12 07:54:59 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 || N/A 27C P0 37W / 300W | 4792MiB / 5000MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|+-----------------------------------------------------------------------------+
预期输出表明,该容器被分配显存总量为5000 MiB,实际使用了4792MiB。
- 查看所在节点的GPU显存隔离情况(在节点上执行)。
export PATH=$PATH:/usr/local/nvidia/bin;nvidia-smi
预期输出:
Wed Apr 12 09:31:10 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 || N/A 27C P0 37W / 300W | 4837MiB / 16160MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| 0 N/A N/A 760445 C python 4835MiB |+-----------------------------------------------------------------------------+
预期输出表明,GPU节点上的显存总量为16160 MiB,其中示例Pod使用了4837MiB。
support.huaweicloud.com/usermanual-ucs/ucs_01_0298.html