检测到您已登录华为云国际站账号,为了您更好的体验,建议您访问国际站服务网站 https://www.huaweicloud.com/intl/zh-cn
不再显示此消息
Priority-based Scheduling AI performance-based scheduling Scheduling policies are configured based on the nature and resource usage of AI tasks to increase the throughput of cluster services and improve service performance.
Supported GPU Drivers The list of supported GPU drivers applies only to CCE AI Suite (NVIDIA GPU) of v1.2.28 or later. To use the latest GPU driver, upgrade your CCE AI Suite (NVIDIA GPU) to the latest version.
Suite (NVIDIA GPU) Exceptions Nodes' System Parameters Residual Package Version Data Node Commands Node Swap NGINX Ingress Controller Upgrade of Cloud Native Cluster Monitoring containerd Pod Restart Risks Key CCE AI Suite (NVIDIA GPU) Parameters GPU or NPU Pod Rebuild Risks ELB
Excellent Trusted AI Cloud - Cloud Native AI Capability Maturity CAICT has confirmed that Huawei cloud native solution passed the L4 tests in "2024 AI Cloud Native Capability Maturity Model" (Q/KXY ACN001) in areas such as heterogeneous resource management, orchestration and scheduling
The CCE AI Suite (NVIDIA GPU) add-on, version 2.7.40 or later, is built on NVIDIA DCGM, providing advanced GPU monitoring functionalities.
The CCE AI Suite (NVIDIA GPU) add-on has been installed, with the selected driver matching the GPU model on the node. For details, see CCE AI Suite (NVIDIA GPU).
The command for obtaining a driver varies depending on the CCE AI Suite (NVIDIA GPU) version.
The command for obtaining a driver varies depending on the CCE AI Suite (NVIDIA GPU) version.
Suite (NVIDIA GPU): 2.0.5 or later Step 1: Enable GPU Virtualization Both CCE AI Suite (NVIDIA GPU) and Volcano Scheduler must be installed in the cluster.
The CCE AI Suite (Ascend NPU) add-on has been installed. For details, see CCE AI Suite (Ascend NPU). Creating a Workload with Full NPU Dispatch Enabled You can create a workload with full NPU dispatch enabled using the console or kubectl.
Suite (NVIDIA GPU) Parameters Check whether the configuration of CCE AI Suite (NVIDIA GPU) in a cluster has been intrusively modified.
Enabling DCGM-Exporter Using CCE AI Suite (NVIDIA GPU) Enable the core component DCGM-Exporter using the CCE AI Suite (NVIDIA GPU) add-on. Log in to the CCE console and click the cluster name to access the cluster console.
The CCE AI Suite (Ascend NPU) add-on has been installed in the cluster and its version is 2.1.55 or later. For details, see CCE AI Suite (Ascend NPU). The Cloud Native Cluster Monitoring add-on has been installed in the cluster and its version is 3.12.1 or later.
Ascend-accelerated nodes (powered by HiSilicon Ascend 310 AI processors) apply to scenarios such as image recognition, video processing, inference computing, and machine learning. The docker baseSize is configurable. Namespace affinity scheduling is supported.
Ascend-accelerated nodes (powered by HiSilicon Ascend 310 AI processors) apply to scenarios such as image recognition, video processing, inference computing, and machine learning. The docker baseSize is configurable. Namespace affinity scheduling is supported.
Cloud Native Heterogeneous Computing Add-ons Add-on Name Description CCE AI Suite (NVIDIA GPU) This add-on supports and manages GPUs in containers. Only NVIDIA drivers are supported. CCE AI Suite (Ascend NPU) This add-on supports and manages NPUs in containers.
What Can I Do If Certain Alarms Are Displayed in the GPU Node Events After the CCE AI Suite (NVIDIA GPU) Add-on Is Upgraded? Parent Topic: Node
The CCE AI Suite (NVIDIA GPU) add-on has been installed in the cluster, and the add-on metrics API is working properly.
Fault Locating The CCE AI Suite (NVIDIA GPU) add-on has an outdated driver version. After a new driver is downloaded and installed, the fault is rectified. You did not specify the requirement for GPUs in workloads.
Fault Locating The CCE AI Suite (NVIDIA GPU) add-on has an outdated driver version. After a new driver is downloaded and installed, the fault is rectified. You did not specify the requirement for GPUs in workloads.