Search_HUAWEI CLOUD

Add-ons - Cloud Container Engine
Add-ons - Cloud Container Engine

Add-ons Overview Scheduling and Elasticity Add-ons Cloud Native Observability Add-ons Cloud Native AI Add-ons Container Network Add-ons Container Storage Add-ons Container Security Add-ons Other Add-ons Add-on Upgrade Checks

Help > Cloud Container Engine > User Guide
Troubleshooting for Pre-upgrade Check Exceptions - Cloud Container Engine

Suite (NVIDIA GPU) Exceptions Nodes' System Parameters Residual Package Version Data Node Commands Node Swap NGINX Ingress Controller Upgrade of Cloud Native Cluster Monitoring containerd Pod Restart Risks Key CCE AI Suite (NVIDIA GPU) Parameters GPU or NPU Pod Rebuild Risks ELB

Help > Cloud Container Engine > User Guide > Clusters > Upgrading a Cluster
Scheduling Overview - Cloud Container Engine

Priority-based Scheduling AI performance-based scheduling Scheduling policies are configured based on the nature and resource usage of AI tasks to increase the throughput of cluster services and improve service performance.

Help > Cloud Container Engine > User Guide > Scheduling
Preparing Virtualized GPU Resources - Cloud Container Engine

CUDA version CUDA 12.2.0 to 12.8.0 Runtime containerd Add-on The following add-ons must be installed in the cluster: Volcano Scheduler: 1.10.5 or later CCE AI Suite (NVIDIA GPU): 2.0.5 or later Step 1: Enable GPU Virtualization Both CCE AI Suite (NVIDIA GPU) and Volcano Scheduler

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Virtualization
Certificates - Cloud Container Engine
Certificates - Cloud Container Engine

Excellent Trusted AI Cloud - Cloud Native AI Capability Maturity CAICT has confirmed that Huawei cloud native solution passed the L4 tests in "2024 AI Cloud Native Capability Maturity Model" (Q/KXY ACN001) in areas such as heterogeneous resource management, orchestration and scheduling

Help > Cloud Container Engine > Service Overview > Security
Comprehensive Monitoring of DCGM Metrics - Cloud Container Engine

The CCE AI Suite (NVIDIA GPU) add-on, version 2.7.40 or later, is built on NVIDIA DCGM, providing advanced GPU monitoring functionalities.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Monitoring
Recommended GPU Driver Versions for CCE - Cloud Container Engine

Supported GPU Drivers The list of supported GPU drivers applies only to CCE AI Suite (NVIDIA GPU) of v1.2.28 or later. To use the latest GPU driver, upgrade your CCE AI Suite (NVIDIA GPU) to the latest version.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Driver Version
Default GPU Scheduling in Kubernetes - Cloud Container Engine

The CCE AI Suite (NVIDIA GPU) add-on has been installed, with the selected driver matching the GPU model on the node. For details, see CCE AI Suite (NVIDIA GPU).

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling
Overview - Cloud Container Engine
Overview - Cloud Container Engine

Prerequisites Item Supported Version Cluster version v1.23.8-r0, v1.25.3-r0, or later OS Huawei Cloud EulerOS 2.0 with the kernel version of 5.10 or later GPU type Tesla T4 and Tesla V100 Driver version 535.216.03, 535.54.03, 510.47.03 (EOL), and 470.57.02 (EOL) NOTE: CCE AI Suite

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Virtualization
Upgrading the Driver Version of a GPU Node Using a Node Pool - Cloud Container Engine

The command for obtaining a driver varies depending on the CCE AI Suite (NVIDIA GPU) version.

Help > Cloud Container Engine > User Guide (ME-Abu Dhabi Region) > Scheduling > GPU Scheduling > GPU Driver Version
Upgrading the Driver Version of a GPU Node Using a Node Pool - Cloud Container Engine

The command for obtaining a driver varies depending on the CCE AI Suite (NVIDIA GPU) version.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Driver Version
Complete NPU Allocation - Cloud Container Engine

The CCE AI Suite (Ascend NPU) add-on has been installed. For details, see CCE AI Suite (Ascend NPU). Creating a Workload with Complete NPU Allocation Enabled You can create a workload with complete NPU allocation enabled using the console or kubectl.

Help > Cloud Container Engine > User Guide > Scheduling > NPU Scheduling
Pre-upgrade Check - Cloud Container Engine

Suite (NVIDIA GPU) Parameters Check whether the configuration of CCE AI Suite (NVIDIA GPU) in a cluster has been intrusively modified.

Help > Cloud Container Engine > User Guide > Clusters > Upgrading a Cluster > Troubleshooting for Pre-upgrade Check Exceptions
Monitoring GPU Metrics Using DCGM-Exporter - Cloud Container Engine

Enabling DCGM-Exporter Using CCE AI Suite (NVIDIA GPU) Enable the core component DCGM-Exporter using the CCE AI Suite (NVIDIA GPU) add-on. Log in to the CCE console and click the cluster name to access the cluster console.

Help > Cloud Container Engine > Best Practices > Monitoring
Comprehensive Monitoring of NPU Metrics - Cloud Container Engine

The CCE AI Suite (Ascend NPU) add-on has been installed in the cluster and its version is 2.1.55 or later. For details, see CCE AI Suite (Ascend NPU). The Cloud Native Cluster Monitoring add-on has been installed in the cluster and its version is 3.12.1 or later.

Help > Cloud Container Engine > User Guide > Scheduling > NPU Scheduling > NPU Monitoring
Overview - Cloud Container Engine
Overview - Cloud Container Engine

Cloud Native Heterogeneous Computing Add-ons Add-on Name Description CCE AI Suite (NVIDIA GPU) This add-on supports and manages GPUs in containers. Only NVIDIA drivers are supported. CCE AI Suite (Ascend NPU) This add-on supports and manages NPUs in containers.

Help > Cloud Container Engine > User Guide > Add-ons
Kubernetes 1.13 (EOM) Release Notes - Cloud Container Engine

Ascend-accelerated nodes (powered by HiSilicon Ascend 310 AI processors) apply to scenarios such as image recognition, video processing, inference computing, and machine learning. The docker baseSize is configurable. Namespace affinity scheduling is supported.

Help > Cloud Container Engine > User Guide > Clusters > Cluster Version Release Notes > Kubernetes Version Release Notes
Kubernetes 1.13 (EOM) Release Notes - Cloud Container Engine

Ascend-accelerated nodes (powered by HiSilicon Ascend 310 AI processors) apply to scenarios such as image recognition, video processing, inference computing, and machine learning. The docker baseSize is configurable. Namespace affinity scheduling is supported.

Help > Cloud Container Engine > Product Bulletin > Product Release Notes > Cluster Versions > Kubernetes Version Release Notes
Configuring Workload Scaling Based on GPU Monitoring Metrics - Cloud Container Engine

The CCE AI Suite (NVIDIA GPU) add-on has been installed in the cluster, and the add-on metrics API is working properly.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Auto Scaling
Node Running - Cloud Container Engine
Node Running - Cloud Container Engine

What Can I Do If Certain Alarms Are Displayed in the GPU Node Events After the CCE AI Suite (NVIDIA GPU) Add-on Is Upgraded? Parent Topic: Node

Help > Cloud Container Engine > FAQs > Node

Total results: 107

Was this helpful?

Feedback

/200

Submit Cancel