Search_HUAWEI CLOUD

What Should I Do If an Error Occurs When I Deploy a Service on a GPU Node? - Cloud Container Engine

Fault Locating The CCE AI Suite (NVIDIA GPU) add-on has an outdated driver version. After a new driver is downloaded and installed, the fault is rectified. You did not specify the requirement for GPUs in workloads.

Help > Cloud Container Engine > FAQs > Workload > Workload Exception Troubleshooting
Managing Costs for a Cluster - Cloud Container Engine

Enabling AI Performance-based Scheduling In AI and big data collaborative scheduling scenarios, Volcano Dominant Resource Fairness (DRF) and group can be used to improve training performance and resource utilization.

Help > Cloud Container Engine > Best Practices > Cluster
Manual NPU Virtualization - Cloud Container Engine

The CCE AI Suite (Ascend NPU) add-on of v2.1.15 or later has been installed in the cluster. For details, see CCE AI Suite (Ascend NPU). An NPU driver has been installed on the NPU nodes, and the driver version is 23.0.1 or later.

Help > Cloud Container Engine > User Guide > Scheduling > NPU Scheduling > NPU Virtualization
Configuring Auto Scaling for xGPU Nodes - Cloud Container Engine

CCE AI Suite (NVIDIA GPU) (v2.1.8, v2.7.5 or later), Volcano Scheduler (v1.10.5 or later), and CCE Cluster Autoscaler (v1.27.150, v1.28.78, v1.29.41, or later) have been installed in the cluster.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Auto Scaling
GPU Fault Handling - Cloud Container Engine

Table 1 lists CCE AI Suite (NVIDIA GPU) exception events and isolation results.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling
Why Do a Large Number of Pods Fail to Be Executed After a Workload That Uses Even Scheduling on Virtual GPUs Is Created? - Cloud Container Engine

Possible Cause For even scheduling on virtual GPUs, the cluster version must be compatible with the CCE AI Suite (NVIDIA GPU) add-on version.

Help > Cloud Container Engine > FAQs > Workload > Scheduling Policies
What Can I Do If There Is an Abnormal Pod and a Message Stating That the Device Files Can't Be Found? - Cloud Container Engine

When the CCE AI Suite (Ascend NPU) add-on reported information, only the chip logic IDs were updated, while the mapping between the chip logic IDs and NPU IDs remained unchanged.

Help > Cloud Container Engine > FAQs > Workload > Workload Exception Troubleshooting
Scheduling Workloads - Cloud Container Engine

It provides end users with computing frameworks from multiple domains such as AI, big data, gene, and rendering. It also offers job scheduling, job management, and queue management for computing applications. Kubernetes typically uses its default scheduler to schedule workloads.

Help > Cloud Container Engine > User Guide > Scheduling > Volcano Scheduling
Overview - Cloud Container Engine
Overview - Cloud Container Engine

Offline jobs: Such jobs run for a short time, have high computing requirements, and can tolerate high latency, such as AI and big data services.

Help > Cloud Container Engine > User Guide > Scheduling > Cloud Native Hybrid Deployment
Selecting a Network Model - Cloud Container Engine

Therefore, the VPC network model applies to scenarios that have high requirements on performance, such as AI computing and big data computing.

Help > Cloud Container Engine > Best Practices > Networking
Comprehensive Monitoring of GPU, Virtualization, and Pod Resource Metrics - Cloud Container Engine

The CCE AI Suite (NVIDIA GPU) add-on has been installed in the cluster, and the add-on version is 2.0.10 or later. At least one NVIDIA GPU node is available in the cluster.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Monitoring
Enabling Kubernetes' Default GPU Scheduling in GPU Virtualization - Cloud Container Engine

Notes and Constraints To support Kubernetes' default GPU scheduling on GPU nodes, the CCE AI Suite (NVIDIA GPU) add-on must be of v2.0.10 or later, and the Volcano Scheduler add-on must be of v1.10.5 or later. Example of Shared GPU Scheduling Use kubectl to access the cluster.

Help > Cloud Container Engine > User Guide > Scheduling > GPU Scheduling > GPU Virtualization
Creating a Job - Cloud Container Engine
Creating a Job - Cloud Container Engine

(Optional) NPU Quota Configurable only when the cluster contains NPU nodes and the CCE AI Suite (Ascend NPU) add-on has been installed. Do not use: No NPU will be used.

Help > Cloud Container Engine > User Guide > Workloads > Creating a Workload
Creating a DaemonSet - Cloud Container Engine

(Optional) NPU Quota Configurable only when the cluster contains NPU nodes and the CCE AI Suite (Ascend NPU) add-on has been installed. Do not use: No NPU will be used.

Help > Cloud Container Engine > User Guide > Workloads > Creating a Workload
Creating a Deployment - Cloud Container Engine

(Optional) NPU Quota Configurable only when the cluster contains NPU nodes and the CCE AI Suite (Ascend NPU) add-on has been installed. Do not use: No NPU will be used.

Help > Cloud Container Engine > User Guide > Workloads > Creating a Workload
Creating a CronJob - Cloud Container Engine

(Optional) NPU Quota Configurable only when the cluster contains NPU nodes and the CCE AI Suite (Ascend NPU) add-on has been installed. Do not use: No NPU will be used.

Help > Cloud Container Engine > User Guide > Workloads > Creating a Workload
Using Edge Cloud Resources in a Remote CCE Turbo Cluster - Cloud Container Engine

(Ascend NPU) CCE AI Suite (NVIDIA GPU) Cloud Native Cluster Monitoring Cloud Native Log Collection Grafana Cloud native cluster monitoring Monitoring Center: The Cloud Native Cluster Monitoring of 3.12.0 or later must be installed in the cluster.

Help > Cloud Container Engine > User Guide > Clusters > Buying a Cluster
Overview - Cloud Container Engine
Overview - Cloud Container Engine

Therefore, the VPC network model applies to scenarios that have high requirements on performance, such as AI computing and big data computing.

Help > Cloud Container Engine > User Guide > Networking > Container Networks
Volcano Scheduler - Cloud Container Engine

It accesses the computing frameworks for various industries such as AI, big data, gene, and rendering and schedules up to 1000 pods per second for end users, greatly improving scheduling efficiency and resource utilization.

Help > Cloud Container Engine > User Guide > Add-ons > Scheduling and Elasticity Add-ons
CCE Advantages - Cloud Container Engine
CCE Advantages - Cloud Container Engine

AI computing is 3 to 5 times better with NUMA BMSs and high-speed InfiniBand network cards. Highly Available and Secure HA: CCE supports three control plane nodes on the cluster management plane. These nodes run in different regions to ensure cluster HA.

Help > Cloud Container Engine > Service Overview

Total results: 106

Was this helpful?

Feedback

/200

Submit Cancel