检测到您已登录华为云国际站账号,为了您更好的体验,建议您访问国际站服务网站 https://www.huaweicloud.com/intl/zh-cn
不再显示此消息
Symptom In EulerOS 2.9, manually installing a GPU driver without using the CCE AI Suite (NVIDIA GPU) add-on results in an error message similar to the following displayed: ERROR: Unable to find the kernel source tree for the currently running kernel.
CCE AI Suite (NVIDIA GPU) has been installed in the cluster. For details, see CCE AI Suite (NVIDIA GPU). The add-on version must meet the following requirements: If the cluster version is 1.27 or earlier, the add-on version must be 2.1.41 or later.
History Table 6 Release history Add-on Version Supported Cluster Version New Feature Community Version 1.2.3 v1.27 v1.28 v1.29 v1.30 v1.31 v1.32 Clusters of v1.32 are supported. v1.2.2 1.2.2 v1.27 v1.28 v1.29 v1.30 v1.31 The Kuberay add-on is now available. v1.2.2 Parent Topic: AI
The command for obtaining a driver varies depending on the CCE AI Suite (NVIDIA GPU) add-on version.
The command for obtaining a driver varies depending on the CCE AI Suite (NVIDIA GPU) add-on version.
Parent Topic: AI Task Management
Queue Resource Management (capacity Plugin) In a Kubernetes cluster, multiple teams or services (such as AI training, big data analysis, and online services) need to share compute resources, and jobs have different resource requirements and priorities.
Support NPU resource scheduling To use this capability, the CCE AI Suite (Ascend NPU) add-on (CCE AI Suite (Ascend NPU)) must be installed.
CCE has released a new version of the CCE AI Suite (NVIDIA GPU) add-on to fix this vulnerability. For details, see CCE AI Suite (NVIDIA GPU) Release History.
CCE has released a new version of the CCE AI Suite (NVIDIA GPU) add-on to fix these vulnerabilities. Upgrade the add-on to the fixed version. For details, see CCE AI Suite (NVIDIA GPU) Release History.
Add-on Instance Parameters CoreDNS CCE Container Storage (Everest) CCE Node Problem Detector Kubernetes Dashboard CCE Cluster Autoscaler NGINX Ingress Controller Kubernetes Metrics Server CCE Advanced HPA CCE Cloud Bursting Engine for CCI CCE AI Suite (NVIDIA GPU) CCE AI Suite (Ascend
Volcano is a versatile, scalable, reliable platform for running big data and AI jobs. It supports a wide range of computing frameworks, including those for AI, big data, gene sequencing, and rendering tasks.
Supported GPU Drivers The list of supported GPU drivers applies only to CCE AI Suite (NVIDIA GPU) of v1.2.28 or later. To use the latest GPU driver, upgrade your CCE AI Suite (NVIDIA GPU) to the latest version.
Problem Detector Release History Kubernetes Dashboard Release History CCE Cluster Autoscaler Release History NGINX Ingress Controller Release History Kubernetes Metrics Server Release History CCE Advanced HPA Release History CCE Cloud Bursting Engine for CCI Release History CCE AI
You have installed CCE AI Suite (NVIDIA GPU) or CCE AI Suite (Ascend NPU) in the cluster. For details, see CCE AI Suite (NVIDIA GPU) and CCE AI Suite (Ascend NPU). The NPU driver version must be later than 23.0.
Deploying and Using Kubeflow in a CCE Cluster Deploying Kubeflow Training a TensorFlow Model Using Kubeflow and Volcano to Train an AI Model Parent Topic: Batch Computing
When installing the CCE AI Suite (NVIDIA GPU) add-on of v2.7.2 or later, you can configure GPU virtualization for node pools.
Add-ons Overview Scheduling and Elasticity Add-ons Cloud Native Observability Add-ons Cloud Native AI Add-ons Container Network Add-ons Container Storage Add-ons Container Security Add-ons Other Add-ons
Volcano Scheduling Volcano Scheduling Overview Scheduling Workloads Resource Usage-based Scheduling Priority-based Scheduling AI Performance-based Scheduling Queue Scheduling NUMA Affinity Scheduling Application Scaling Priority Policies Parent Topic: Scheduling
What Can I Do If a Pod Cannot Be Started After the CCE AI Suite (Ascend NPU) Add-on Is Upgraded from 1.x.x to 2.x.x? How Can I Drain a GPU Node After Upgrading or Rolling Back the CCE AI Suite (NVIDIA GPU) Add-on? Why Am I Unable to Install a NVIDIA Driver on EulerOS 2.9?