云容器引擎 CCE-工作负载异常:实例调度失败:排查思路

时间:2024-05-20 10:01:17

排查思路

根据具体事件信息确定具体问题原因,如表1所示。

表1 实例调度失败

事件信息

问题原因与解决方案

no nodes available to schedule pods.

集群中没有可用的节点。

排查项一:集群内是否无可用节点

0/2 nodes are available: 2 Insufficient cpu.

0/2 nodes are available: 2 Insufficient memory.

节点资源(CPU、内存)不足。

排查项二:节点资源(CPU、内存等)是否充足

0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod affinity rules, 1 node(s) didn't match pod affinity/anti-affinity.

节点与Pod亲和性配置互斥,没有满足Pod要求的节点。

排查项三:检查工作负载的亲和性配置

0/2 nodes are available: 2 node(s) had volume node affinity conflict.

Pod挂载云硬盘存储卷与节点不在同一个可用区。

排查项四:挂载的存储卷与节点是否处于同一可用区

0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

节点存在污点Tanits,而Pod不能容忍这些污点,所以不可调度。

排查项五:检查Pod污点容忍情况

0/7 nodes are available: 7 Insufficient ephemeral-storage.

节点临时存储不足。

排查项六:检查临时卷使用量

0/1 nodes are available: 1 everest driver not found at node

节点上everest-csi-driver不在running状态。

排查项七:检查everest插件是否工作正常

Failed to create pod sandbox: ...

Create more free space in thin pool or use dm.min_free_space option to change behavior

节点thinpool空间不足。

排查项八:检查节点thinpool空间是否充足

0/1 nodes are available: 1 Too many pods.

该节点调度的Pod超出上限。

检查项九:检查节点上调度的Pod是否过多

support.huaweicloud.com/cce_faq/cce_faq_00098.html