Deploying a Scalable HPC Cluster with Slurm

Solution Overview

This solution helps you quickly set up a scalable HPC environment on Huawei Cloud based on the open-source software Slurm and Huawei's open-source Gearbox. Slurm is configured to run in "configless" mode for cloud servers functioning as compute nodes. The Gearbox program interconnects with Huawei Cloud Auto Scaling and Cloud Eye to monitor the job status of a Slurm cluster and automatically scale in or out cloud servers in the Slurm cluster in real time. In addition, new cloud servers are automatically registered with and added to the cluster, or cloud servers are automatically deregistered from the cluster and then destroyed.

Solution Architecture

This solution helps you quickly set up a scalable HPC environment on Huawei Cloud.

Deploying a Scalable HPC Cluster with Slurm

Version: 2.0.0

Last Updated: April 2024

Built By: Huawei Cloud

Time Required for Deployment: About 40 minutes

Time Required for Uninstallation: About 10 minutes

Estimated Cost ◥

View Source Code ◥

View Deployment Guide

Deploy

Solution Description

Create two Linux Elastic Cloud Servers (ECSs), install the open-source software Slurm, install the Gearbox program on the scheduling node, and configure the Java environment.
Create one Elastic IP (EIP) for internal and external communication.
Create security groups and configure rules to control access to ECSs so as to secure the ECS environment.
Use Image Management Service (IMS) to prepare the initialization environment for compute node servers during auto scaling.
Use Auto Scaling (AS) to create and configure an auto scaling group as well as define scaling policies to automatically scale in or out cluster resources.
Use Cloud Eye (CES) for resource monitoring. The Gearbox program monitors the job status, calculates the workload value of custom metrics, and reports the metrics to Cloud Eye.
Use Scalable File Service (SFS) to mount SFS file systems to the ECSs to provide shared file storage for clusters.

Deploying a Scalable HPC Cluster with Slurm

Deploying a Scalable HPC Cluster with Slurm

Solution Overview

Solution Architecture

Advantages

Auto scaling

Personalized customization

Easy deployment