Search_HUAWEI CLOUD

Using a Large Model to Create a Model on ModelArts Standard and Deploy It as a Real-Time Service - ModelArts

To avoid repeated loading, the platform allows the model package to be loaded from the local storage space of the node in the resource pool and keeps the loaded files valid even when the service is stopped or restarted (using the hash value to ensure data consistency).

Help > ModelArts > Best Practices > Model Inference
Lite Server Node Fault Diagnosis - ModelArts

Server Model Select a server model and select nodes in the node list. You can search for node information using keywords. Snt9b nodes and Snt9b23 supernodes are supported. Diagnosis Item You can select Parameter Plane Network Diagnosis, Ascend Device Diagnosis, or both.

Help > ModelArts > ModelArts User Guide (Lite Server) > Lite Server Plug-in Management
Stable Diffusion XL Inference Guide Based on ModelArts Notebook (6.5.907) - ModelArts

For details, see Creating a Notebook Instance (New Page). The default resource specification is 5 GB, but you can expand it as needed.

Help > ModelArts > Best Practices > Image Generation Model Training and Inference
One-Click Pressure Test for Lite Server Nodes - ModelArts

Server Model Select a server model and select nodes in the node list. You can search for node information using keywords. Snt9b nodes and Snt9b23 supernodes are supported. Test Case You can select any of the following pressure test cases.

Help > ModelArts > ModelArts User Guide (Lite Server) > Lite Server Plug-in Management
Creating a Multiple-Node Multi-PU Distributed Training Job (DistributedDataParallel) - ModelArts

Why Can I Leave the IP Address of the Master Node Blank for DDP? The init method parameter in parser.add_argument('--init_method', default=None, help='tcp_port') contains the IP address and port number of the master node, which are automatically input by the platform.

Help > ModelArts > ModelArts User Guide (Standard) > Using ModelArts Standard to Train Models > Distributed Model Training
Minimum Number of PUs and Sequence Length Supported by Each Model - ModelArts

Minimum Number of PUs and Sequence Length Supported by Each Model Model Training Time and Cluster Scale Prediction Training time and the number of PUs depend on the model, cluster specifications (Snt9b B3/B2/B1 or Snt9b23), and dataset size.

Help > ModelArts > Best Practices > LLM Training > Adapting Mainstream Open-Source Models to AscendFactory NPU Training Based on Lite Server
Managing Lite Cluster Node Pools - ModelArts

Synchronization for Existing Nodes (labels and taints) and Synchronization for Existing Nodes (labels) can be modified synchronously for existing nodes (by selecting the check boxes). The updated resource tag information in the node pool is synchronized to its nodes.

Help > ModelArts > ModelArts User Guide (Lite Cluster) > Managing Lite Cluster Resources
Expanding and Reducing Lite Server Supernodes - ModelArts

(Optional) Custom Instance Injection Use this function to configure Server nodes if you want to: Use scripts to simplify the Server node configuration. Use scripts to initialize OSs. Use existing scripts and upload them to the server when creating the Server node.

Help > ModelArts > ModelArts User Guide (Lite Server) > Managing Lite Server Supernodes
Configuring the Parameter Plane Network of Lite Server Nodes - ModelArts

Server Model Select a server model and select nodes in the node list. You can search for node information using keywords. Snt9b nodes and Snt9b23 supernodes are supported.

Help > ModelArts > ModelArts User Guide (Lite Server) > Lite Server Plug-in Management
Software Versions Required by Different Models - ModelArts

Software Versions Required by Different Models A resource pool for elastic clusters can use either Elastic Bare Metal Servers (BMSs) or Elastic Cloud Servers (ECSs) as nodes. Each node model has its own operating system (OS) and compatible CCE cluster versions.

Help > ModelArts > ModelArts User Guide (Lite Cluster) > Before You Start
Adapting Diffusers and ComfyUI Kits to PyTorch NPU for Inference Using ModelArts Lite Server (6.5.907) - ModelArts

(If the port number is in use, change it to another one.) Access the snt9b23 container.

Help > ModelArts > Best Practices > Image Generation Model Training and Inference
Saving the Image of a Debug Training Job - ModelArts

Parent topic: Managing Model Training Jobs

Help > ModelArts > ModelArts User Guide (Standard) > Using ModelArts Standard to Train Models > Managing Model Training Jobs
Provisioning Lite Server Resources (New Version) - ModelArts

Table 7 Node management parameters Parameter Description Server Name Lite Server name, which can contain 1 to 64 characters. Only digit, letters, underscores (_), and hyphens (-) are allowed. CAUTION: The server name in the order will not be changed.

Help > ModelArts > ModelArts User Guide (Lite Server)
Configuring the Software Environment on the NPU Server - ModelArts

Figure 14 RoCE test result (receive end) Figure 15 RoCE test result (server) If the RoCE bandwidth test has been started for a NIC, the following error message is displayed when the task is started again.

Help > ModelArts > ModelArts User Guide (Lite Server) > Configuring Lite Server Resources > (Optional) Configuring the Software Environment
Fixing Lite Server Vulnerabilities - ModelArts

Server Model Select a server model and select nodes in the node list. You can search for node information using keywords. Snt9b nodes and Snt9b23 supernodes are supported.

Help > ModelArts > ModelArts User Guide (Lite Server) > Lite Server Plug-in Management
Changing or Resetting the Lite Server OS - ModelArts

ak := os.Getenv("HUAWEICLOUD_SDK_AK") sk := os.Getenv("HUAWEICLOUD_SDK_SK") auth := basic.NewCredentialsBuilder(). WithAk(ak). WithSk(sk). Build() client := bms.NewBmsClient( bms.BmsClientBuilder(). WithRegion(region.ValueOf("cn-north-4")).

Help > ModelArts > ModelArts User Guide (Lite Server) > Managing Lite Server Resources
Viewing Training Job Details - ModelArts

In the navigation pane, choose Model Training > Training Jobs. In the job list, click Export to export training job details in a certain time range as an Excel file. A maximum of 200 rows of data can be exported.

Help > ModelArts > ModelArts User Guide (Standard) > Using ModelArts Standard to Train Models > Managing Model Training Jobs
Creating a Custom Image for a Model - ModelArts

You need to import a model package. The new image is larger than 35 GB and needs to be created on a server such as ECS. For details, see Creating a Custom Image on ECS. Figure 1 Creating a custom image for a model Constraints No malicious code is allowed.

Help > ModelArts > ModelArts User Guide (Standard) > Creating a Custom Image for ModelArts Standard > Creating a Custom Image for Inference
Collecting and Uploading NPU Logs - ModelArts

Template Parameters Enter the node directory on the Lite Server for storing logs. The default value is /root/log_collection. Server Model Snt9b and Snt9b23 supernodes are supported. Collection Items Select Device side log, Host side log, NPU environment log, or all of them.

Help > ModelArts > ModelArts User Guide (Lite Server) > Collecting Lite Server Logs
Overview - ModelArts
Overview - ModelArts

): Multiple GPUs work together on one server to speed up training using data parallelism.

Help > ModelArts > ModelArts User Guide (Standard) > Using ModelArts Standard to Train Models > Distributed Model Training

Total results: 96

Was this helpful?

Feedback

/200

Submit Cancel