检测到您已登录华为云国际站账号,为了您更好的体验,建议您访问国际站服务网站 https://www.huaweicloud.com/intl/zh-cn
不再显示此消息
Using OBS+SFS Turbo for Storage Acceleration in AI Scenarios Solution Overview Resource Planning and Costs Process Implementation Procedure FAQs
When the storage read/write bandwidth no longer meets the AI training needs, for example, checkpoint saves and loads take a longer time or loading datasets slows down the training, you can expand the file system performance to reduce the data loading time.
Automated Customer Service Bringing AI to customer service Community Forum The HUAWEI CLOUD community forum is full of experts happy to help.
Managing SFS Turbo+OBS Storage Interworking Overview In scenarios like AI training and inference, high-performance data preprocessing, EDA, rendering, and simulation, you can use SFS Turbo file systems to speed access to your data in OBS buckets.
Configuring SFS Turbo and OBS Interworking SFS Turbo HPC file systems can access objects stored in OBS buckets seamlessly. You can specify an SFS Turbo interworking directory and associate it with an OBS bucket. Log in to the SFS console. In the left navigation pane, choose SFS Turbo
Configuring the SFS Turbo Data Eviction Policy After an OBS bucket is added as the storage backend of an SFS Turbo HPC file system, you are advised to configure a cold data eviction duration. Once configured, SFS Turbo will automatically delete files that have not been accessed within
Configuring Auto Data Export from SFS Turbo to OBS After auto export is configured, checkpoint files periodically written to the SFS Turbo file system during training will be automatically exported to the OBS bucket for long-term storage. The asynchronous auto export does not interrupt
Uploading Data to OBS and Preloading the Data to SFS Turbo Uploading Data to OBS An OBS bucket has been created by referring to Creating a Bucket. obsutil has been installed by referring to Downloading and Installing obsutil. Visit the ImageNet official website at http://image-net.org
Configuring Network Passthrough Between ModelArts and SFS Turbo Creating an Agency to Authorize ModelArts to Use SFS Turbo Log in to the IAM console as the IAM administrator. In the navigation pane on the left, choose Permissions > Policies/Roles. Configure a custom policy for calling
Parent topic: Using OBS+SFS Turbo for Storage Acceleration in AI Scenarios
Creating a Training Job Create a ModelArts training job based on the SFS Turbo shared file storage. Log in to the ModelArts console. In the navigation pane, choose Training Management > Training Jobs. Click Create Training Job in the upper right corner. On the displayed page, set
Parent topic: Using OBS+SFS Turbo for Storage Acceleration in AI Scenarios
Training Uploading Data to OBS and Preloading the Data to SFS Turbo Creating a Training Job Parent topic: Implementation Procedure
Parent topic: Using OBS+SFS Turbo for Storage Acceleration in AI Scenarios
Implementation Procedure Creating Resources Basic Configurations Training Routine O&M Parent topic: Using OBS+SFS Turbo for Storage Acceleration in AI Scenarios
Basic Configurations Configuring Network Passthrough Between ModelArts and SFS Turbo Configuring SFS Turbo and OBS Interworking Configuring Auto Data Export from SFS Turbo to OBS Configuring the SFS Turbo Data Eviction Policy Parent topic: Implementation Procedure
The powerful compute, storage, and network performance of the AI infrastructure ensure the balanced development of AI compute power.
Creating Resources This best practice uses a VPC, an SFS Turbo HPC file system, an OBS bucket, and a ModelArts resource pool. To achieve optimal acceleration performance, you are advised to select the same region and AZ for the SFS Turbo HPC file system and ModelArts resource pool
Performance-Enhanced Maximum bandwidth: 2 GB/s; maximum IOPS: 100,000 Enhanced bandwidth, IOPS, and capacity Low latency, high IOPS, high bandwidth, and tenant exclusive Workloads dealing with massive small files, and latency-sensitive and bandwidth-demanding workloads, such as image rendering, AI
100,000 Latency: 1 to 3 ms; maximum capacity: 320 TB Enhanced bandwidth, IOPS, and capacity Low latency, high IOPS, high bandwidth, and tenant exclusive Workloads dealing with massive small files, and latency-sensitive and bandwidth-demanding workloads, such as image rendering, AI