云服务器内容精选

  • 基于ModelArts预置镜像提交训练作业 指定命令行options参数提交训练作业 ma-cli ma-job submit --code-dir obs://your-bucket/mnist/code/ \ --boot-file main.py \ --framework-type PyTorch \ --working-dir /home/ma-user/modelarts/user-job-dir/code \ --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 \ --data-url obs://your-bucket/mnist/dataset/MNIST/ \ --log-url obs://your-bucket/mnist/logs/ \ --train-instance-type modelarts.vm.cpu.8u \ --train-instance-count 1 \ -q 使用预置镜像的train.yaml样例:
  • 基于 自定义镜像 创建训练作业 指定命令行options参数提交训练作业 ma-cli ma-job submit --image-url atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e \ --code-dir obs://your-bucket/mnist/code/ \ --user-command "export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/PyTorch-1.8/bin/python main.py" \ --data-url obs://your-bucket/mnist/dataset/MNIST/ \ --log-url obs://your-bucket/mnist/logs/ \ --train-instance-type modelarts.vm.cpu.8u \ --train-instance-count 1 \ -q 使用自定义镜像的train.yaml样例: # .ma/train.yaml样例(自定义镜像)image-url: atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2euser-command: export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/PyTorch-1.8/bin/python main.pytrain-instance-type: modelarts.vm.cpu.8utrain-instance-count: 1data-url: obs://your-bucket/mnist/dataset/MNIST/code-dir: obs://your-bucket/mnist/code/log-url: obs://your-bucket/mnist/logs/##[Optional] Uncomment to set uid when use custom image modeuid: 1000##[Optional] Uncomment to upload output file/dir to OBS from training platformoutput: - name: output_dir obs_path: obs://your-bucket/mnist/output1/##[Optional] Uncomment to download input file/dir from OBS to training platforminput: - name: data_url obs_path: obs://your-bucket/mnist/dataset/MNIST/##[Optional] Uncomment pass hyperparametersparameters: - epoch: 10 - learning_rate: 0.01 - pretrained:##[Optional] Uncomment to use dedicated poolpool_id: pool_xxxx##[Optional] Uncomment to use volumes attached to the training jobvolumes: - efs: local_path: /xx/yy/zz read_only: false nfs_server_path: xxx.xxx.xxx.xxx:/
  • 示例 基于yaml文件提交训练作业 ma-cli ma-job submit ./train-job.yaml 基于命令行和预置镜像pytorch1.8-cuda10.2-cudnn7-ubuntu18.04提交训练作业。 ma-cli ma-job submit --code-dir obs://automation-use-only/Original/TrainJob/TrainJob-v2/pytorch1.8.0_cuda10.2/code/ \ --boot-file test-pytorch.py \ --framework-type PyTorch \ --working-dir /home/ma-user/modelarts/user-job-dir/code \ --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 \ --data-url obs://automation-use-only/Original/TrainJob/TrainJob-v2/pytorch1.8.0_cuda10.2/data/ \ --log-url obs://automation-use-only/Original/TrainJob/TrainJob-v2/pytorch1.8.0_cuda10.2/data/logs/ \ --train-instance-type modelarts.vm.cpu.8u \ --train-instance-count 1 \