云服务器内容精选

  • 支持数据简介 MindSpeed-LLM框架常用数据集格式: alpaca格式 sharegpt格式 moss格式 本教程使用到的训练数据集样例是Alpaca数据集。您也可以自行准备数据集。 Alpaca数据集下载链接如下: 预训练(MindSpeed-LLM):train-00000-of-00001-a09b74b3ef9c3b56.parquet,数据大小:24M左右。 微调:alpaca_gpt4_data.json,数据大小:43.6 MB。
  • 模型最小卡数配置 不同模型推荐的训练参数和计算规格要求如表1所示, 目前仅提供微调(SFT)及训练(PT)阶段卡数配置。一般snt9B规格为单节点8卡,Snt9B23规格为单机8卡=16*DIE,其中1*DIE等效于Snt9B中的1卡,Snt9B23规格实际训练过程中设置并行策略时2*DIE为最小单位。 * 表格中“-”代表不支持,规格与卡数中的 4*Ascend表示4卡在Snt9B中表示4卡,Snt9B23表示4*DIE,以此类推。 表1 模型最小卡数配置 支持模型参数量 训练策略类型 序列长度SEQ_LEN MindSpeed-LLM规格卡数/DIE Snt9B Snt9B23 llama3.1-8b full 4096/8192 4*Ascend lora 4*Ascend llama3.1-70b full 4096 32*Ascend lora 16*Ascend full 8192 64*Ascend lora 16*Ascend llama3.2-1b full/lora 4096/8192 1*Ascend 2*Ascend llama3.2-3b full 4096/8192 2*Ascend lora 1*Ascend 2*Ascend qwen2-0.5b full/lora 4096/8192 1*Ascend 2*Ascend qwen2-1.5b full/lora 4096/8192 1*Ascend 2*Ascend qwen2-7b full 4096 4*Ascend lora 2*Ascend full 8192 8*Ascend lora 2*Ascend qwen2-72b full 4096 32*Ascend lora 16*Ascend full 8192 64*Ascend lora 16*Ascend qwen2.5-0.5b full/lora 4096/8192 1*Ascend 2*Ascend qwen2.5-7b full 4096 2*Ascend lora 2*Ascend full 8192 2*Ascend lora 2*Ascend qwen2.5-14b full 4096 8*Ascend lora 4*Ascend full 8192 8*Ascend lora 8*Ascend qwen2.5-32b full 4096 16*Ascend lora 16*Ascend full 8192 16*Ascend lora 16*Ascend qwen2.5-72b full 4096 32*Ascend lora 16*Ascend full 8192 64*Ascend lora 16*Ascend glm4-9b full 4096/8192 8*Ascend lora 4096/8192 2*Ascend mixtral-8x7b full 4096/8192 16*Ascend DeepSeek-V3/R1 full 4096 512*Ascend lora 64*Ascend 1. 当mindspeed-llm上开启分布式优化器并行时,优化器参数会在集群所有机器上切分共享,因此最优配置会和卡数相关; 2. 当前benchmark是综合考虑了最小可运行卡数和最优性能平衡情况下测试出的配置,实际情况中可以根据集群规模大小和性能取舍进行参数调整; 父主题: 训练脚本说明参考
  • 模型推荐的参数与NPU卡数设置 不同模型推荐的训练参数和计算规格要求如表2所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。 表2 不同模型推荐的参数与NPU卡数设置 序号 支持模型 支持模型参数量 训练策略类型 文本序列长度(SEQ_LEN) 并行参数设置 micro batch size (MBS) 规格与节点数 1 llama2 llama2-7b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 2 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 2 1*节点 & 8*Ascend 2 llama2-13b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend 3 llama2-70b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend 4 llama3 llama3-8b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend 5 llama3-70b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend 6 Qwen qwen-7b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend 7 qwen-14b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 2 1*节点 & 8*Ascend 8 qwen-72b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend 9 Qwen1.5 qwen1.5-7b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 2 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend 10 qwen1.5-14b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend 11 qwen1.5-32b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2 2*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 4 2*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 1 2*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2 2*节点 & 8*Ascend 12 qwen1.5-72b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend 13 Yi yi-6b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 2 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=2 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend 14 yi-34b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 1 2*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 2 2*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend 15 ChatGLMv3 glm3-6b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 2 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 1 1*节点 & 4*Ascend 16 Baichuan2 baichuan2-7b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 2 full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1 lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 17 baichuan2-13b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1 2*节点 & 8*Ascend 18 Qwen2 qwen2-0.5b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend 19 qwen2-1.5b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend 20 qwen2-7b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 2 1*节点 & 8*Ascend 21 qwen2-72b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend 22 GLMv4 glm4-9b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 1 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=2 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend 23 mistral mistral-7b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 2 1*节点 & 8*Ascend 24 mixtral mixtral-8x7b full 4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 1 2*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 1 2*节点 & 8*Ascend 25 llama3.1 llama3.1-8b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend 26 llama3.1-70b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 4 2*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2 2*节点 & 8*Ascend 27 Qwen2.5 qwen2.5-0.5b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend 28 qwen2.5-7b full 4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 1 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 2 1*节点 & 8*Ascend 29 qwen2.5-14b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend lora TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 4 1*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 2 1*节点 & 8*Ascend 30 qwen2.5-32b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2 2*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 4 2*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 1 2*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2 2*节点 & 8*Ascend 31 qwen2.5-72b full 4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 1 4*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4 4*节点 & 8*Ascend full 8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 1 8*节点 & 8*Ascend lora TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 2 4*节点 & 8*Ascend 32 llama3.2 llama3.2-1b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend 33 llama3.2-3b full 4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 2 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 2 1*节点 & 4*Ascend full 8192 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 1 1*节点 & 4*Ascend lora TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 1 1*节点 & 4*Ascend
  • 步骤一:资源下载 Python依赖包下载:进入 scripts/install.sh 文件中,找到需要安装的pip文件,如下列所示。直接下载pip文件,注意:下载要求的版本。 pip install numpy==1.22.0 \ transformers_stream_generator==0.0.5 \ ... 代码下载:访问 scripts/install.sh 文件中,找到需要git clone的文件,如下列所示。运行git clone命令,并git checkout切换到指定的版本。注意:针对Megatron-LM下载完成后,需要将megatron文件夹复制至ModelLink中。 git clone https://gitee.com/ascend/ModelLink.git cd ModelLink git checkout 8f50777 cd .. git clone https://gitee.com/lmzwhu/Megatron-LM.git cd Megatron-LM git checkout -f core_r0.6.0 cp -r megatron ../ModelLink/ cd .. git clone https://gitee.com/ascend/MindSpeed.git cd MindSpeed git checkout 4ea42a23 cd .. 完整的源码目录结构如下: |——AscendCloud-LLM |──llm_train # 模型训练代码包 |──AscendSpeed # 基于AscendSpeed的训练代码 |──ascendcloud_patch/ # 针对昇腾云平台适配的功能补丁包 |──scripts/ # 训练需要的启动脚本 |——src/ # 启动命令行封装脚本,在install.sh里面自动构建 |──Megatron-LM/ # 适配昇腾的Megatron-LM训练框架 |──MindSpeed/ # MindSpeed昇腾大模型加速库 |──ModelLink/ # ModelLink端到端的大语言模型方案 |——megatron/ # 注意:该文件夹从Megatron-LM中复制得到 |——...
  • 步骤二:资源安装 将资源上传至机器中,确保容器能够访问,并进入已创建的容器。 Python依赖包本地安装:进入pip文件所在的路径,并运行安装命令。如下列所示。 pip install numpy pip install transformers_stream_generator ... 代码安装:访问 scripts/install.sh 文件,在最后执行的命令中需要分别进入ModelLink、MindSpeed、AscendSpeed目录,并运行以下命令。其中${INSTALL_DIR}为AscendSpeed所在路径。 cd ${INSTALL_DIR}/ModelLink pip install -e . cd ${INSTALL_DIR}/MindSpeed pip3 install -e . cd ${INSTALL_DIR} pip install -e .
  • 模型推荐的参数与NPU卡数设置 不同模型推荐的训练参数和计算规格要求如表2所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。 表2 不同模型推荐的参数与NPU卡数设置 序号 支持模型 支持模型参数量 文本序列长度 并行参数设置 规格与节点数 1 llama2 llama2-7b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 2 llama2-13b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 3 llama2-70b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 4 llama3 llama3-8b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 5 llama3-70b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 6 Qwen qwen-7b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 7 qwen-14b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 8 qwen-72b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 9 Qwen1.5 qwen1.5-7b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 10 qwen1.5-14b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 11 qwen1.5-32b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend 12 qwen1.5-72b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 13 Yi yi-6b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 14 yi-34b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 2*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend 15 ChatGLMv3 glm3-6b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 16 Baichuan2 baichuan2-13b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 17 Qwen2 qwen2-0.5b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend 18 qwen2-1.5b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend 19 qwen2-7b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 20 qwen2-72b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 21 GLMv4 glm4-9b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 22 mistral mistral-7b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 23 mixtral mixtral-8x7b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 2*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 2*节点 & 8*Ascend
  • ChatGLMv3-6B 在训练开始前,针对ChatGLMv3-6B模型中的tokenizer文件,需要修改代码。修改文件chatglm3-6b/tokenization_chatglm.py 。 文件最后几处代码中需要修改,具体位置可根据上下文代码信息进行查找,修改后如图2所示。 图2 修改ChatGLMv3-6B tokenizer文件 图3 修改ChatGLMv3-6B tokenizer文件
  • Yi模型 在使用Yi模型的chat版本时,由于transformer 4.38版本的bug,导致在读取tokenizer文件时,加载的vocab_size出现类似如下尺寸不匹配的问题。 RuntimeError: Error(s) in loading state_dict for VocabParallelEmbedding: size mismatch for weight: copying a param with shape torch.Size([64000, 4096]) from checkpoint, the shape in current model is torch.Size([63992, 4096]). 需要在训练开始前,修改llm_train/AscendFactory/yi/3_training.sh文件,并添加--tokenizer-not-use-fast参数。修改后如图1所示。 图1 修改Yi 模型3_training.sh文件
  • Yi模型 在使用Yi模型的chat版本时,由于transformer 4.38版本的bug,导致在读取tokenizer文件时,加载的vocab_size出现类似如下尺寸不匹配的问题。 RuntimeError: Error(s) in loading state_dict for VocabParallelEmbedding: size mismatch for weight: copying a param with shape torch.Size([64000, 4096]) from checkpoint, the shape in current model is torch.Size([63992, 4096]). 需要在训练开始前,修改llm_train/AscendFactory/yi/3_training.sh文件,并添加--tokenizer-not-use-fast参数。修改后如图1所示。 图1 修改Yi模型3_training.sh文件
  • ChatGLMv3-6B 在训练开始前,针对ChatGLMv3-6B模型中的tokenizer文件,需要修改代码。修改文件chatglm3-6b/tokenization_chatglm.py 。 文件最后几处代码中需要修改,具体位置可根据上下文代码信息进行查找,修改后如图2所示。 图2 修改ChatGLMv3-6B tokenizer文件 图3 修改ChatGLMv3-6B tokenizer文件