命令示例 本节以Windows为例介绍eihealth-toolkit的使用过程,Linux和macOS环境使用方法基本相同,可参考。 使用health get job -s命令获取模板,详细的模板介绍和使用请参见获取作业模板。 获取作业详情,以模板方式展示。 health get job 000c6057-cc6c-11ed-bbec-fa163ef30f89
job:
id: 000c6057-cc6c-11ed-bbec-fa163ef30f89
name: job-7402
description: ""
priority: 0
timeout: 1440
output_dir: /job-7402-de91a3e0-076c-4327-a41c-8e88c7aec6ae
workflow_id: f1af14bb-cc69-11ed-bbec-fa163ef30f89
io_acc_id: ""
node_labels: []
tasks:
- task_name: task-1-test-echo
inputs: []
resources:
cpu: 0.1C
memory: 0.1G
gpu: "0"
tool_type: workflow
tool_id: f1af14bb-cc69-11ed-bbec-fa163ef30f89
labels: []
获取作业详情,以json方式展示。 health get job f17a3542-3f7c-11eb-868a-fa163e3ddba1 --detail
{
"jobs": [{
"id": "2",
"name": "zx-1030-mkdir",
"description": "测试文件创建",
"priority": 0,
"timeout": 1440,
"output_dir": "",
"status": "SUCCEEDED",
"create_time": "2021-01-20T03:38:14Z",
"finish_time": "2021-01-20T03:43:23Z",
"tool_info": {
"tool_id": "",
"tool_name": "",
"tool_version": "",
"tool_type": ""
},
"tasks": [{
"task_name": "task0",
"display_name": "",
"output_dir": "",
"whole_output_dir": "",
"resources": {
"cpu": "0.1C",
"memory": "0.1G",
"gpu_type": "",
"gpu": "0"
},
"inputs": [{
"name": "in-dir",
"values": [
"ei_eihealth_x00356764_02:/zx-1030/"
]
},
{
"name": "in-str",
"values": [
"mkdir1030"
]
}
],
"app_info": {
"app_id": "2",
"app_name": "zx-1030-mkdir",
"app_version": "1.0.0",
"app_src_project_name": "",
"app_labels": [],
"app_summary": "",
"app_description": "",
"app_image": "ei_eihealth_x00356764_02/modelarts-base-cpu-py3:custom-2.0.2",
"app_commands": [
"mkdir ${in-dir}${in-str}"
],
"app_input_parameters": [{
"name": "in-dir",
"pattern": "",
"type": "DIRECTORY",
"required": true,
"description": ""
},
{
"name": "in-str",
"pattern": "",
"type": "STRING",
"required": true,
"description": ""
}
],
"app_output_parameters": []
}
}],
"task_runtime_info": [{
"task_name": "task0",
"status": "SUCCEEDED",
"create_time": "2021-01-20 11:38:22",
"finish_time": "2021-01-20 11:43:22",
"run_time": "5m0s"
}],
"dag": {
"task0": {}
},
"io_acc_expected_usage": 10,
"io_acc_info": {
"id": "35673038-d57b-4dab-942a-72cf3e11e7df",
"type": "IO_PERFORMANCE_BANDW
IDT H",
"space": 500,
"free_space": 500.0
}
}],
"count": 1
} 获取作业列表。 health get job #不带任何参数默认获取100条
job_id job_name tool_name tool_version tool_type status user_name create_time finish_time labels
4b682e15-ab92-11ee-a057-fa163ef319da cli-demo-job cp-test 2.0.0 workflow PENDING wwx-test-admin 2024-01-05 14:18:51 --
e7e55c6e-aaf6-11ee-a057-fa163ef319da cli-demo-job-import cli-demo-workflow 4.0.0 workflow FAILED wwx-test-admin 2024-01-04 19:46:32 2024-01-04 19:47:50
aee9e91a-aaf6-11ee-a057-fa163ef319da job-6685 cli-demo-workflow 4.0.0 workflow FAILED wwx-test-admin 2024-01-04 19:44:56 2024-01-04 19:45:50
58a8f13b-aaf3-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow FAILED wwx-test-admin 2024-01-04 19:21:03 2024-01-04 19:23:54
35ff73b3-aaf3-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-04 19:20:05 2024-01-04 19:24:52
24b72eee-aaf3-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-04 19:19:36 2024-01-04 19:25:10
4ccef1fb-aaf2-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-04 19:13:34 2024-01-04 19:17:34
health get job -j cli-demo-job
job_id job_name tool_name tool_version tool_type status user_name create_time finish_time labels
70f1baa8-ab96-11ee-a057-fa163ef319da cli-demo-job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-05 14:48:32 2024-01-05 14:55:13
6c6098f0-ab96-11ee-a057-fa163ef319da cli-demo-job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-05 14:48:24 2024-01-05 14:54:25
health get job -l 3 同 health get job -l 3 -o 0
#列出当前project的job的基本信息
#表示取3条数据,也就是取1-3 条数据
health get job -o 10 同 health get job -l 100 -o 10
#列出当前project的job的基本信息
#表示取100条数据,也就是取11-110 100 条数据
health get job -l 10 -o 3
#列出当前project的job的基本信息
#表示跳过3条数据,从第4条数据开始取,取10条数据,也就是取4-13 10条数据 获取作业事件。 health get job 550e8400-e29b-41d4-a716-446655440000 --event
------------------------------------------------------------------------------------------------------------------------
成功关联执行器
2024-01-05 14:18:51
------------------------------------------------------------------------------------------------------------------------
执行 create, 共计 1 个子任务
2024-01-05 14:18:51
------------------------------------------------------------------------------------------------------------------------
执行 create, 共计 1 个子任务
2024-01-05 14:18:51
------------------------------------------------------------------------------------------------------------------------
创建k8s Job对象 task-3-two-cp-0-bd5e1f7dac10005f 成功.
2024-01-05 14:18:51
------------------------------------------------------------------------------------------------------------------------
等待任务 task-3-two-cp-0-bd5e1f7dac10005f 执行完成
2024-01-05 14:18:56
------------------------------------------------------------------------------------------------------------------------
元素(task-3-two-cp-0)第1次重试执行(create),当前异常:Failed to wait the Job(task-3-two-cp-0-bd5e1f7dac10005f) has desiredReplicas: the pod list of job:task-3-two-cp-0-bd5e1f7dac10005f is empty .
2024-01-05 14:18:51
------------------------------------------------------------------------------------------------------------------------
创建k8s Job对象 task-2-cp-dir-0-bd5e1f7dac10005f 成功.
2024-01-05 14:18:56
------------------------------------------------------------------------------------------------------------------------ 获取作业某一task事件。 health get job 550e8400-e29b-41d4-a716-446655440000 --event --task task-lmx-job-1
Task event list:
Status Times Type Details First Report Time Last Report Time
SuccessfulCreate 1 Normal Created pod: task-1-rename-0-1b840133ac100049-hkppv 2022-05-24 18:04:55 2022-05-24 18:04:55
JobIsComplete 1 Normal Pod exits with success, the job is complete 2022-05-24 18:07:09 2022-05-24 18:07:09
Task instances list:
Name Status PodIP Node RestartCount Request/Limit(CPU) Request/Limit(Memory) CreateTime
task-1-rename-0-1b840133ac100049-hkppv Succeeded 172.16.1.20 192.168.125.40 0 / / 2022-05-24T10:04:55Z 获取并发task的实例事件。 health get job c5b3d272-f398-11ec-845a-fa163ef3fac0 --task task-1-test-bingfasmial;1 --event
Task event list:
Status Times Type Details First Report Time Last Report Time
SuccessfulCreate 1 Normal Created pod: task-1-test-bingfasmial-1-59620029ac100038-jkdpt 2022-06-24 16:37:20 2022-06-24 16:37:20
JobIsComplete 1 Normal Pod exits with success, the job is complete 2022-06-24 16:37:23 2022-06-24 16:37:23
Task instances list:
Name PodIP Node RestartCount Request/Limit(CPU) Request/Limit(Memory) CreateTime
task-1-test-bingfasmial-1-59620029ac100038-jkdpt 172.16.3.37 192.168.54.255 0 1/1 1G/1G 2022-06-24 16:37:20 获取作业某一task日志。 health get job 550e8400-e29b-41d4-a716-446655440000 --log ./test/demo.log --task task-xxx-job-1
download the log of task task-lmx-job-1 successfully! 获取作业列表。 health get job --status Failed --user-name ei_eihealth --create-from-time "2022-12-15 00:40:11" --create-to-time "2022-12-17 00:40:11" --finish-from-time "2022-12-14 17:05:09" --finish-to-time "2022-12-19 23:04:07" --labels "label1,lab_el-A"
--job-name h-err-1 --workflow-name herr --limit 1 --offset 1
job_id job_name tool_name tool_version tool_type status user_name create_time finish_time labels
8a6078d9-c307-11ed-a824-fa163e504fdd job-4127-01 new-01 wewe workflow FAILED ei_eihealth_h00541446_01 2023-03-15 16:01:07 2023-03-15 16:02:51 label1,lab_el-A