云搜索服务 CSS-配置模型服务:管理模型服务
管理模型服务
搜索大模型插件深度集成Kibana命令行界面(CLI),支持对模型服务进行更新、监控、扩缩容等全生命周期管理。如表4所示,可以通过标准CLI命令执行更新(update)、删除(delete)等核心操作管理模型服务。
操作类型 |
API命令 |
请求示例 |
响应示例 |
---|---|---|---|
更新模型服务 |
POST _inference/model_service/{service_name}/update |
更新Embedding模型服务: POST _inference/model_service/pangu_vector/update
{
"description": "搜索大模型-语义向量化模型更新",
"service_config": {
"semantic_vector": {
"service_urls": ["http://{endpoint}/app/search/v1/vector"],
"timeout_ms": 60000
}
}
} |
返回更新后的模型服务信息: {
"service_name" : "pangu_vector",
"service_type" : "remote",
"description" : "搜索大模型-语义向量化模型更新",
"create_time" : 1747966388508,
"service_config" : {
"semantic_vector" : {
"embedding_type" : "query2doc",
"service_urls" : [
"http://{endpoint}/app/search/v1/vector"],
"method" : "POST",
"timeout_ms" : 60000,
"max_conn" : 200,
"security" : false,
"dimension" : "768",
"algorithm" : "GRAPH",
"metric" : "inner_product"
}
}
} |
检查模型服务连通性 |
GET _inference/model_service/{service_name}/check |
检查Embedding模型服务的连通性: GET _inference/model_service/pangu_vector/check |
{ "acknowledged" : true } |
查看模型服务 |
|
查看Embedding模型服务的配置信息: GET _inference/model_service/pangu_vector |
返回模型服务信息: {
"count" : 1,
"model_service_configs" : [
{
"service_name" : "pangu_vector",
"service_type" : "remote",
"description" : "搜索大模型-语义向量化模型",
"create_time" : 1747966388508,
"service_config" : {
"semantic_vector" : {
"embedding_type" : "query2doc",
"service_urls" : ["http://{endpoint}/app/search/v1/vector"],
"method" : "POST",
"timeout_ms" : 60000,
"max_conn" : 200,
"security" : false,
"dimension" : "768",
"algorithm" : "GRAPH",
"metric" : "inner_product"
}
}
}
]
} |
删除模型服务配置(删除后,索引将无法使用该模型服务) |
DELETE _inference/model_service/{service_name} |
删除Embedding模型服务配置: DELETE _inference/model_service/pangu_vector |
{ "acknowledged" : true } |
设置模型服务的数量上限(最多支持创建几个模型服务) |
PUT _cluster/settings { "transient": { "pg_search.inference.max_inference_model_service": 100 //最大值是1000,最小值是1,默认值是100。 } } |
设置模型服务的数量上限为10: PUT _cluster/settings { "transient": { "pg_search.inference.max_inference_model_service": 10 } } |
{ "acknowledged" : true, "persistent" : { }, "transient" : { "pg_search" : { "inference" : { "max_inference_model_service" : "10" } } } } |