全文检索-华为云

数据仓库服务 GAUSSDB(DWS)-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 1 2 3 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 1 2 3 4 5 CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1 2 3 4 ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem SET default_text_search_config = 'public.ts_conf'; SET SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

数据仓库服务 GAUSSDB(DWS) 全文检索

数据仓库服务 GAUSSDB(DWS)-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 1 2 3 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 1 2 3 4 5 CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1 2 3 4 ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem SET default_text_search_config = 'public.ts_conf'; SET SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

数据仓库服务 GAUSSDB(DWS) 全文检索

云数据库 GAUSSDB-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 gaussdb=# CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件gs_dict.syn内容如下： 1 2 gaussdb gs gauss gs 执行如下语句创建Synonym词典： 1 2 3 4 5 gaussdb=# CREATE TEXT SEARCH DICTIONARY gs_dict ( TEMPLATE = synonym, SYNONYMS = gs_dict, FILEPATH = 'file:///home/dicts' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 gaussdb=# CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'file:///home/dicts' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1 2 3 4 gaussdb=# ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH gs_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 gaussdb=# ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 gaussdb=# SELECT * FROM ts_debug('ts_conf', ' GaussDB, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 gaussdb=# \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | gs_dict,english_ispell,english_stem asciiword | gs_dict,english_ispell,english_stem file | simple host | simple hword | gs_dict,english_ispell,english_stem hword_asciipart | gs_dict,english_ispell,english_stem hword_numpart | simple hword_part | gs_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | gs_dict,english_ispell,english_stem gaussdb=# SET default_text_search_config = 'public.ts_conf'; SET gaussdb=# SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

云数据库 GAUSSDB 全文检索

云数据库 GAUSSDB-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 postgres=# CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 1 2 3 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 1 2 3 4 5 postgres=# CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'file:///home/dicts' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 postgres=# CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'file:///home/dicts' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1 2 3 4 postgres=# ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 postgres=# ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 postgres=# SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 postgres=# \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem postgres=# SET default_text_search_config = 'public.ts_conf'; SET postgres=# SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

云数据库 GAUSSDB 全文检索

云数据库 GAUSSDB-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 openGauss=# CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 1 2 3 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 1 2 3 4 5 openGauss=# CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'file:///home/dicts' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 openGauss=# CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'file:///home/dicts' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1 2 3 4 openGauss=# ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 openGauss=# ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 openGauss=# SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 openGauss=# \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem openGauss=# SET default_text_search_config = 'public.ts_conf'; SET openGauss=# SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

云数据库 GAUSSDB 全文检索

云数据库 GAUSSDB-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 openGauss=# CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 1 2 3 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 1 2 3 4 5 openGauss=# CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'file:///home/dicts' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 openGauss=# CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'file:///home/dicts' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1 2 3 4 openGauss=# ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 openGauss=# ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 openGauss=# SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 openGauss=# \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem openGauss=# SET default_text_search_config = 'public.ts_conf'; SET openGauss=# SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

云数据库 GAUSSDB 全文检索

数据仓库服务 GAUSSDB(DWS)-文本搜索配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 1 2 CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 1 2 3 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 1 2 3 4 5 CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1 2 3 4 5 6 7 CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见文本搜索解析器。 1 2 3 4 ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 1 2 ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 1 2 3 4 5 SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. '); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 \dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem SET default_text_search_config = 'public.ts_conf'; SET SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)

数据仓库服务 GAUSSDB(DWS) 全文检索

表格存储服务 CloudTable-HBase Elasticsearch schema说明

HBase Elasticsearch schema说明 HBase通过表的METADATA来存储Elasticsearch schema的定义：表1 schema定义字段名称 value说明是否必填 hbase.index.es.enabled 该HBase表在Elasticsearch中是否创建全文索引，true表示创建，默认为false。是 hbase.index.es.endpoint 云搜索服务集群（Elasticsearch引擎）的访问地址，例如'ip1:port,ip2:port'。是 hbase.index.es.indexname HBase表对应在Elasticsearch中的索引名称，必须小写。是 hbase.index.es.shards Elasticsearch中索引的分片数量，默认5。取值为大于等于1的整数。否 hbase.index.es.replicas Elasticsearch中的索引的副本数量，默认1。取值为大于等于0的整数。否 hbase.index.es.schema HBase和Elasticsearch的字段映射，json数组格式的字符，每个元素包含以下字段： name：Elasticsearch中的字段名称。 type：Elasticsearch中的字段类型。 hbaseQualifier：数据源HBase qualifier。 analyzer：text类型的字段通过“analyzer”可以指定分词器。中文分词器一般使用“ik_smart”。默认是“Standard”分词器，支持英文。例如： '[ {"name":"contentCh","type":"text","hbaseQualifier":"cf1:contentCh","analyzer":"ik_smart"}, {"name":"contentEng","type":"text","hbaseQualifier":"cf2:contentEng"},{"name":"id","type":"long","hbaseQualifier":"cf1:id"} ]' 是 HBase-Elasticsearch全文检索当前支持的数据类型有{"text", "long", "integer", "short", "byte", "double", "float","boolean"}，也就是schema中type的取值类型。text是Elasticsearch中的文本类型。全文检索一般是指对text类型数据的检索，同时也支持基本数据类型的准确检索。父主题：开发HBase Elasticsearch全文检索应用

表格存储服务 CloudTable 开发HBase Elasticsearch全文检索应用

表格存储服务 CloudTable-全文检索概述:原理

原理 CloudTable作为大数据存储服务，用户数据以Byte类型存储，同时提供高效的kv随机查询能力。在此基础上，用户根据自身业务需求，自定义schema来指定部分字段的数据类型（一般是文本类型）来扩展CloudTable的全文检索能力。CloudTable服务是计算和存储分离，数据存储易扩容和低成本，适合作为海量源数据（任何数据类型）的主存储系统，云搜索服务（Elasticsearch）保留轻量级的索引数据来支持关键词检索。如下图所示：图1 原理图用户创建HBase表时如果定义部分字段开启全文索引，HBase写入数据时会自动同步全文索引数据到云搜索服务中，同时HBase原生的数据读取接口scan在kv读取能力上，也支持常用的全文检索能力。对于复杂的高阶检索能力，用户可以先调用Elasticsearch的接口再调用CloudTable的读接口来组合完成业务逻辑。

表格存储服务 CloudTable HBase Elasticsearch全文检索

表格存储服务 CloudTable-全文检索概述:使用场景

使用场景用户业务数据海量，需要HBase作为大数据在线存储系统提供最基础的高效高并发低时延KV查询能力。同时，数据的字段种类、数量众多，即对应的业务多样化。比如某表的一行数据，部分文本字段需要用关键词来全文检索，部分字段是二级索引，以及部分字段应用于标签位图索引。此场景适合CloudTable服务开启Elasticsearch全文检索能力，同时也保留其他业务扩展能力。例如：搜索网站，实时存储海量用户的搜索词条信息、用户环境信息以及基本信息，并按照商品关键词提取用户信息，信息立即转售给第三方电商平台。智慧医院的智能病例系统，存储病人就医信息，信息包括病人的基本信息、身体当前状态信息、医生当前职业信息、病情描述、诊断结果描述、服用药品等信息。医院信息平台根据当前社会的疫情、违禁药品、或技术突破等关键词，统计或查找历史就医的病人，回访病人或者联系病人使用新技术二次诊断等创新型贴心业务。政府智能舆情治理系统，海量存储主流媒体平台用户的社会性言论和用户信息、转发次数等数据。检索出当前的热点事件，如果是造谣事件，系统自动提醒用户当前事件的真实性、用户发表/转发的社会影响性数据、以及相关的法律条文和类似案件。智能的反馈机制给造谣用户起到震慑作用，引导良好的言论风气。

表格存储服务 CloudTable HBase Elasticsearch全文检索

表格存储服务 CloudTable-全文检索概述:HBase Elasticsearch schema定义说明

HBase Elasticsearch schema定义说明 HBase通过表的METADATA来存储Elasticsearch schema的定义：表1 schema定义字段名称 value说明是否必填 hbase.index.es.enabled 该HBase表在Elasticsearch中是否创建全文索引，true表示创建，默认为false。是 hbase.index.es.endpoint 云搜索服务集群（Elasticsearch引擎）的访问地址，例如'ip1:port,ip2:port'。是 hbase.index.es.indexname HBase表对应在Elasticsearch中的索引名称，必须小写。是 hbase.index.es.shards Elasticsearch中索引的分片数量，默认5。取值为大于等于1的整数。否 hbase.index.es.replicas Elasticsearch中的索引的副本数量，默认1。取值为大于等于0的整数。否 hbase.index.es.schema HBase和Elasticsearch的字段映射，json数组格式的字符，每个元素包含以下字段： name：Elasticsearch中的字段名称。 type：Elasticsearch中的字段类型。 hbaseQualifier：数据源HBase qualifier。 analyzer：text类型的字段通过“analyzer”可以指定分词器。中文分词器一般使用“ik_smart”。默认是“Standard”分词器，支持英文。例如： '[ {"name":"contentCh","type":"text","hbaseQualifier":"cf1:contentCh","analyzer":"ik_smart"}, {"name":"contentEng","type":"text","hbaseQualifier":"cf2:contentEng"},{"name":"id","type":"long","hbaseQualifier":"cf1:id"} ]' 是 HBase-Elasticsearch全文检索当前支持的数据类型有{"text", "long", "integer", "short", "byte", "double", "float","boolean"}，也就是schema中type的取值类型。text是Elasticsearch中的文本类型。全文检索一般是指对text类型数据的检索，同时也支持基本数据类型的准确检索。

表格存储服务 CloudTable HBase Elasticsearch全文检索

数据仓库服务 GaussDB(DWS)-配置示例:操作步骤

操作步骤创建一个文本搜索配置ts_conf，复制预定义的文本搜索配置english。 12 CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english );CREATE TEXT SEARCH CONFIGURATION 创建Synonym词典。假设同义词词典定义文件pg_dict.syn内容如下： 123 postgres pg pgsql pg postgresql pg 执行如下语句创建Synonym词典： 12345 CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1' ); 创建一个Ispell词典english_ispell（词典定义文件来自开源词典）。 1234567 CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=cn-north-1'); 设置文本搜索配置ts_conf，修改某些类型的token对应的词典列表。关于token类型的详细信息，请参见解析器。 1234 ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem; 在文本搜索配置中，选择设置不索引或搜索某些token类型。 12 ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float; 使用文本检索调测函数ts_debug()对所创建的词典配置ts_conf进行测试。 12345 SELECT * FROM ts_debug('ts_conf', 'PostgreSQL, the highly scalable, SQL compliant, open source object-relationaldatabase management system, is now undergoing beta testing of the nextversion of our software.'); 可以设置当前session使用ts_conf作为默认的文本搜索配置。此设置仅在当前session有效。 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627 \dF+ ts_conf Text search configuration "public.ts_conf"Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stemSET default_text_search_config = 'public.ts_conf';SETSHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf(1 row)

数据仓库服务 GaussDB(DWS) 全文检索

云服务器内容精选

全文检索

7*24

备案

专业服务

退订

建议反馈

售前咨询热线