数据仓库服务 GAUSSDB(DWS)-GaussDB(DWS)查询时结果不一致的常见场景和解决方法:使用string_agg

时间:2024-04-30 17:23:48

使用string_agg

场景使用string_agg查询表employee,出现查询结果不一致。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
SELECT * FROM employee;
 empno | ename  |   job   | mgr  |      hiredate       |  sal  | comm | deptno 
-------+--------+---------+------+---------------------+-------+------+--------
  7654 | MARTIN | SALEMAN | 7698 | 2022-11-08 00:00:00 | 12000 | 1400 |     30
  7566 | JONES  | MANAGER | 7839 | 2022-11-08 00:00:00 | 32000 |    0 |     20
  7499 | ALLEN  | SALEMAN | 7698 | 2022-11-08 00:00:00 | 16000 |  300 |     30
(3 rows)

SELECT count(*) FROM (select deptno, string_agg(ename, ',') from employee group by deptno) t1, (select deptno, string_agg(ename, ',') from employee group by deptno) t2 where t1.string_agg = t2.string_agg;
 count 
-------
     2
(1 row)

SELECT count(*) FROM (select deptno, string_agg(ename, ',') from employee group by deptno) t1, (select deptno, string_agg(ename, ',') from employee group by deptno) t2 where t1.string_agg = t2.string_agg;
 count 
-------
     1
(1 row)

原因分析:

String_agg函数的作用是将组内的数据合并成一行,但是如果某用户的用法是string_agg(ename, ',') ,结果集就是不稳定的,因为没有指定组合的顺序。例如,上述语句中,对于select deptno, string_agg(ename, ',') from employee group by deptno;

输出结果既可以是:

1
30 | ALLEN,MARTIN

也可能是:

1
30 |MARTIN,ALLEN

两个结果都是合理的,因此上述关联场景下,有可能出现t1这个subquery中的结果和t2这个subquery中的结果对于deptno=30时的输出结果不一致。

解决方法:

String_agg中增加order by排序,保证按顺序拼接。

1
SELECT count(*) FROM (select deptno, string_agg(ename, ',' order by ename desc) from employee group by deptno) t1 ,(select deptno, string_agg(ename, ',' order by ename desc) from employee group by deptno) t2 where t1.string_agg = t2.string_agg;
support.huaweicloud.com/dws_faq/dws_03_2107.html