数据仓库服务 GAUSSDB(DWS)-DWS查询时结果不一致的常见场景和解决方法:使用string_agg
使用string_agg
场景:使用string_agg查询表employee,出现查询结果不一致。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
SELECT * FROM employee; empno | ename | job | mgr | hiredate | sal | comm | deptno -------+--------+---------+------+---------------------+-------+------+-------- 7654 | MARTIN | SALEMAN | 7698 | 2022-11-08 00:00:00 | 12000 | 1400 | 30 7566 | JONES | MANAGER | 7839 | 2022-11-08 00:00:00 | 32000 | 0 | 20 7499 | ALLEN | SALEMAN | 7698 | 2022-11-08 00:00:00 | 16000 | 300 | 30 (3 rows) SELECT count(*) FROM (select deptno, string_agg(ename, ',') from employee group by deptno) t1, (select deptno, string_agg(ename, ',') from employee group by deptno) t2 where t1.string_agg = t2.string_agg; count ------- 2 (1 row) SELECT count(*) FROM (select deptno, string_agg(ename, ',') from employee group by deptno) t1, (select deptno, string_agg(ename, ',') from employee group by deptno) t2 where t1.string_agg = t2.string_agg; count ------- 1 (1 row) |
原因分析:
String_agg函数的作用是将组内的数据合并成一行,但是如果某用户的用法是string_agg(ename, ',') ,结果集就是不稳定的,因为没有指定组合的顺序。例如,上述语句中,对于select deptno, string_agg(ename, ',') from employee group by deptno;
输出结果既可以是:
1 |
30 | ALLEN,MARTIN |
也可能是:
1 |
30 |MARTIN,ALLEN |
两个结果都是合理的,因此上述关联场景下,有可能出现t1这个subquery中的结果和t2这个subquery中的结果对于deptno=30时的输出结果不一致。
解决方法:
String_agg中增加order by排序,保证按顺序拼接。
1 |
SELECT count(*) FROM (select deptno, string_agg(ename, ',' order by ename desc) from employee group by deptno) t1 ,(select deptno, string_agg(ename, ',' order by ename desc) from employee group by deptno) t2 where t1.string_agg = t2.string_agg; |