数据湖探索 DLI-窗口:OVER WINDOW
OVER WINDOW
Over Window与Group Window区别在于Over window每一行都会输出一条记录。
语法格式
1 2 3 4 5 6 7 8 9 10 11 |
SELECT agg1(attr1) OVER ( [PARTITION BY partition_name] ORDER BY proctime|rowtime ROWS BETWEEN (UNBOUNDED|rowCOUNT) PRECEDING AND CURRENT ROW FROM TABLENAME SELECT agg1(attr1) OVER ( [PARTITION BY partition_name] ORDER BY proctime|rowtime RANGE BETWEEN (UNBOUNDED|timeInterval) PRECEDING AND CURRENT ROW FROM TABLENAME |
语法说明
参数 |
参数说明 |
---|---|
PARTITION BY |
指定分组的主键,每个分组各自进行计算。 |
ORDER BY |
指定数据按processing time或event time作为时间戳。 |
ROWS |
个数窗口。 |
RANGE |
时间窗口。 |
注意事项
- 所有的聚合必须定义到同一个窗口中,即相同的分区、排序和区间。
- 当前仅支持 PRECEDING (无界或有界) 到 CURRENT ROW 范围内的窗口、FOLLOWING 所描述的区间并未支持。
- ORDER BY 必须指定于单个的时间属性。
示例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
// 计算从规则启动到目前为止的计数及总和(in proctime) insert into temp SELECT name, count(amount) OVER (PARTITION BY name ORDER BY proctime RANGE UNBOUNDED preceding) as cnt1, sum(amount) OVER (PARTITION BY name ORDER BY proctime RANGE UNBOUNDED preceding) as cnt2 FROM Orders; // 计算最近四条记录的计数及总和(in proctime) insert into temp SELECT name, count(amount) OVER (PARTITION BY name ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as cnt1, sum(amount) OVER (PARTITION BY name ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as cnt2 FROM Orders; // 计算最近60s的计数及总和(in eventtime),基于事件时间处理,事件时间为Orders中的timeattr字段。 insert into temp SELECT name, count(amount) OVER (PARTITION BY name ORDER BY timeattr RANGE BETWEEN INTERVAL '60' SECOND PRECEDING AND CURRENT ROW) as cnt1, sum(amount) OVER (PARTITION BY name ORDER BY timeattr RANGE BETWEEN INTERVAL '60' SECOND PRECEDING AND CURRENT ROW) as cnt2 FROM Orders; |