MAPREDUCE服务 MRS-FlinkServer作业对接Hudi表:FlinkSQL Lookup Join Hudi使用须知

时间:2025-06-10 14:43:26

FlinkSQL Lookup Join Hudi使用须知

适用于 MRS 3.5.0及以后版本。

  • 使用lookup.join.cache.ttl参数来控制维表数据的加载周期,默认值为60min。
  • Hudi维表数据会被加载到Flink TaskManager Heap中,所以不推荐大于10万行记录的Hudi表作为维表。
  • 维表的新增、更新数据需要等到下一次加载周期后,才能被加载进来参与计算。
SQL示例如下:
CREATE TABLE hudimor(
  uuid VARCHAR(20),
  name VARCHAR(10),
  age INT,
  ts INT,
  `p` VARCHAR(20),
  PRIMARY KEY (uuid) NOT ENFORCED
) PARTITIONED BY (`p`) WITH (
  'connector' = 'hudi',
  'path' = 'hdfs://hacluster/tmp/hudimor',
  'table.type' = 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field' = 'uuid',
  'write.precombine.field' = 'ts',
  'lookup.join.cache.ttl' = '60min'
);
CREATE TABLE datagen(uuid varchar(20), proctime as PROCTIME()) WITH (
  'connector' = 'datagen',
  'rows-per-second' = '1'
);
CREATE TABLE blackhole (
  uuid VARCHAR(20),
  name VARCHAR(10),
  age INT,
  ts INT,
  `p` VARCHAR(20)
) WITH ('connector' = 'blackhole');
insert into
  blackhole
select
  t1.uuid as uuid,
  t2.name as name,
  t2.age as age,
  t2.ts as ts,
  t2.p as p
FROM
  datagen AS t1
  left JOIN hudimor FOR SYSTEM_TIME AS OF t1.proctime AS t2 ON t1.uuid = t2.uuid;
support.huaweicloud.com/cmpntguide-lts-mrs/mrs_01_24180.html
提示

您即将访问非华为云网站,请注意账号财产安全