华为云用户手册

MapReduce服务 MRS-HBase数据读写样例程序开发思路:开发思路

开发思路根据上述的业务场景进行功能分解，需要开发的功能点如表2所示。表2 在HBase中开发的功能序号步骤代码实现 1 根据表1中的信息创建表。请参见创建HBase表。 2 导入用户数据。请参见向HBase表中插入数据。 3 增加“教育信息”列族，在用户信息中新增用户的学历、职称等信息。请参见修改HBase表。 4 根据用户编号查询用户姓名和地址。请参见使用Get API读取HBase表数据。 5 根据用户姓名进行查询。请参见使用Filter过滤器读取HBase表数据。 6 为提升查询性能，创建二级索引或者删除二级索引。请参见创建HBase表二级索引和基于二级索引查询HBase表数据。 7 用户销户，删除用户信息表中该用户的数据。请参见删除HBase表数据。 8 A业务结束后，删除用户信息表。请参见删除HBase表。

MapReduce服务 MRS
MapReduce服务 MRS-Flink Client CLI介绍:注意事项

注意事项如果yarn-session.sh使用-z配置特定的zookeeper的namespace，则在使用flink run时必须使用-yid指出applicationID，使用-yz指出zookeeper的namespace，前后namespace保持一致。举例： bin/yarn-session.sh -z YARN101 bin/flink run -yid application_****_**** -yz YARN101 examples/streaming/WindowJoin.jar

MapReduce服务 MRS
MapReduce服务 MRS-如何处理新创建的Flink用户提交任务报ZooKeeper文件目录权限不足:回答

回答首先查看ZooKeeper中/flink_base的目录权限是否为：'world,'anyone: cdrwa；如果不是，请修改/flink_base的目录权限为：'world,'anyone: cdrwa，然后继续根据步骤二排查；如果是，请根据步骤二排查。由于在Flink配置文件中“high-availability.zookeeper.client.acl”默认为“creator”，即谁创建谁有权限，由于原有用户已经使用ZooKeeper上的/flink_base/flink目录，导致新创建的用户访问不了ZooKeeper上的/flink_base/flink目录。新用户可以通过以下操作来解决问题。查看客户端的配置文件“conf/flink-conf.yaml”。修改配置项“high-availability.zookeeper.path.root”对应的ZooKeeper目录，例如：/flink2。重新提交任务。

MapReduce服务 MRS
MapReduce服务 MRS-Flink HBase样例程序开发思路:开发思路

开发思路写HBase：通过参数指定“hbase-site.xml”文件的父目录，Flink Sink可以获取到HBase的Connection。通过Connection判断表是否存在，如果不存在则创建表。将接收到的数据转化成Put对象，写到HBase。读HBase：通过参数指定“hbase-site.xml”文件的父目录，Flink Source可以获取到HBase的Connection。通过Connection判断表是否存在，如果不存在则作业失败，需要通过HBase Shell创建表或上游作业创建表。读取HBase中的数据，将Result数据转化成Row对象发送给下游算子。

MapReduce服务 MRS
MapReduce服务 MRS-配置HBase应用输出日志:代码样例

代码样例以下为代码示例： hbase.root.logger=INFO,console,RFA //hbase客户端日志输出配置，console：输出到控制台；RFA：输出到日志文件hbase.security.logger=DEBUG,console,RFAS //hbase客户端安全相关的日志输出配置，console：输出到控制台；RFAS：输出到日志文件hbase.log.dir=/var/log/Bigdata/hbase/client/ //日志路径，根据实际路径修改，但目录要有写入权限hbase.log.file=hbase-client.log //日志文件名hbase.log.level=INFO //日志级别，如果需要更详细的日志定位问题，需要修改为DEBUG，修改完需要重启进程才能生效hbase.log.maxbackupindex=20 //最多保存的日志文件数目# Security audit appenderhbase.security.log.file=hbase-client-audit.log //审计日志文件命令

MapReduce服务 MRS
MapReduce服务 MRS-如何处理Flink任务配置State Backend为RocksDB时报错GLIBC版本问题:问题

问题 Flink任务配置State Backend为RocksDB时，运行报如下错误： Caused by: java.lang.UnsatisfiedLinkError: /srv/BigData/hadoop/data1/nm/usercache/***/appcache/application_****/rocksdb-lib-****/librocksdbjni-linux64.so: /lib64/libpthread.so.0: version `GLIBC_2.12` not found (required by /srv/BigData/hadoop/***/librocksdbjni-linux64.so)at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1965) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1890) at java.lang.Runtime.load0(Runtime.java:795) at java.lang.System.load(System.java:1062) at org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(NativeLibraryLoader.java:78)at org.rocksdb.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:56)at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.ensureRocksDBIsLoaded(RocksDBStateBackend.java:734)... 11 more

MapReduce服务 MRS
MapReduce服务 MRS-删除HBase表:代码样例

代码样例以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的dropTable方法中。 public void dropTable() { LOG .info("Entering dropTable."); Admin admin = null; try { admin = conn.getAdmin(); if (admin.tableExists(tableName)) { // Disable the table before deleting it. admin.disableTable(tableName); // Delete table. admin.deleteTable(tableName);//注[1] } LOG.info("Drop table successfully."); } catch (IOException e) { LOG.error("Drop table failed " ,e); } finally { if (admin != null) { try { // Close the Admin object. admin.close(); } catch (IOException e) { LOG.error("Close admin failed " ,e); } } } LOG.info("Exiting dropTable."); }

MapReduce服务 MRS
MapReduce服务 MRS-访问HBase ThriftServer认证:样例代码

样例代码代码认证以下代码在“hbase-thrift-example”样例工程的“com.huawei.bigdata.hbase.examples”包的“TestMain”类中。 private static void init() throws IOException { // Default load from conf directory conf = HBaseConfiguration.create(); String userdir = TestMain.class.getClassLoader().getResource("conf").getPath() + File.separator;[1] //In Linux environment //String userdir = System.getProperty("user.dir") + File.separator + "conf" + File.separator; conf.addResource(new Path(userdir + "core-site.xml"), false); conf.addResource(new Path(userdir + "hdfs-site.xml"), false); conf.addResource(new Path(userdir + "hbase-site.xml"), false);} [1]userdir获取的是编译后资源路径下conf目录的路径。初始化配置用到的core-site.xml、hdfs-site.xml、hbase-site.xml文件和用于安全认证的用户凭证文件，需要放置到"src/main/resources/conf"的目录下。安全登录请根据实际情况，修改“userName”为实际用户名，例如“developuser”。 private static void login() throws IOException { if (User.isHBaseSecurityEnabled(conf)) { userName = " developuser "; //In Windows environment String userdir = TestMain.class.getClassLoader().getResource("conf").getPath() + File.separator; //In Linux environment //String userdir = System.getProperty("user.dir") + File.separator + "conf" + File.separator; userKeytabFile = userdir + "user.keytab"; krb5File = userdir + "krb5.conf"; /* * if need to connect zk, please provide jaas info about zk. of course, * you can do it as below: * System.setProperty("java.security.auth.login.config", confDirPath + * "jaas.conf"); but the demo can help you more : Note: if this process * will connect more than one zk cluster, the demo may be not proper. you * can contact us for more help */ LoginUtil.setJaasConf(ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME, userName, userKeytabFile); LoginUtil.login(userName, userKeytabFile, krb5File, conf); }} 连接ThriftServer实例 try { test = new ThriftSample(); test.test("10.120.16.170", THRIFT_PORT, conf);[2] } catch (TException | IOException e) { LOG.error("Test thrift error", e); } [2]test.test()传入参数为待访问的ThriftServer实例所在节点ip地址，需根据实际运行集群情况进行修改，且该节点ip需要配置到运行样例代码的本机hosts文件中。 “THRIFT_PORT”为ThriftServer实例的配置参数"hbase.regionserver.thrift.port"对应的值。

MapReduce服务 MRS
MapReduce服务 MRS-如何处理非static的KafkaPartitioner类对象构造FlinkKafkaProducer010运行时报错:问题

问题 Flink内核升级到1.3.0之后，当Kafka调用带有非static的KafkaPartitioner类对象为参数的FlinkKafkaProducer010去构造函数时，运行时会报错。报错内容如下： org.apache.flink.api.common.InvalidProgramException: The implementation of the FlinkKafkaPartitioner is not serializable. The object probably contains or references non serializable fields.

MapReduce服务 MRS
MapReduce服务 MRS-FlinkServer REST API样例程序开发思路:数据规划

数据规划准备用户认证文件：登录Manager下载用户凭证，获取“user.keytab”和“krb5.conf”文件。准备待创建的租户信息，如“tenantId”为“92”，“tenantName”为“test92”，“remark”为“test tenant remark1”。如果是在Windows运行本样例程序，需将所有FlinkServer所在节点的主机名和IP地址添加到“C:\Windows\System32\drivers\etc\hosts”中。

MapReduce服务 MRS
MapReduce服务 MRS-访问HBase ThriftServer认证:操作场景

操作场景 HBase把Thrift结合起来可以向外部应用提供HBase服务。在HBase服务安装时可选部署ThriftServer实例，ThriftServer系统可访问HBase的用户，拥有HBase所有NameSpace和表的读、写、执行、创建和管理的权限。访问ThriftServer服务同样需要进行Kerberos认证。HBase实现了两套Thrift Server服务，此处“hbase-thrift-example”为ThriftServer实例服务的调用实现。

MapReduce服务 MRS
MapReduce服务 MRS-Flink Scala API接口介绍:提供分流能力

提供分流能力表8 提供分流能力的相关接口 API 说明 def split(selector: OutputSelector[T]): SplitStream[T] 传入OutputSelector，重写select方法确定分流的依据(即打标记)，构建SplitStream流。即对每个元素做一个字符串的标记，作为选择的依据，打好标记之后就可以通过标记选出并新建某个标记的流。 def select(outputNames: String*): DataStream[T] 从一个SplitStream中选出一个或多个流。 outputNames指的是使用split方法对每个元素做的字符串标记的序列。

MapReduce服务 MRS
MapReduce服务 MRS-Flink Scala API接口介绍:提供Join能力

提供Join能力表12 提供Join能力的相关接口 API 说明 def join[T2](otherStream: DataStream[T2]): JoinedStreams[T, T2] 通过给定的key在一个窗口范围内join两条数据流。 join操作的key值通过where和eaualTo方法进行指定，代表两条流过滤出包含等值条件的数据。 def coGroup[T2](otherStream: DataStream[T2]): CoGroupedStreams[T, T2] 通过给定的key在一个窗口范围内co-group两条数据流。 coGroup操作的key值通过where和eaualTo方法进行指定，代表两条流通过该等值条件进行分区处理。

MapReduce服务 MRS
MapReduce服务 MRS-Flink Scala API接口介绍:Flink常用接口

Flink常用接口 Flink主要使用到如下这几个类： StreamExecutionEnvironment：是Flink流处理的基础，提供了程序的执行环境。 DataStream：Flink用特别的类DataStream来表示程序中的流式数据。用户可以认为它们是含有重复数据的不可修改的集合(collection)，DataStream中元素的数量是无限的。 KeyedStream：DataStream通过keyBy分组操作生成流，数据经过对设置的key值进行分组。 WindowedStream：KeyedStream通过window窗口函数生成的流，设置窗口类型并且定义窗口触发条件，然后在窗口数据上进行一些操作。 AllWindowedStream：DataStream通过window窗口函数生成的流，设置窗口类型并且定义窗口触发条件，然后在窗口数据上进行一些操作。 ConnectedStreams：将两条DataStream流连接起来并且保持原有流数据的类型，然后进行map或者flatMap操作。 JoinedStreams：在窗口上对数据进行等值join操作，join操作是coGroup操作的一种特殊场景。 CoGroupedStreams：在窗口上对数据进行coGroup操作，可以实现流的各种join类型。图1 Flink Stream的各种流类型转换

MapReduce服务 MRS
MapReduce服务 MRS-Flink Scala API接口介绍:提供设置eventtime属性的能力

提供设置eventtime属性的能力表6 提供设置eventtime属性的能力的相关接口 API 说明 def assignTimestampsAndWatermarks(assigner: AssignerWithPeriodicWatermarks[T]): DataStream[T] 为了能让event time窗口可以正常触发窗口计算操作，需要从记录中提取时间戳。 def assignTimestampsAndWatermarks(assigner: AssignerWithPunctuatedWatermarks[T]): DataStream[T]

MapReduce服务 MRS
MapReduce服务 MRS-使用代理用户访问FlinkServer REST API样例程序（Java）:代码样例

代码样例以租户用户为“test92”，租户ID为“92”，获取具有FlinkServer管理员权限的用户名为“flinkserveradmin”的代理访问API为例，以下代码为完整示例。 public class TestCreateTenants { public static void main(String[] args) { ParameterTool paraTool = ParameterTool.fromArgs(args); final String hostName = paraTool.get("hostName"); // 修改hosts文件，使用主机名 final String keytab = paraTool.get("keytab"); // user.keytab路径 final String krb5 = paraTool.get("krb5"); // krb5.conf路径 final String principal = paraTool.get("principal"); // 认证用户 System.setProperty("java.security.krb5.conf", krb5); String url = "https://"+hostName+":28943/flink/v1/tenants"; String jsonstr = "{" + "\n\t \"tenantId\":\"92\"," + "\n\t \"tenantName\":\"test92\"," + "\n\t \"remark\":\"test tenant remark1\"," + "\n\t \"updateUser\":\"test_updateUser1\"," + "\n\t \"createUser\":\"test_createUser1\"" + "\n}"; try { LoginClient.getInstance().setConfigure(url, principal, keytab, ""); LoginClient.getInstance().login(); // 先使用flinkserver管理员用户登录 String proxyUrl = "https://"+hostName+":28943/flink/v1/proxyUserLogin"; // 调用代理用户接口，获取普通用户token String result = HttpClientUtil.doPost(proxyUrl, "{\n" + "\t\"realUser\": \"flinkserveradmin\"\n" + "}", "utf-8", true); Gson gson = new Gson(); JsonObject jsonObject = gson.fromJson(result, JsonObject.class); String token = jsonObject.get("result").toString(); token = "hadoop_auth=" + token; System.out.println(HttpClientUtil.doPost(url, jsonstr, "utf-8", true , token)); } catch (Exception e) { System.out.println(e); } }}

MapReduce服务 MRS
MapReduce服务 MRS-使用REST接口操作Namespace:功能简介

功能简介使用REST服务，传入对应host与port组成的url以及指定的Namespace，通过HTTPS协议，对Namespace进行创建、查询、删除，获取指定Namespace中表的操作。 HBase表以“命名空间:表名”格式进行存储，若在创建表时不指定命名空间，则默认存储在“default”中。其中，“hbase”命名空间为系统表命名空间，请不要对该系统表命名空间进行业务建表或数据读写等操作。

MapReduce服务 MRS
MapReduce服务 MRS-使用Filter过滤器读取HBase表数据:注意事项

注意事项当前二级索引不支持使用SubstringComparator类定义的对象作为Filter的比较器。例如，如下示例中的用法当前不支持： Scan scan = new Scan();filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier),CompareOperator.EQUAL, new SubstringComparator(substring)));scan.setFilter(filterList);

MapReduce服务 MRS
MapReduce服务 MRS-HBase访问多个ZooKeeper样例程序:代码样例

代码样例以下代码片段在“hbase-zk-example\src\main\java\com\huawei\hadoop\hbase\example”包的“TestZKSample”类中，用户主要需要关注“login”和“connectApacheZK”这两个方法。 private static void login(String keytabFile, String principal) throws IOException { conf = HBaseConfiguration.create(); //In Windows environment String confDirPath = TestZKSample.class.getClassLoader().getResource("").getPath() + File.separator;[1] //In Linux environment //String confDirPath = System.getProperty("user.dir") + File.separator + "conf" + File.separator; // Set zoo.cfg for hbase to connect to fi zookeeper. conf.set("hbase.client.zookeeper.config.path", confDirPath + "zoo.cfg"); if (User.isHBaseSecurityEnabled(conf)) { // jaas.conf file, it is included in the client pakcage file System.setProperty("java.security.auth.login.config", confDirPath + "jaas.conf"); // set the kerberos server info,point to the kerberosclient System.setProperty("java.security.krb5.conf", confDirPath + "krb5.conf"); // set the keytab file name conf.set("username.client.keytab.file", confDirPath + keytabFile); // set the user's principal try { conf.set("username.client.kerberos.principal", principal); User.login(conf, "username.client.keytab.file", "username.client.kerberos.principal", InetAddress.getLocalHost().getCanonicalHostName()); } catch (IOException e) { throw new IOException("Login failed.", e); } } } private void connectApacheZK() throws IOException, org.apache.zookeeper.KeeperException { try { // Create apache zookeeper connection. ZooKeeper digestZk = new ZooKeeper("127.0.0.1:2181", 60000, null); LOG.info("digest directory：{}", digestZk.getChildren("/", null)); LOG.info("Successfully connect to apache zookeeper."); } catch (InterruptedException e) { LOG.error("Found error when connect apache zookeeper ", e); } }

MapReduce服务 MRS
MapReduce服务 MRS-使用Filter过滤器读取HBase表数据:代码样例

代码样例以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的testSingleColumnValueFilter方法中。 public void testSingleColumnValueFilter() { LOG.info("Entering testSingleColumnValueFilter."); Table table = null; ResultScanner rScanner = null; try { table = conn.getTable(tableName); Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name")); // Set the filter criteria. SingleColumnValueFilter filter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareOperator.EQUAL, Bytes.toBytes("Xu Bing")); scan.setFilter(filter); // Submit a scan request. rScanner = table.getScanner(scan); // Print query results. for (Result r = rScanner.next(); r != null; r = rScanner.next()) { for (Cell cell : r.rawCells()) { LOG.info("{}:{},{},{}", Bytes.toString(CellUtil.cloneRow(cell)), Bytes.toString(CellUtil.cloneFamily(cell)), Bytes.toString(CellUtil.cloneQualifier(cell)), Bytes.toString(CellUtil.cloneValue(cell))); } } LOG.info("Single column value filter successfully."); } catch (IOException e) { LOG.error("Single column value filter failed " ,e); } finally { if (rScanner != null) { // Close the scanner object. rScanner.close(); } if (table != null) { try { // Close the HTable object. table.close(); } catch (IOException e) { LOG.error("Close table failed " ,e); } } } LOG.info("Exiting testSingleColumnValueFilter."); }

MapReduce服务 MRS
MapReduce服务 MRS-创建HBase表二级索引:注意事项

注意事项注[1]：创建联合索引。 HBase支持在多个字段上创建二级索引，例如在列name和age上。 HIndexSpecification iSpecUnite = new HIndexSpecification(indexName); iSpecUnite.addIndexColumn(new HColumnDescriptor("info"), "name", ValueType.String); iSpecUnite.addIndexColumn(new HColumnDescriptor("info"), "age", ValueType.String);

MapReduce服务 MRS
MapReduce服务 MRS-PyFlink样例程序代码说明:通过Python API的方式提交Flink SQL作业到Yarn上代码样例

通过Python API的方式提交Flink SQL作业到Yarn上代码样例下面列出pyflink-sql.py的主要逻辑代码作为演示，在提交之前需要确保“file_path” 为要运行的SQL的路径，建议写全路径。完整代码参见“flink-examples/pyflink-example/pyflink-sql”中的“pyflink-sql.py”。 import loggingimport sysimport osfrom pyflink.table import (EnvironmentSettings, TableEnvironment)def read_sql(file_path): if not os.path.isfile(file_path): raise TypeError(file_path + " does not exist") all_the_text = open(file_path).read() return all_the_textdef exec_sql(): # 提交之前修改SQL路径 file_path = "datagen2kafka.sql" sql = read_sql(file_path) t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) statement_set = t_env.create_statement_set() sqlArr = sql.split(";") for sqlStr in sqlArr: sqlStr = sqlStr.strip() if sqlStr.lower().startswith("create"): print("---------create---------------") print(sqlStr) t_env.execute_sql(sqlStr) if sqlStr.lower().startswith("insert"): print("---------insert---------------") print(sqlStr) statement_set.add_insert_sql(sqlStr) statement_set.execute()if __name__ == '__main__': logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(message)s") exec_sql() 表2 使用Python提交SQL作业参数说明参数说明示例 file_path “datagen2kafka.sql”文件路径，建议写全路径。需在二次样例代码中获取并上传至客户端指定目录。说明：当作业需要以yarn-application模式提交时，需替换如下路径： file_path = os.getcwd() + "/../../../../yarnship/datagen2kafka.sql" file_path = /客户端安装目录/Flink/flink/datagen2kafka.sql SQL示例： create table kafka_sink ( uuid varchar(20), name varchar(10), age int, ts timestamp(3), p varchar(20)) with ( 'connector' = 'kafka', 'topic' = 'input2', 'properties.bootstrap.servers' = 'Kafka的Broker实例业务IP:Kafka端口号', 'properties.group.id' = 'testGroup2', 'scan.startup.mode' = 'latest-offset', 'format' = 'json');create TABLE datagen_source ( uuid varchar(20), name varchar(10), age int, ts timestamp(3), p varchar(20)) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1');INSERT INTO kafka_sinkSELECT *FROM datagen_source;

MapReduce服务 MRS
MapReduce服务 MRS-PyFlink样例程序代码说明:通过Python API的方式提交Flink读写Kafka作业到Yarn上代码样例

通过Python API的方式提交Flink读写Kafka作业到Yarn上代码样例下面列出pyflink-kafka.py的主要逻辑代码作为演示，在提交之前需要确保“file_path” 为要运行的SQL的路径，建议写全路径。完整代码参见“flink-examples/pyflink-example/pyflink-kafka”中的“pyflink-kafka.py”。 import osimport loggingimport sysfrom pyflink.common import JsonRowDeserializationSchema, JsonRowSerializationSchemafrom pyflink.common.typeinfo import Typesfrom pyflink.datastream.connectors import FlinkKafkaProducer, FlinkKafkaConsumerfrom pyflink.datastream import StreamExecutionEnvironmentfrom pyflink.table import TableEnvironment, EnvironmentSettingsdef read_sql(file_path): if not os.path.isfile(file_path): raise TypeError(file_path + " does not exist") all_the_text = open(file_path).read() return all_the_textdef exec_sql(): # 提交前修改sql路径 # file_path = "/opt/client/Flink/flink/insertData2kafka.sql" # file_path = os.getcwd() + "/../../../../yarnship/insertData2kafka.sql" # file_path = "/opt/client/Flink/flink/conf/ssl/insertData2kafka.sql" file_path = "insertData2kafka.sql" sql = read_sql(file_path) t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) statement_set = t_env.create_statement_set() sqlArr = sql.split(";") for sqlStr in sqlArr: sqlStr = sqlStr.strip() if sqlStr.lower().startswith("create"): print("---------create---------------") print(sqlStr) t_env.execute_sql(sqlStr) if sqlStr.lower().startswith("insert"): print("---------insert---------------") print(sqlStr) statement_set.add_insert_sql(sqlStr) statement_set.execute()def read_write_kafka(): # find kafka connector jars env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) specific_jars = "file:///opt/client/Flink/flink/lib/flink-connector-kafka-xxx.jar" # specific_jars = "file://" + os.getcwd() + "/../../../../yarnship/flink-connector-kafka-xxx.jar" # specific_jars = "file:///opt/client/Flink/flink/conf/ssl/flink-connector-kafka-xxx.jar" # the sql connector for kafka is used here as it's a fat jar and could avoid dependency issues env.add_jars(specific_jars) kafka_properties = {'bootstrap.servers': '192.168.20.162:21005', 'group.id': 'test_group'} deserialization_schema = JsonRowDeserializationSchema.builder() \ .type_info(type_info=Types.ROW([Types.INT(), Types.STRING()])).build() kafka_consumer = FlinkKafkaConsumer( topics='test_source_topic', deserialization_schema=deserialization_schema, properties=kafka_properties) print("---------read ---------------") ds = env.add_source(kafka_consumer) serialization_schema = JsonRowSerializationSchema.builder().with_type_info( type_info=Types.ROW([Types.INT(), Types.STRING()])).build() kafka_producer = FlinkKafkaProducer( topic='test_sink_topic', serialization_schema=serialization_schema, producer_config=kafka_properties) print("--------write------------------") ds.add_sink(kafka_producer) env.execute("pyflink kafka test")if __name__ == '__main__': logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(message)s") print("------------------insert data to kafka----------------") exec_sql() print("------------------read_write_kafka----------------") read_write_kafka() 表1 使用Python提交普通作业参数说明参数说明示例 bootstrap.servers Kafka的Broker实例业务IP和端口。 192.168.12.25:21005 specific_jars “客户端安装目录/Flink/flink/lib/flink-connector-kafka-*.jar”包路径，建议写全路径。说明：当作业需要以yarn-application模式提交时，需替换如下路径，jar包版本号请以实际为准： specific_jars="file://"+os.getcwd()+"/../../../../yarnship/flink-connector-kafka-1.15.0-h0.cbu.mrs.330.r13.jar" specific_jars = file:///客户端安装目录/Flink/flink/lib/flink-connector-kafka-1.15.0-h0.cbu.mrs.330.r13.jar file_path “insertData2kafka.sql”文件路径，建议写全路径。需在二次样例代码中获取并上传至客户端指定目录。说明：当作业需要以yarn-application模式提交时，需替换如下路径： file_path = os.getcwd() + "/../../../../yarnship/insertData2kafka.sql" file_path = /客户端安装目录/Flink/flink/insertData2kafka.sql SQL示例： create table kafka_sink_table ( age int, name varchar(10)) with ( 'connector' = 'kafka', 'topic' = 'test_source_topic', --写入Kafka的topic名称，需确保与上述Python文件中的topic相同 'properties.bootstrap.servers' = 'Kafka的Broker实例业务IP:Kafka端口号', 'properties.group.id' = 'test_group', 'format' = 'json');create TABLE datagen_source_table ( age int, name varchar(10)) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1');INSERT INTO kafka_sink_tableSELECT *FROM datagen_source_table;

MapReduce服务 MRS
MapReduce服务 MRS-向Phoenix表中插入数据:代码样例

代码样例以下代码片段在com.huawei.bigdata.hbase.examples包的“PhoenixSample”类的testPut方法中。 /** * Put data */ public void testPut() { LOG.info("Entering testPut."); String URL = "jdbc:phoenix:" + conf.get("hbase.zookeeper.quorum"); // Insert String upsertSQL = "UPSERT INTO TEST VALUES(1,'John','100000', TO_DATE('1980-01-01','yyyy-MM-dd'))"; try (Connection conn = DriverManager.getConnection(url, props); Statement stat = conn.createStatement()){ // Execute Update SQL stat.executeUpdate(upsertSQL); conn.commit(); LOG.info("Put successfully."); } catch (Exception e) { LOG.error("Put failed.", e); } LOG.info("Exiting testPut."); }

MapReduce服务 MRS
MapReduce服务 MRS-BulkLoad和Put应用场景有哪些:回答

回答 bulkload是通过启动MapReduce任务直接生成HFile文件，再将HFile文件注册到HBase，因此错误的使用bulkload会因为启动MapReduce任务而占用更多的集群内存和CPU资源，也可能会生成大量很小的HFile文件频繁的触发Compaction，导致查询速度急剧下降。错误的使用put，会造成数据加载慢，当分配给RegionServer内存不足时会造成RegionServer内存溢出从而导致进程退出。下面给出bulkload和put适合的场景： bulkload适合的场景：大量数据一次性加载到HBase。对数据加载到HBase可靠性要求不高，不需要生成WAL文件。使用put加载大量数据到HBase速度变慢，且查询速度变慢时。加载到HBase新生成的单个HFile文件大小接近HDFS block大小。 put适合的场景：每次加载到单个Region的数据大小小于HDFS block大小的一半。数据需要实时加载。加载数据过程不会造成用户查询速度急剧下降。

MapReduce服务 MRS
MapReduce服务 MRS-创建HBase客户端连接:代码样例

代码样例以下代码片段是登录，创建Connection并创建表的示例，在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的HBaseSample方法中。 private TableName tableName = null; private Connection conn = null; public HBaseSample(Configuration conf) throws IOException { this.tableName = TableName.valueOf("hbase_sample_table"); this.conn = ConnectionFactory.createConnection(conf);}

MapReduce服务 MRS
MapReduce服务 MRS-HBase应用开发流程介绍

HBase应用开发流程介绍本文档主要基于Java API对HBase进行应用开发。开发流程中各阶段的说明如图1和表1所示。图1 HBase应用程序开发流程表1 HBase应用开发的流程说明阶段说明参考文档了解基本概念在开始开发应用前，需要了解HBase的基本概念，了解场景需求，设计表等。 HBase常用概念准备开发和运行环境 HBase的应用程序当前推荐使用Java语言进行开发。可使用IntelliJ IDEA工具。HBase的运行环境即HBase客户端，根据指导完成客户端的安装和配置。准备HBase应用开发和运行环境准备工程 HBase提供了不同场景下的样例程序，您可以导入样例工程进行程序学习。导入并配置HBase样例工程准备安全认证如果您使用的是安全集群，需要进行安全认证。配置HBase应用安全认证根据场景开发工程提供了Java语言的样例工程，包含从建表、写入到删除表全流程的样例工程。开发HBase应用编译并运行程序指导用户将开发好的程序编译并提交运行。调测HBase应用查看程序运行结果程序运行结果会写在用户指定的路径下。用户还可以通过UI查看应用运行情况。父主题： HBase开发指南（安全模式）

MapReduce服务 MRS HBase开发指南（安全模式）
MapReduce服务 MRS-创建HBase客户端连接:功能介绍

功能介绍 HBase通过ConnectionFactory.createConnection(configuration)方法创建Connection对象。传递的参数为上一步创建的Configuration。 Connection封装了底层与各实际服务器的连接以及与ZooKeeper的连接。Connection通过ConnectionFactory类实例化。创建Connection是重量级操作，Connection是线程安全的，因此，多个客户端线程可以共享一个Connection。典型的用法，一个客户端程序共享一个单独的Connection，每一个线程获取自己的Admin或Table实例，然后调用Admin对象或Table对象提供的操作接口。不建议缓存或者池化Table、Admin。Connection的生命周期由调用者维护，调用者通过调用close()，释放资源。

MapReduce服务 MRS
MapReduce服务 MRS-FlinkServer REST API样例程序（Java）:代码样例

代码样例具体代码参见com.huawei.bigdata.flink.examples.TestCreateTenants。 public class TestCreateTenants { public static void main(String[] args) { ParameterTool paraTool = ParameterTool.fromArgs(args); final String hostName = paraTool.get("hostName"); // 修改hosts文件，使用主机名 final String keytab = paraTool.get("keytab文件路径"); // user.keytab路径 final String krb5 = paraTool.get("krb5文件路径"); // krb5.conf路径 final String principal = paraTool.get("认证用户名"); // 认证用户 System.setProperty("java.security.krb5.conf", krb5); String url = "https://"+hostName+":28943/flink/v1/tenants"; String jsonstr = "{" + "\n\t \"tenantId\":\"92\"," + "\n\t \"tenantName\":\"test92\"," + "\n\t \"remark\":\"test tenant remark1\"," + "\n\t \"updateUser\":\"test_updateUser1\"," + "\n\t \"createUser\":\"test_createUser1\"" + "\n}"; try { LoginClient.getInstance().setConfigure(url, principal, keytab, ""); LoginClient.getInstance().login(); System.out.println(HttpClientUtil.doPost(url, jsonstr, "utf-8", true)); } catch (Exception e) { System.out.println(e); } }}

MapReduce服务 MRS
MapReduce服务 MRS-初始化HBase配置:代码样例

代码样例下面代码片段在com.huawei.bigdata.hbase.examples包的“TestMain”类的init方法中。 private static void init() throws IOException { // Default load from conf directory conf = HBaseConfiguration.create(); //In Windows environment String userdir = TestMain.class.getClassLoader().getResource("conf").getPath() + File.separator; //In Linux environment //String userdir = System.getProperty("user.dir") + File.separator + "conf" + File.separator; conf.addResource(new Path(userdir + "core-site.xml"), false); conf.addResource(new Path(userdir + "hdfs-site.xml"), false); conf.addResource(new Path(userdir + "hbase-site.xml"), false); }

MapReduce服务 MRS

共100000条

undefined

意见反馈

0/200

提交取消

提交成功！非常感谢您的反馈，我们会继续努力做到更好反馈提交失败！请稍后重试！