背景:kudu是好用,想直接hive操作,好像貌似只有hive 4.0.0 支持;还有一个就是纯粹测试。网上这种升级的帖子少,有自己编译过hive4.0.0源码升级的,我不会java啊。
升级是测试和研究,简单的步骤一口带过,有不足的地方各位看官凑合吧。
步骤:
- 下载hive apache-hive-4.0.0-alpha-2-bin.tar.gz 有精神的可以下载src自己编译,顺便说一嘴,自己编译,不可以更改hadoop、hbase、kafka、zookeeper的版本,需要定制开发才得行。
- 解压。
- 关闭服务,备份元数据库。
以下操作,所有与hive有关的服务器均要做。
- 在所有用到hive的(service或者是gateway)服务器下/opt/cloudera/parcels/CDH/lib/hive 新建文件lib400。
- 将apache-hive-4.0.0-alpha-2-bin/lib 下所有的文件复制到/opt/cloudera/parcels/CDH/lib/hive/lib400
- 到/opt/cloudera/parcels/CDH/lib/hive/lib400
rm -f zookeeper*
ln -s ../../jars/zookeeper-3.4.5-cdh6.2.1.jar zookeeper.jar
- 修改apache-hive-4.0.0-alpha-2-bin/bin 下的文件
beeline:
. "$bin"/hive --service beeline -u "jdbc:hive2://hiveservice2:10000" user passwd "$@"
其中:name是hive元数据库的用户,比如mysql,name就是能访问mysql数据库hive元数据库的用户,密码同理。
hive:
HIVE_LIB=${HIVE_HOME}/lib400
- mv /opt/cloudera/parcels/CDH/lib/hive/bin /opt/cloudera/parcels/CDH/lib/hive/binbak
- 将apache-hive-4.0.0-alpha-2-bin/bin 复制到/opt/cloudera/parcels/CDH/lib/hive/
- 修改/opt/cloudera/parcels/CDH/bin/hive
exec $LIB_DIR/hive/bin/hive --service beeline -u "jdbc:hive2://hiveservice2:10000" name passwd "$@"
- 升级数据库:
修改:apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.1.0-to-2.2.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/037-HIVE-14496.mysql.sql;
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/038-HIVE-10562.mysql.sql;
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/039-HIVE-12274.mysql.sql;
apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.2.0-to-2.3.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/040-HIVE-16399.mysql.sql;
然后在数据库中:
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.1.0-to-2.2.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.2.0-to-2.3.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-3.0.0-to-3.1.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-3.1.0-to-3.2.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-3.2.0-to-4.0.0-alpha-1.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.mysql.sql
至此,基本更新已经完成。但是,启动hive会报错:
Caused by: java.lang.IllegalArgumentException: hive configuration hive.query.redaction.rules does not exists.
12、还需要修改一个地方:
/opt/cloudera/parcels/CDH/lib/hive/conf/hive-env.xml
注释掉export HIVE_OPTS
后面执行sql会报WARN
WARN : WARNING! Query command could not be redacted.java.lang.RuntimeException: Error loading hooks(hive.exec.query.redactor.hooks): java.lang.ClassNotFoundException: org.cloudera.hadoop.hive.ql.hooks.QueryRedactor
这个没有找到解决办法,但是不影响查询。
每次更新了配置,重启hive,重新部署客户端后都要更改再重启一次,烦的很。我还没有找到CDH下发配置文件的源头,找到了,应该直接修改源头就得行。
13、重启hive。
测试一下:
至此,升级完成。给博主加鸡腿!
注意:
- hive升级4.0,不要跟升级2.X 和 3.X一样,只copy hive的jar包,不然启动不了。
- 对hive和beeline执行文件的更改是为了防止每次都要输入验证。
- 不建议线上升级,想升级也可以,备份好。
- spark 的 /opt/cloudera/parcels/CDH/lib/spark/hive/下的jar包 换成hive-exec-4.0.0-alpha-2.jar
- 升级后,hive命令行是beeline,需要学习一下beeline命令;但是hive-e hive-f等都是一样的。前提是要跟我一样进行修改。
至于Hive Kudu 整合,我决定放弃,impala挺好用的。【苦笑】
半个小时后,我看到一篇文章:
Apache Hive 中文手册 - Hive Kudu 整合 | Docs4dev,是我理解错了,以为hive可以直接建kudu表。其实是一种映射关系。
- impala建kudu表:
CREATE TABLE kudu_table (foo INT, bar STRING, baz DOUBLE, PRIMARY KEY (foo))
PARTITION BY HASH PARTITIONS 4
STORED AS kudu
TBLPROPERTIES (
"kudu.master_addresses"="172.10.62.131:7051"
);
- 找到对应的kudu表名称:
[root@172-10-62-132 hive]# kudu table list 172.10.62.131
impala::pg_service_profile.kudu_table
[root@172-10-62-132 hive]#
- hive建kudu表的映射表(注意,表名要么不同,要么再其他库,不然会报表存在的错):
CREATE EXTERNAL TABLE default.kudu_table (foo INT, bar STRING, baz DOUBLE)
STORED BY 'org.apache.hadoop.hive.kudu.KuduStorageHandler'
TBLPROPERTIES (
"kudu.table_name"="impala::pg_service_profile.kudu_table",
"kudu.master_addresses"="172.10.62.131:7051"
);
- impala 插入数据到苦读表:
INSERT INTO TABLE kudu_table
VALUES (1, 'test 1', 1.1), (2, 'test 2', 2.2);
- 查询数据:
INFO : Compiling command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8): select * from kudu_table
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:kudu_table.foo, type:int, comment:null), FieldSchema(name:kudu_table.bar, type:string, comment:null), FieldSchema(name:kudu_table.baz, type:double, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8); Time taken: 0.768 seconds
INFO : Operation QUERY obtained 2 locks
INFO : Executing command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8): select * from kudu_table
INFO : Completed executing command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8); Time taken: 0.001 seconds
+-----------------+-----------------+-----------------+
| kudu_table.foo | kudu_table.bar | kudu_table.baz |
+-----------------+-----------------+-----------------+
| 1 | test 1 | 1.1 |
| 2 | test 2 | 2.2 |
+-----------------+-----------------+-----------------+
2 rows selected (1.943 seconds)
0: jdbc:hive2://172.10.62.132:10000>
优秀啊,给博主加牛腿!
又过了半个小时,突发奇想,impala可以插入数据,hive查看,那hive插入数据,impala会不也可以看到。答案是,可以的,但是,虽然hive的表没有给出主键,但依然遵循主键。下面给大家实验一下:
- hive插入数据
INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4);
日志:
INFO : Compiling command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561): INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:int, comment:null), FieldSchema(name:_col1, type:string, comment:null), FieldSchema(name:_col2, type:double, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561); Time taken: 0.377 seconds
INFO : Operation QUERY obtained 4 locks
INFO : Executing command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561): INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4)
INFO : Query ID = hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561
INFO : Total jobs = 1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Starting task [Stage-1:DDL] in serial mode
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-2:MAPRED] in serial mode
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1670221063369_0023
INFO : Executing with tokens: []
INFO : The url to track the job: http://172-10-62-131.lightspeed.moblal.sbcglobal.net:8088/proxy/application_1670221063369_0023/
INFO : Starting Job = job_1670221063369_0023, Tracking URL = http://172-10-62-131.lightspeed.moblal.sbcglobal.net:8088/proxy/application_1670221063369_0023/
INFO : Kill Command = /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/mapred job -kill job_1670221063369_0023
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
INFO : 2023-01-05 16:03:25,992 Stage-2 map = 0%, reduce = 0%
INFO : 2023-01-05 16:03:43,633 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 13.12 sec
INFO : MapReduce Total cumulative CPU time: 13 seconds 120 msec
INFO : Ended Job = job_1670221063369_0023
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-2: Map: 1 Cumulative CPU: 13.12 sec HDFS Read: 6657 HDFS Write: 0 HDFS EC Read: 0 SUCCESS
INFO : Total MapReduce CPU Time Spent: 13 seconds 120 msec
INFO : Completed executing command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561); Time taken: 37.397 seconds
- impala查看数据
- hive再插入相同的数据
INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4);
这里不会报错,也会执行成功。
- hive查看,impala查看
hive查询结果:
INFO : Compiling command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb): select * from kudu_table
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:kudu_table.foo, type:int, comment:null), FieldSchema(name:kudu_table.bar, type:string, comment:null), FieldSchema(name:kudu_table.baz, type:double, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb); Time taken: 0.236 seconds
INFO : Operation QUERY obtained 2 locks
INFO : Executing command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb): select * from kudu_table
INFO : Completed executing command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb); Time taken: 0.002 seconds
+-----------------+-----------------+-----------------+
| kudu_table.foo | kudu_table.bar | kudu_table.baz |
+-----------------+-----------------+-----------------+
| 4 | test 4 | 2.4 |
| 1 | test 1 | 1.1 |
| 2 | test 2 | 2.2 |
| 3 | test 3 | 1.3 |
+-----------------+-----------------+-----------------+
4 rows selected (0.734 seconds)
0: jdbc:hive2://172.10.62.132:10000>
impala查询结果:
[172-10-62-100.lightspeed.moblal.sbcglobal.net:21000] pg_service_profile> select * from kudu_table;
Query: select * from kudu_table
Query submitted at: 2023-01-05 16:36:13 (Coordinator: http://172-10-62-100.lightspeed.moblal.sbcglobal.net:25000)
Query progress can be monitored at: http://172-10-62-100.lightspeed.moblal.sbcglobal.net:25000/query_plan?query_id=ab46790bb782b28d:829fe7200000000
+-----+--------+-----+
| foo | bar | baz |
+-----+--------+-----+
| 3 | test 3 | 1.3 |
| 1 | test 1 | 1.1 |
| 2 | test 2 | 2.2 |
| 4 | test 4 | 2.4 |
+-----+--------+-----+
Fetched 4 row(s) in 0.21s
[172-10-62-100.lightspeed.moblal.sbcglobal.net:21000] pg_service_profile>
很好理解,hive只是工具,存储的位置是kudu。
大象腿安排上。