背景:kudu是好用,想直接hive操作,好像貌似只有hive 4.0.0 支持;还有一个就是纯粹测试。网上这种升级的帖子少,有自己编译过hive4.0.0源码升级的,我不会java啊。

升级是测试和研究,简单的步骤一口带过,有不足的地方各位看官凑合吧。

步骤:

  1. 下载hive apache-hive-4.0.0-alpha-2-bin.tar.gz 有精神的可以下载src自己编译,顺便说一嘴,自己编译,不可以更改hadoop、hbase、kafka、zookeeper的版本,需要定制开发才得行。
  2. 解压。
  3. 关闭服务,备份元数据库。

以下操作,所有与hive有关的服务器均要做。

  1. 在所有用到hive的(service或者是gateway)服务器下/opt/cloudera/parcels/CDH/lib/hive 新建文件lib400。
  2. 将apache-hive-4.0.0-alpha-2-bin/lib 下所有的文件复制到/opt/cloudera/parcels/CDH/lib/hive/lib400
  3. 到/opt/cloudera/parcels/CDH/lib/hive/lib400
rm -f zookeeper*
ln -s ../../jars/zookeeper-3.4.5-cdh6.2.1.jar zookeeper.jar
  1. 修改apache-hive-4.0.0-alpha-2-bin/bin 下的文件

beeline:

. "$bin"/hive --service beeline -u "jdbc:hive2://hiveservice2:10000" user passwd "$@"

其中:name是hive元数据库的用户,比如mysql,name就是能访问mysql数据库hive元数据库的用户,密码同理。

hive:

HIVE_LIB=${HIVE_HOME}/lib400
  1. mv /opt/cloudera/parcels/CDH/lib/hive/bin /opt/cloudera/parcels/CDH/lib/hive/binbak
  2. 将apache-hive-4.0.0-alpha-2-bin/bin 复制到/opt/cloudera/parcels/CDH/lib/hive/
  3. 修改/opt/cloudera/parcels/CDH/bin/hive
exec $LIB_DIR/hive/bin/hive --service beeline -u "jdbc:hive2://hiveservice2:10000" name passwd "$@"
  1. 升级数据库:

修改:apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.1.0-to-2.2.0.mysql.sql

SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/037-HIVE-14496.mysql.sql;
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/038-HIVE-10562.mysql.sql;
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/039-HIVE-12274.mysql.sql;

apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.2.0-to-2.3.0.mysql.sql

SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/040-HIVE-16399.mysql.sql;

然后在数据库中:

SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.1.0-to-2.2.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.2.0-to-2.3.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-3.0.0-to-3.1.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-3.1.0-to-3.2.0.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-3.2.0-to-4.0.0-alpha-1.mysql.sql
SOURCE /root/apache-hive-4.0.0-alpha-2-bin/scripts/metastore/upgrade/mysql/upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.mysql.sql

至此,基本更新已经完成。但是,启动hive会报错:

Caused by: java.lang.IllegalArgumentException: hive configuration hive.query.redaction.rules does not exists.

12、还需要修改一个地方:

/opt/cloudera/parcels/CDH/lib/hive/conf/hive-env.xml

注释掉export HIVE_OPTS

后面执行sql会报WARN

WARN  : WARNING! Query command could not be redacted.java.lang.RuntimeException: Error loading hooks(hive.exec.query.redactor.hooks): java.lang.ClassNotFoundException: org.cloudera.hadoop.hive.ql.hooks.QueryRedactor

这个没有找到解决办法,但是不影响查询。

每次更新了配置,重启hive,重新部署客户端后都要更改再重启一次,烦的很。我还没有找到CDH下发配置文件的源头,找到了,应该直接修改源头就得行。

13、重启hive。

测试一下:




CDH怎么启动hbase集群 cdh hbase升级_mysql


CDH怎么启动hbase集群 cdh hbase升级_Powered by 金山文档_02


至此,升级完成。给博主加鸡腿!

注意:

  1. hive升级4.0,不要跟升级2.X 和 3.X一样,只copy hive的jar包,不然启动不了。
  2. 对hive和beeline执行文件的更改是为了防止每次都要输入验证。
  3. 不建议线上升级,想升级也可以,备份好。
  4. spark 的 /opt/cloudera/parcels/CDH/lib/spark/hive/下的jar包 换成hive-exec-4.0.0-alpha-2.jar
  5. 升级后,hive命令行是beeline,需要学习一下beeline命令;但是hive-e hive-f等都是一样的。前提是要跟我一样进行修改。

至于Hive Kudu 整合,我决定放弃,impala挺好用的。【苦笑】

半个小时后,我看到一篇文章:

Apache Hive 中文手册 - Hive Kudu 整合 | Docs4dev,是我理解错了,以为hive可以直接建kudu表。其实是一种映射关系。

  1. impala建kudu表:
CREATE TABLE kudu_table (foo INT, bar STRING, baz DOUBLE, PRIMARY KEY (foo))
PARTITION BY HASH PARTITIONS 4
STORED AS kudu
TBLPROPERTIES (
  "kudu.master_addresses"="172.10.62.131:7051"
);
  1. 找到对应的kudu表名称:
[root@172-10-62-132 hive]# kudu table list 172.10.62.131
impala::pg_service_profile.kudu_table
[root@172-10-62-132 hive]#
  1. hive建kudu表的映射表(注意,表名要么不同,要么再其他库,不然会报表存在的错):
CREATE EXTERNAL TABLE default.kudu_table (foo INT, bar STRING, baz DOUBLE)
STORED BY 'org.apache.hadoop.hive.kudu.KuduStorageHandler'
TBLPROPERTIES (
  "kudu.table_name"="impala::pg_service_profile.kudu_table", 
  "kudu.master_addresses"="172.10.62.131:7051"
);
  1. impala 插入数据到苦读表:
INSERT INTO TABLE kudu_table
VALUES (1, 'test 1', 1.1), (2, 'test 2', 2.2);
  1. 查询数据:
INFO  : Compiling command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8): select * from kudu_table
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:kudu_table.foo, type:int, comment:null), FieldSchema(name:kudu_table.bar, type:string, comment:null), FieldSchema(name:kudu_table.baz, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8); Time taken: 0.768 seconds
INFO  : Operation QUERY obtained 2 locks
INFO  : Executing command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8): select * from kudu_table
INFO  : Completed executing command(queryId=hive_20230105155029_f5343a3d-cf67-41fc-ac65-921b669088b8); Time taken: 0.001 seconds
+-----------------+-----------------+-----------------+
| kudu_table.foo  | kudu_table.bar  | kudu_table.baz  |
+-----------------+-----------------+-----------------+
| 1               | test 1          | 1.1             |
| 2               | test 2          | 2.2             |
+-----------------+-----------------+-----------------+
2 rows selected (1.943 seconds)
0: jdbc:hive2://172.10.62.132:10000>

优秀啊,给博主加牛腿!

又过了半个小时,突发奇想,impala可以插入数据,hive查看,那hive插入数据,impala会不也可以看到。答案是,可以的,但是,虽然hive的表没有给出主键,但依然遵循主键。下面给大家实验一下:

  1. hive插入数据
INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4);

日志:
INFO  : Compiling command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561): INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:int, comment:null), FieldSchema(name:_col1, type:string, comment:null), FieldSchema(name:_col2, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561); Time taken: 0.377 seconds
INFO  : Operation QUERY obtained 4 locks
INFO  : Executing command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561): INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4)
INFO  : Query ID = hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561
INFO  : Total jobs = 1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Starting task [Stage-1:DDL] in serial mode
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-2:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1670221063369_0023
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://172-10-62-131.lightspeed.moblal.sbcglobal.net:8088/proxy/application_1670221063369_0023/
INFO  : Starting Job = job_1670221063369_0023, Tracking URL = http://172-10-62-131.lightspeed.moblal.sbcglobal.net:8088/proxy/application_1670221063369_0023/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/mapred job  -kill job_1670221063369_0023
INFO  : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
INFO  : 2023-01-05 16:03:25,992 Stage-2 map = 0%,  reduce = 0%
INFO  : 2023-01-05 16:03:43,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 13.12 sec
INFO  : MapReduce Total cumulative CPU time: 13 seconds 120 msec
INFO  : Ended Job = job_1670221063369_0023
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-2: Map: 1   Cumulative CPU: 13.12 sec   HDFS Read: 6657 HDFS Write: 0 HDFS EC Read: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 13 seconds 120 msec
INFO  : Completed executing command(queryId=hive_20230105160307_05a7066a-165d-4c69-a9d1-7752e5cf9561); Time taken: 37.397 seconds
  1. impala查看数据


CDH怎么启动hbase集群 cdh hbase升级_CDH怎么启动hbase集群_03


  1. hive再插入相同的数据
INSERT INTO TABLE kudu_table
VALUES (3, 'test 3', 1.3), (4, 'test 4', 2.4);

这里不会报错,也会执行成功。
  1. hive查看,impala查看
hive查询结果:
INFO  : Compiling command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb): select * from kudu_table
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:kudu_table.foo, type:int, comment:null), FieldSchema(name:kudu_table.bar, type:string, comment:null), FieldSchema(name:kudu_table.baz, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb); Time taken: 0.236 seconds
INFO  : Operation QUERY obtained 2 locks
INFO  : Executing command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb): select * from kudu_table
INFO  : Completed executing command(queryId=hive_20230105160356_e01a3666-0527-4002-9288-2910f4462fdb); Time taken: 0.002 seconds
+-----------------+-----------------+-----------------+
| kudu_table.foo  | kudu_table.bar  | kudu_table.baz  |
+-----------------+-----------------+-----------------+
| 4               | test 4          | 2.4             |
| 1               | test 1          | 1.1             |
| 2               | test 2          | 2.2             |
| 3               | test 3          | 1.3             |
+-----------------+-----------------+-----------------+
4 rows selected (0.734 seconds)
0: jdbc:hive2://172.10.62.132:10000>

impala查询结果:
[172-10-62-100.lightspeed.moblal.sbcglobal.net:21000] pg_service_profile> select * from kudu_table;
Query: select * from kudu_table
Query submitted at: 2023-01-05 16:36:13 (Coordinator: http://172-10-62-100.lightspeed.moblal.sbcglobal.net:25000)
Query progress can be monitored at: http://172-10-62-100.lightspeed.moblal.sbcglobal.net:25000/query_plan?query_id=ab46790bb782b28d:829fe7200000000
+-----+--------+-----+
| foo | bar    | baz |
+-----+--------+-----+
| 3   | test 3 | 1.3 |
| 1   | test 1 | 1.1 |
| 2   | test 2 | 2.2 |
| 4   | test 4 | 2.4 |
+-----+--------+-----+
Fetched 4 row(s) in 0.21s
[172-10-62-100.lightspeed.moblal.sbcglobal.net:21000] pg_service_profile>

很好理解,hive只是工具,存储的位置是kudu。

大象腿安排上。