Hive分区表跨集群迁移

原创

香山上的麻雀 2021-06-21 16:04:27 ©著作权

文章标签 hive 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者香山上的麻雀的原创作品，请联系作者获取转载授权，否则将追究法律责任

（1）先从集群1使用hadoop fs -get将文件下载到本地磁盘
（2）再使用hadoop fs -put将数据上传至集群2的hdfs上
（3）在集群2上创建语句一模一样的表
（4）给集群2的同名hive表添加分区，数据就会自动加载到Hive表中

添加分区可写一个脚本：

首先在集群1下，shell命令行执行 hive -e "show partitions table_name" > partitions.txt
得到类似如下内容：

day=17987/event=$AppClick
day=17987/event=$AppEnd
day=17987/event=$AppStart
day=17987/event=$AppStartPassively
day=17987/event=$AppViewScreen
day=17987/event=$SignUp
day=17987/event=$WebClick

我这里day是int类型的，event是字符串类型的，
然后写如下脚本执行：

import os

file = open("partitions.txt")
for line in file:
    splits = line.split("/")
    events = splits[1].split("=")
    partition = splits[0] + "," + events[0] + "=\"" + events[1].strip() + "\""
    command = "hive -e 'alter table analysis.events_parquet_mtmy add partition("+partition+");'"
    os.system(command)

在集群2上执行即可。