(1)先从集群1使用hadoop fs -get将文件下载到本地磁盘
(2)再使用hadoop fs -put将数据上传至集群2的hdfs上
(3)在集群2上创建语句一模一样的表
(4)给集群2的同名hive表添加分区,数据就会自动加载到Hive表中

添加分区可写一个脚本:

  • 首先在集群1下,shell命令行执行 hive -e "show partitions table_name" > partitions.txt
    得到类似如下内容:
day=17987/event=$AppClick
day=17987/event=$AppEnd
day=17987/event=$AppStart
day=17987/event=$AppStartPassively
day=17987/event=$AppViewScreen
day=17987/event=$SignUp
day=17987/event=$WebClick

我这里day是int类型的,event是字符串类型的,
然后写如下脚本执行:

import os

file = open("partitions.txt")
for line in file:
    splits = line.split("/")
    events = splits[1].split("=")
    partition = splits[0] + "," + events[0] + "=\"" + events[1].strip() + "\""
    command = "hive -e 'alter table analysis.events_parquet_mtmy add partition("+partition+");'"
    os.system(command)

在集群2上执行即可。