本文主要实现一下目标:

   1. 在hive中创建的表能直接创建保存到hbase中。

   2. hive中的表插入数据,插入的数据会同步更新到hbase对应的表中。

   3. hbase对应的列簇值变更,也会在Hive中对应的表中变更。

   4. 实现了多列,多列簇的转化:(示例:hive中3列对应hbase中2列簇)


hive与hbase的整合

 1. 创建hbase识别的表:

CREATE TABLE hbase_table_1(key int, value string)    
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
hbase.table.name 定义在hbase的table名称
hbase.columns.mapping 定义在hbase的列族


报错FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations


        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)


        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)


        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)

解决:是hive读取hbase的zookeeper时出错了(https://my.oschina.net/houxm/blog/639748)

hive-site.xml文件添加配置     
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01,hadoop02,hadoop03</value>
<description>
</description>
</property>

hbase中看到的表:


hbase(main):001:0> list
TABLE
xyz
t1
2 row(s) in 0.7750 seconds

=> ["xyz", "t1"]

hbase里自动建好xyz

这个表



2.使用sql导入数据 

i.预先准备数据 

a)新建hive的数据表

hive> create table ccc(foo int,bar string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;


vi kv1.txt

1       xiaoming
2 cainiao
3 zhangsan

这个文件位于/home/hadoop/目录下


load data local inpath '/home/hadoop/kv1.txt' overwrite into table ccc;
hive> select * from ccc;
OK
1 xiaoming
2 cainiao
3 zhangsan


使用sql导入hbase_table_1

hive> insert overwrite table hbase_table_1 select * from ccc where foo=1;

Query ID = hadoop_20170208120143_2d3a5cec-46d8-47ef-baeb-aa8ebaf5bc0e


Total jobs = 1


Launching Job 1 out of 1


Number of reduce tasks is set to 0 since there's no reduce operator


Starting Job = job_1486525566658_0001, Tracking URL = http://hadoop06:8088/proxy/application_1486525566658_0001/


Kill Command = /home/hadoop/cloud/hadoop-2.7.3/bin/hadoop job  -kill job_1486525566658_0001


Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0


2017-02-08 12:08:26,134 Stage-1 map = 0%,  reduce = 0%


2017-02-08 12:09:26,916 Stage-1 map = 0%,  reduce = 0%


2017-02-08 12:10:27,696 Stage-1 map = 0%,  reduce = 0%


2017-02-08 12:11:28,559 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 2.8 sec


2017-02-08 12:11:40,152 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.22 sec


MapReduce Total cumulative CPU time: 4 seconds 810 msec


Ended Job = job_1486525566658_0001


MapReduce Jobs Launched: 


Stage-Stage-1: Map: 1   Cumulative CPU: 4.81 sec   HDFS Read: 10855 HDFS Write: 0 SUCCESS


Total MapReduce CPU Time Spent: 4 seconds 810 msec


OK

查看数据

会显示刚刚插入的数据 

hive> select * from hbase_table_1;
OK
1 xiaoming


hbase 登录hbase


查看加载的数据


hbase(main):002:0> scan 'xyz';
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1486527116866, value=xiaoming


hbase添加数据:

hbase(main):004:0> put 'xyz','100','cf1:val','www.gongchang.com';
hbase(main):005:0> put 'xyz','200','cf1:val','hello,word!';
hbase(main):006:0> scan 'xyz';
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1486527116866, value=xiaoming
100 column=cf1:val, timestamp=1486527821853, value=www.gongchang.com
200 column=cf1:val, timestamp=1486527821904, value=hello,word!
3 row(s) in 0.0270 seconds


Hive 


参看hive中的数据

hive> select * from hbase_table_1;
OK
1 xiaoming
100 www.gongchang.com
200 hello,word!

刚刚在hbase中插入的数据,已经在hive里了


hive访问已经存在的hbase

hbase中的元数据准备:

hbase(main):009:0> create 'student','info';
0 row(s) in 1.7530 seconds
hbase(main):014:0> put "student",'1','info:name','tom';
hbase(main):015:0* put "student",'2','info:name','lily';
hbase(main):016:0* put "student",'3','info:name','wwn';
hbase(main):018:0> scan 'student';
ROW COLUMN+CELL
1 column=info:name, timestamp=1486528334315, value=tom
2 column=info:name, timestamp=1486528334331, value=lily
3 column=info:name, timestamp=1486528334404, value=wwn
3 row(s) in 0.0380 seconds


hive访问已经存在的hbase 

hive新建表

CREATE EXTERNAL TABLE hbase_table_3(key int, value string)    
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")
TBLPROPERTIES("hbase.table.name" = "student");


hive> select * from hbase_table_3;
OK
1 tom
2 lily
3 wwn
Time taken: 0.439 seconds, Fetched: 3 row(s)

由上可以看出,hive已经能访问查看hbase中原有的数据了。


注意:如果hbase中列簇名name数据变更,那么hive中查询结果也会相应的变更,如果hbase中不是其他列簇


    内容更新则hive中查询结果不显示。


三、多列和多列族(Multiple Columns and Families) 

1.创建数据库

hive新建表

CREATE TABLE hbase_table_add1(key int, value1 string, value2 int, value3 int)    
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:col1,info:col2,city:nu")
TBLPROPERTIES("hbase.table.name" = "student_info");


通过参数hive.cli.print.header可以控制在cli中是否显示表的列名。


hive> set hive.cli.print.header=true;


hive> select * from hbase_table_3;
OK
hbase_table_3.key hbase_table_3.value
1 tom
2 lily
3 wwn


hive执行操作

hive> insert overwrite table hbase_table_add1 select key,value,key+1,value from hbase_table_3;

Query ID = hadoop_20170208150313_7aba27e4-8c94-4432-b08a-e14af6e10b17


Total jobs = 1


Launching Job 1 out of 1


Number of reduce tasks is set to 0 since there's no reduce operator


Starting Job = job_1486525566658_0002, Tracking URL = http://hadoop06:8088/proxy/application_1486525566658_0002/


Kill Command = /home/hadoop/cloud/hadoop-2.7.3/bin/hadoop job  -kill job_1486525566658_0002


Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0


2017-02-08 15:09:28,333 Stage-1 map = 0%,  reduce = 0%


2017-02-08 15:10:29,086 Stage-1 map = 0%,  reduce = 0%


2017-02-08 15:11:29,331 Stage-1 map = 0%,  reduce = 0%


2017-02-08 15:12:17,755 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.09 sec


MapReduce Total cumulative CPU time: 14 seconds 90 msec


Ended Job = job_1486525566658_0002


MapReduce Jobs Launched: 


Stage-Stage-1: Map: 1   Cumulative CPU: 14.09 sec   HDFS Read: 5226 HDFS Write: 0 SUCCESS


Total MapReduce CPU Time Spent: 14 seconds 90 msec


OK


_col0   _col1   _col2   _col3


Time taken: 571.258 seconds

hbase_table_add1.key    hbase_table_add1.value1 hbase_table_add1.value2 hbase_table_add1.value3
1 tom 2 NULL
2 lily 3 NULL
3 wwn 4 NULL
Time taken: 0.392 seconds, Fetched: 3 row(s)


hive> insert overwrite table hbase_table_add1 select key,value,key+1,key+100 from hbase_table_3;

Query ID = hadoop_20170208151919_f0513cd5-b742-479c-96ec-56f3c4b47d5e


Total jobs = 1


Launching Job 1 out of 1


Number of reduce tasks is set to 0 since there's no reduce operator


Starting Job = job_1486525566658_0003, Tracking URL = http://hadoop06:8088/proxy/application_1486525566658_0003/


Kill Command = /home/hadoop/cloud/hadoop-2.7.3/bin/hadoop job  -kill job_1486525566658_0003


Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0


2017-02-08 15:22:57,975 Stage-1 map = 0%,  reduce = 0%


2017-02-08 15:23:58,599 Stage-1 map = 0%,  reduce = 0%


2017-02-08 15:24:59,518 Stage-1 map = 0%,  reduce = 0%


2017-02-08 15:25:22,711 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.46 sec


MapReduce Total cumulative CPU time: 4 seconds 460 msec


Ended Job = job_1486525566658_0003


MapReduce Jobs Launched: 


Stage-Stage-1: Map: 1   Cumulative CPU: 4.46 sec   HDFS Read: 5345 HDFS Write: 0 SUCCESS


Total MapReduce CPU Time Spent: 4 seconds 460 msec


OK


key     value   c2      c3


Time taken: 392.229 seconds

hive> select * from hbase_table_add1;select * from hbase_table_add1;
OK
hbase_table_add1.key hbase_table_add1.value1 hbase_table_add1.value2 hbase_table_add1.value3
1 tom 2 101
2 lily 3 102
3 wwn 4 103
Time taken: 0.591 seconds, Fetched: 3 row(s)




登陆hbase中查看:

hbase(main):001:0> list
TABLE
student
student_info
xyz
3 row(s) in 0.9700 seconds

=> ["student", "student_info", "xyz"]
NameError: undefined local variable or method `scy' for #<Object:0x3946075>

hbase(main):003:0> scan 'student_info';
ROW COLUMN+CELL
1 column=city:nu, timestamp=1486538721232, value=101
1 column=info:col1, timestamp=1486538721232, value=tom
1 column=info:col2, timestamp=1486538721232, value=2
2 column=city:nu, timestamp=1486538721232, value=102
2 column=info:col1, timestamp=1486538721232, value=lily
2 column=info:col2, timestamp=1486538721232, value=3
3 column=city:nu, timestamp=1486538721232, value=103
3 column=info:col1, timestamp=1486538721232, value=wwn
3 column=info:col2, timestamp=1486538721232, value=4
3 row(s) in 30.5480 seconds
put "student_info",'4','info:col1','haha';


hive

hive> select * from hbase_table_add1;
OK
hbase_table_add1.key hbase_table_add1.value1 hbase_table_add1.value2 hbase_table_add1.value3
1 tom 2 101
2 lily 3 102
3 wwn 4 103
4 haha NULL NULL
Time taken: 0.443 seconds, Fetched: 4 row(s)




这里有3个hive的列,(value1和value2,value3),2个hbase的列簇(info,city)


hive的2列(value,和value2)对应1个hbase的列簇(info,在hbase的列名称col1,col2),


hive的另外1列(value3)对应列nu位于city列簇。


这里实现了hive中表,多列存放到hbase少量固定的列簇中。