1.  Hive整合HBase原理
Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠Hive安装包 lib/hive-hbase-handler-0.13.0.jar工具类,它负责Hbase和Hive进行通信的。


Hive和HBase通信原理如下图:



1008-Hive访问HBase表数据_jar包



2.  Hive的安装


假设这里已经完成hive的安装,下面需要考虑相关的jar包


(1)考虑jar包


#删除$HIVA_HOME/lib目录下的的Zookeeper的jar包
rm -rf $HIVE_HOME/lib/zookeeper*


#拷贝生产环境下的Zookeeper的jar包到$HIVA_HOME/lib目录下
cp $ZOOKEEPER_HOME/zookeeper-3.4.6.jar $HIVA_HOME/lib



3、创建HBase表,将数据添加到HBase表中


4、创建HBase表映射的Hive表


5、在Hive下访问Hbase的表



(1) 编写Mapreduce,读取每行数据然后保存HBase


(2) 让Hive操作HBase表的数据


(3) Hive统计分析HBase表的数据,分析用户访客行为




3、查看hbase中的数据


3.1 全表查看


scan  'UserVisitInfo'


3.2 根据rowkey查看


hbase(main):012:0> get 'UserVisitInfo','20150706_3037487029517069460000'
COLUMN CELL
info:FirstAccessUrl timestamp=1443000064923, value=/m/subject/100000000000009_0.html
info:browser timestamp=1443000064923, value=Safari
info:browserVersion timestamp=1443000064923, value=533.1
info:firstAccessTime timestamp=1443000064923, value=20150706000104
info:operateSystem timestamp=1443000064923, value=linux
info:recentAccessTime timestamp=1443000065001, value=20150706030107
info:recentAccessUrl timestamp=1443000065001, value=/m/
info:screenColor timestamp=1443000064923, value=24
info:screenSize timestamp=1443000064923, value=480x854
info:siteType timestamp=1443000064923, value=0
info:userFlag timestamp=1443000064923, value=3037487029517069460000
info:userProvince timestamp=1443000064923, value=999
info:userVisitId timestamp=1443000064923, value=20150706_3037487029517069460000
info:visitCount timestamp=1443000065001, value=2
info:visitDay timestamp=1443000064923, value=20150706
info:visitFlag timestamp=1443000064923, value=3037487029517069460000
info:visitHour timestamp=1443000064923, value=0
info:visitIp timestamp=1443000064923, value=10.139.198.176
info:visitKeepTime timestamp=1443000065001, value=10803


 


 4、统计hive分析hbase表的数据


 4.1 创建HBase表,将数据添加到HBase表中


 UserVisitInfo


 4.2 创建HBase表映射的Hive表


 (1) 创建表


CREATE external TABLE User_Visit_Info
(
userVisitId string,
FirstAccessUrl string,
browserVersion string,
firstAccessTime string ,
operateSystem string,
recentAccessTime string,
recentAccessUrl string,
screenColor string,
screenSize string,
siteType string,
userFlag string,
userProvince string,
visitCount string,
visitDay string,
visitFlag string,
visitHour string,
visitIp string,
visitKeepTime string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
info:FirstAccessUrl,info:browserVersion,info:firstAccessTime,info:operateSystem,
info:recentAccessTime,info:recentAccessUrl,info:screenColor,info:screenSize,info:siteType,
info:userFlag,info:userProvince,info:visitCount,info:visitDay,info:visitFlag,info:visitHour,
info:visitIp,info:visitKeepTime
")
TBLPROPERTIES ("hbase.table.name" = "UserVisitInfo");


 4.3 使用Hive统计分析