Hive可以视为在Hadoop和HDFS之上为用户封装一层便于用户使用的接口,该接口有丰富的样式,包括命令终端、Web UI以及JDBC/ODBC等(本文都将讲到)。因此Hive的安装依赖于Hadoop。下面介绍如何下载、安装、配置和使用Hive。

一.Hive安装和配置

1.先决条件

       ubuntu下已经安装好hadoop(我装的是hadoop2.4.0,系统为ubuntu13.10)。

2.下载Hive安装包

       当前Hive最新版本为0.13.1,可到apache官网下载,下面给出链接:

       http://apache.fayea.com/apache-mirror/hive/

       链接下提供了两种安装包:

       apache-hive-0.13.1-bin.tar.gz:二进制版,已经编译好的,下载解压后就直接能用。

       apache-hive-0.13.1-src.tar.gz:源代码版,下载后要先编译后才能用,编译用mvn即可。

       下载二进制版,然后将其解压到你Hadoop所在的那个目录下(其实解压的哪里都可以,但毕竟是和Hadoop相关的嘛放在一起方便管理,我的是在/opt下,以下路径中请将/opt修改为你Hive所在的路径)

        解压:tar -zvxf apache-hive-0.13.1-bin.tar.gz /opt

        重命名下:mv /opt/apache-hive-0.13.1-bin /opt/hive-0.13.1

3.配置系统环境变量/etc/profile或/root/.bashrc

        export HIVE_HOME=/opt/hive-0.13.1

        export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf

        source /etc/profile 使刚刚的配置生效

4.配置Hive

hive的配置文件放在$HIVE_HOME/conf下,里面有4个默认的配置文件模板

          hive-default.xml.template                           默认模板

          hive-env.sh.template                hive-env.sh默认配置

          hive-exec-log4j.properties.template    exec默认配置

          hive-log4j.properties.template               log默认配置
        可不做任何修改hive也能运行,默认的配置元数据是存放在Derby数据库里面的,大多数人都不怎么熟悉,我们得改用mysql来存储我们的元数据,以及修改数据存放位置和日志存放位置等使得我们必须配置自己的环境,下面介绍如何配置。
        (1)创建配置文件,直接copy默认配置文件再修改即可,用户自定义配置会覆盖默认配置

cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
        cp $HIVE_HOME/conf/hive-env.sh.template $HIVE_HOME/conf/hive-env.sh
        cp $HIVE_HOME/conf/hive-exec-log4j.properties.template $HIVE_HOME/conf/hive-exec-log4j.properties
        cp $HIVE_HOME/conf/hive-log4j.properties.template $HIVE_HOME/conf/hive-log4j.properties


        (2)修改 hive-env.sh

vi $HIVE_HOME/conf/hive-env.sh 
        export HADOOP_HOME=/home/hadoop/hadoop-2.4.0
        export HIVE_CONF_DIR=/home/hadoop/hive-0.13.1/conf


        (3)修改 hive-log4j.properties

mkdir $HIVE_HOME/logs
         vi $HIVE_HOME/conf/hive-log4j.properties
         hive.log.dir=/opt/hive-0.13.1/logs


        (4)修改 hive-site.xml

vi $HIVE_HOME/conf/hive-site.xml
      <configuration>
      <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/hive/warehouse</value>
      </property>
      <property>
        <name>hive.exec.scratchdir</name>
        <value>/hive/scratchdir</value>
      </property>
      <property>
        <name>hive.querylog.location</name>
        <value>/opt/hive-0.13.1/logs</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
      </property>
      <property>
        <name>hive.aux.jars.path</name>
        <value>file:///opt/hive/lib/hive-hbase-handler-0.13.1.jar,file:///opt/hive/lib/hbase-client-0.98.2-hadoop2.jar,file:///opt/hive/lib/hbase-common-0.98.2-hadoop2.jar,file:///opt/hive/lib/hbase-common-0.98.2-hadoop2-tests.jar,file:///opt/hive/lib/hbase-protocol-0.98.2-hadoop2.jar,file:///opt/hive/lib/hbase-server-0.98.2-hadoop2.jar,file:///opt/hive/lib/htrace-core-2.04.jar,file:///opt/hive/lib/zookeeper-3.4.6.jar,file:///opt/hive/lib/protobuf-java-2.5.0.jar,file:///opt/hive/lib/guava-12.0.1.jar</value>
      </property>


        下面解释下一些重要的配置项:
         hive.metastore.warehouse.dir:指定hive的数据存储目录,指定的是HDFS上的位置,默认值:            /user/hive/warehouse
         hive.exec.scratchdir:指定hive的临时数据目录,默认位置为:/tmp/hive-${user.name}
               javax.jdo.option.ConnectionURL:指定hive连接的数据库的数据库连接字符串
         javax.jdo.option.ConnectionDriverName:指定驱动的类入口名称
         hive.aux.jars.path:是与hbase整合的时候需要用到的jar包,必须加上
5.复制hbase库到hive下面

cp $HBASE_HOME/lib/hbase-* $HIVE_HOME/lib/
          cp $HBASE_HOME/lib/htrace-core-2.04.jar $HIVE_HOME/lib/
          cp $HBASE_HOME/lib/zookeeper-3.4.6.jar $HIVE_HOME/lib/
          cp $HBASE_HOME/lib/protobuf-java-2.5.0.jar $HIVE_HOME/lib/
          cp $HBASE_HOME/lib/guava-12.0.1.jar $HIVE_HOME/lib/


6.上传jdbc jar包
         默认hive使用Derby数据库存放元数据,并且也集成了Derby数据库及连接驱动jar包,但此处我们换成了MySQL作为数据库,所以还必须得有MySQL的JDBC驱动包。
         下载链接:http://dev.mysql.com/downloads/connector/j/5.0.html           解压:tar -zxvf mysql-connector-java-5.1.32.tar.gz
          将驱动包复制到$HIVE_HOME/lib下:cp mysql-connector-java-5.1.32-bin.jar$HIVE_HOME/lib        
 7.安装并配置MySQL
          如果你还没装MySQL,就先装MySQL
          安装mysql,执行命令:sudo apt-get install mysql-server
          安装时用户名设为root,密码为123456,同上面配置
          安装完成后,只拥有root用户。下面我们创建Hive系统的用户权限,步骤如下:
          (1)创建用户
           CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
          (2)赋予权限
           GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
          (3)强制写出
           flush privileges;
           此外,为了使远程用户也能访问MySQL,需要修改/etc/mysql/my.cnf文件,将bind-address一行注释掉,该参数绑定本地用户访问。
           配置完后,重启MySQL:sudo /etc/ini.d/mysql restart
           至此,Hive配置完成。

二.Hive shell

1.创建测试数据,以及数据仓库目录

 

vi /opt/hive-0.13.1/testdata001.dat
luffy,20
zero,21
hadoop fs -mkdir -p hive/warehouse</span>



           2.使用shell命令,测试hive

root@Ubuntu-Kylin:/opt/hive# hive
14/08/28 21:26:13 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in file:/opt/hive/conf/hive-log4j.properties
hive> show databases;
OK
default
test1
test2
test3
Time taken: 1.375 seconds, Fetched: 4 row(s)
hive> create database test4;
OK
Time taken: 0.946 seconds
hive> show databases;
OK
default
test1
test2
test3
test4
Time taken: 0.043 seconds, Fetched: 5 row(s)
hive> use test4;
OK
Time taken: 0.047 seconds
hive> create table testtable (name string,age int) row format delimited fields terminated by ',' stored as textfile;
OK
Time taken: 0.813 seconds
hive> show tables;
OK
testtable
Time taken: 0.056 seconds, Fetched: 1 row(s)
hive> load data local inpath '/opt/hive/testdata001.dat' overwrite into table testtable;
Copying data from file:/opt/hive/testdata001.dat
Copying file: file:/opt/hive/testdata001.dat
Loading data to table test4.testtable
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://localhost:9000/hive/warehouse/test4.db/testtable
Table test4.testtable stats: [numFiles=1, numRows=0, totalSize=17, rawDataSize=0]
OK
Time taken: 2.532 seconds
hive> select * from testtable;
OK
luffy    20
zero    21
Time taken: 1.061 seconds, Fetched: 2 row(s)

           至此,hive测试成功。
           3.hive to hbase(Hive中的表数据导入到Hbase中去)

hive> use test4;
OK
Time taken: 0.043 seconds
hive> create table hive2hbase_1(key string,value int) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_1");
OK
Time taken: 6.56 seconds
hive> show tables;
OK
hive2hbase_1
testtable
Time taken: 0.049 seconds, Fetched: 2 row(s)</span>
</span> 
  将testtable表中的数据导入到表hive2hbase_1中,会自动同步到hbase 
  
 
  hive> insert overwrite table hive2hbase_1 select * from testtable;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1409199368386_0002, Tracking URL = http://Ubuntu-Kylin:8088/proxy/application_1409199368386_0002/
Kill Command = /opt/hadoop-2.4.0/bin/hadoop job  -kill job_1409199368386_0002
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-08-28 22:05:00,025 Stage-0 map = 0%,  reduce = 0%
2014-08-28 22:05:34,630 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 6.74 sec
MapReduce Total cumulative CPU time: 6 seconds 740 msec
Ended Job = job_1409199368386_0002
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 8.11 sec   HDFS Read: 242 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 110 msec
OK
Time taken: 89.487 seconds
hive> select * from hive2hbase_1;
OK
luffy	20
zero	21
Time taken: 0.79 seconds, Fetched: 2 row(s)

5.用shell连接hbase,查看hive过来的数据是否已经存在

root@Ubuntu-Kylin:/opt/hive# hbase shell
2014-08-28 22:34:47,023 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-08-28 22:34:47,240 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-08-28 22:34:47,322 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-08-28 22:34:47,395 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-08-28 22:34:47,464 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

hbase(main):001:0> list
TABLE                                                                           
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase-0.98.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hbase2hive                                                                      
hive2hbase                                                                      
hive2hbase_1                                                                    
student                                                                         
teacher                                                                         
test                                                                            
6 row(s) in 5.5730 seconds

=> ["hbase2hive", "hive2hbase", "hive2hbase_1", "student", "teacher", "test"]
hbase(main):002:0> scan 'hive2hbase_1'
ROW                   COLUMN+CELL                                               
 luffy                column=cf1:val, timestamp=1409234737169, value=20         
 zero                 column=cf1:val, timestamp=1409234737169, value=21         
2 row(s) in 0.8880 seconds

至此,hive to hbase 测试功能正常。
           6.hbase to hive(Hbase中的表数据导入到Hive)

1)、在hbase下创建表hbase2hive_1
 
  hbase(main):003:0> create 'hbase2hive_1','name','age'
0 row(s) in 1.1060 seconds

=> Hbase::Table - hbase2hive_1
hbase(main):004:0> put 'hbase2hive_1','lucy','age','19'
0 row(s) in 0.2000 seconds

hbase(main):005:0> put 'hbase2hive_1','lazi','age','20'
0 row(s) in 0.0160 seconds

hbase(main):006:0> scan 'hbase2hive_1'
ROW                   COLUMN+CELL                                               
 lazi                 column=age:, timestamp=1409236970365, value=20            
 lucy                 column=age:, timestamp=1409236934766, value=19            
2 row(s) in 0.0430 seconds

2)、Hive下创建表连接Hbase中的表

root@Ubuntu-Kylin:/opt/hive# hive
14/08/28 22:45:12 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in file:/opt/hive/conf/hive-log4j.properties
hive> show databases;
OK
default
test1
test2
test4
Time taken: 1.495 seconds, Fetched: 4 row(s)
hive> use test4;
OK
Time taken: 0.048 seconds
hive> show tables;
OK
hive2hbase_1
testtable
Time taken: 0.059 seconds, Fetched: 2 row(s)
hive> create external table hbase2hive_1(key string,value map<string,string>) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="age:") TBLPROPERTIES ("hbase.table.name" = "hbase2hive_1");
OK
Time taken: 0.385 seconds
hive> select * from hbase2hive_1;
OK
lazi    {"":"20"}
lucy    {"":"19"}
Time taken: 0.661 seconds, Fetched: 2 row(s)

至此,hbase to hive 测试成功。

三.Hive 网络(Web UI)接口

           使用Hive的网络接口需要修改配置文件hive-site.xml
<property>
            <name>hive.hwi.war.file</name>            <value>lib/hive-hwi-0.13.1.war</value>
            <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
            </property>

            <property>
            <name>hive.hwi.listen.host</name>
            <value>0.0.0.0</value>
            <description>This is the host address the Hive Web Interface will listen on</description>
            </property>

            <property>
            <name>hive.hwi.listen.port</name>
            <value>9999</value>
            <description>This is the port the Hive Web Interface will listen on</description>
            </property>
           这里需要注意的是hive0.13.1版本中没有hive-hwi-0.13.1.war这个包,可以到网上去下,我的解决方法是:去源码中找到hwi/web/包。先用zip 打包成.zip文件,然后在修改后缀名得到。
wget http://apache.fayea.com/apache-mirror/hive/hive-0.13.1/apache-hive-0.13.1-src.tar.gz
tar -zxvf apache-hive-0.13.1-src.tar.gz
cd apache-hive-0.13.1-src
cd hwi/web
zip hive-hwi-0.13.1.zip ./*
修改后缀名为war
mv hive-hwi-0.13.1.war $HIVE_HOME/lib</span>
      
配置完成后,开启服务:
root@Ubuntu-Kylin:/opt/hive# hive --service hwi
14/08/29 00:11:07 INFO hwi.HWIServer: HWI is starting up
14/08/29 00:11:10 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
14/08/29 00:11:10 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
14/08/29 00:11:10 INFO mortbay.log: jetty-6.1.26
14/08/29 00:11:10 INFO mortbay.log: Extract /opt/hive/lib/hive-hwi-0.13.1.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.13.1.war__hwi__.xvnhjk/webapp
14/08/29 00:11:11 INFO mortbay.log: Started SocketConnector@0.0.0.0:9999
   
这样就能通过浏览器访问Hive了,输入地址:http:/localhost:9999/hwi,界面如图:

centos yum安装hive linux hive安装_hive和hbase整合


          可以看到Hive的网络接口拉近了用户和系统之间的距离。可以通过创建会话,并进行查询。


四.Hive的JDBC 接口

          Eclipse环境配置
          在Eclipse中新建一个Java项目,我的命名:HiveJdbcClient 之后右键项目,点击Build Path->Configure Build Path->Libraries 将$HIVE_HOME/lib下的全部Jar包和hadoop-common-2.4.0.jar添加到项目中。
          Eclipse上运行Hive程序时:需要hive开启端口监听用户的连接,在终端输入命令:

root@Ubuntu-Kylin:/opt/hive# hive --service hiveserverStarting Hive Thrift Server

  下面是一个用java编写的JDBC客户端访问的代码样例:

package first.hive;import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.Statement;

public class HiveJdbcClient {

	public static void main(String[] args) throws SQLException{
		// TODO Auto-generated method stub
		try{
			//注册jdbc驱动
			Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
		}catch (ClassNotFoundException e){
			e.printStackTrace();
			System.exit(1);
		}
		
		//创建链接
		Connection con=DriverManager.getConnection("jdbc:hive://127.0.0.1:10000/test2","","");
		
		//statement用来执行sql语句
		Statement stmt=con.createStatement();
		
		//hive QL 测试语句
		String tablename="u1_data";
		
		//删除表
		stmt.executeQuery("drop table "+tablename);
		
		//创建表
		ResultSet res=stmt.executeQuery("create table "+tablename+"(userid int,movieid int,rating int,city string,viewTime string)"+"row format delimited fields terminated by '\t' stored as textfile");
		
		//show table 语句
		String sql="show tables";
		System.out.println("Running: "+sql+":");
		res=stmt.executeQuery(sql);
		if(res.next()){
			System.out.println(res.getString(1));
		}
		
		//describe table 语句
		sql="describe "+tablename;
		System.out.println("Running: "+sql+":");
		res=stmt.executeQuery(sql);
		while(res.next()){
			System.out.println(res.getString(1)+"\t"+res.getString(2));
		}
		
		//load data 语句
		String filepath="/home/dashengong/workspace/u1_data.dat";
		sql= "load data local inpath '"+filepath+"' overwrite into table "+tablename;
		System.out.println("Running: "+sql+":");
		res=stmt.executeQuery(sql);
		
		//select query:选取前5条记录
		sql="select * from "+tablename+" limit 5";
		System.out.println("Running: "+sql+":");
		res=stmt.executeQuery(sql);
		while(res.next()){
			System.out.println(String.valueOf(res.getString(3)+"\t"+res.getString(4)));
		}
		
		//hive query:统计记录条数
		sql="select count(*) from "+tablename;
		System.out.println("Running: "+sql+":");
		res=stmt.executeQuery(sql);
		while(res.next()){
			System.out.println(res.getString(1));
		}
	}
}


          

将项目运行在Hadoop上,成功!


终端输出:


OK

OK
OK
OK
Copying data from file:/home/dashengong/workspace/u1_data.dat
Copying file: file:/home/dashengong/workspace/u1_data.dat
Loading data to table default.u1_data
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://localhost:9000/hive/warehouse/u1_data
Table default.u1_data stats: [numFiles=1, numRows=0, totalSize=52, rawDataSize=0]
OK
OK
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1409199368386_0003, Tracking URL = http://Ubuntu-Kylin:8088/proxy/application_1409199368386_0003/
Kill Command = /opt/hadoop-2.4.0/bin/hadoop job  -kill job_1409199368386_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-08-29 00:38:12,026 Stage-1 map = 0%,  reduce = 0%
2014-08-29 00:38:48,126 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec
2014-08-29 00:39:17,752 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.47 sec
MapReduce Total cumulative CPU time: 6 seconds 470 msec
Ended Job = job_1409199368386_0003
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.47 sec   HDFS Read: 262 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 470 msec
OK

eclipse控制台输出:


log4j:WARN No appenders could be found for logger (org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe).


log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Running: show tables:
u1_data
Running: describe u1_data:
userid              	int                 
movieid             	int                 
rating              	int                 
city                	string              
viewtime            	string              
Running: load data local inpath '/home/dashengong/workspace/u1_data.dat' overwrite into table u1_data:
Running: select * from u1_data limit 5:
90	beijing
85	chengdu
Running: select count(*) from u1_data:
2

查看HDFS:



centos yum安装hive linux hive安装_linux下hive-0.13.1_02


           至此,hive在eclipse上配置成功。




注:在进行上述的所有测试时,务必保证hadoop,hbase,mysql都启动起来了!最后,如有错误,请指正!