为了探索Hive 的神秘与伟大,我们踏上了Hive的学习之路,这个工具的好与不好且先不谈,先来安装Hive吧。。。

我们用MySQL存储Hive 的元数据Metastore,所以先安装MySQL。具体安装及配置步骤如下:

我的整个操作过程分7个部分:

1.安装 MySQL

2.安装 Hive

3.将 Hive 元数据 Metastore配置到 MySQL

4.Hadoop集群配置

5.Hive 数据仓库位置配置

6. 查询后信息显示配置(根据自己喜好配或不配)

7.Hive日志文件配置

----------------------------------OK,开始我的 长篇大论-------------------------------------------------

》》》》》》>1.  安装MySQL

step 1: 查看mysql 是否安装,如果已经安装,卸载原有MySQL

# yum list installed | grep mysql

#yum -y remove mysql-libs.x86-64

step 1: 下载压缩包并安装

# rpm -Uvh http://repo.mysql.com/mysql-community-release-el6-5.noarch.rpm(下载网址)

#yum install mysql-community-server -y

或者

# yum -y install mysql mysql-server mysql-devel

# wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm

# rpm -ivh mysql-community-release-el7-5.noarch.rpm

# yum -y install mysql-community-server

step3: 开启mysql

#service mysqld start

step4: 设置root用户登录密码

# mysqladmin -uroot password 'rootroot'

step5: 登录mysql

# mysql -uroot -prootroot

step6: 登录后设置访问权限,以便使集群内部机器访问

mysql> grant all privileges on *.* to root@'%' identified by 'rootrooot';

mysql>

可以 > exit; 退出,

此时,MySQL已经完成安装。

 》》》》》》> 2.  安装 Hive

step1: 将本机下载好多hive安装包 apache-hive-1.2.1-bin.tar.gz 上传之虚拟机(hadoop011)

Alt +p 进入sftp界面

sftp> put G:/Hive/apache-hive-1.2.1-bin.tar.gz

然后进入home目录 #cd ~ 找到文件,将文件包移到指定路径下

# mv apache-hive-1.2.1-bin.tar.gz /opt/soft

step2:  解压安装包到/opt/app/

# tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/app/

进入/opt/app/修改文件名称

# cd /opt/app

#mv apache-hive-1.2.1-bin apache-hive-1.2.1

step 3:  配置文件 hive-env.sh 

进入/opt/app/apache-hive-1.2.1/conf 找到文件 hive-env.sh.template,修改文件名称

[root@hadoop011 conf]# mv hive-env.sh.template hive-env.sh

[root@hadoop011 conf]# vim hive-env.sh

进入文件hive-env.sh,添加HADOOP_HOME, HIVE_CONF_DIR路径

export HADOOP_HOME=/opt/app/hadoop-2.7.2

export HIVE_CONF_DIR=/opt/app/apache-hive-1.2.1/conf

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE
# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi
# The heap size of the jvm stared by hive shell script can be controlled via:
# export HADOOP_HEAPSIZE=1024
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be
# appropriate for hive server (hwi etc).
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/app/hadoop-2.7.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/app/apache-hive-1.2.1/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
"hive-env.sh" 54L, 2407C 已写入

此时,可以在目录/opt/app/apache-hive-1.2.1/conf下启动Hive了,但是我们的整体工作还未配置完成,下面继续,,,

》》》》》》> 3. Hive 元数据(Metastore)配置到MySQL

step 1: 将下载好的驱动文件上传至本机

Alt+p:

sftp> put G:/Hive/mysql-connector-java-5.1.37-bin.jar

Uploading mysql-connector-java-5.1.37-bin.jar to /root/mysql-connector-java-5.1.37-bin.jar

  100% 962KB    962KB/s 00:00:00    

G:/Hive/mysql-connector-java-5.1.37-bin.jar: 985603 bytes transferred in 0 seconds (962 KB/s)

sftp>

将mysql-connector-java-5.1.37-bin.jar移动(或拷贝)到 /opt/app/apache-hive-1.2.1/lib/

[root@hadoop011 ~]# mv mysql-connector-java-5.1.37-bin.jar /opt/app/apache-hive-1.2.1/lib/

step 2: 配置Metastore 到mySQL

在/opt/app/apache-hive-1.2.1/conf/下创建hive-site.xml

[root@hadoop011 ~]# cd /opt/app/apache-hive-1.2.1/conf

[root@hadoop011 conf]# touch hive-site.xml

将配置内容(https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin)放入 hive-site.xml文件中

[root@hadoop011 conf]# vim hive-site.xml

配置信息:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
	  <name>javax.jdo.option.ConnectionURL</name>
	  <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
	  <description>JDBC connect string for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionDriverName</name>
	  <value>com.mysql.jdbc.Driver</value>
	  <description>Driver class name for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionUserName</name>
	  <value>root</value>
	  <description>username to use against metastore database</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionPassword</name>
	  <value>000000</value>
	  <description>password to use against metastore database</description>
	</property>
</configuration>

其中对内容中两处作如下修改:

a.   我的MySQL在hadoop015上面

<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>

——> <value>jdbc:mysql://hadoop015:3306/metastore?createDatabaseIfNotExist=true</value>

b.  MySQL密码已设置为‘rootroot’,所以此处需要修改

<property>

        <name>javax.jdo.option.ConnectionPassword</name>

rootrooot</value>

        <description>password to use against metastore database</description>

</property>

step 3:配置完成,可以重启查看

#reboot

#service mysqld start

#mysql -uroot -prootroot

mysql> show databases:

此时,可以发现数据库 metastore ,说明配置成功

当然,也可以启动集群和Hive

你认为结束了吗,NONONO,可以继续其它配置,,,,,,,

》》》》》》> 4. Hadoop集群配置

step 1:启动集群

[root@hadoop011 ~]# start-dfs.sh

[root@hadoop012 ~]# start-yarn.sh

step 2:在HDFS上创建目录/user/hive/warehouse(可以自己设置),作为HIVE 的数据仓库

[root@hadoop011 conf]# hadoop fs -mkdir -p /user/hive/warehouse

step 2:修改权限

[root@hadoop011 conf]# hadoop fs -chmod g+w /user/hive/warehouse

 

》》》》》》>5. 数据仓库位置配置

修改default数据仓库原始位置(将目录/opt/app/apache-hive-1.2.1/conf下的hive-default.xml.template如下配置信息拷贝到hive-site.xml文件中)。

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

<description>location of default database for the warehouse</description>

</property>

》》》》》》>6. 查询后信息显示配置

为了实现查询后显示当前数据库,以及查询表的头信息,可在hive-site.xml文件中添加下面配置信息:

<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

<property>
	<name>hive.cli.print.current.db</name>
	<value>true</value>
</property>

》》》》》》> 7. Hive日志文件配置

查看日志文件:

#cd /opt/app/apache-hive-1.2.1/conf 

发现日志文件hive-log4j.properties.template,修改名称为 hive-log4j.properties

[root@hadoop011 conf]# mv hive-log4j.properties.template hive-log4j.properties

[root@hadoop011 conf]# pwd

/opt/app/apache-hive-1.2.1/conf

进入日志文件,修改LOG存放路径

[root@hadoop011 conf]# vim hive-log4j.properties

日志文件信息:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Define some default values that can be overridden by system properties
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/opt/app/apache-hive-1.2.1/logs
hive.log.file=hive.log

原始数据: hive.log.dir=${java.io.tmpdir}/${user.name}

修改后路径: hive.log.dir=/opt/app/apache-hive-1.2.1/logs

---------------------------------

OK,到此为止,基本所需的东西已配置完毕。当然,根据需要也可以配置其它信息,如参数配置等等

我想,这些对于我目前的学习,已足矣

那么,最后的最后,启动HIVE

# reboot

[root@hadoop011 ~]# start-dfs.sh

[root@hadoop012 ~]# start-yarn.sh

进入目录 /opt/app/apache-hive-1.2.1/conf

[root@hadoop011 bin]# ./hive

------------ 一同开启学习之路 --------------------------------