一、hive相关知识复习
复习一下hadoop的数据仓库hive,它的元数据存储在第三方数据库中,实际数据存储在hadoop的hdfs上,所以hive不能脱离hadoop与第三方数据库单独存在,我们前面已经安装好了hadoop2.7.3版本,只要安装一个第三方数据库即可满足hive安装的依赖环境,这里我选择安装的是mysql5.5.46,hive选择安装的是1.2.1版本。
hive将结构化的数据文件映射为数据库表(元数据保存在mysql中),数据文件在hdfs上存储,通过将sql语句解析转换为mapreduce任务进行分布式计算运行。
hive官网对安装准备有如下描述;
Hive installation has these requirements:
- Java 1.7 (preferred).
Note: Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6, but prefer 1.7. Users are strongly advised to start moving to Java 1.8 (see HIVE-8607). - Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward).
Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x. - Hive is commonly used in production Linux and Windows environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac. Using it on Windows would require slightly different steps.
二、安装文件下载链接
hive:
http://apache.fayea.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz
mysql:
https://downloads.mysql.com/archives/get/file/MySQL-server-5.5.46-1.el7.x86_64.rpm
https://downloads.mysql.com/archives/get/file/MySQL-client-5.5.46-1.el7.x86_64.rpm
三、安装步骤
1、卸载操作系统中预安装的mysql依赖库
--下面操作均需要在root权限下操作
#yum remove mysql-libs
2、安装mysql客户端
[root@master hadoop]# rpm -ivh MySQL-client-5.5.46-1.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:MySQL-client-5.5.46-1.el7 ################################# [100%]
--slave01上也安装mysql的客户端
[root@slave01 hadoop]# rpm -ivh MySQL-client-5.5.46-1.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:MySQL-client-5.5.46-1.el7 ################################# [100%]
3、安装mysql服务器端
[root@master hadoop]# rpm -ivh MySQL-server-5.5.46-1.el7.x86_64.rpm
warning: MySQL-server-5.5.46-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:MySQL-server-5.5.46-1.el7 ################################# [100%]
170404 1:49:51 [Note] /usr/sbin/mysqld (mysqld 5.5.46) starting as process 33406 ...
170404 1:49:51 [Note] /usr/sbin/mysqld (mysqld 5.5.46) starting as process 33413 ...
--mysql安装成功后输出信息,提示要首先启动服务为mysql的root用户设置密码
PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !
To do so, start the server, then issue the following commands:
--mysql怕我不知道怎么为root设置密码,还非常慷慨的给出了实例
/usr/bin/mysqladmin -u root password 'new-password'
/usr/bin/mysqladmin -u root -h master password 'new-password'
--如果上面给出的更改密码方法你不爽的话,mysql还提供了套餐服务
Alternatively you can run:
/usr/bin/mysql_secure_installation
--下面还非常细心的建议在生产数据库中删除test库等必要的安全措施
which will also give you the option of removing the test
databases and anonymous user created by default. This is
strongly recommended for production servers.
See the manual for more instructions.
Please report any problems at http://bugs.mysql.com/
4、启动mysql数据库服务
[root@master hadoop]# service mysql start
Starting MySQL... SUCCESS!
5、设置root用户的登陆密码,这一步开始不需要在root权限下运行了
[hadoop@master ~]$ /usr/bin/mysqladmin -u root password 'mysql123'
--我的密码是为我这个老年痴呆症患者设计的
6、用root用户测试进入mysql数据库
[hadoop@master ~]$ mysql -uroot -pmysql123
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.46 MySQL Community Server (GPL)
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.02 sec)
7、创建hive元数据存储数据库,并授权hive用户访问
--创建数据库hivemeta
mysql> create database hivemeta;
Query OK, 1 row affected (0.05 sec)
--老版本的mysql安装成hive的元数据存储必须要设置数据库字符集为latin1,否则在hive中使用时会报错
mysql> alter database hivemeta character set latin1;
Query OK, 1 row affected (0.00 sec)
--创建mysql数据库登陆hive用户
mysql> create user 'hive'@'%' identified by 'mysql123';
Query OK, 0 rows affected (0.05 sec)
--设置权限
mysql> grant all privileges on *.* to 'hive'@'%' identified by 'mysql123' with grant option;
Query OK, 0 rows affected (0.05 sec)
--下面的重点让我跳了2晚上的坑,就是因为对mysql的用户登陆授权机制搞清晰;
grant all privileges on *.* to 'hive'@'localhost' identified by 'mysql123' with grant option;
grant all privileges on *.* to 'hive'@'master' identified by 'mysql123' with grant option;
正常情况下使用%授权应该指的是对所有主机授权,但是查询mysql的官方文档说明,%并不包括localhost和其他的域名,如果使用域名登陆必须要对域名进行单独授权;
%实际值得应该是IP地址。
验证测试:
mysql -h localhost -u hive -p
mysql -h master -u hive -p
如果不加上上面的授权,mysql登陆就会自己报错,hive其实是通过jdbc调用,如果使用mysql命令行都会报错,hive肯定也会调用失败。
--刷新权限到数据库中,设置生效
mysql> flush privileges;
Query OK, 0 rows affected (0.03 sec)
--验证hive用户授权情况,应该出现允许%,localhost,master三条记录
mysql> select user,host from mysql.user;
+------+-----------+
| user | host |
+------+-----------+
| hive | % |
| root | 127.0.0.1 |
| root | ::1 |
| | localhost |
| hive | localhost |
| root | localhost |
| | master |
| hive | master |
| root | master |
+------+-----------+
9 rows in set (0.10 sec)
8、配置集群hive的操作系统环境变量
vim /home/hadoop/.bashrc
export JAVA_HOME=/usr/java/jdk1.8.0_121
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_USER_NAME=hadoop
export HIVE_HOME=/home/hadoop/bigdata/hive
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$PATH
--导入环境变量文件
source /home/hadoop/.bashrc
--将环境配置文件拷贝到slave01、slave02的对应目录下
[hadoop@master ~]$ scp .bashrc hadoop@slave01:/home/hadoop/
.bashrc 100% 477 0.5KB/s 00:00
[hadoop@master ~]$ scp .bashrc hadoop@slave02:/home/hadoop/
.bashrc 100% 477 0.5KB/s 00:00
9、解压缩hive的安装文件,开始hive的解压安装
cd /home/hadoop/bigdata/
tar -zxf apache-hive-1.2.1-bin.tar.gz
mv apache-hive-1.2.1-bin hive
cd /home/hadoop/bigdata/hive/conf
--根据配置文件模板创建hive运行配置文件方便下一步填充必要内容
cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-log4j.properties.template hive-log4j.properties
10、复制mysql的jdbc驱动JAR包文件到/home/hadoop/bigdata/hive/lib目录下
[hadoop@master lib]$ ls -la mysql-connector-java-5.1.7-bin.jar
-rw-rw-r-- 1 hadoop hadoop 709922 Apr 5 06:12 mysql-connector-java-5.1.7-bin.jar
--因为mysql安装时默认不带这个jar包文件,所以需要单独从oracle的网站上去下载,没有这个JAR包hive无法正常调用mysql根本无法启动。
11、修改hive的配置文件
--配置hive运行环境
vim /home/hadoop/bigdata/hive/conf/hive-env.sh
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HIVE_CONF_DIR=/home/hadoop/bigdata/hive/conf
--配置hive日志log4j框架的日志存储路径
vim /home/hadoop/bigdata/hive/conf/hive-log4j.properties
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/home/hadoop/bigdata/hive/log
hive.log.file=hive.log
--配置hive运行主配置文件
vim /home/hadoop/bigdata/hive/conf/hive-site.xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://master:9000/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
--Hive使用此目录来存储用于查询的不同map/reduce阶段的计划以及存储这些阶段的中间输出,这里是hdfs的目的地。
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://master:9000/hive/scratchdir</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/home/hadoop/bigdata/hive/logs</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hivemeta?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql123</value>
<description>password to use against metastore database</description>
</property>
--当Hive以本地模式运行时,此目录用于临时文件
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/hadoop/bigdata/hive/logs</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/hadoop/bigdata/hive/logs</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
12、将hive安装文件拷贝到slave01上,使得slave01可以使用hive
scp -r /home/hadoop/bigdata/hive hadoop@slave01:/home/hadoop/bigdata/
13、在master上启动metastore服务
[hadoop@master ~]$ hive --service metastore
Starting Hive Metastore Server
--另起一个终端窗口验证hive的metastore服务是否启动成功
[hadoop@master lib]$ ps -ef | grep hive
hadoop 29327 27579 0 04:17 pts/1 00:00:24 /usr/java/jdk1.8.0_121/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/bigdata/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/bigdata/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/bigdata/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/bigdata/hive/lib/hive-service-1.2.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore
--如果验证正常就可以使用下面的方法在后台启动服务
nohup hive -service metastore&
14、在slave01上启动Hiveserver2服务
[hadoop@slave01 ~]$ hive --service hiveserver2
--另起一个终端窗口验证hive的hiveserver2服务是否启动成功
[hadoop@slave01 bin]$ ps -ef | grep hive
hadoop 18552 16412 2 07:08 pts/2 00:00:07 /usr/java/jdk1.8.0_121/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/bigdata/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/bigdata/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/bigdata/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/bigdata/hive/lib/hive-service-1.2.1.jar org.apache.hive.service.server.HiveServer2
--同样如果启动正常可以使用下面的后台启动方法
nohup hive --service hiveserver2&
15、在master或slave01上进入hive客户端验证
hive
hive> create database test;
OK
Time taken: 0.392 seconds
hive> show databases;
OK
default
test
Time taken: 0.023 seconds, Fetched: 2 row(s)
三、故障排错
--在master主机上使用master域名无法登陆,还是mysql授权问题导致
[hadoop@master ~]$ mysql -h master -u hive -pmysql123
ERROR 1045 (28000): Access denied for user 'hive'@'master' (using password: YES)
--mysql对用户授权出错导致hive启动报错,解决方案参见第7步
Exception in thread "main" java.lang.RuntimeException: java.net.ConnectException: Call From slave01/192.168.10.167 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.ConnectException: Call From slave01/192.168.10.167 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
--没有配置hive-site.xml文件中的hive.exec.local.scratchdir与hive.downloaded.resources.dir导致hive启动报错,参见第11步;
Logging initialized using configuration in file:/home/hadoop/bigdata/hive/conf/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:563)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 8 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 11 more
四、总结
hive的安装与第三方的环境结合太过紧密,尤其是mysql,如果对mysql的安装授权不太清楚会走很多弯路,首先在master节点上解压安装配置好hive,并且不要忘记下载mysql的jdbc驱动文件并把文件放置在hive的lib目录下,不要着急启动hive的服务,首先检查mysql的用户授权是否完整,并在客户端下测试使用主机名是否可以正常登陆,登陆后是否可以打开hivemeta数据库,这一步测试完成后再开始在master节点上启动hive的metastore服务,一开始不要使用后台启动方法,如果出错可以直接看到抛出的异常方便查找故障,所有的安装调试过程需要对每一步的原理与细节弄得非常清晰否则到处都是坑。