hadoop 和hive配合使用,可以极大的提高大数据的运算效率,公司在线上的服务器每天都会产生大量的log,技术人员需要对这些log进行分析,取得用户的telemetry等数据,使用传统的脚本来分析,需要耗费大量的时间,因此搭建了一个hadoop集群(10台机器),并且使用hive配合,再辅助几个crontab脚本,这样每天上班之后,技术就可以拿到前一天的应用数据了,整个过程不需要人工干预,下面是整个集群的搭建过程:

写在前面:

为了节省篇幅,这里只使用三台机器

os:CentOS


搭建过程:

1. 准备,共三台机器

10.1.0.31        namenode (master)
10.1.0.32        datanode   (slave)
10.1.0.33        datanode   (slave)2.修改hosts和h哦s台


2.修改hosts和hostname

在所有机器上执行:
下面是以10.1.0.31 node0为例
sudo vim /etc/hosts
删除原来得配置,加入以下配置:
127.0.0.1 localhost
10.1.0.31 node0
10.1.0.32 node1
10.1.0.33 node2
sudo vim /etc/hostname
修改为如下:
node0
sudo reboot


3.创建hadoop用户

在所有机器上执行:
sudo useradd -m -s /bin/bash hadoop


4.实现各机器之间无密码访问

在namenode上执行:
ssh-keygen -t rsa
将公钥拷贝到各namenode上


5.安装jdk,配置java环境

在所有机器上执行:
以namenode为例:
chmod 755 jdk-6u37-linux-x64.bin
./jdk-6u37-linux-x64.bin
ln -s jdk1.6.0_37/ java
vim .bashrc
添加:
export JAVA_HOME=/home/hadoop/java
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar


6.安装hadoop,

在所有机器上执行:
以namenode为例
将hadoop得安装文件拷贝到/home/hadoop/目录下
tar -xvf hadoop-1.0.4.tar.gz
ln -s hadoop-1.0.4/ hadoop

   以下为hadoop的配置文件:

所有机器配置相同:
cd hadoop/conf
---------------
vim core-site.xml
         <property>
                 <name>fs.default.name</name>
                 <value>hdfs://10.1.0.31/</value>
         </property>
         <property>
                 <name>hadoop.tmp.dir</name>
                 <value>/home/hadoop/tmp</value>#设置临时目录防止机器重启后数据丢失
         </property>
----------------
vim hadoop-env.sh
添加:
export JAVA_HOME=/home/hadoop/java
----------------
vim hdfs-site.xml
<property>
  <name>dfs.replication</name> #备份的点数量,不能超过机器数
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<property>
  <name>dfs.datanode.du.reserved</name>
  <value>1024</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>
<property>
  <name>dfs.name.dir</name>
  <value>/home/hadoop/data/dfs/name</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hadoop/data/dfs/data</value>
</property>
------------------
vim mapred-site.xml
<property>
          <name>mapred.job.tracker</name>
          <value>10.1.0.31:9001</value>
</property>
------------------
vim masters
10.1.0.31
------------------
vim slaves
10.1.0.32
10.1.0.33
------------------
在namenode上新建tmp
cd ~ && mkdir tmp
拷贝java hadoop tmp到不同的datanode
------------------


7.启动hadoop

在namenode上执行:
./hadoop/bin/hadoop namenode -format
查看是否格式化成功,如果成功在执行:
./hadoop/bin/start-all.sh
启动hadoop,并且观察是否有错误产生:
在namenode上:
tail -f hadoop/logs/hadoop-hadoop-jobtracker-node0.log
在datanode:
tail -f hadoop/logs/hadoop-hadoop-datanode-node1.log
如果都没有异常,说明hadoop启动成功。


8.安装hive(在namenode上)

拷贝hive的安装文件到/home/hadoop
tar -xvf hive-0.10.0-bin.tar.gz
ln -s hive-0.10.0-bin/ hive

9.配置hive

cd hive/conf
<!-- Hive Execution Parameters -->
<configuration>
<property>
  <name>hive.cli.print.header</name>#显示表头
  <value>true</value>
  <description>Whether to print the names of the columns in query output.</description>
</property>
<property>
  <name>hive.metastore.local</name>
  <value>false</value>
  <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://111.111.111.111:3306/hivedb?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=latin1</value>#远程数据库 用户名及密码
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hivepass</value>
  <description>password to use against metastore database</description>
</property>
<property>
  <name>hive.stats.dbconnectionstring</name>
  <value>jdbc:mysql://111.111.111.111:3306/hive_stats?useUnicode=true&amp;characterEncoding=latin1&amp;user=hive&amp;password=hivepass&amp;createDatabaseIfNotExist=true</value>
  <description>The default connection string for the database that stores temporary hive statistics.</description>
</property>
<property>
  <name>hive.stats.dbconnectionstring</name>
  <value>jdbc:mysql://111.111.111.111:3306/hive_stats?useUnicode=true&amp;characterEncoding=utf8&amp;user=hive&amp;password=hivepass&amp;createDatabaseIfNotExist=true</value>
  <description>The default connection string for the database that stores temporary hive statistics.</description>
</property>
<property>
  <name>hive.stats.dbclass</name>
  <value>jdbc:mysql</value>
  <description>The default database that stores temporary hive statistics.</description>
</property>
<property>
  <name>hive.stats.jdbcdriver</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>The JDBC driver for the database that stores temporary hive statistics.</description>
</property>
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://127.0.0.1:9083</value>
</property>
</configuration>
vim hive-env.sh
添加:
export HADOOP_HOME=$HOME/hadoop

10.启动hive

启动hive metastore:
~/hive/bin/hive --service hiveserver &> hiveserver.log &
~/hive/bin/hive --service metastore &> metastore.log &
使用netstat -tulnp 查看:
port:10000
port:9083


至此hadoop hive平台搭建完毕