hadoop 和hive配合使用,可以极大的提高大数据的运算效率,公司在线上的服务器每天都会产生大量的log,技术人员需要对这些log进行分析,取得用户的telemetry等数据,使用传统的脚本来分析,需要耗费大量的时间,因此搭建了一个hadoop集群(10台机器),并且使用hive配合,再辅助几个crontab脚本,这样每天上班之后,技术就可以拿到前一天的应用数据了,整个过程不需要人工干预,下面是整个集群的搭建过程:
写在前面:
为了节省篇幅,这里只使用三台机器
os:CentOS
搭建过程:
1. 准备,共三台机器
10.1.0.31 namenode (master) 10.1.0.32 datanode (slave) 10.1.0.33 datanode (slave)2.修改hosts和h哦s台
2.修改hosts和hostname
在所有机器上执行: 下面是以10.1.0.31 node0为例 sudo vim /etc/hosts 删除原来得配置,加入以下配置: 127.0.0.1 localhost 10.1.0.31 node0 10.1.0.32 node1 10.1.0.33 node2 sudo vim /etc/hostname 修改为如下: node0 sudo reboot
3.创建hadoop用户
在所有机器上执行: sudo useradd -m -s /bin/bash hadoop
4.实现各机器之间无密码访问
在namenode上执行: ssh-keygen -t rsa 将公钥拷贝到各namenode上
5.安装jdk,配置java环境
在所有机器上执行: 以namenode为例: chmod 755 jdk-6u37-linux-x64.bin ./jdk-6u37-linux-x64.bin ln -s jdk1.6.0_37/ java vim .bashrc 添加: export JAVA_HOME=/home/hadoop/java export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
6.安装hadoop,
在所有机器上执行: 以namenode为例 将hadoop得安装文件拷贝到/home/hadoop/目录下 tar -xvf hadoop-1.0.4.tar.gz ln -s hadoop-1.0.4/ hadoop
以下为hadoop的配置文件:
所有机器配置相同: cd hadoop/conf --------------- vim core-site.xml <property> <name>fs.default.name</name> <value>hdfs://10.1.0.31/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value>#设置临时目录防止机器重启后数据丢失 </property> ---------------- vim hadoop-env.sh 添加: export JAVA_HOME=/home/hadoop/java ---------------- vim hdfs-site.xml <property> <name>dfs.replication</name> #备份的点数量,不能超过机器数 <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.datanode.du.reserved</name> <value>1024</value> <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use. </description> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop/data/dfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/data/dfs/data</value> </property> ------------------ vim mapred-site.xml <property> <name>mapred.job.tracker</name> <value>10.1.0.31:9001</value> </property> ------------------ vim masters 10.1.0.31 ------------------ vim slaves 10.1.0.32 10.1.0.33 ------------------ 在namenode上新建tmp cd ~ && mkdir tmp 拷贝java hadoop tmp到不同的datanode ------------------
7.启动hadoop
在namenode上执行: ./hadoop/bin/hadoop namenode -format 查看是否格式化成功,如果成功在执行: ./hadoop/bin/start-all.sh 启动hadoop,并且观察是否有错误产生: 在namenode上: tail -f hadoop/logs/hadoop-hadoop-jobtracker-node0.log 在datanode: tail -f hadoop/logs/hadoop-hadoop-datanode-node1.log 如果都没有异常,说明hadoop启动成功。
8.安装hive(在namenode上)
拷贝hive的安装文件到/home/hadoop tar -xvf hive-0.10.0-bin.tar.gz ln -s hive-0.10.0-bin/ hive
9.配置hive
cd hive/conf <!-- Hive Execution Parameters --> <configuration> <property> <name>hive.cli.print.header</name>#显示表头 <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <property> <name>hive.metastore.local</name> <value>false</value> <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://111.111.111.111:3306/hivedb?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=latin1</value>#远程数据库 用户名及密码 <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivepass</value> <description>password to use against metastore database</description> </property> <property> <name>hive.stats.dbconnectionstring</name> <value>jdbc:mysql://111.111.111.111:3306/hive_stats?useUnicode=true&characterEncoding=latin1&user=hive&password=hivepass&createDatabaseIfNotExist=true</value> <description>The default connection string for the database that stores temporary hive statistics.</description> </property> <property> <name>hive.stats.dbconnectionstring</name> <value>jdbc:mysql://111.111.111.111:3306/hive_stats?useUnicode=true&characterEncoding=utf8&user=hive&password=hivepass&createDatabaseIfNotExist=true</value> <description>The default connection string for the database that stores temporary hive statistics.</description> </property> <property> <name>hive.stats.dbclass</name> <value>jdbc:mysql</value> <description>The default database that stores temporary hive statistics.</description> </property> <property> <name>hive.stats.jdbcdriver</name> <value>com.mysql.jdbc.Driver</value> <description>The JDBC driver for the database that stores temporary hive statistics.</description> </property> <property> <name>hive.metastore.uris</name> <value>thrift://127.0.0.1:9083</value> </property> </configuration> vim hive-env.sh 添加: export HADOOP_HOME=$HOME/hadoop
10.启动hive
启动hive metastore: ~/hive/bin/hive --service hiveserver &> hiveserver.log & ~/hive/bin/hive --service metastore &> metastore.log & 使用netstat -tulnp 查看: port:10000 port:9083
至此hadoop hive平台搭建完毕