//hadoop安装配置---coco


# by coco
#2014-07-25


本资料主要介绍hadoop的安装配置过程。如有不懂的请联系:qq:120890945
本次环境以3台虚拟机为主:
192.168.8.96   db96
192.168.8.98   db98
192.168.8.99   db99


1. hive是建立在hadoop基础之上的,我们先配置hadoop环境。
 配置JDK环境。


下载jdk压缩包,解压到对应的目录。在这里解压到/usr/java 下面
[root@db96 java]# /usr/java/latest/bin/java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
[root@db96 java]# echo "export JAVA_HOME=/usr/java/latest">/etc/profile
profile    profile.d/ 
[root@db96 java]# echo "export JAVA_HOME=/usr/java/latest">/etc/profile.d/java.sh
[root@db96 java]# echo "PATH=$PATH:$JAVA_HOME/bin">>/etc/profile.d/java.sh 


如果用户不期望修改影响到系统里所有的用户,一个代替方案就是,将PATH和JAVA_HOME环境变量的定义加入到用户的$HOME/.bashrc文件中。
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$JAVA_HOME/bin


2. 安装配置hadoop (db96 master db98,db99slave)


[root@db96 ~]# wget http://www.us.apache.org/dist/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
[root@db96 ~]# tar -zxvf hadoop-2.2.0.tar.gz -C /usr/local/
[root@db96 ~]# cd /usr/local/
[root@db96 local]# ln -s /usr/local/hadoop-2.2.0/ hadoop
[root@db96 local]# echo "export HADOOP_HOME=/usr/local/hadoop">/etc/profile.d/hadoop.sh
[root@db96 local]# echo "PATH=$PATH:$HADOOP_HOME/bin">>/etc/profile.d/hadoop.sh
[root@db96 local]# vim /etc/profile   //编辑profile 在最后添加如下内容:
export JAVA_HOME=/usr/java/latest
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$PATH 
保存退出,并运行 source /etc/profile 使环境变量立即生效。


在配置之前需要在主机上本地文件系统创建一下文件夹:
~/dfs/name
~/dfs/data
~/temp
[root@db96 hadoop]# mkdir -p /data/hadoop/dfs/name
[root@db96 hadoop]# mkdir -p /data/hadoop/dfs/data
[root@db96 hadoop]# mkdir -p /data/hadoop/temp


这里要涉及到的配置文件有7个:
~/hadoop/etc/hadoop/hadoop-env.sh
~/hadoop/etc/hadoop/yarn-env.sh
~/hadoop/etc/hadoop/slaves
~/hadoop/etc/hadoop/core-site.xml
~/hadoop/etc/hadoop/hdfs-site.xml
~/hadoop/etc/hadoop/mapred-site.xml
~/hadoop/etc/hadoop/yarn-site.xml


配置文件1:hadoop-env.sh
[root@db96 hadoop]# vim ./hadoop/etc/hadoop/hadoop-env.sh   //添加java_home路径
export JAVA_HOME=/usr/java/latest


配置文件2:yarn-env.sh  修改JAVA_HOME值
[root@db96 hadoop]# vim yarn-env.sh  //添加java_home路径
export JAVA_HOME=/usr/java/latest


配置文件3:slaves (这个文件里面保存所有slave节点)
[root@db96 hadoop]# vi slaves     //写入一下内容:
db98
db99


配置文件4:core-site.xml     (修改一下3个点)
[root@db96 hadoop]# vim core-site.xml  
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://db96:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data/hadoop/temp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>                                                                                                 
</configuration>


配置文件5:hdfs-site.xml
[root@db96 hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>db96:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>


配置文件6:mapred-site.xml
[root@db96 hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>db96:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>db96:19888</value>                                                                                
</property>                                                                                              
</configuration>


配置文件7:yarn-site.xml
[root@db96 hadoop]# vim yarn-site.xml 
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>db96:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>db96:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>db96:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>db96:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>db96:8088</value>
</property>
</configuration>


3. 复制到其他节点上。(db98,db99)
[root@db96 hadoop]# scp -r /usr/local/hadoop-2.2.0 db98:/usr/local/
[root@db96 hadoop]# scp -r /usr/local/hadoop-2.2.0 db99:/usr/local/
分别登陆98,99做软连接:
[root@db98 local]# ln -s /usr/local/hadoop-2.2.0/ hadoop
[root@db99 local]# ln -s /usr/local/hadoop-2.2.0/ hadoop


4. 启动验证
4.1 启动hadoop
格式化hdfs:  /usr/local/hadoop/bin/hdfs namenode -format   //出现status 0 标示格式化成功。
[root@db96 hadoop]# /usr/local/hadoop/bin/hdfs namenode -format
...............
14/07/17 00:03:37 INFO namenode.FSImage: Image file /data/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 196 bytes saved in 0 seconds.
14/07/17 00:03:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/07/17 00:03:37 INFO util.ExitUtil: Exiting with status 0
14/07/17 00:03:37 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at db96/192.168.8.96
************************************************************/


启动hdfs: /usr/local/hadoop/sbin/start-dfs.sh 
[root@db96 hadoop]# /usr/local/hadoop/sbin/start-dfs.sh 
14/07/17 00:10:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [db96]
db96: starting namenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-root-namenode-db96.out
db99: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-root-datanode-db99.out
db98: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-root-datanode-db98.out
Starting secondary namenodes [db96]
db96: starting secondarynamenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-root-secondarynamenode-db96.out
14/07/17 00:10:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


此时db96上运行的进程有:
[root@db96 hadoop]# jps
3048 SecondaryNameNode
2864 NameNode
3207 Jps
db98,db99上面的进程有:
[root@db98 local]# jps
1476 DataNode
1554 Jps


启动yarn:  /usr/local/hadoop/sbin/start-yarn.sh 
[root@db96 hadoop]# /usr/local/hadoop/sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-root-resourcemanager-db96.out
db98: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-root-nodemanager-db98.out
db99: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-root-nodemanager-db99.out


此时在db96上运行的进程有:NameNode SecondaryNameNode ResourceManager
db98,db99上运行的进程有:DataNode NodeManager


查看集群的状态: /usr/local/hadoop/bin/hdfs dfsadmin -report
[root@db96 local]# /usr/local/hadoop/bin/hdfs dfsadmin -report 
14/07/17 00:45:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 67632857088 (62.99 GB)
Present Capacity: 63816904704 (59.43 GB)
DFS Remaining: 63816855552 (59.43 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0


-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)


Live datanodes:
Name: 192.168.8.99:50010 (db99)
Hostname: db99
Decommission Status : Normal
Configured Capacity: 33816428544 (31.49 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 1907978240 (1.78 GB)
DFS Remaining: 31908425728 (29.72 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.36%
Last contact: Thu Jul 17 00:45:10 CST 2014




Name: 192.168.8.98:50010 (db98)
Hostname: db98
Decommission Status : Normal
Configured Capacity: 33816428544 (31.49 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 1907974144 (1.78 GB)
DFS Remaining: 31908429824 (29.72 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.36%
Last contact: Thu Jul 17 00:45:10 CST 2014


查看文件块组成: /usr/local/hadoop/bin/hdfs fsck / -files -blocks 
[root@db96 local]# /usr/local/hadoop/bin/hdfs fsck / -files -blocks    
14/07/17 01:01:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://db96:50070
FSCK started by root (auth:SIMPLE) from /192.168.8.96 for path / at Thu Jul 17 01:01:31 CST 2014
/ <dir>
Status: HEALTHY
 Total size:    0 B
 Total dirs:    1
 Total files:   0
 Total symlinks:                0
 Total blocks (validated):      0
 Minimally replicated blocks:   0
 Over-replicated blocks:        0
 Under-replicated blocks:       0
 Mis-replicated blocks:         0
 Default replication factor:    2
 Average block replication:     0.0
 Corrupt blocks:                0
 Missing replicas:              0
 Number of data-nodes:          2
 Number of racks:               1
FSCK ended at Thu Jul 17 01:01:31 CST 2014 in 2 milliseconds




The filesystem under path '/' is HEALTHY


查看HDFS: http://192.168.8.96:50070
查看RM:   http:192.168.8.96:8088


先在hdfs上创建一个文件夹
[root@db96 ~]# /usr/local/hadoop/bin/hdfs dfs -mkdir /input
测试使用:上传一个文件,下载一个文件,查看上传文件的内容:


[root@db96 ~]# cat wwn.txt 
# This is a text txt
# by coco
# 2014-07-18
[root@db96 ~]# hdfs dfs -mkdir /test
[root@db96 ~]# hdfs dfs -put wwn.txt /test
[root@db96 ~]# hdfs dfs -cat /test/wwn.txt
[root@db96 ~]# hdfs dfs -get /test/wwn.txt /tmp
[root@db96 hadoop]# hdfs dfs -rm /test/wwn.txt
[root@db96 tmp]# ll
总用量 6924
-rw-r--r-- 1 root root      70 7月  18 11:50 wwn.txt
[root@db96 ~]# hadoop dfs -ls /test           
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Found 2 items
-rw-r--r--   2 root supergroup    6970105 2014-07-18 11:44 /test/gc_comweight.txt
-rw-r--r--   2 root supergroup         59 2014-07-18 14:56 /test/hello.txt
到此我们的hdfs文件系统已经能正常使用。