参考了网上很多人的帖子,结合自己实践,一步步记录所称的帖子,希望对大家有帮助!

hadoop 集群安装记录

3台机器的hostname和ip分别如下
andy1    192.168.224.144  master   namenode
andy2    192.168.224.145 slave    datenode
andy3    192.168.224.143   slave   datenode
 
 
修改主机配置文件:[主要实现用主机名代替IP地址,如果所有命令中都采用IP地址,那么则不需要做此配置]
namenode中/etc/hosts(andy1)
 
192.168.224.144 andy1 # Added by NetworkManager
127.0.0.1 localhost.localdomain 
::1 andy1 localhost6.localdomain6 localhost6
127.0.1.1 ubuntu
 
192.168.224.145 andy2  //
192.168.224.143 andy3 //  这两行为新加的,本文件主要修改实现IP地址和主机名的对应
 
[
192.168.224.144 andy1
192.168.224.145 andy2
192.168.224.143 andy3 
 
]
 
在andy2,adny3 中加入相应的其余2个主机的IP 主机名对应
 
 
 
 
一、安装SSH
1   $ sudo apt-get install openssh-server
   然后确认sshserver是否启动了:
    $ ps -e |grep ssh
   若只有ssh-agent ,则表示没启动成功,继续
执行 $ /etc/init.d/ssh start 启动
看到sshd 说明已经启动
 
 
 
2 首先用andy用户登录每台机器(包括namenode),在/home/andy/目录下建立.ssh目录,并将目录权限设为:drwxr-xr-x,设置命令: chmod 755 .ssh
 
命令为 : sudo mkdir /home/andy/.ssh
          sudo chmod 755 .ssh (sudo chmod 755 /home/andy/.ssh)
 
namenode 上执行  ssh-keygen -t rsa
 
运行结果为:
Generating public/private rsa key pair.
Enter file in which to save the key (/home/andy/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/andy/.ssh/id_rsa.
Your public key has been saved in /home/andy/.ssh/id_rsa.pub.
The key fingerprint is:
7a:a3:d2:43:d9:07:9c:93:d0:14:56:24:f9:e4:87:b7 andy@andy1
The key's randomart image is:
+--[ RSA 2048]----+
|       o==o      |
|      ..o..      |
|       o * .     |
|        * + o    |
|       oSo o .   |
|      o.. . E    |
|     o. o.       |
|    . oo .       |
|     ...         |
+-----------------+
 
 
最后三句直接回车
 
[是否可以考虑为每一台机器都建立公钥密钥呢]
 
 
3
然后将id_rsa的内容复制到每个机器(包括本机)
在namenode 下执行 
$ cp /home/andy/.ssh/id_rsa.pub  /home/andy/.ssh/authorized_keys 
$ scp /home/andy/.ssh/authorized_keys  andy2:/home/andy/.ssh/
$ scp /home/andy/.ssh/authorized_keys  andy3:/home/andy/.ssh/
 
[scp /home/andy/.ssh/authorized_keys  192.168.224.145:/home/andy/.ssh/]
 
andy登录每台机器,设置/home/andy/.ssh/authorized_keys 权限
具体命令如下:
$ cd /home/andy/.ssh
$ chmod 644 authorized_keys
 
 
 
二  安装jdk  下载后
将文件复制到 /usr/java 中,其中java文件是之前 mkdir /usr/java 创建的
 
root 用户操作
[不知道什么原因,以下命令先切换大/usr/java 下才是有效的。在未执行此命令时,安装的jdk文件不可见(没找到,不知道跑那里去了)]
sudo chmod u+x /usr/java/jdk-6u25-linux-i586.bin
sudo /usr/java/jdk-6u25-linux-i586.bin
 
/etc/profile  (配置jdk环境变量)将以下内容写在文件的末尾
 
export JAVA_HOME=/usr/java/jdk1.6.0_25
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH 
 
source /etc/profile
这一步很重要,如果没有的话,会显示
程序“java”已包含在下列软件包中:
 * gcj-4.4-jre-headless
 * gcj-4.5-jre-headless
 * openjdk-6-jre-headless
请尝试:apt-get install <选定的软件包>
 
运行java -version 测试jdk是否安装成功!
 
andy2+andy3的JDK 安装,如上,之所以不能够复制,可能是/urs下的权限有限制
 
 
三  hadoop 安装,就按装在andy里面吧
 
将hadoop-0.20.2.tar.gz复制到andy目录里
运行: sudo tar zxvf hadoop-0.20.2.tar.gz  // 这样hadoop解压成功
之后运行:sudo mv hadoop-0.20.2  hadoop     //这一步是将文件名进行修改,实际上采用的是将文件剪切到新建hadoop文件夹里面
 
 
修改master即andy1的hadoop的 /home/andy/hadoop/conf/hadoop-env.sh的内容
 
export JAVA_HOME=/usr/java/jdk1.6.0_25
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH 
export HADOOP_HEAPSIZE=200  
export HADOOP_HOME=/home/andy/hadoop   
 
 
在每台机器上查看 hadoop下的 bin/hadoop
显示结果为:
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  mradmin              run a Map-Reduce admin client
  fsck                 run a DFS filesystem checking utility
  fs                   run a generic filesystem user client
  balancer             run a cluster balancing utility
  jobtracker           run the MapReduce job Tracker node
  pipes                run a Pipes job
  tasktracker          run a MapReduce task Tracker node
  job                  manipulate MapReduce jobs
  queue                get information regarding JobQueues
  version              print the version
  jar <jar>            run a jar file
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME <src>* <dest> create a hadoop archive
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
 
 
 
 
切换到hadoop根目录,cd /home/andy/haoop
mkdir tmp
mkdir hdfs
mkdir hdfs/name (不应该创建,否则后面的hadoop格式化不会成功!)
mkdir hdfs/data
 
切换到conf目录:
修改以下三个文件如下:
core-site.xml
 
 
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://andy1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/andy/hadoop/tmp</value>
    </property>
</configuration>
 
hdfs-site.xml
 
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/home/andy/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/andy/hadoop/hdfs/data</value>
    </property>
</configuration>
 
 
marped-site.xml
 
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>andy1:9001</value>
    </property>
</configuration>
其他的节点采用
scp -r /home/andy/hdoop andy2:/home/andy/hadoop
scp -r /home/andy/hdoop andy3:/home/andy/hadoop
复制过去就可以的。
 
因为hadoop的设置全都在hadoop目录中,所以复制过去后,所有的信息都OK了
 
测试下:
在 andy1(master)上 输入 bin/start-all.sh
得到结果:
starting namenode, logging to /home/andy/hadoop/logs/hadoop-andy-namenode-andy1.out
andy2: starting datanode, logging to /home/andy/hadoop/logs/hadoop-andy-datanode-andy2.out
andy3: starting datanode, logging to /home/andy/hadoop/logs/hadoop-andy-datanode-andy3.out
andy1: starting secondarynamenode, logging to /home/andy/hadoop/logs/hadoop-andy-secondarynamenode-andy1.out
starting jobtracker, logging to /home/andy/hadoop/logs/hadoop-andy-jobtracker-andy1.out
andy2: starting tasktracker, logging to /home/andy/hadoop/logs/hadoop-andy-tasktracker-andy2.out
andy3: starting tasktracker, logging to /home/andy/hadoop/logs/hadoop-andy-tasktracker-andy3.out
 
 
测试
 格式化分布式文件系统:
1  bin/hadoop namenode -format
2   启动hadoop
  bin/start-all.sh
3 检测 启动的服务 jps
 
在hdfs文件中创建一个input文件夹,并将所有的conf中文件复制到此文件夹中
命令如下:
 bin/hadoop fs -mkdir input
 bin/hadoop fs -copyFromLoc