用hadoop搭建的集群在启动时子节点一直无法连接到主节点,在使用hadoop集群时一直报错,也就是集群并没有搭建成功,导致了出现了上面的报错信息org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-master/192.168.1.130:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS),这中间排查了很多问题,也查询了很多资料,最后发现是出在了自己的主从机的hosts配置问题上,也就是主机的hosts配置有问题,这里把这几天解决这个问题所踩的坑做一个总结,希望能帮助后来学习的朋友。

搭建的hadoop集群的具体环境如下:
    主节点:
        系统:CentOS Linux release 7.3.1611 (Core)
        系统名称:hadoop-master
        系统ip:192.168.1.130
        hadoop:hadoop2.8.4
        java:1.7.0
        ssh2:OpenSSH_7.4p1, OpenSSL 1.0.2k-fips
    子节点:
        系统:Ubuntu 15.04
        系统名称:hadoop-salve1
        系统ip:192.168.1.128
        hadoop:hadoop2.8.4
        java:1.7.0
        ssh2:OpenSSH_6.7p1 Ubuntu-5ubuntu1, OpenSSL 1.0.1f

一、主从机无法进行通信

       主从机互相通信是hadoop集群搭建时必须要实现的,主要的问题是ssh的配置,主从机需要使用publickey,公钥进行验证,在互相通信时不需要输入私钥,如果主从机无法实现ssh互相通过公钥通信,每次启动hadoop集群的时候都会导致重新输入子节点的私钥,很麻烦,所以需要把主从节点使用公钥进行验证,这里给出一种互相进行公钥配置的方法。

1.1 主从机实现公钥互相登录

       首先要生成ssh公钥

bash-4.2$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:vjntK50PYjcoDpyIXY3N+SaSxAGrN3kz766xVpcJJtk hadoop@hadoop-master
The key's randomart image is:
+---[RSA 2048]----+
|  .              |
|   o             |
|  . .o           |
| . oo*E.         |
|. + Oo=.So       |
| + B *.o+.       |
|. o B.+.Bo+.     |
|    .B =o=+o     |
|   .oo+ o+oo.    |
+----[SHA256]-----+

      然后将公钥拷贝到从节点,首次拷贝的时候需要输入从机的私钥,以后在使用的时候就可以免密登录了,如下

bash-4.2$ ssh-copy-id -i id_rsa.pub hadoop@hadoop-slave1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@hadoop-slave1's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@hadoop-slave1'"
and check to make sure that only the key(s) you wanted were added.

     在接下来将公钥填入本机的授权key中,这样本机在使用ssh访问自己的时候也不需要使用私钥,设置好后,一定要修改authorized_keys的授权,如下

bash-4.2$ cat id_rsa.pub >> authorized_keys 
bash-4.2$ chmod 644 authorized_keys

     最后验证下公钥是否生效,如果生效了那么在登录的时候是不需要输入秘钥的,如下

bash-4.2$ ssh hadoop@hadoop-slave1
Welcome to Ubuntu 15.04 (GNU/Linux 3.19.0-15-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Your Ubuntu release is not supported anymore.
For upgrade information, please visit:
http://www.ubuntu.com/releaseendoflife

New release '15.10' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Mon Dec  3 12:39:40 2018 from localhost

    在搭建过程中也会有很多问题,没法成功的时候公钥验证,这里给几个解决方案。

1.2 公钥验证问题汇总

1.2.1 ssh_config或sshd_config没有开启公钥验证

    需要开启ssh的公钥验证

bash-4.2$ vim /etc/ssh/ssh_config
Host *
#       GSSAPIAuthentication no
#       GSSAPIAuthentication yes
# If this option is set to yes then remote X11 clients will have full access
# to the original X11 display. As virtually no X11 client supports the untrusted
# mode correctly we set this to yes.
#       ForwardX11Trusted yes
# Send locale-related environment variables
#       SendEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
#       SendEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
#       SendEnv LC_IDENTIFICATION LC_ALL LANGUAGE
#       SendEnv XMODIFIERS
    SendEnv LANG LC_*
    HashKnownHosts yes
    GSSAPIAuthentication yes
    GSSAPIDelegateCredentials no

       Centos7以后的ssh2中ssh_config默认是并没有开启GSSAPIAuthentication验证的,需要将验证更改开启,但是在ubuntu中默认是开启的。

bash-4.2$ sudo vim /etc/ssh/sshd_config
PubkeyAuthentication yes
AllowUsers hadoop root
# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile      .ssh/authorized_keys

       一定要开启PubkeyAuthentication公钥验证,并且设置AllowUsers允许用户登录。最后也要设置公钥验证文件的路径
AuthorizedKeysFile,公钥文件的验证是默认路径,如果修改的话可以更改位置。
在修改完成后,记得重启ssh,生效配置,CentOs重启如下:

[root@hadoop-master .ssh]$ service sshd.service restart
bash-4.2$ service ssh restart

Ubuntu重启并加载文件如下:
hadoop@hadoop-slave1:/Library/hadoop/hadoop284/logs$ /etc/init.d/ssh restart
hadoop@hadoop-slave1:/root$ /etc/init.d/ssh reload

1.2.2:authorized_keys文件权限问题

       进行了上面的设置后使用ssh登录如果还要输入密码的话,可能就是文件权限问题了,这里需要修改.ssh和authorized_keys文件的所属权。

[root@hadoop-master .ssh]$ ls -lsZ /home/hadoop/.ssh
total 16
-rw-r--r--. hadoop hadoop unconfined_u:object_r:ssh_home_t:s0 authorized_keys
-rw-------. hadoop hadoop unconfined_u:object_r:ssh_home_t:s0 id_rsa
-rw-r--r--. hadoop hadoop unconfined_u:object_r:ssh_home_t:s0 id_rsa.pub
-rw-r--r--. hadoop hadoop unconfined_u:object_r:ssh_home_t:s0 known_hosts
[root@hadoop-master .ssh]$ ls -lsZ /home/hadoop/.ssh/authorized_keys
-rwx------. hadoop hadoop unconfined_u:object_r:var_t:s0 /home/hadoop/.ssh/authorized_keys

    观察上面authorized_keys文件的所属权发现,一个是ssh_home_t,一个是var_t,文件的所属权有问题,需要修改下

[root@hadoop-master .ssh]$ semanage fcontext -a -t ssh_home_t /home/hadoop/.ssh/authorized_keys
[root@hadoop-master .ssh]$ restorecon -r -vv /home/hadoop/.ssh
[root@hadoop-master .ssh]$ ls -laZ /home/hadoop/.ssh/authorized_keys
-rwx------. hadoop hadoop unconfined_u:object_r:ssh_home_t:s0 /home/hadoop/.ssh/authorized_keys

   设置好后使用ssh登录正常,完美解决问题!