问题1】

# jps
 27851 Jps
 18198 -- process information unavailable没有办法kill掉
# kill -9 18198


-bash: kill: (18198) - No such process

解决方案:进入linux的/tmp下,cd /tmp,删除目录下的名称为hsperfdata_{username}的文件夹 然后jps,清净了。

问题2】

Hadoop的datanode,resourcemanager起不来(或者启动后自动关闭),日志报UnknownHostException。

Datanode日志:

2014-09-22 14:03:02,935 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host =java.net.UnknownHostException: master: master

解决方案:原来我的/etc/hosts文件中没有master(主机名)和ip的映射,vi /etc/hosts,添加一行:

master    192.168.80.4

另外也许是因为(遇到的问题太多了没记住是不是这个对应的解决方案),core-site.xml文件的改动带来的影响,自己先创建hadoop_tmp目录:



问题3】WARN net.DNS: Unable to determine address of the host-falling back to "localhost" address java.net.UnknownHostException: slave1: slave1


解决方案:这个报错是hosts的问题 ,在hosts里面有localhost还是不够的,要包当前的主机名slave1加进去。并测试:hostname –f,如果能返回当前的主机名,那么就是ok的。 


问题4】


异常:


2014-03-13 11:26:30,788 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1257313099-10.10.208.38-1394679083528 (storage id DS-743638901-127.0.0.1-50010-1394616048958) service to Linux-hadoop-38/10.10.208.38:9000
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-8e201022-6faa-440a-b61c-290e4ccfb006; datanode clusterID = clustername
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:309)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:662)


解决办案: 1、在hdfs-site.xml配置文件中,配置了dfs.namenode.name.dir,在master中,该配置的目录下有个current文件夹,里面有个VERSION文件,内容如下:

#Thu Mar 13 10:51:23 CST 2014
namespaceID=1615021223
clusterID=CID-8e201022-6faa-440a-b61c-290e4ccfb006
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1257313099-10.10.208.38-1394679083528
layoutVersion=-40
2、在core-site.xml配置文件中,配置了hadoop.tmp.dir,在slave中,该配置的目录下有个dfs/data/current目录,里面也有一个VERSION文件,内容
#Wed Mar 12 17:23:04 CST 2014
storageID=DS-414973036-10.10.208.54-50010-1394616184818
clusterID=clustername
cTime=0
storageType=DATA_NODE
layoutVersion=-40
3、一目了然,两个内容不一样,导致的。删除slave中的错误内容,重启,搞定!


问题5】

开启Hadoop时,出现如下信息:(就是被ssh一下刷屏了,苦逼)

[root@hd-m1 /]# ./hadoop/hadoop-2.6.0/sbin/start-all.sh 
 This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
 15/01/23 20:23:41 WARN util.NativeCodeLoader: Unable to load native-hadooplibrary for your platform... using builtin-java classes where applicable
 Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loadedlibrary /hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might havedisabled stack guard. The VM will try to fix the stack guard now.
 It's highly recommended that you fix the library with 'execstack -c<libfile>', or link it with '-z noexecstack'.
 hd-m1]
 sed: -e expression #1, char 6: unknown option to `s'
 -c: Unknown cipher type 'cd'
 hd-m1: starting namenode, logging to/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-hd-m1.out
 HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Temporary failure inname resolution
 Java: ssh: Could not resolve hostname Java:Temporary failure in name resolution
 Client: ssh: Could not resolve hostname Client:Temporary failure in name resolution
 You: ssh: Could not resolve hostname You:Temporary failure in name resolution
 warning:: ssh: Could not resolve hostname warning:: Temporary failure in nameresolution
 VM: ssh: Could not resolve hostname VM: Temporary failure in name resolution
 have: ssh: Could not resolve hostname have: Temporary failure in nameresolution
 library: ssh: Could not resolve hostname library: Temporary failure in nameresolution
 loaded: ssh: Could not resolve hostname loaded: Temporary failure in nameresolution
 might: ssh: Could not resolve hostname might: Temporary failure in nameresolution
 which: ssh: Could not resolve hostname which: Temporary failure in nameresolution
 have: ssh: Could not resolve hostname have: Temporary failure in nameresolution
 disabled: ssh: Could not resolve hostname disabled: Temporary failure in nameresolution
 stack: ssh: Could not resolve hostname stack: Temporary failure in nameresolution
 guard.: ssh: Could not resolve hostname guard.: Temporary failure in nameresolution
 VM: ssh: Could not resolve hostname VM: Temporary failure in name resolution
 The: ssh: Could not resolve hostname The: Temporary failure in name resolution
 try: ssh: Could not resolve hostname try: Temporary failure in name resolution
 will: ssh: Could not resolve hostname will: Temporary failure in nameresolution
 to: ssh: Could not resolve hostname to: Temporary failure in name resolution
 fix: ssh: Could not resolve hostname fix: Temporary failure in name resolution
 the: ssh: Could not resolve hostname the: Temporary failure in name resolution
 stack: ssh: Could not resolve hostname stack: Temporary failure in nameresolution
 guard: ssh: Could not resolve hostname guard: Temporary failure in nameresolution
 It's: ssh: Could not resolve hostname It's: Temporary failure in nameresolution
 now.: ssh: Could not resolve hostname now.: Temporary failure in nameresolution
 recommended: ssh: Could not resolve hostname recommended: Temporary failure inname resolution
 highly: ssh: Could not resolve hostname highly: Temporary failure in nameresolution
 that: ssh: Could not resolve hostname that: Temporary failure in nameresolution
 you: ssh: Could not resolve hostname you: Temporary failure in name resolution
 with: ssh: Could not resolve hostname with: Temporary failure in nameresolution
 'execstack: ssh: Could not resolve hostname 'execstack: Temporary failure inname resolution
 the: ssh: Could not resolve hostname the: Temporary failure in name resolution
 library: ssh: Could not resolve hostname library: Temporary failure in nameresolution
 fix: ssh: Could not resolve hostname fix: Temporary failure in name resolution
 < libfile>',: ssh: Could not resolve hostname <libfile>',:Temporary failure in name resolution
 or: ssh: Could not resolve hostname or: Temporary failure in name resolution
 link: ssh: Could not resolve hostname link: Temporary failure in nameresolution
 it: ssh: Could not resolve hostname it: Temporary failure in name resolution
 '-z: ssh: Could not resolve hostname '-z: Temporary failure in name resolution
 with: ssh: Could not resolve hostname with: Temporary failure in nameresolution
 noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Temporary failurein name resolution。。。。。

解决方案:

出现上述问题主要是环境变量没设置好,在~/.bash_profile或者/etc/profile中加入以下语句就没问题了。

#vi /etc/profile或者vi~/.bash_profile
     export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
     exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"然后用source重新编译使之生效即可!


  #source /etc/profile或者source~/.bash_profile

 

问题6】

首先会因为以下几种情况才会出现启动不了datanode。

1.首先修改过master那台的配置文件,

2.多次hadoop namenode -format这种不好的习惯。

一般会出现一下报错:

java.io.IOException: Cannot lock storage /usr/hadoop/tmp/dfs/name. The directory is already locked.

或者是:

[root@hadoop current]# hadoop-daemon.sh start datanode
 starting datanode, logging to /usr/local/hadoop1.1/libexec/../logs/hadoop-root-datanode-hadoop.out
 [root@hadoop ~]# jps

jps命令发现没有datanode启动

对于这种情况请先试一下:

在坏死的节点上输入如下命令即可:

bin/hadoop-daemon.sh start dataNode
 bin/hadoop-daemon.sh start jobtracker

如果还不可以的话,那么恭喜你和我遇到的情况一下。

正确的处理方法是,到你的每个Slave下面去,找到.../usr/hadoop/tmp/dfs/  -ls

会显示有: data

这里需要把data文件夹删掉。接着直接在刚才的目录下启动hadoop

start-all.sh

接着查看jps

那么就会出现datanode.了

接着去看

http://210.41.166.61(你的master的IP):50070

里面的活节点有多少个?

http://210.41.166.61(你的master的ip):50030/

显示的node数目。

OK,问题解决。