问题1】
# jps
27851 Jps
18198 -- process information unavailable没有办法kill掉
# kill -9 18198
-bash: kill: (18198) - No such process
解决方案:进入linux的/tmp下,cd /tmp,删除目录下的名称为hsperfdata_{username}的文件夹 然后jps,清净了。
问题2】
Hadoop的datanode,resourcemanager起不来(或者启动后自动关闭),日志报UnknownHostException。
Datanode日志:
2014-09-22 14:03:02,935 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host =java.net.UnknownHostException: master: master
解决方案:原来我的/etc/hosts文件中没有master(主机名)和ip的映射,vi /etc/hosts,添加一行:
master 192.168.80.4
另外也许是因为(遇到的问题太多了没记住是不是这个对应的解决方案),core-site.xml文件的改动带来的影响,自己先创建hadoop_tmp目录:
问题3】WARN net.DNS: Unable to determine address of the host-falling back to "localhost" address java.net.UnknownHostException: slave1: slave1
解决方案:这个报错是hosts的问题 ,在hosts里面有localhost还是不够的,要包当前的主机名slave1加进去。并测试:hostname –f,如果能返回当前的主机名,那么就是ok的。
问题4】
异常:
2014-03-13 11:26:30,788 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1257313099-10.10.208.38-1394679083528 (storage id DS-743638901-127.0.0.1-50010-1394616048958) service to Linux-hadoop-38/10.10.208.38:9000
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-8e201022-6faa-440a-b61c-290e4ccfb006; datanode clusterID = clustername
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:309)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:662)
解决办案: 1、在hdfs-site.xml配置文件中,配置了dfs.namenode.name.dir,在master中,该配置的目录下有个current文件夹,里面有个VERSION文件,内容如下:
#Thu Mar 13 10:51:23 CST 2014
namespaceID=1615021223
clusterID=CID-8e201022-6faa-440a-b61c-290e4ccfb006
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1257313099-10.10.208.38-1394679083528
layoutVersion=-40
2、在core-site.xml配置文件中,配置了hadoop.tmp.dir,在slave中,该配置的目录下有个dfs/data/current目录,里面也有一个VERSION文件,内容
#Wed Mar 12 17:23:04 CST 2014
storageID=DS-414973036-10.10.208.54-50010-1394616184818
clusterID=clustername
cTime=0
storageType=DATA_NODE
layoutVersion=-40
3、一目了然,两个内容不一样,导致的。删除slave中的错误内容,重启,搞定!
问题5】
开启Hadoop时,出现如下信息:(就是被ssh一下刷屏了,苦逼)
[root@hd-m1 /]# ./hadoop/hadoop-2.6.0/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/01/23 20:23:41 WARN util.NativeCodeLoader: Unable to load native-hadooplibrary for your platform... using builtin-java classes where applicable
Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loadedlibrary /hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might havedisabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c<libfile>', or link it with '-z noexecstack'.
hd-m1]
sed: -e expression #1, char 6: unknown option to `s'
-c: Unknown cipher type 'cd'
hd-m1: starting namenode, logging to/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-hd-m1.out
HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Temporary failure inname resolution
Java: ssh: Could not resolve hostname Java:Temporary failure in name resolution
Client: ssh: Could not resolve hostname Client:Temporary failure in name resolution
You: ssh: Could not resolve hostname You:Temporary failure in name resolution
warning:: ssh: Could not resolve hostname warning:: Temporary failure in nameresolution
VM: ssh: Could not resolve hostname VM: Temporary failure in name resolution
have: ssh: Could not resolve hostname have: Temporary failure in nameresolution
library: ssh: Could not resolve hostname library: Temporary failure in nameresolution
loaded: ssh: Could not resolve hostname loaded: Temporary failure in nameresolution
might: ssh: Could not resolve hostname might: Temporary failure in nameresolution
which: ssh: Could not resolve hostname which: Temporary failure in nameresolution
have: ssh: Could not resolve hostname have: Temporary failure in nameresolution
disabled: ssh: Could not resolve hostname disabled: Temporary failure in nameresolution
stack: ssh: Could not resolve hostname stack: Temporary failure in nameresolution
guard.: ssh: Could not resolve hostname guard.: Temporary failure in nameresolution
VM: ssh: Could not resolve hostname VM: Temporary failure in name resolution
The: ssh: Could not resolve hostname The: Temporary failure in name resolution
try: ssh: Could not resolve hostname try: Temporary failure in name resolution
will: ssh: Could not resolve hostname will: Temporary failure in nameresolution
to: ssh: Could not resolve hostname to: Temporary failure in name resolution
fix: ssh: Could not resolve hostname fix: Temporary failure in name resolution
the: ssh: Could not resolve hostname the: Temporary failure in name resolution
stack: ssh: Could not resolve hostname stack: Temporary failure in nameresolution
guard: ssh: Could not resolve hostname guard: Temporary failure in nameresolution
It's: ssh: Could not resolve hostname It's: Temporary failure in nameresolution
now.: ssh: Could not resolve hostname now.: Temporary failure in nameresolution
recommended: ssh: Could not resolve hostname recommended: Temporary failure inname resolution
highly: ssh: Could not resolve hostname highly: Temporary failure in nameresolution
that: ssh: Could not resolve hostname that: Temporary failure in nameresolution
you: ssh: Could not resolve hostname you: Temporary failure in name resolution
with: ssh: Could not resolve hostname with: Temporary failure in nameresolution
'execstack: ssh: Could not resolve hostname 'execstack: Temporary failure inname resolution
the: ssh: Could not resolve hostname the: Temporary failure in name resolution
library: ssh: Could not resolve hostname library: Temporary failure in nameresolution
fix: ssh: Could not resolve hostname fix: Temporary failure in name resolution
< libfile>',: ssh: Could not resolve hostname <libfile>',:Temporary failure in name resolution
or: ssh: Could not resolve hostname or: Temporary failure in name resolution
link: ssh: Could not resolve hostname link: Temporary failure in nameresolution
it: ssh: Could not resolve hostname it: Temporary failure in name resolution
'-z: ssh: Could not resolve hostname '-z: Temporary failure in name resolution
with: ssh: Could not resolve hostname with: Temporary failure in nameresolution
noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Temporary failurein name resolution。。。。。
。
解决方案:
出现上述问题主要是环境变量没设置好,在~/.bash_profile或者/etc/profile中加入以下语句就没问题了。
#vi /etc/profile或者vi~/.bash_profile
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"然后用source重新编译使之生效即可!
#source /etc/profile或者source~/.bash_profile
问题6】
首先会因为以下几种情况才会出现启动不了datanode。
1.首先修改过master那台的配置文件,
2.多次hadoop namenode -format这种不好的习惯。
一般会出现一下报错:
java.io.IOException: Cannot lock storage /usr/hadoop/tmp/dfs/name. The directory is already locked.
或者是:
[root@hadoop current]# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/hadoop1.1/libexec/../logs/hadoop-root-datanode-hadoop.out
[root@hadoop ~]# jps
jps命令发现没有datanode启动
对于这种情况请先试一下:
在坏死的节点上输入如下命令即可:
bin/hadoop-daemon.sh start dataNode
bin/hadoop-daemon.sh start jobtracker
如果还不可以的话,那么恭喜你和我遇到的情况一下。
正确的处理方法是,到你的每个Slave下面去,找到.../usr/hadoop/tmp/dfs/ -ls
会显示有: data
这里需要把data文件夹删掉。接着直接在刚才的目录下启动hadoop
start-all.sh
接着查看jps
那么就会出现datanode.了
接着去看
http://210.41.166.61(你的master的IP):50070
里面的活节点有多少个?
http://210.41.166.61(你的master的ip):50030/
显示的node数目。
OK,问题解决。