HMaster节点无故挂掉

1.报错信息:

2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: []
2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase master:60000-0x400000393770000 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

2.详细日志

2018-09-05 18:40:58,468 INFO  [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Opening socket connection to server server6/192.168.211.6:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:40:58,468 INFO [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Socket connection established to server6/192.168.211.6:2181, initiating session
2018-09-05 18:40:58,473 WARN [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770000 has expired
2018-09-05 18:40:58,473 INFO [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770000 has expired, closing socket connection
2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: []
2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase master:60000-0x400000393770000 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2018-09-05 18:40:58,520 INFO [main-EventThread] regionserver.HRegionServer: STOPPED: Stopped by main-EventThread
2018-09-05 18:40:58,528 INFO [master/server4/192.168.211.4:60000] regionserver.HRegionServer: Stopping infoServer
2018-09-05 18:40:58,849 INFO [server4:60000.activeMasterManager-SendThread(server4:2181)] zookeeper.ClientCnxn: Opening socket connection to server server4/192.168.211.4:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:40:58,849 INFO [server4:60000.activeMasterManager-SendThread(server4:2181)] zookeeper.ClientCnxn: Socket connection established to server4/192.168.211.4:2181, initiating session
2018-09-05 18:40:58,850 INFO [server4:60000.activeMasterManager-SendThread(server4:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x400000393770003, likely server has closed socket, closing socket connection and attempting reconnect
2018-09-05 18:40:58,851 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x400000393770000
2018-09-05 18:41:03,261 INFO [server4,60000,1535980776013_splitLogManager__ChoreService_1] hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor missed its start time
2018-09-05 18:41:03,262 INFO [server4,60000,1535980776013_splitLogManager__ChoreService_1] hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor was stopped
2018-09-05 18:41:03,273 INFO [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Opening socket connection to server server6/192.168.211.6:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:41:03,689 INFO [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Opening socket connection to server server6/192.168.211.6:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:41:04,329 INFO [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Socket connection established to server6/192.168.211.6:2181, initiating session
2018-09-05 18:41:04,329 INFO [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Socket connection established to server6/192.168.211.6:2181, initiating session
2018-09-05 18:41:04,335 WARN [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10000 has expired
2018-09-05 18:41:04,335 INFO [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10000 has expired, closing socket connection
2018-09-05 18:41:04,335 WARN [master/server4/192.168.211.4:60000-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2018-09-05 18:41:04,335 INFO [master/server4/192.168.211.4:60000-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x5000003d3f10000
2018-09-05 18:41:04,335 INFO [master/server4/192.168.211.4:60000-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x5000003d3f10000
2018-09-05 18:41:04,336 WARN [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10001 has expired
2018-09-05 18:41:04,336 INFO [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10001 has expired, closing socket connection
2018-09-05 18:41:04,336 WARN [server4:60000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2018-09-05 18:41:05,015 INFO [server4:60000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x5000003d3f10001
2018-09-05 18:41:05,015 INFO [server4:60000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x5000003d3f10001
2018-09-05 18:41:05,706 INFO [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Opening socket connection to server server5/192.168.211.5:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:41:05,706 INFO [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Socket connection established to server5/192.168.211.5:2181, initiating session
2018-09-05 18:41:09,357 WARN [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770003 has expired
2018-09-05 18:41:09,357 INFO [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770003 has expired, closing socket connection
2018-09-05 18:41:09,357 INFO [server4:60000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x400000393770003
2018-09-05 18:41:15,059 WARN [server4,60000,1535980776013_ChoreService_2] zookeeper.ZKUtil: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Unable to list children of znode /hbase/replication/peers
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/peers
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:319)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:462)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:490)
at org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.getAllPeerIds(ReplicationPeersZKImpl.java:392)
at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.getUnDeletedQueues(ReplicationZKNodeCleaner.java:80)
at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:48)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:16,601 ERROR [server4,60000,1535980776013_ChoreService_2] zookeeper.ZooKeeperWatcher: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/peers
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:319)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:462)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:490)
at org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.getAllPeerIds(ReplicationPeersZKImpl.java:392)
at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.getUnDeletedQueues(ReplicationZKNodeCleaner.java:80)
at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:48)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:16,601 ERROR [server4,60000,1535980776013_ChoreService_2] hbase.ScheduledChore: Caught error
java.lang.NullPointerException
at java.util.HashSet.<init>(HashSet.java:119)
at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.getUnDeletedQueues(ReplicationZKNodeCleaner.java:80)
at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:48)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:16,601 INFO [server4,60000,1535980776013_ChoreService_2] hbase.ScheduledChore: Chore: server4,60000,1535980776013-DoMetricsChore missed its start time
2018-09-05 18:41:16,601 INFO [server4,60000,1535980776013_ChoreService_2] hbase.ScheduledChore: Chore: server4,60000,1535980776013-DoMetricsChore was stopped
2018-09-05 18:41:15,577 WARN [server4,60000,1535980776013_ChoreService_1] master.HMaster: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Unable to list backup servers
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/backup-masters
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:319)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:519)
at org.apache.hadoop.hbase.master.HMaster.getClusterStatusWithoutCoprocessor(HMaster.java:2422)
at org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:49)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:18,871 INFO [master/server4/192.168.211.4:60000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2018-09-05 18:41:21,347 INFO [master/server4/192.168.211.4:60000] procedure2.ProcedureExecutor: Stopping the procedure executor
2018-09-05 18:41:22,576 INFO [master/server4/192.168.211.4:60000] wal.WALProcedureStore: Stopping the WAL Procedure Store
2018-09-05 18:41:23,348 INFO [master/server4/192.168.211.4:60000] regionserver.HRegionServer: stopping server server4,60000,1535980776013
2018-09-05 18:41:23,349 INFO [master/server4/192.168.211.4:60000] regionserver.HRegionServer: stopping server server4,60000,1535980776013; all regions closed.
2018-09-05 18:41:23,349 INFO [master/server4/192.168.211.4:60000] hbase.ChoreService: Chore service for: server4,60000,1535980776013 had [[ScheduledChore: Name: server4,60000,1535980776013-ClusterStatusChore Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: server4,60000,1535980776013-BalancerChore Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: server4,60000,1535980776013-RegionNormalizerChore Period: 1800000 Unit: MILLISECONDS], [ScheduledChore: Name: CatalogJanitor-server4:60000 Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: CompactedHFilesCleaner Period: 120000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS]] on shutdown
2018-09-05 18:41:23,409 WARN [master/server4/192.168.211.4:60000] zookeeper.ZKUtil: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:397)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:269)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1286)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1146)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:23,410 ERROR [master/server4/192.168.211.4:60000] zookeeper.ZooKeeperWatcher: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:397)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:269)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1286)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1146)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:23,410 ERROR [master/server4/192.168.211.4:60000] master.ActiveMasterManager: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:397)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:269)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1286)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1146)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:23,924 INFO [master/server4/192.168.211.4:60000] hbase.ChoreService: Chore service for: server4,60000,1535980776013_splitLogManager_ had [] on shutdown
2018-09-05 18:41:24,133 INFO [master/server4/192.168.211.4:60000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2018-09-05 18:41:24,416 INFO [master/server4/192.168.211.4:60000] ipc.RpcServer: Stopping server on 60000
2018-09-05 18:41:24,481 INFO [RpcServer.listener,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: stopping
2018-09-05 18:41:24,639 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2018-09-05 18:41:24,639 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2018-09-05 18:41:27,497 WARN [master/server4/192.168.211.4:60000] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/server4,60000,1535980776013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:182)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1250)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1239)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1504)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1154)
at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:27,694 INFO [master/server4/192.168.211.4:60000] regionserver.HRegionServer: stopping server server4,60000,1535980776013; zookeeper connection closed.
2018-09-05 18:41:27,694 INFO [master/server4/192.168.211.4:60000] regionserver.HRegionServer: master/server4/192.168.211.4:60000 exiting

3.问题相关点

  • 长时间GC停顿导致zk会话超时 => 看gc日志,看一下出问题时间点之前有没有这种gc停顿

4.解决方案

5.参考文章

  • 这篇文章问题发起者,也遇到同样的错误
  • 绝大多数情况下是某些操作导致ZK Session过期,而不是真的是回话过于慢导致的。所以单纯的修改​​hbase-site.xml​​中的​​timeout​​选项并不能很好的解决问题

6.总结

  • zk这个gc问题,也是它分布式锁最大的缺陷