0x00 概述

研发反馈skywalking突然无法使用,查看日志发现

2020-10-19 14:25:32,712 - org.apache.skywalking.apm.collector.cache.caffeine.service.ServiceNameCacheCaffeineService -82494825 [grpc-default-executor-149] ERROR [] - No shard available for [get [service_name][type][-87]: routing [null]]
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [service_name][type][-87]: routing [null]]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:209) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:186) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:95) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:59) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:146) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142) ~[elasticsearch-5.5.0.jar:5.5.0]

错误关键字 No shard available for


0x01  磁盘写满

经过搜索发现,该类报错一般是磁盘慢导致的,通过命令df -h发现,存储数据/data目录使用量正常,但是es7的安装目录/app已经写满;

排查发现,安装目录内es7的日志文件过多,导致/app目录被写满,删除部分过期日志后,集群恢复正常;


0x02 总结

除了ES数据存储目录需要保持足够的使用空间,ES自己的运行日志也需要有足够的磁盘空间,以上两个任意1个磁盘空间满了,都会包磁盘不足的错误;

注意在es有大量业务的情况下,es自身也会产生大量日志,需要注意磁盘清理或者配置监控。