建议 : 一个分片不能大于40G数据


red:

找到状态为 UNASSIGNED 的 shard,并找到分片所在的node,然后查看相应的node是否挂掉,如果挂掉,需要 重启node,再观察集群和分片状态

curl -XGET {ip:port}/_cat/shards?v

index shard prirep state docs store ip node
tsf-oss-scalable-log@2021-07-21 2 r STARTED 1 9.9kb 172.16.16.42 node-172.16.16.42
tsf-oss-scalable-log@2021-07-21 2 p STARTED 1 9.9kb 172.16.16.33 node-172.16.16.33
tsf-oss-scalable-log@2021-07-21 1 r STARTED 1 9.9kb 172.16.16.33 node-172.16.16.33

如果node都没挂掉,那么需要手动分配shard

curl -XPOST {ip:port}/_cluster/reroute?retry_failed=true -d
'{
"commands": [
{
"allocate_stale_primary": { # allocate_empty_primary直接分片清空 allocate_stale_primary有主分片数据用旧数据
"index": "${index_name}", # shard所在的index
"shard": 1, # shard号
"node": "${node_name}", # 分配到哪个node上
"accept_data_loss":true # 是否接受数据有丢失
}
}
]
}'

Yellow

集群出现 yellow,说明有从分片未分配,此时可以通过以下查看集群各分片状态的数量:

curl -XGET {ip:port}/_cluster/health
{
"cluster_name" : "tsf_es_cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 170,
"active_shards" : 340,
"relocating_shards" : 0, # 正在进行搬迁的shard数
"initializing_shards" : 0, # 正在初始化的shard数
"unassigned_shards" : 0, # 未分配的shard数

"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

若发现未分配的shard较多,可以适当调整集群的恢复速度,加快恢复,待集群状态变为green时,再将恢复速度调为原来的值

// 调大恢复速度
curl -XPUT {ip:port}/_cluster/settings -d
'{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": 56,
"indices.recovery.max_bytes_per_sec": "400mb"
}
}'
// 调回原来的值
curl -XPUT {ip:port}/_cluster/settings -d
'{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": 8,
"indices.recovery.max_bytes_per_sec": "40mb" # 默认40m每秒
}
}'

问题1:因为某个索引很大,副本分配 导致整个集群其他分片都yellow

curl --location --request GET 'http://localhost:9201/_cat/shards?v' |grep -i init

elasticsearch 集群Yellow或者Red 告警_初始化

解决:将这个索引副本改为0,不在副本

curl -XPUT '<vip>:<vport>/<INDEX_NAME>/_settings' -H 'Content-type: application/json' -d'{  number_of_replicas: 0}'