1、说明

集群本来有三个节点,但是异常情况导致两个节点安装es的磁盘丢失了,之后恢复了磁盘,然后恢复集群,恢复集群我是把好的es的整个目录拷贝到刚恢复的节点上,然后修改配置文件。

好的节点的配置文件如下所示:

$ egrep -v "^#|^$" elasticsearch.yml
cluster.name: elasticsearch
node.name: "node 14.69"
bootstrap.mlockall: true
network.host: 192.168.14.69
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 60s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.14.40","192.168.14.177","192.168.14.69"]

拷贝后修改刚恢复的两个节点的配置文件,但是由于疏忽,忘记修改node.name这个参数,所以在集群起来后所有节点的node.name都是node 14.69,之后又新的数据进来后,就出现了 unassigned 的分片,在head插件的页面上就会在最上面出现一行 unassigned 的分片,集群的状态也变为red。

上面的情况是怎么出现的?下面分析下:

新建索引,默认是5个分片,1个副本,副本分片的主要目的就是为了故障转移,如果持有主分片的节点挂掉了,一个副本分片就会晋升为主分片的角色。

副本分片和主分片是不能放到一个节点上面的,当副本分片没有办法分配到其他的节点上,所以出现所有副本分片都unassigned得情况。因为配置失误,所以集群被识别为只有一个节点。

2、解决办法:

查看节点的状态

$ curl -XGET http://192.168.14.69:9200/_cluster/health\?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 56,
  "active_shards" : 112,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 52,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
可以看到集群状态时red,未赋值的分片数是52个。
首先设置副本数为0
$ curl -XPUT "http://192.168.14.69:9200/_settings" -d'        
{
  "number_of_replicas" : 0
}'

再次查看节点的状态

$ curl -XGET http://192.168.14.69:9200/_cluster/health\?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 56,
  "active_shards" : 56,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 26,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}

可以看到集群未赋值的分片数是26,少了一半,说明少的是正常节点上的副本分片。

查看所有的分片信息

$ curl http://192.168.14.40:9200/_cat/shards
voicereprot      2 p UNASSIGNED                                          
voicereprot      0 p UNASSIGNED                                          
voicereprot      3 p STARTED     15301   5.8mb 192.168.14.40  node 14.40  
voicereprot      1 p STARTED     15461   5.8mb 192.168.14.177 node 14.177 
voicereprot      5 p STARTED     14540   5.3mb 192.168.14.40  node 14.40  
voicereprot      4 p STARTED     16375   6.2mb 192.168.14.69  node 14.69   
spipe            2 p UNASSIGNED                                          
spipe            0 p STARTED         0    144b 192.168.14.69  node 14.69   
spipe            3 p STARTED         0    144b 192.168.14.177 node 14.177

...........此处省略

查看所有的节点信息

$ curl http://192.168.14.40:9200/_nodes/process?pretty
{
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "mc3rloswRgqUJ5VkL4nxBw" : {
      "name" : "node 14.177",
      "transport_address" : "inet[/192.168.14.177:9300]",
      "host" : "SZB-L0019761",
      "ip" : "192.168.14.177",
      "version" : "1.7.3",
      "build" : "05d4530",
      "http_address" : "inet[/192.168.14.177:9200]",
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 21695,
        "max_file_descriptors" : 65536,
        "mlockall" : false
      }
    },
    "bICASdrQSe2ddNLhHw0Vyw" : {
      "name" : "node 14.40",
      "transport_address" : "inet[/192.168.14.40:9300]",
      "host" : "DEV-L0003234",
      "ip" : "192.168.14.40",
      "version" : "1.7.3",
      "build" : "05d4530",
      "http_address" : "inet[/192.168.14.40:9200]",
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 12999,
        "max_file_descriptors" : 8192,
        "mlockall" : false
      }
    },
    "kKzGBiXXTICg6f0UrT9_BA" : {
      "name" : "node 14.69",
      "transport_address" : "inet[/192.168.14.69:9300]",
      "host" : "DEV-L0000155",
      "ip" : "192.168.14.69",
      "version" : "1.7.3",
      "build" : "05d4530",
      "http_address" : "inet[/192.168.14.69:9200]",
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 29986,
        "max_file_descriptors" : 65535,
        "mlockall" : false
      }
    }
  }
}

查看节点信息,其中第一行 mc3rloswRgqUJ5VkL4nxBw、bICASdrQSe2ddNLhHw0Vyw、kKzGBiXXTICg6f0UrT9_BA 分别是节点的唯一标识。


一个一个转移未赋值的分片

$ curl -XPOST '192.168.14.40:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "voicereprot",
                  "shard" : 0,
                  "node" : "bICASdrQSe2ddNLhHw0Vyw",
                  "allow_primary" : true
              }
            }
        ]
    }'

之后的只需要修改四个变量, 节点IP、index(索引名)、shard(分片标记)、node(节点唯一标识)

这里只转移到14.40这一个节点上

再次再次查看节点的状态,可以发现已经转移了一个分片

$ curl http://192.168.14.40:9200/_cluster/health\?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 57,
  "active_shards" : 57,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 25,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
陆续转移其他分片
$ curl -XPOST '192.168.14.40:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "voicereprot",
                  "shard" : 2,
                  "node" : "bICASdrQSe2ddNLhHw0Vyw",
                  "allow_primary" : true
              }
            }
        ]
    }'

...............此处省略

最后执行完毕可以发现所有的未赋值分片都已经转移完毕

$ curl http://192.168.14.40:9200/_cat/shards
voicereprot      4 p STARTED  16375   6.2mb 192.168.14.69  node 14.69   
voicereprot      0 p STARTED      0    144b 192.168.14.177 node 14.177 
voicereprot      3 p STARTED  15301   5.8mb 192.168.14.40  node 14.40  
voicereprot      1 p STARTED  15461   5.8mb 192.168.14.177 node 14.177 
voicereprot      5 p STARTED  14540   5.3mb 192.168.14.40  node 14.40 
......此处省略
$ curl http://192.168.14.40:9200/_cluster/health\?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 82,
  "active_shards" : 82,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}

最后恢复副本为1

$ curl -XPUT "http://192.168.14.69:9200/_settings" -d'        
{
  "number_of_replicas" : 1
}'

可以把转移分片的过程写入到一个脚本里边,把索引名和分片标记定义为变量,把变量的值过滤到一个文档中,脚本通过循环执行对对应的索引和分片标记做处理。