1、repository-hdfs的安装

(1)去elasticsearch官网下载repository-hdfs安装包

(elasticsearch-5.4.0对应的版本是repository-hdfs-5.4.0)

hdfs 快照命令 hdfs快照恢复_hdfs 快照命令

下载地址:

https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/repository-hdfs.html

hdfs 快照命令 hdfs快照恢复_java_02

(2)将压缩包拷到集群下,进入elasticsearch目录:

执行安装:

sudo bin/elasticsearch-plugin install

file:///home/huangyan/repository-hdfs-5.4.0.zip

2、源集群创建仓库

源集群创建仓库:

curl -XPUT 'http://host:9200/_snapshot/my_hdfs_repository?pretty' -d '{
    "type": "hdfs",
    "settings": {
        "uri": "hdfs://host:8020",
        "path": "elasticsearch/repositories/my_hdfs_repository",
        "conf.dfs.client.read.shortcircuit": "false"        
    }
}'

这里conf.dfs.client.read.shortcircuit如果设置为true,那么hdfs里需要配置一些额外的东西,设置为true能减少通信次数,加快速度,如果不想折腾,还是建议设置为false。

hdfs 快照命令 hdfs快照恢复_hdfs_03

查看创建好的仓库:

curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'

hdfs 快照命令 hdfs快照恢复_hdfs 快照命令_04

 

删除仓库:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'

 

3、索引备份

这里备份history_data_index-00002索引:

curl -XPUT 'http://10.45.157.*:9200/_snapshot/my_hdfs_repository/snapshot_2?wait_for_completion=false&pretty' -d '{

  "indices": "history_data_index-00002",

  "ignore_unavailable": true,

  "include_global_state": false

}'

参数解释:

wait_for_completion=true会一直等待备份结束。

wait_for_completion=false会立即返回,备份在后台进行,可以使用下面的api查看备份的进度:

curl -XGET '10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_status?pretty'

"ignore_unavailable": true忽略有问题的shard

"include_global_state": false快照里不放入集群global信息

注意:

如果执行上述命令式报出could not read repository data from index blob的异常,如下图,则是java的权限问题

hdfs 快照命令 hdfs快照恢复_hdfs_05

需要修改配置如下:

(1)修改plugin-security.policy文件,添加内容如下:

  permission javax.security.auth.AuthPermission "getSubject";

  permission javax.security.auth.AuthPermission "doAs";

  permission javax.security.auth.AuthPermission "modifyPrivateCredentials";

  permission java.lang.RuntimePermission "accessDeclaredMembers";

  permission java.lang.RuntimePermission "getClassLoader";

  permission java.lang.RuntimePermission "shutdownHooks";

  permission java.lang.reflect.ReflectPermission "suppressAccessChecks";

  permission javax.security.auth.AuthPermission "doAs";

  permission javax.security.auth.AuthPermission "getSubject";

  permission javax.security.auth.AuthPermission "modifyPrivateCredentials";

  permission java.security.AllPermission;

  permission java.util.PropertyPermission "*", "read,write";

  permission javax.security.auth.PrivateCredentialPermission "org.apache.hadoop.security.Credentials * \"*\"", "read";

 

hdfs 快照命令 hdfs快照恢复_hdfs 快照命令_06

hdfs 快照命令 hdfs快照恢复_java_07

(2)还需要手动配置一次/usr/elk/elasticsearch/config/jvm.options文件,在jvm.options文件中添加以下信息:

-Djava.security.policy=/usr/elk/elasticsearch/plugins/repository-hdfs/plugin-security.policy

 

hdfs 快照命令 hdfs快照恢复_elasticsearch_08

(3)重启ES,再次执行上面的索引备份即可成功

hdfs 快照命令 hdfs快照恢复_elasticsearch_09

查看快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3?pretty'
查看所有的快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/_all?pretty'

删除快照:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_1_restore?pretty'

4、恢复快照

curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{
  "indices": "history_data_index-00002",
   "index_settings": {
    "index.number_of_replicas": 1
  },
  "ignore_index_settings": [
    "index.refresh_interval"
  ]
}'

恢复快照的时候分片的数量是不能改变的(要想改变分片数量只能re-index)。但是副本的数量是可以重新指定的(index.number_of_replicas )

如果集群中有与要恢复的索引名字相同的索引,可以通过"rename_pattern"和"rename_replacement"参数来对索引进行重命名,下面命令就可以将person_list_data_index_yinchuan索引的名称改为restored_index_yinchuan:

curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3/_restore?pretty' -d '{
  "indices": "person_list_data_index_yinchuan",
  "ignore_unavailable": "true",
  "include_global_state": false,
  "rename_pattern": "person_list_data_index_(.+)",
  "rename_replacement": "restored_index_$1"
}'

hdfs 快照命令 hdfs快照恢复_java_10

查看恢复状态:
curl -XGET 'http://10.45.*:9200/_recovery/'

如果是在别的集群上进行快照恢复,需要在目标集群创建仓库:

curl -XPUT 'http://目标host:9200/_snapshot/my_backup?pretty' -d '{
    "type": "hdfs",
    "settings": {
        "uri": "hdfs://待备份host:8020",
        "path": "/user/master/elasticsearch/repositories/my_hdfs_repository",
        "conf.dfs.client.read.shortcircuit": "false"        
    }
}'

然后恢复:

curl -XPOST 'http://目标host:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{
  "indices": "history_data_index-00002",
   "index_settings": {
    "index.number_of_replicas": 1
  },
  "ignore_index_settings": [
    "index.refresh_interval"
  ]
}'

如果按照索引的别名创建快照的话,恢复时直接全部恢复:
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_4/_restore?pretty'

5、补充:

修改包:
需要将/usr/elk/elasticsearch/plugins/repository-hdfs路径下的一些包的版本改为和hdfs相同的版本,例如我现在是2.7.1的版本,要改为2.6.0的版本。
/usr/cdh/phoenix/lib路径下有2.6.0的版本,需要改的包有:hadoop-annotations-2.7.1.jar,hadoop-auth-2.7.1.jar,hadoop-client-2.7.1.jar,
hadoop-common-2.7.1.jar,hadoop-hdfs-2.7.1.jar
还需要将htrace-core-3.1.0-incubating.jar改为htrace-core4-4.0.1-incubating.jar才能成功重启es

查看所有的jar包:
cd /opt/cloudera/parcels/CDH/jars/
ls
将htrace-core4-4.0.1-incubating.jar拷贝到/usr/elk/elasticsearch/plugins/repository-hdfs/下:
cp htrace-core4-4.0.1-incubating.jar /usr/elk/elasticsearch/plugins/repository-hdfs/

查看hdfs下的路径:
查看根目录下的子目录:sudo -u hdfs hadoop fs -ls /
查看/user下面的子目录:sudo -u hdfs hadoop fs -ls /user
创建仓库时,如果path设置为:"path": "elasticsearch/repositories/my_hdfs_repository",
则其存储的路径为:/user/elasticsearch/elasticsearch/repositories/my_hdfs_repository
查看仓库下的快照: sudo -u hdfs hadoop fs -ls /user/elasticsearch/elasticsearch/repositories/my_hdfs_repository

 6、测试

1、备份532,391条数据1.52G(3.03G)共花费208541ms,大概3分半钟

    恢复532391条数据,花费时间大概为6.5s

2、备份1,578,227条数据9.09G(18.1G)共花费1510737ms,大概25分钟

     恢复1,578,227条数据,花费时间大概为105s

总体来说快照备份的速度不是很快,建议直接用reindex来迁移索引,但是要注意,5.4.0版本的es是不支持跨集群reindex的