1、repository-hdfs的安装
(1)去elasticsearch官网下载repository-hdfs安装包
(elasticsearch-5.4.0对应的版本是repository-hdfs-5.4.0)
下载地址:
https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/repository-hdfs.html
(2)将压缩包拷到集群下,进入elasticsearch目录:
执行安装:
sudo bin/elasticsearch-plugin install
file:///home/huangyan/repository-hdfs-5.4.0.zip
2、源集群创建仓库
源集群创建仓库:
curl -XPUT 'http://host:9200/_snapshot/my_hdfs_repository?pretty' -d '{
"type": "hdfs",
"settings": {
"uri": "hdfs://host:8020",
"path": "elasticsearch/repositories/my_hdfs_repository",
"conf.dfs.client.read.shortcircuit": "false"
}
}'
这里conf.dfs.client.read.shortcircuit如果设置为true,那么hdfs里需要配置一些额外的东西,设置为true能减少通信次数,加快速度,如果不想折腾,还是建议设置为false。
查看创建好的仓库:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'
删除仓库:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'
3、索引备份:
这里备份history_data_index-00002索引:
curl -XPUT 'http://10.45.157.*:9200/_snapshot/my_hdfs_repository/snapshot_2?wait_for_completion=false&pretty' -d '{
"indices": "history_data_index-00002",
"ignore_unavailable": true,
"include_global_state": false
}'
参数解释:
wait_for_completion=true会一直等待备份结束。
wait_for_completion=false会立即返回,备份在后台进行,可以使用下面的api查看备份的进度:
curl -XGET '10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_status?pretty'
"ignore_unavailable": true忽略有问题的shard
"include_global_state": false快照里不放入集群global信息
注意:
如果执行上述命令式报出could not read repository data from index blob的异常,如下图,则是java的权限问题
需要修改配置如下:
(1)修改plugin-security.policy文件,添加内容如下:
permission javax.security.auth.AuthPermission "getSubject";
permission javax.security.auth.AuthPermission "doAs";
permission javax.security.auth.AuthPermission "modifyPrivateCredentials";
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.RuntimePermission "getClassLoader";
permission java.lang.RuntimePermission "shutdownHooks";
permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
permission javax.security.auth.AuthPermission "doAs";
permission javax.security.auth.AuthPermission "getSubject";
permission javax.security.auth.AuthPermission "modifyPrivateCredentials";
permission java.security.AllPermission;
permission java.util.PropertyPermission "*", "read,write";
permission javax.security.auth.PrivateCredentialPermission "org.apache.hadoop.security.Credentials * \"*\"", "read";
(2)还需要手动配置一次/usr/elk/elasticsearch/config/jvm.options文件,在jvm.options文件中添加以下信息:
-Djava.security.policy=/usr/elk/elasticsearch/plugins/repository-hdfs/plugin-security.policy
(3)重启ES,再次执行上面的索引备份即可成功
查看快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3?pretty'
查看所有的快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/_all?pretty'
删除快照:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_1_restore?pretty'
4、恢复快照
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{
"indices": "history_data_index-00002",
"index_settings": {
"index.number_of_replicas": 1
},
"ignore_index_settings": [
"index.refresh_interval"
]
}'
恢复快照的时候分片的数量是不能改变的(要想改变分片数量只能re-index)。但是副本的数量是可以重新指定的(index.number_of_replicas )
如果集群中有与要恢复的索引名字相同的索引,可以通过"rename_pattern"和"rename_replacement"参数来对索引进行重命名,下面命令就可以将person_list_data_index_yinchuan索引的名称改为restored_index_yinchuan:
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3/_restore?pretty' -d '{
"indices": "person_list_data_index_yinchuan",
"ignore_unavailable": "true",
"include_global_state": false,
"rename_pattern": "person_list_data_index_(.+)",
"rename_replacement": "restored_index_$1"
}'
查看恢复状态:
curl -XGET 'http://10.45.*:9200/_recovery/'
如果是在别的集群上进行快照恢复,需要在目标集群创建仓库:
curl -XPUT 'http://目标host:9200/_snapshot/my_backup?pretty' -d '{
"type": "hdfs",
"settings": {
"uri": "hdfs://待备份host:8020",
"path": "/user/master/elasticsearch/repositories/my_hdfs_repository",
"conf.dfs.client.read.shortcircuit": "false"
}
}'
然后恢复:
curl -XPOST 'http://目标host:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{
"indices": "history_data_index-00002",
"index_settings": {
"index.number_of_replicas": 1
},
"ignore_index_settings": [
"index.refresh_interval"
]
}'
如果按照索引的别名创建快照的话,恢复时直接全部恢复:
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_4/_restore?pretty'
5、补充:
修改包:
需要将/usr/elk/elasticsearch/plugins/repository-hdfs路径下的一些包的版本改为和hdfs相同的版本,例如我现在是2.7.1的版本,要改为2.6.0的版本。
/usr/cdh/phoenix/lib路径下有2.6.0的版本,需要改的包有:hadoop-annotations-2.7.1.jar,hadoop-auth-2.7.1.jar,hadoop-client-2.7.1.jar,
hadoop-common-2.7.1.jar,hadoop-hdfs-2.7.1.jar
还需要将htrace-core-3.1.0-incubating.jar改为htrace-core4-4.0.1-incubating.jar才能成功重启es
查看所有的jar包:
cd /opt/cloudera/parcels/CDH/jars/
ls
将htrace-core4-4.0.1-incubating.jar拷贝到/usr/elk/elasticsearch/plugins/repository-hdfs/下:
cp htrace-core4-4.0.1-incubating.jar /usr/elk/elasticsearch/plugins/repository-hdfs/
查看hdfs下的路径:
查看根目录下的子目录:sudo -u hdfs hadoop fs -ls /
查看/user下面的子目录:sudo -u hdfs hadoop fs -ls /user
创建仓库时,如果path设置为:"path": "elasticsearch/repositories/my_hdfs_repository",
则其存储的路径为:/user/elasticsearch/elasticsearch/repositories/my_hdfs_repository
查看仓库下的快照: sudo -u hdfs hadoop fs -ls /user/elasticsearch/elasticsearch/repositories/my_hdfs_repository
6、测试
1、备份532,391条数据1.52G(3.03G)共花费208541ms,大概3分半钟
恢复532391条数据,花费时间大概为6.5s
2、备份1,578,227条数据9.09G(18.1G)共花费1510737ms,大概25分钟
恢复1,578,227条数据,花费时间大概为105s
总体来说快照备份的速度不是很快,建议直接用reindex来迁移索引,但是要注意,5.4.0版本的es是不支持跨集群reindex的