利用HDFS实现ElasticSearch7.2容灾方案
目录
- 利用HDFS实现ElasticSearch7.2容灾方案
- 前言
- 快照版本兼容
- 备份集群
- HDFS文件系统
- 软件下载
- JDK环境
- 配置系统环境变量
- hadoop配置
- 配置JAVA_HOME
- 配置核心组件文件
- 配置文件系统
- 配置mapred
- 配置 yarn-site.xml
- 格式化文件系统
- 启动hdfs
- 访问
- ES插件安装
- 插件下载
- 插件安装
- 创建仓库
- 创建快照
- 恢复快照
- 备份恢复时间
- 案例快照详情
- 案例快照恢复详情
- 常见问题
- 启动hdfs
- 问题1
- 问题2
- 创建仓库
- 问题1
- 参考文档
前言
Elasticsearch 副本提供了高可靠性,它们让你可以容忍零星的节点丢失而不会中断服务。但是,副本并不提供对灾难性故障的保护。对这种情况,就需要的是对集群真正的备份(在某些东西确实出问题的时候有一个完整的拷贝)。
案例模拟ElasticSearch7.2集群环境,采用snapshot
API基于快照的方式备份集群。
案例演示HDFS
分布式文件系统作为仓库举例。
快照版本兼容
备份集群
HDFS文件系统
软件下载
hadoop-3.3.0.tar.gz
JDK环境
hadoop java编写,运行需依赖jvm
jdk-8u161-linux-x64.tar.gz
配置系统环境变量
#JAVA
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
export CLASSPATH=$JAVA_HOME/libdt.jar:$JAVA_HOME/tools.jar
#hadoop
export HADOOP_HOME=/home/hadoop/hadoop-3.3.0
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
hadoop配置
hadoop-3.3.0/etc/hadoop 的目录下
配置JAVA_HOME
hadoop-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
配置核心组件文件
core-site.xml需要在和之间添加
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.16.176.103:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data</value>
</property>
配置文件系统
hdfs-site.xml需要在和之间添加
<!--namenode-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/namenode</value>
</property>
<!--datanode-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/datanode</value>
</property>
<!--副本数,默认1-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--禁用权限检查,配合es-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
配置mapred
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置 yarn-site.xml
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>elasticsearch01</value>
</property>
格式化文件系统
hdfs namenode -format
启动hdfs
start-dfs.sh
$ start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
Starting datanodes
Starting secondary namenodes [host103]
访问
http://localhost:9870/
ES插件安装
集群中每个节点都必须安装hdfs插件,安装后需重启ES
插件下载
插件版本和ES版本相对应
repository-hdfs-7.2.0.zip
插件安装
提前下载软件包,离线安装
集群中各节点依次安装
sudo bin/elasticsearch-plugin install file:///path/to/plugin.zip
$ ./elasticsearch-plugin install file:///home/es/repository-hdfs-7.2.0.zip
-> Downloading file:///home/es/repository-hdfs-7.2.0.zip
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessClassInPackage.sun.security.krb5
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.RuntimePermission loadLibrary.jaas
* java.lang.RuntimePermission loadLibrary.jaas_nt
* java.lang.RuntimePermission loadLibrary.jaas_unix
* java.lang.RuntimePermission setContextClassLoader
* java.lang.RuntimePermission shutdownHooks
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.net.SocketPermission localhost:0 listen,resolve
* java.security.SecurityPermission insertProvider.SaslPlainServer
* java.security.SecurityPermission putProviderProperty.SaslPlainServer
* java.util.PropertyPermission * read,write
* javax.security.auth.AuthPermission doAs
* javax.security.auth.AuthPermission getSubject
* javax.security.auth.AuthPermission modifyPrincipals
* javax.security.auth.AuthPermission modifyPrivateCredentials
* javax.security.auth.AuthPermission modifyPublicCredentials
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KerberosTicket * "*" read
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KeyTab * "*" read
* javax.security.auth.PrivateCredentialPermission org.apache.hadoop.security.Credentials * "*" read
* javax.security.auth.kerberos.ServicePermission * initiate
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]y
-> Installed repository-hdfs
$
创建仓库
- 创建
PUT _snapshot/my_hdfs_repository
{
"type": "hdfs", --类型
"settings": {
"uri": "hdfs://172.16.176.103:9000/", --hdfs访问url
"path": "/data",
"conf.dfs.client.read.shortcircuit": "false"
}
}
- 查看
GET /_snapshot
{
"my_hdfs_repository" : {
"type" : "hdfs",
"settings" : {
"path" : "/data",
"uri" : "hdfs://172.16.176.103:9000/",
"conf" : {
"dfs" : {
"client" : {
"read" : {
"shortcircuit" : "false"
}
}
}
}
}
}
}
创建快照
- 创建快照
不等待快照完成,即刻返回结果
PUT _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12
{
"indices": "i_xfjbblxt_cxfw_xfj_d12"
}
- 查看快照当前状态
GET _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12
{
"snapshots" : [
{
"snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",
"uuid" : "-BS9XjxvS1Sp6wW_bT02lA",
"version_id" : 7020099,
"version" : "7.2.0",
"indices" : [
"i_xfjbblxt_cxfw_xfj_d12"
],
"include_global_state" : true,
"state" : "IN_PROGRESS", --正在做快照中
"start_time" : "2020-10-12T14:04:49.425Z", --开始时间
"start_time_in_millis" : 1602511489425,
"end_time" : "1970-01-01T00:00:00.000Z",
"end_time_in_millis" : 0,
"duration_in_millis" : -1602511489425,
"failures" : [ ],
"shards" : {
"total" : 0,
"failed" : 0,
"successful" : 0
}
}
]
}
- 完成状态
{
"snapshots" : [
{
"snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12", --快照名称
"uuid" : "-BS9XjxvS1Sp6wW_bT02lA",
"version_id" : 7020099,
"version" : "7.2.0",
"indices" : [
"i_xfjbblxt_cxfw_xfj_d12" --索引
],
"include_global_state" : true,
"state" : "SUCCESS", --快照成功
"start_time" : "2020-10-12T14:04:49.425Z", --开始时间
"start_time_in_millis" : 1602511489425, --开始时间戳
"end_time" : "2020-10-12T14:24:33.942Z", --结束时间
"end_time_in_millis" : 1602512673942, --结束时间戳
"duration_in_millis" : 1184517, --耗时(毫秒)
"failures" : [ ],
"shards" : {
"total" : 5, --总分片
"failed" : 0,
"successful" : 5 --成功分片
}
}
]
}
恢复快照
快照恢复如果恢复到原索引中,需要先把原索引关闭或者先删除后,在进行快照恢复
- 恢复快照
POST _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12/_restore
{
"indices": "i_xfjbblxt_cxfw_xfj_d12" --快照备份索引名称
,"rename_pattern": "i_xfjbblxt_cxfw_xfj_d12" --检索匹配到的索引名称
, "rename_replacement": "restored_i_xfjbblxt_cxfw_xfj_d12" --重命名索引
}
- 状态查看
{
"restored_i_xfjbblxt_cxfw_xfj_d12" : {
"shards" : [
{
"id" : 4,
"type" : "SNAPSHOT",
"stage" : "INDEX",
"primary" : true,
"start_time_in_millis" : 1602571287856,
"total_time_in_millis" : 1249147,
"source" : {
"repository" : "my_hdfs_repository",
"snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",
"version" : "7.2.0",
"index" : "i_xfjbblxt_cxfw_xfj_d12",
"restoreUUID" : "KM1EaKsAQkO4OxB0PwKe0Q"
},
"target" : {
"id" : "DWvUrfqQRxGLIWm6SQmunA",
"host" : "172.16.176.104",
"transport_address" : "172.16.176.104:9300",
"ip" : "172.16.176.104",
"name" : "node-104"
},
"index" : {
"size" : {
"total_in_bytes" : 8312825377,
"reused_in_bytes" : 0,
"recovered_in_bytes" : 6781859331,
"percent" : "81.6%"
},
"files" : {
"total" : 104,
"reused" : 0,
"recovered" : 86,
"percent" : "82.7%"
},
"total_time_in_millis" : 1249039,
"source_throttle_time_in_millis" : 0,
"target_throttle_time_in_millis" : 0
},
"translog" : {
"recovered" : 0,
"total" : 0,
"percent" : "100.0%",
"total_on_start" : 0,
"total_time_in_millis" : 0
},
"verify_index" : {
"check_index_time_in_millis" : 0,
"total_time_in_millis" : 0
}
},
--部分省略
备份恢复时间
案例快照详情
第一次快照
节点数 | 主分片 | 副本分配 | 数据量 | 大小 | 快照大小 | 耗时(快照) |
3 | 5 | 1 | 5149535 | 77.4gb | 40gb | 19.74195分钟 |
案例快照恢复详情
快照恢复过程为并行恢复
分片 | 耗时(恢复) | 恢复字节 |
0(主) | 27.42分钟 | 7.75G |
1(主) | 27.14分钟 | 7.72G |
2(主) | 27.45分钟 | 7.75G |
3(主) | 25.89分钟 | 7.74G |
4(主) | 25.5分钟 | 7.74G |
0(副) | 18.65分钟 | 7.75G |
1(副) | 10.3分钟 | 7.72G |
2(副) | 17.21分钟 | 7.75G |
3(副) | 10.6分钟 | 7.74G |
4(副) | 18.32分钟 | 7.74G |
常见问题
启动hdfs
问题1
$ start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
Last login: Sun Oct 11 22:32:11 CST 2020 from 172.16.176.46 on pts/1
host103: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
Last login: Sun Oct 11 22:32:23 CST 2020 on pts/1
localhost: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [host103]
Last login: Sun Oct 11 22:32:24 CST 2020 on pts/1
host103: ERROR: JAVA_HOME is not set and could not be found.
- 解决
配置java环境变量
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
export CLASSPATH=$JAVA_HOME/libdt.jar:$JAVA_HOME/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
问题2
$ start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
host103: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting datanodes
localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting secondary namenodes [host103]
host103: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
- 解决
hadoop用户执行
[hadoop@host103 ~]$ ssh-copy-id hadoop@host103
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@host103's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@host103'"
and check to make sure that only the key(s) you wanted were added.
创建仓库
问题1
- 创建
PUT _snapshot/my_hdfs_repository
{
"type": "hdfs",
"settings": {
"uri": "hdfs://172.16.176.103:9000/",
"path": "/",
"conf.dfs.client.read.shortcircuit": "false"
}
}
- 错误
error": {
"root_cause": [
{
"type": "repository_exception",
"reason": "[my_hdfs_repository] cannot create blob store"
}
],
"type": "repository_exception",
"reason": "[my_hdfs_repository] cannot create blob store",
"caused_by": {
"type": "unchecked_i_o_exception",
"reason": "Cannot create HDFS repository for uri [hdfs://172.16.176.103:9000/]",
"caused_by": {
"type": "access_control_exception",
"reason": "Permission denied: user=es, access=WRITE, inode=\"/\":hadoop:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:360)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1909)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1893)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1852)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3407)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1161)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:739)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)\n",
- 问题解决
新增hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
参考文档
- HDFS插件
https://www.elastic.co/guide/en/elasticsearch/plugins/7.2/repository-hdfs.html
- HDFS SingleCluster
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html