利用HDFS实现ElasticSearch7.2容灾方案

目录

  • 利用HDFS实现ElasticSearch7.2容灾方案
  • 前言
  • 快照版本兼容
  • 备份集群
  • HDFS文件系统
  • 软件下载
  • JDK环境
  • 配置系统环境变量
  • hadoop配置
  • 配置JAVA_HOME
  • 配置核心组件文件
  • 配置文件系统
  • 配置mapred
  • 配置 yarn-site.xml
  • 格式化文件系统
  • 启动hdfs
  • 访问
  • ES插件安装
  • 插件下载
  • 插件安装
  • 创建仓库
  • 创建快照
  • 恢复快照
  • 备份恢复时间
  • 案例快照详情
  • 案例快照恢复详情
  • 常见问题
  • 启动hdfs
  • 问题1
  • 问题2
  • 创建仓库
  • 问题1
  • 参考文档

前言

Elasticsearch 副本提供了高可靠性,它们让你可以容忍零星的节点丢失而不会中断服务。但是,副本并不提供对灾难性故障的保护。对这种情况,就需要的是对集群真正的备份(在某些东西确实出问题的时候有一个完整的拷贝)。

案例模拟ElasticSearch7.2集群环境,采用snapshot API基于快照的方式备份集群。

案例演示HDFS分布式文件系统作为仓库举例。

快照版本兼容

备份集群

HDFS文件系统

软件下载

下载地址

hadoop-3.3.0.tar.gz

JDK环境

hadoop java编写,运行需依赖jvm

jdk-8u161-linux-x64.tar.gz

配置系统环境变量

#JAVA
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
export CLASSPATH=$JAVA_HOME/libdt.jar:$JAVA_HOME/tools.jar
#hadoop
export HADOOP_HOME=/home/hadoop/hadoop-3.3.0
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

hadoop配置

hadoop-3.3.0/etc/hadoop 的目录下

配置JAVA_HOME

hadoop-env.sh

export JAVA_HOME=/home/hadoop/jdk1.8.0_161
配置核心组件文件

core-site.xml需要在和之间添加

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://172.16.176.103:9000</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/data</value>
</property>
配置文件系统

hdfs-site.xml需要在和之间添加

<!--namenode-->
<property>
        <name>dfs.namenode.name.dir</name>
        <value>/data/namenode</value>
</property>
<!--datanode-->
<property>
        <name>dfs.datanode.data.dir</name>
        <value>/data/datanode</value>
</property>
<!--副本数,默认1-->
<property>
   		<name>dfs.replication</name>
        <value>1</value>
</property>
<!--禁用权限检查,配合es-->
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
配置mapred

mapred-site.xml

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
配置 yarn-site.xml

yarn-site.xml

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>elasticsearch01</value>
</property>

格式化文件系统

hdfs namenode -format

启动hdfs

start-dfs.sh

$ start-dfs.sh 
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
Starting datanodes
Starting secondary namenodes [host103]

访问

http://localhost:9870/

ES插件安装

集群中每个节点都必须安装hdfs插件,安装后需重启ES

插件下载

插件版本和ES版本相对应

下载地址

repository-hdfs-7.2.0.zip

插件安装

提前下载软件包,离线安装

集群中各节点依次安装

sudo bin/elasticsearch-plugin install file:///path/to/plugin.zip

$ ./elasticsearch-plugin install file:///home/es/repository-hdfs-7.2.0.zip 
-> Downloading file:///home/es/repository-hdfs-7.2.0.zip
[=================================================] 100%   
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessClassInPackage.sun.security.krb5
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.RuntimePermission loadLibrary.jaas
* java.lang.RuntimePermission loadLibrary.jaas_nt
* java.lang.RuntimePermission loadLibrary.jaas_unix
* java.lang.RuntimePermission setContextClassLoader
* java.lang.RuntimePermission shutdownHooks
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.net.SocketPermission localhost:0 listen,resolve
* java.security.SecurityPermission insertProvider.SaslPlainServer
* java.security.SecurityPermission putProviderProperty.SaslPlainServer
* java.util.PropertyPermission * read,write
* javax.security.auth.AuthPermission doAs
* javax.security.auth.AuthPermission getSubject
* javax.security.auth.AuthPermission modifyPrincipals
* javax.security.auth.AuthPermission modifyPrivateCredentials
* javax.security.auth.AuthPermission modifyPublicCredentials
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KerberosTicket * "*" read
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KeyTab * "*" read
* javax.security.auth.PrivateCredentialPermission org.apache.hadoop.security.Credentials * "*" read
* javax.security.auth.kerberos.ServicePermission * initiate
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed repository-hdfs
$

创建仓库

  • 创建
PUT _snapshot/my_hdfs_repository
{
  "type": "hdfs",	--类型
  "settings": {
    "uri": "hdfs://172.16.176.103:9000/",	--hdfs访问url
    "path": "/data",
    "conf.dfs.client.read.shortcircuit": "false"
  }
}
  • 查看
GET /_snapshot
{
  "my_hdfs_repository" : {
    "type" : "hdfs",
    "settings" : {
      "path" : "/data",
      "uri" : "hdfs://172.16.176.103:9000/",
      "conf" : {
        "dfs" : {
          "client" : {
            "read" : {
              "shortcircuit" : "false"
            }
          }
        }
      }
    }
  }
}

创建快照

  • 创建快照

不等待快照完成,即刻返回结果

PUT _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12
{
    "indices": "i_xfjbblxt_cxfw_xfj_d12"
}
  • 查看快照当前状态
GET _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",
      "uuid" : "-BS9XjxvS1Sp6wW_bT02lA",
      "version_id" : 7020099,
      "version" : "7.2.0",
      "indices" : [
        "i_xfjbblxt_cxfw_xfj_d12"
      ],
      "include_global_state" : true,
      "state" : "IN_PROGRESS",	--正在做快照中
      "start_time" : "2020-10-12T14:04:49.425Z",	--开始时间
      "start_time_in_millis" : 1602511489425,
      "end_time" : "1970-01-01T00:00:00.000Z",
      "end_time_in_millis" : 0,
      "duration_in_millis" : -1602511489425,
      "failures" : [ ],
      "shards" : {
        "total" : 0,
        "failed" : 0,
        "successful" : 0
      }
    }
  ]
}
  • 完成状态
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",	--快照名称
      "uuid" : "-BS9XjxvS1Sp6wW_bT02lA",
      "version_id" : 7020099,
      "version" : "7.2.0",
      "indices" : [
        "i_xfjbblxt_cxfw_xfj_d12"	--索引
      ],
      "include_global_state" : true,
      "state" : "SUCCESS",	--快照成功
      "start_time" : "2020-10-12T14:04:49.425Z",	--开始时间
      "start_time_in_millis" : 1602511489425,	--开始时间戳
      "end_time" : "2020-10-12T14:24:33.942Z",	--结束时间
      "end_time_in_millis" : 1602512673942,	--结束时间戳
      "duration_in_millis" : 1184517,	--耗时(毫秒)
      "failures" : [ ],
      "shards" : {
        "total" : 5,	--总分片
        "failed" : 0,
        "successful" : 5	--成功分片
      }
    }
  ]
}

恢复快照

快照恢复如果恢复到原索引中,需要先把原索引关闭或者先删除后,在进行快照恢复

  • 恢复快照
POST _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12/_restore
{
  "indices": "i_xfjbblxt_cxfw_xfj_d12"	--快照备份索引名称
  ,"rename_pattern": "i_xfjbblxt_cxfw_xfj_d12"	--检索匹配到的索引名称
  , "rename_replacement": "restored_i_xfjbblxt_cxfw_xfj_d12"	--重命名索引
}
  • 状态查看
{
  "restored_i_xfjbblxt_cxfw_xfj_d12" : {
    "shards" : [
      {
        "id" : 4,
        "type" : "SNAPSHOT",
        "stage" : "INDEX",
        "primary" : true,
        "start_time_in_millis" : 1602571287856,
        "total_time_in_millis" : 1249147,
        "source" : {
          "repository" : "my_hdfs_repository",
          "snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",
          "version" : "7.2.0",
          "index" : "i_xfjbblxt_cxfw_xfj_d12",
          "restoreUUID" : "KM1EaKsAQkO4OxB0PwKe0Q"
        },
        "target" : {
          "id" : "DWvUrfqQRxGLIWm6SQmunA",
          "host" : "172.16.176.104",
          "transport_address" : "172.16.176.104:9300",
          "ip" : "172.16.176.104",
          "name" : "node-104"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 8312825377,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 6781859331,
            "percent" : "81.6%"
          },
          "files" : {
            "total" : 104,
            "reused" : 0,
            "recovered" : 86,
            "percent" : "82.7%"
          },
          "total_time_in_millis" : 1249039,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 0
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      --部分省略

备份恢复时间

案例快照详情

第一次快照

节点数

主分片

副本分配

数据量

大小

快照大小

耗时(快照)

3

5

1

5149535

77.4gb

40gb

19.74195分钟

案例快照恢复详情

快照恢复过程为并行恢复

分片

耗时(恢复)

恢复字节

0(主)

27.42分钟

7.75G

1(主)

27.14分钟

7.72G

2(主)

27.45分钟

7.75G

3(主)

25.89分钟

7.74G

4(主)

25.5分钟

7.74G

0(副)

18.65分钟

7.75G

1(副)

10.3分钟

7.72G

2(副)

17.21分钟

7.75G

3(副)

10.6分钟

7.74G

4(副)

18.32分钟

7.74G

常见问题

启动hdfs

问题1

$ start-dfs.sh 
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
Last login: Sun Oct 11 22:32:11 CST 2020 from 172.16.176.46 on pts/1
host103: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
Last login: Sun Oct 11 22:32:23 CST 2020 on pts/1
localhost: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [host103]
Last login: Sun Oct 11 22:32:24 CST 2020 on pts/1
host103: ERROR: JAVA_HOME is not set and could not be found.
  • 解决

配置java环境变量

export JAVA_HOME=/home/hadoop/jdk1.8.0_161
export CLASSPATH=$JAVA_HOME/libdt.jar:$JAVA_HOME/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

问题2

$ start-dfs.sh 
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
host103: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting datanodes
localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting secondary namenodes [host103]
host103: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
  • 解决

hadoop用户执行

[hadoop@host103 ~]$ ssh-copy-id hadoop@host103
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@host103's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@host103'"
and check to make sure that only the key(s) you wanted were added.

创建仓库

问题1

  • 创建
PUT _snapshot/my_hdfs_repository
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://172.16.176.103:9000/",
    "path": "/",
    "conf.dfs.client.read.shortcircuit": "false"
  }
}
  • 错误
error": {
    "root_cause": [
      {
        "type": "repository_exception",
        "reason": "[my_hdfs_repository] cannot create blob store"
      }
    ],
    "type": "repository_exception",
    "reason": "[my_hdfs_repository] cannot create blob store",
    "caused_by": {
      "type": "unchecked_i_o_exception",
      "reason": "Cannot create HDFS repository for uri [hdfs://172.16.176.103:9000/]",
      "caused_by": {
        "type": "access_control_exception",
        "reason": "Permission denied: user=es, access=WRITE, inode=\"/\":hadoop:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:360)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1909)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1893)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1852)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3407)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1161)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:739)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)\n",
  • 问题解决

新增hdfs-site.xml

<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>

参考文档

  • HDFS插件

https://www.elastic.co/guide/en/elasticsearch/plugins/7.2/repository-hdfs.html

  • HDFS SingleCluster

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html