文章目录

  • Hadoop系列文章目录
  • 一、介绍
  • 1、快照介绍
  • 2、快照作用
  • 3、HDFS快照功能
  • 二、使用及验证
  • 1、快照功能开启
  • 2、快照功能禁用
  • 3、快照操作相关命令
  • 4、HDFS Snapshot使用
  • 1)、开启指定目录快照功能
  • 2)、创建快照
  • 3)、Web页面浏览快照
  • 4)、重命名快照
  • 5)、列出HDFS集群所有开启快照功能的目录
  • 6)、快照间差异比较
  • 7)、比较快照差异
  • 8)、删除快照
  • 9)、删除开启快照功能的目录


一、介绍

1、快照介绍

快照(Snapshot)是数据存储的某一时刻的状态记录;与备份不同,备份(Backup)则是数据存储的某一个时刻的副本。
HDFS Snapshot快照是整个文件系统或某个目录在某个时刻的镜像。
该镜像并不会随着源目录的改变而进行动态的更新。

2、快照作用

  • 数据恢复
    对重要目录进行创建snapshot的操作,当用户误操作时,可以通过snapshot来进行相关的恢复操作
  • 数据备份
    使用snapshot来进行整个集群或者某些目录、文件的备份
    管理员以某个时刻的snapshot作为备份的起始结点,然后通过比较不同备份之间差异性,来进行增量备份
  • 数据测试
    可以临时的为用户针对要操作的数据来创建一个snapshot,然后让用户在对应的snapshot上进行相关的实验和测试

3、HDFS快照功能

  • HDFS快照不是数据的简单拷贝,只做差异的记录
  • 对于大多不变的数据,你所看到的数据其实是当前物理路径所指的内容,而发生变更的inode数据才会被快照额外拷贝,也就是所说的差异拷贝
  • inode指索引节点,用来存放文件及目录的基本信息,包含时间、名称、拥有者、所在组等
  • HDFS快照不会复制datanode中的块,只记录了块列表和文件大小
  • HDFS快照不会对常规HDFS操作产生不利影响,修改记录按逆时针顺序进行,因此可以直接访问当前数据。通过从当前数据中减去修改来计算快照数据

二、使用及验证

1、快照功能开启

HDFS中可以针对整个文件系统或者某个目录创建快照,但是创建快照的前提是相应的目录开启快照的功能。如果针对没有启动快照功能的目录创建快照则会报错。

#启用快照功能:
hdfs dfsadmin -allowSnapshot /testsnapshot

[alanchan@server1 ~]$ hadoop fs -ls /testsnapshot
[alanchan@server1 ~]$ hadoop fs -createSnapshot  /testsnapshot test_snapshot
createSnapshot: Directory is not a snapshottable directory: /testsnapshot
[alanchan@server1 ~]$ hdfs dfsadmin -allowSnapshot /testsnapshot
Allowing snapshot on /testsnapshot succeeded
[alanchan@server1 ~]$ hadoop fs -createSnapshot  /testsnapshot test_snapshot
Created snapshot /testsnapshot/.snapshot/test_snapshot

2、快照功能禁用

HDFS中可以针对已经开启快照功能的目录进行禁用。禁用的前提是该目录的所有快照已经被删除。

#禁用快照功能:
hdfs dfsadmin -disallowSnapshot  /testsnapshot

3、快照操作相关命令

createSnapshot 创建快照
deleteSnapshot 删除快照
renameSnapshot 重命名快照
lsSnapshottableDir 列出可以快照目录列表
snapshotDiff 获取快照差异报告

[alanchan@server1 ~]$ hadoop dfs
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.

Usage: hadoop fs [generic options]
...
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
...
command [genericOptions] [commandOptions]

[alanchan@server1 ~]# hdfs lsSnapshottableDir
[alanchan@server1 ~]# hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

4、HDFS Snapshot使用

1)、开启指定目录快照功能

[alanchan@server1 ~]$ hdfs dfsadmin -allowSnapshot /testsnapshot
Allowing snapshot on /testsnapshot succeeded

2)、创建快照

#系统自动生成快照名称
hdfs dfs -createSnapshot /testsnapshot
#指定名称创建快照
hdfs dfs -createSnapshot /testsnapshot test_snapshot

[alanchan@server1 ~]$ hadoop fs -createSnapshot  /testsnapshot test_snapshot
Created snapshot /testsnapshot/.snapshot/test_snapshot

13、HDFS Snapshot快照_hadoop

3)、Web页面浏览快照

13、HDFS Snapshot快照_大数据_02

4)、重命名快照

#重命名快照
hdfs dfs -renameSnapshot /testsnapshot test_snapshot rename_test_snapshot 

[alanchan@server1 ~]$ hdfs dfs -renameSnapshot /testsnapshot test_snapshot rename_test_snapshot 
[alanchan@server1 ~]$ hadoop fs -ls  /testsnapshot/.snapshot
Found 1 items
drwxr-xr-x   - alanchan supergroup          0 2022-09-13 09:38 /testsnapshot/.snapshot/rename_test_snapshot

5)、列出HDFS集群所有开启快照功能的目录

#列出HDFS集群所有开启快照功能的目录
hdfs lsSnapshottableDir

[alanchan@server1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 alanchan supergroup 0 2022-09-13 09:38 1 65536 /testsnapshot

6)、快照间差异比较

准备快照

hdfs dfs -createSnapshot /testsnapshot snapshot1

#1、在当前文件夹下创建一个快照snapshot1
[alanchan@server1 ~]$ hdfs dfs -createSnapshot /testsnapshot snapshot1
Created snapshot /testsnapshot/.snapshot/snapshot1
#2、在当前文件夹下上传一个文件,并再创建一个快照snapshot2
[alanchan@server1 ~]$ cd /usr/local/bigdata/hadoop-3.1.4
[alanchan@server1 hadoop-3.1.4]$ echo 12345 >> 1.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -put 1.txt /testsnapshot
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -ls /testsnapshot
Found 1 items
-rw-r--r--   3 alanchan supergroup          6 2022-09-13 10:05 /testsnapshot/1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot2
Created snapshot /testsnapshot/.snapshot/snapshot2
#3、修改文件内容后,再创建一个快照snapshot3
[alanchan@server1 hadoop-3.1.4]$ echo 54321 >> 2.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -appendToFile 2.txt /testsnapshot/1.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -cat /testsnapshot/1.txt
12345
54321
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot3
Created snapshot /testsnapshot/.snapshot/snapshot3
#4、上传一个文件,再创建一个快照snapshot4
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -put README.txt /testsnapshot
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot4
Created snapshot /testsnapshot/.snapshot/snapshot4
#5、删除一个文件后,再创建一个快照snapshot5
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -rm /testsnapshot/README.txt
2022-09-13 10:09:31,863 INFO fs.TrashPolicyDefault: Moved: 'hdfs://HadoopHAcluster/testsnapshot/README.txt' to trash at: hdfs://HadoopHAcluster/user/alanchan/.Trash/Current/testsnapshot/README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot5
Created snapshot /testsnapshot/.snapshot/snapshot5
[alanchan@server1 hadoop-3.1.4]$

13、HDFS Snapshot快照_bigdata_03

7)、比较快照差异

+ The file/directory has been created.
-  The file/directory has been deleted.
M  The file/directory has been modified.
R  The file/directory has been renamed.
#比较指定目录两个版本快照之间的差异
hdfs snapshotDiff /testsnapshot snapshot1 snapshot2

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot2
Difference between snapshot snapshot1 and snapshot snapshot2 under directory /testsnapshot:
M       .
+       ./1.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot3
Difference between snapshot snapshot1 and snapshot snapshot3 under directory /testsnapshot:
M       .
+       ./1.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot4
Difference between snapshot snapshot1 and snapshot snapshot4 under directory /testsnapshot:
M       .
+       ./1.txt
+       ./README.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot5
Difference between snapshot snapshot1 and snapshot snapshot5 under directory /testsnapshot:
M       .
+       ./1.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot3
Difference between snapshot snapshot2 and snapshot snapshot3 under directory /testsnapshot:
M       ./1.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot4
Difference between snapshot snapshot2 and snapshot snapshot4 under directory /testsnapshot:
M       .
+       ./README.txt
M       ./1.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot5
Difference between snapshot snapshot2 and snapshot snapshot5 under directory /testsnapshot:
M       ./1.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot3 snapshot4
Difference between snapshot snapshot3 and snapshot snapshot4 under directory /testsnapshot:
M       .
+       ./README.txt

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot3 snapshot5
Difference between snapshot snapshot3 and snapshot snapshot5 under directory /testsnapshot:

[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot4 snapshot5
Difference between snapshot snapshot4 and snapshot snapshot5 under directory /testsnapshot:
M       .
-       ./README.txt

[alanchan@server1 hadoop-3.1.4]$

8)、删除快照

hdfs dfs -deleteSnapshot /testsnapshot rename_test_snapshot

9)、删除开启快照功能的目录

hadoop fs -rm -r /allenwoon
拥有快照的目录不允许被删除,某种程度上也保护了文件安全

[alanchan@server1 hadoop-3.1.4]$ hadoop fs -rm -r /testsnapshot
rm: Failed to move to trash: hdfs://HadoopHAcluster/testsnapshot: The directory /testsnapshot cannot be deleted since /testsnapshot is snapshottable and already has snapshots