http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
概述
HDFS快照是一个只读的基于时间点文件系统拷贝。可以为文件系统中的某个子目录或者整个文件系统拍摄快照。快照通常用来作为数据备份,防止用户错误和容灾。
HDFS快照的创建是高效的:
快照的创建是”瞬间”完成的:除去查找inode的时间,cost是O(1)
只有当修改SnapShot时,才会有额外的内存占用,内存使用量为O(M),M 为修改的文件或者目录数
DataNode中的Blocks并不会被复制:快照只记录了Block list和文件大小。
Snapshot并不会影响HDFS 的正常操作:修改会按照时间的反序记录,这样可以直接读取到最新的数据。快照数据是当前数据减去修改的部分计算出来的。
Snapshottable Directories
只有被设置为snapshottable的目录才可以创建快照。被设定为snapshottable的目录可以容纳65536个同时进行的快照。管理员可以设置任何的目录成为snapshottable。如果snapshottable里面存着快照,那么在这些快照被删除之前,文件夹不能删除或者改名。
如果一个目录的父目录,或者子目录被设为snapshottable,那么它本身不可以被设为snapshottable
Snapshot Paths
当你将某一目录设为snapshottable并创建快照后,在这个目录下会生成一个”.snapshot”目录来存放快照。假设/foo目录被设置为snapshottable,bar是/foo中的一个文件或目录,你为/foo创建了一个快照s0。那么/foo/.snapshot/s0/bar中就存放了bar的快照。
常用的API和CLI能够在”.snapshot” 的路径下运行。下面是一些例子:
列出snapshottable目录的所有快照
hdfs dfs -ls /foo/.snapshot
列出在快照s0的所有文件
hdfs dfs -ls /foo/.snapshot/s0
从s0拷贝一个文件:
hdfs dfs -cp /foo/.snapshot/s0/bar /tmp
Snapshot Operations
下面的操作,需要拥有superuser权限
Allow Snapshots
允许一个目录可以创建快照。如果操作成功,这个目录即为snapshottable目录
hdfs dfsadmin -allowSnapshot <path>
[root@gc2 oracle]# hdfs dfsadmin -allowSnapshot /snap
Allowing snaphot on /snap succeeded
Disallow Snapshots
disallowing前,所有快照需被删除
hdfs dfsadmin -disallowSnapshot <path>
[root@gc2 oracle]# hdfs dfsadmin -disallowSnapshot /snap
Disallowing snaphot on /snap succeeded
Create Snapshots
为目录创建快照(snapshottable的目录)。需要对该目录有owner权限
hdfs dfs -createSnapshot <path> [<snapshotName>]
默认的如果不指定snapshotName,那么默认为"'s'yyyyMMdd-HHmmss.SSS", 列如: "s20130412-151029.033".
[root@gc2 oracle]# hdfs dfs -createSnapshot /snap
Created snapshot /snap/.snapshot/s20150726-120414.379
[root@gc2 oracle]# hdfs dfs -createSnapshot /snap s0
Created snapshot /snap/.snapshot/s0
[root@gc2 oracle]# hdfs dfs -ls -R /snap/.snapshot
drwxr-xr-x - root supergroup 0 2015-07-26 12:04 /snap/.snapshot/s0
-rw-r--r-- 1 root supergroup 831 2015-07-26 11:56 /snap/.snapshot/s0/hehe.ora
-rw-r--r-- 1 root supergroup 72 2015-07-26 11:55 /snap/.snapshot/s0/sum.sh
-rw-r--r-- 1 root supergroup 754 2015-07-26 11:56 /snap/.snapshot/s0/test.sh
drwxr-xr-x - root supergroup 0 2015-07-26 12:04 /snap/.snapshot/s20150726-120414.379
-rw-r--r-- 1 root supergroup 831 2015-07-26 11:56 /snap/.snapshot/s20150726-120414.379/ hehe.ora
-rw-r--r-- 1 root supergroup 72 2015-07-26 11:55 /snap/.snapshot/s20150726-120414.379/sum .sh
-rw-r--r-- 1 root supergroup 754 2015-07-26 11:56 /snap/.snapshot/s20150726-120414.379/ test.sh
Delete Snapshots
hdfs dfs -deleteSnapshot <path> <snapshotName>
[root@gc2 oracle]# hdfs dfs -deleteSnapshot /snap s0
[root@gc2 oracle]# hdfs dfs -ls -R /snap/.snapshot
drwxr-xr-x - root supergroup 0 2015-07-26 12:04 /snap/.snapshot/s20150726-120414.379
-rw-r--r-- 1 root supergroup 831 2015-07-26 11:56 /snap/.snapshot/s20150726-120414.379/hehe.ora
-rw-r--r-- 1 root supergroup 72 2015-07-26 11:55 /snap/.snapshot/s20150726-120414.379/sum.sh
-rw-r--r-- 1 root supergroup 754 2015-07-26 11:56 /snap/.snapshot/s20150726-120414.379/test.sh
Rename Snapshots
hdfs dfs -renameSnapshot <path> <oldName> <newName>
[root@gc2 oracle]# hdfs dfs -createSnapshot /snap s0
Created snapshot /snap/.snapshot/s0
[root@gc2 oracle]# hdfs dfs -renameSnapshot /snap s0 s1
[root@gc2 oracle]# hadoop fs -ls -R /snap/.snapshot
drwxr-xr-x - root supergroup 0 2015-07-26 12:10 /snap/.snapshot/s1
-rw-r--r-- 1 root supergroup 831 2015-07-26 11:56 /snap/.snapshot/s1/hehe.ora
-rw-r--r-- 1 root supergroup 72 2015-07-26 11:55 /snap/.snapshot/s1/sum.sh
-rw-r--r-- 1 root supergroup 754 2015-07-26 11:56 /snap/.snapshot/s1/test.sh
drwxr-xr-x - root supergroup 0 2015-07-26 12:04 /snap/.snapshot/s20150726-120414.379
-rw-r--r-- 1 root supergroup 831 2015-07-26 11:56 /snap/.snapshot/s20150726-120414.379/hehe.ora
-rw-r--r-- 1 root supergroup 72 2015-07-26 11:55 /snap/.snapshot/s20150726-120414.379/sum.sh
-rw-r--r-- 1 root supergroup 754 2015-07-26 11:56 /snap/.snapshot/s20150726-120414.379/test.sh
Get Snapshottable Directory Listing
获取所有当前用户有权限创建snapshot的snapshottable目录列表
hdfs lsSnapshottableDir
[root@gc2 oracle]# hdfs lsSnapshottableDir
drwxr-xr-x 0 root supergroup 0 2015-07-26 12:10 2 65536 /snap
Get Snapshots Difference Report
获取两个snapshot的不同之处。这个操作需要对每个snapshot涉及的目录和文件拥有read权限
hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
Results:
+ The file/directory has been created.
- The file/directory has been deleted.
M The file/directory has been modified.
R The file/directory has been renamed.
做这个实验前,我们先删除/snap/sum.sh这个文件,并为/snap创建快照s2
[root@gc2 oracle]# hadoop fs -rm /snap/sum.sh
15/07/26 12:16:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://localhost:9000/snap/sum.sh' to trash at: hdfs://localhost:9000/user/root/.Trash/Current
上面两行表示文件并未彻底删除,而是移动到了回收站,保留时间是1440分钟
[root@gc2 oracle]# hdfs dfs -createSnapshot /snap s2
Created snapshot /snap/.snapshot/s2
[root@gc2 oracle]# hdfs snapshotDiff /snap s1 s2
Difference between snapshot s1 and snapshot s2 under directory /snap:
M .
- ./sum.sh
s1 比 s2多了一个 sum.sh 或者 比较方便的理解方法是 s1 - xxx = s2
可以通过web查看快照信息
http://192.168.255.169:50070/dfshealth.html#tab-snapshot