Overview

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.

HDFS 快照是文件系一个时间点的只读的副本。快照可以是部分文件系统,或者整个文件系统。一些场景使用快照的场景是数据备份,防止用户误操作和灾难恢复。

The implementation of HDFS Snapshots is efficient:

  • Snapshot creation is instantaneous: the cost is O(1) excluding the inode lookup time.
  • Additional memory is used only when modifications are made relative to a snapshot: memory usage is O(M), where M is the number of modified files/directories.
  • Blocks in datanodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
  • Snapshots do not adversely affect regular HDFS operations: modifications are recorded in reverse chronological order so that the current data can be accessed directly. The snapshot data is computed by subtracting the modifications from the current data.

使用HDFS 快照是高效的:

· 快照创建是瞬间的:成本是0(1)排除查找信息节点的时间 。

· 额外的内存使用仅仅当对快照进行修改时产生:内存使用时0(M),M是修改文件/目录的数量。

· 在datanode中的块不会被拷贝:快照文件记录这些块列表和文件大小。不会产生数据拷贝。

· 快照不会对日常的HDFS操作产生不利的影响:修改被按反向时间排序记录,这样当前数据可以直接的访问。快照数据是由当前数据减去修改数据计算出来的。

Snapshottable Directories

Snapshots can be taken on any directory once the directory has been set as snapshottable. A snapshottable directory is able to accommodate 65,536 simultaneous snapshots. There is no limit on the number of snapshottable directories. Administrators may set any directory to be snapshottable. If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.

快照可以产生在任何被设置为snapshottable的目录中。一个snapshottable目录可以同时容纳65536个快照。snapshottable目录没有个数上限,管理员可以设置任意个snapshottable。如果一个snapshottable中存在快照,那么这个目录在删除所有快照之前,不能删除或改名。

Nested snapshottable directories are currently not allowed. In other words, a directory cannot be set to snapshottable if one of its ancestors/descendants is a snapshottable directory.

嵌套的snapshottable目录在现在并不支持。换句话说,如果一个目录的父目录/子目录是一个snapshottable目录的话,那么其不能设置为snapshottable。

Snapshot Paths

For a snapshottable directory, the path component ".snapshot" is used for accessing its snapshots. Suppose /foo is a snapshottable directory, /foo/bar is a file/directory in /foo, and /foo has a snapshot s0. Then, the path

/foo/.snapshot/s0/bar

对于一个snapshottable目录,”.snapshot”组件有利于访问其快照。假设/foo是一个snapshottable目录,/foo/bar是 /foo中的一个文件/目录,/foo有一个快照s0,那么这个路径

/foo/.snapshot/s0/bar

refers to the snapshot copy of /foo/bar. The usual API and CLI can work with the ".snapshot" paths. The following are some examples.

列出一个snapshottable目录中所有的快照:关联到快照副本/foo/bar。一般的API和CLI都可以在”.snapshot”路径上工作。下面是一些例子

  • Listing all the snapshots under a snapshottable directory:
  • 列出一个snapshottable目录下所有的快照:

hdfs dfs -ls /foo/.snapshot

  • Listing the files in snapshot s0:
  • 列出在快照s0中的所有文件:

hdfs dfs -ls /foo/.snapshot/s0

  • Copying a file from snapshot s0:
  • copy一个文件从快照s0:

hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp

Note that this example uses the preserve option to preserve timestamps, ownership, permission, ACLs and XAttrs.

注意这个例子使用了保存选项来保存时间戳,所有权,权限,ACLS和XAttrs

Upgrading to a version of HDFS with snapshots

The HDFS snapshot feature introduces a new reserved path name used to interact with snapshots: .snapshot. When upgrading from an older version of HDFS, existing paths named .snapshot need to first be renamed or deleted to avoid conflicting with the reserved path. See the upgrade section in the HDFS user guide for more information.

HDFS快照特性引用了一个新的保留路径名,来进行快照交互:.snapshot。当HDFS从一个旧版本升级时,现存的路径名称.snapshot需要首先重命名或者删除,来避免保留路径的冲突。更多详细类容,参考HDFS用户指南升级部分。

Snapshot Operations

Administrator Operations

The operations described in this section require superuser privilege.

本节中描述的操作需要超级用户权限

Allow Snapshots

Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable.

允许一个快照目录被创建。如果这个操作成功完成,这个目录就变成snapshottable

  • Command(命令):

hdfs dfsadmin -allowSnapshot <path>

  • Arguments(参数):

path

The path of the snapshottable directory.

See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin.

也可以参考Hdfsadmin中相关JAVA API void allowSnapshot(Path path)。

Disallow Snapshots

Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots.

禁止快照目录创建。在禁止快照之前目录中的所有快照必须删除。

  • Command(命令):

hdfs dfsadmin -disallowSnapshot <path>

  • Arguments(参数):

path

The path of the snapshottable directory.

See also the corresponding Java API void disallowSnapshot(Path path) in HdfsAdmin.

也可以参考Hdfsadmin中相关JAVA API void disallowSnapshot(Path path)。

User Operations

The section describes user operations. Note that HDFS superuser can perform all the operations without satisfying the permission requirement in the individual operations.

本节介绍用户操作。注意HDFS超级用户,可以执行除了个人操作需要满足的安全权限之外的所有操作。

Create Snapshots

Create a snapshot of a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

在snapshottable目录中创建一个一个快照。这个操作需要拥有snapshottabl目录所有者权限。

  • Command(命令):

hdfs dfs -createSnapshot <path> [<snapshotName>]

  • Arguments(参数):

path

The path of the snapshottable directory.

snapshotName

The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format "'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033".

See also the corresponding Java API Path createSnapshot(Path path) and Path createSnapshot(Path path, String snapshotName) in FileSystem. The snapshot path is returned in these methods.

也可以参考文件系统中相关JAVA API Path createSanpshot(Path path)和Path createSnapshot(Path path,String snapshotName)。在这些方法中返回了快照路径。

Delete Snapshots

Delete a snapshot of from a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

从一个snapshottable目录中删除快照。这个操作需要拥有snapshottabl目录所有者权限。

  • Command:

hdfs dfs -deleteSnapshot <path> <snapshotName>

  • Arguments:

path

The path of the snapshottable directory.

snapshotName

The snapshot name.

See also the corresponding Java API void deleteSnapshot(Path path, String snapshotName) in FileSystem.

Rename Snapshots

Rename a snapshot. This operation requires owner privilege of the snapshottable directory.

重命名一个快照。这个操作需要拥有snapshottabl目录所有者权限。

  • Command:

hdfs dfs -renameSnapshot <path> <oldName> <newName>

  • Arguments:

path

The path of the snapshottable directory.

oldName

The old snapshot name.

newName

The new snapshot name.

See also the corresponding Java API void renameSnapshot(Path path, String oldName, String newName) in FileSystem.

也可以参考文件系统中相关JAVA API void renameSnapshot(Path path, String oldName, String newName)

Get Snapshottable Directory Listing

Get all the snapshottable directories where the current user has permission to take snapshtos.

获得当前用户有权限产生快照的所有snapshottabl目录

  • Command:

hdfs lsSnapshottableDir

  • Arguments: none

See also the corresponding Java API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing() in DistributedFileSystem.

也可以参考分布式文件系统中相关JAVA API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()。

Get Snapshots Difference Report

Get the differences between two snapshots. This operation requires read access privilege for all files/directories in both snapshots.

在2个快照之间获得差异。这个操作需要在2个快照中,所有文件/目录的读和访问权限。

  • Command:

hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

  • Arguments:

path

The path of the snapshottable directory.

fromSnapshot

The name of the starting snapshot.

toSnapshot

The name of the ending snapshot.

  • Results:

+

The file/directory has been created.

-

The file/directory has been deleted.

M

The file/directory has been modified.

R

The file/directory has been renamed.

A RENAME entry indicates a file/directory has been renamed but is still under the same snapshottable directory. A file/directory is reported as deleted if it was renamed to outside of the snapshottble directory. A file/directory renamed from outside of the snapshottble directory is reported as newly created.

一个RENAME提示一个文件/目录被重命名,但是仍然存在相同的snapshottabl目录中。如果一个文件/目录被重命名到snapshottabl目录外,那么会打印为删除。从snapshottabl目录之外重命名进来的文件/目录,被打印为新创建。

The snapshot difference report does not guarantee the same operation sequence. For example, if we rename the directory "/foo" to "/foo2", and then append new data to the file "/foo2/bar", the difference report will be:

快照差异报告不能保证相同操作的顺序。例如,如果我们将目录”/foo”重命名为”/foo2″,然后增加一个新文件为”/foo2/bar”,这个差异报告将是:

R. /foo -> /foo2

M. /foo/bar

I.e., the changes on the files/directories under a renamed directory is reported using the original path before the rename ("/foo/bar" in the above example).

即,在一个目录重命名下的文件/目录 变更,在报告的时候,是使用原来未重命名之前的名称。(例如上面的”/foo/bar”)

See also the corresponding Java API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot) in DistributedFileSystem.

也可以参考分布式文件系统中相关JAVA API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)。