一、回收站 Trash 机制开启


        添加并修改两个属性值可开启Trash功能 - (core-site.xml)


注:检查点的创建时间间隔(单位为分钟数),其值应该小于或等于 fs.trash.internal。默认为0,为0时,该值设置为 fs.trash.internal的值


注:检查点 【Trash checkpoint】 : 1.检查点仅仅是用户回收站下的一个目录,用于存储在创建检查点之前删除的所有文件或目录。和 Current 同级目录/user/deploy/.Trash/{timestamp_of_checkpoint_creation}

 图为 .Trash 下的目录

二、回收站 Trash 功能机制使用


-- 1.删除 HDFS 数据
    hadoop fs -rm -r -f /user/hive/external/dwd/dwd_test

-- 2.恢复误删除的文件
    hadoop fs -mv /user/deploy/.Trash/Current/user/hive/external/dwd/dwd_test  /user/hive/external/dwd/dwd_test

-- 3.强制删除数据不进入回收站
    hadoop fs -rm -r -f -skipTrash /user/hive/external/dwd/dwd_test

-- 4.手动删除回收站文件
    hadoop fs -rm -r -f /user/deploy/.Trash/Cureent/user/hive/external/dwd/dwd/dwd_test
-- 5,清空 HDFS 的回收站
    hadoop fs -expunge  (hdfs dfs -expunge 命令只会创建新的checkpoint,不会删除过期的checkpoint)
        1.Hadoop 官方文档命名的说明-检查点 Checkpoints 的创建与删除
        Usage: hadoop fs -expunge [-immediate] [-fs <path>]
        Permanently delete files in checkpoints older than the retention threshold from trash directory, and create new checkpoint. 
        -- 永久删除超过阈值的检查点中文件,并创建新的检查点
        When checkpoint is created, recently deleted files in trash are moved under the checkpoint. Files in checkpoints older than fs.trash.interval will be permanently deleted on the next invocation of -expunge command.
        -- 当检查点被创建了,最近删除的数据会被移动到检查点中,在下次执行 expunge 时过期的checkpoint会被永久删除
        If the file system supports the feature, users can configure to create and delete checkpoints periodically by the parameter stored as fs.trash.checkpoint.interval (in core-site.xml). This value should be smaller or equal to fs.trash.interval.
        -- 如果文件系统支持该特性,用户可以配置通过存储在fs.trash.checkpoint.interval(在core-site.xml中)中的参数定期创建和删除检查点。这个值应该小于或等于fs.trash.interval。
        If the -immediate option is passed, all files in the trash for the current user are immediately deleted, ignoring the fs.trash.interval setting.
        If the -fs option is passed, the supplied filesystem will be expunged, rather than the default filesystem and checkpoint is created.
        For example
            hadoop fs -expunge --immediate -fs s3a://landsat-pds/

        2.操作如下,执行命令会先进行删除已达到过期时间的 checkpoint ,然后会创建新的checkpoint,将最近删除的数据放入

图为 手动执行hadoop fs -extunge时先delete检查点,后create检查点的

三、回收站 Trash工作原理-源码

1.1 初始化

        NameNode启动时会在后台启动一个emptier守护线程,用于定时(NameNode重启周期清零)清理HDFS集群上每个用户下的回收站数据,定时周期为 fs.trash.checkpoint.interval。


private void startTrashEmptier(final Configuration conf) throws IOException {
    long trashInterval =
    if (trashInterval == 0) {
    } else if (trashInterval < 0) {
      throw new IOException("Cannot start trash emptier with negative interval."
          + " Set " + FS_TRASH_INTERVAL_KEY + " to a positive value.");
    // This may be called from the transitionToActive code path, in which
    // case the current user is the administrator, not the NN. The trash
    // emptier needs to run as the NN. See HDFS-3972.
    FileSystem fs = SecurityUtil.doAsLoginUser(
        new PrivilegedExceptionAction<FileSystem>() {
          public FileSystem run() throws IOException {
            return FileSystem.get(conf);
    this.emptier = new Thread(new Trash(fs, conf).getEmptier(), "Trash Emptier");



public Trash(FileSystem fs, Configuration conf) throws IOException {
    trashPolicy = TrashPolicy.getInstance(conf, fs, fs.getHomeDirectory());



/** Return the current user's home directory in this filesystem.
   * The default implementation returns "/user/$USER/".
  public Path getHomeDirectory() {
    return this.makeQualified(
        new Path("/user/"+System.getProperty("user.name")));



public static TrashPolicy getInstance(Configuration conf, FileSystem fs, Path home) {
    Class<? extends TrashPolicy> trashClass = conf.getClass(
        "fs.trash.classname", TrashPolicyDefault.class, TrashPolicy.class);
    TrashPolicy trash = ReflectionUtils.newInstance(trashClass, conf);
    trash.initialize(conf, fs, home); // initialize TrashPolicy
    return trash;

2.2 启动定时线程



    public void run() {
      if (emptierInterval == 0)
        return;                                   // trash disabled
      long now = Time.now();
      long end;
      while (true) {
        end = ceiling(now, emptierInterval);
        try {                                     // sleep for interval
          Thread.sleep(end - now);
        } catch (InterruptedException e) {
          break;                                  // exit on interrupt

        try {
          now = Time.now();
          if (now >= end) {

            FileStatus[] homes = null;
            try {
              homes = fs.listStatus(homesParent);         // list all home dirs
            } catch (IOException e) {
              LOG.warn("Trash can't list homes: "+e+" Sleeping.");

            for (FileStatus home : homes) {         // dump each trash
              if (!home.isDirectory())
              try {
                TrashPolicyDefault trash = new TrashPolicyDefault(
                    fs, home.getPath(), conf);
                trash.deleteCheckpoint();       //删除垃圾数据
                trash.createCheckpoint();      //创建检查点
              } catch (IOException e) {
                LOG.warn("Trash caught: "+e+". Skipping "+home.getPath()+".");
        } catch (Exception e) {
          LOG.warn("RuntimeException during Trash.Emptier.run(): ", e); 
      try {
      } catch(IOException e) {
        LOG.warn("Trash cannot close FileSystem: ", e);

2.3 删除垃圾数据

        检查/user/${user.name}/.Trash/(所有用户)下的第一级子目录,将目录名为格式yyMMddHHmmss的目录转化为时间 time(跳过Current和无法解析的目录),如果符合条件(now - deletionInterval > time),则删除该目录 (deletionInterval = ${fs.trash.interval})。回收站的默认清理机制粒度比较粗,只针对/user/${user.name}/.Trash/下的第一级子目录.

public void deleteCheckpoint() throws IOException {
    FileStatus[] dirs = null;
    try {
      dirs = fs.listStatus(trash);            // scan trash sub-directories
    } catch (FileNotFoundException fnfe) {

    long now = Time.now();
    for (int i = 0; i < dirs.length; i++) {
      Path path = dirs[i].getPath();
      String dir = path.toUri().getPath();
      String name = path.getName();
      if (name.equals(CURRENT.getName()))         // skip current

      long time;
      try {
        time = getTimeFromCheckpoint(name);    //将目录名转换为时间
      } catch (ParseException e) {
        LOG.warn("Unexpected item in trash: "+dir+". Ignoring.");

      if ((now - deletionInterval) > time) {
        if (fs.delete(path, true)) {             //删除目录
          LOG.info("Deleted trash checkpoint: "+dir);
        } else {
          LOG.warn("Couldn't delete checkpoint: "+dir+" Ignoring.");

2.4 创建检查点


public void createCheckpoint() throws IOException {
    if (!fs.exists(current))                     // no trash, no checkpoint

    Path checkpointBase;
    synchronized (CHECKPOINT) {
      checkpointBase = new Path(trash, CHECKPOINT.format(new Date()));
    Path checkpoint = checkpointBase;

    int attempt = 0;
    while (true) {
      try {
        fs.rename(current, checkpoint, Rename.NONE);    //重命名目录
      } catch (FileAlreadyExistsException e) {
        if (++attempt > 1000) {
          throw new IOException("Failed to checkpoint trash: "+checkpoint);
        checkpoint = checkpointBase.suffix("-" + attempt);

    LOG.info("Created trash checkpoint: "+checkpoint.toUri().getPath());



fs.trash.interval = 4320 //3天 fs.trash.checkpoint.interval = 0 //未自定义设置,

2018:11:27 08:00:00开始唤醒emptier线程,先执行deleteCheckpoint()方法,理想情况下应该是符合条件((now - deletionInterval) > time)。 
deletionInterval:4320 minutes 
time:181124080000 => 符合条件,开始删除181124080000目录


deletionInterval:4320 minutes 
time:181124080033 => 不符合条件,跳过执行createCheckpoint()方法 


        用户可以通过手动执行hadoop shell命令清理过期检查点和创建新的检查点,功能同emptier线程的单次执行。

hdfs dfs -expunge 
hadoop fs -expunge


protected void processArguments(LinkedList<PathData> args)
    throws IOException {
      Trash trash = new Trash(getConf());


/** Delete old checkpoint(s). */
  public void expunge() throws IOException {