FSImage 和Edits Log文件用于保存Namenode节点的元数据,用于持久化保存HDFS里各个数据文件之间的对应关系。FSImage在硬盘式以文件的方式保存集群中包括文件目录,数据块与相关datanode之间的映射关系。可能基于性能的考虑, FSImage并不是实时的更新以反映当前HDFS的文件及目录情况, 当前HDFS对于文件及目录等操作都以日志的形式保存于edits.log文件中,基于最小化停机时间的考虑,会存在一个备用的namenode节点, 通过IPC通信,定期的将edits.logs合并进FSImage中, 这样在HDFS下次重启时,namenode将花费较少的时间基于FSImage和edits.log文件在内存中重建HDFS。 感觉有点类似于oralce里面的dynamic check point. 

FSImage作为存储集群里面相关文件名及其一系列block与datanode的映射关系, 其存储结构又是怎么样呢? 我们通过分析org.apache.hadoop.hdfs.server.namenode.FSImage可一窥究竟。

Hadoop1.2.1

hadoop info信息 hadoop fsimage_hadoop info信息

hadoop info信息 hadoop fsimage_时间戳_02

1 boolean loadFSImage(File curFile) throws IOException {
  2     FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
  3     FSDirectory fsDir = fsNamesys.dir;
  4 
  5     //
  6     // Load in bits
  7     //
  8     boolean needToSave = true;
  9     DataInputStream in = new DataInputStream(new BufferedInputStream(
 10                               new FileInputStream(curFile)));
 11     try {
 12       // read image version: first appeared in version -1
 13       //image版本号
 14       int imgVersion = in.readInt();
 15       // read namespaceID: first appeared in version -2
 16       //命名空间id
 17       this.namespaceID = in.readInt();
 18 
 19       // read number of files
 20       //文件或目录的数目,根据版本的不同,加以区别
 21       long numFiles;
 22       if (imgVersion <= -16) {
 23         numFiles = in.readLong();
 24       } else {
 25         numFiles = in.readInt();
 26       }
 27 
 28       this.layoutVersion = imgVersion;
 29       // read in the last generation stamp.
 30       //时间戳
 31       if (imgVersion <= -12) {
 32         long genstamp = in.readLong();
 33         fsNamesys.setGenerationStamp(genstamp); 
 34       }
 35 
 36       needToSave = (imgVersion != FSConstants.LAYOUT_VERSION);
 37 
 38       // read file info
 39       short replication = FSNamesystem.getFSNamesystem().getDefaultReplication();
 40 
 41       LOG.info("Number of files = " + numFiles);
 42 
 43       String path;
 44       String parentPath = "";
 45       INodeDirectory parentINode = fsDir.rootDir;
 46       //开始重建目录树
 47       for (long i = 0; i < numFiles; i++) {
 48         long modificationTime = 0;
 49         long atime = 0;
 50         long blockSize = 0;
 51         path = readString(in);//文件或者目录的路径名
 52         replication = in.readShort();//副本因子,默认为3,可配置 (如果是目录,这里应为0)
 53         replication = FSEditLog.adjustReplication(replication);
 54         modificationTime = in.readLong();//文件的mtime
 55         if (imgVersion <= -17) {
 56           atime = in.readLong();  //atime
 57         }
 58         if (imgVersion <= -8) {
 59           blockSize = in.readLong(); //block的大小,(目录为0)
 60         }
 61         int numBlocks = in.readInt(); //对应文件所包含的block总数,(目录为0)
 62         Block blocks[] = null;
 63 
 64         // for older versions, a blocklist of size 0
 65         // indicates a directory.
 66         if ((-9 <= imgVersion && numBlocks > 0) ||
 67             (imgVersion < -9 && numBlocks >= 0)) {
 68           blocks = new Block[numBlocks];
 69           for (int j = 0; j < numBlocks; j++) {
 70             blocks[j] = new Block();
 71             if (-14 < imgVersion) {
 72               blocks[j].set(in.readLong(), in.readLong(), 
 73                             Block.GRANDFATHER_GENERATION_STAMP);
 74             } else {
 75               blocks[j].readFields(in);
 76             }
 77           }
 78         }
 79         // Older versions of HDFS does not store the block size in inode.
 80         // If the file has more than one block, use the size of the 
 81         // first block as the blocksize. Otherwise use the default block size.
 82         //
 83         if (-8 <= imgVersion && blockSize == 0) {
 84           if (numBlocks > 1) {
 85             blockSize = blocks[0].getNumBytes();
 86           } else {
 87             long first = ((numBlocks == 1) ? blocks[0].getNumBytes(): 0);
 88             blockSize = Math.max(fsNamesys.getDefaultBlockSize(), first);
 89           }
 90         }
 91         
 92         // get quota only when the node is a directory
 93         long nsQuota = -1L;
 94         if (imgVersion <= -16 && blocks == null) {
 95           nsQuota = in.readLong();//nsQuota
 96         }
 97         long dsQuota = -1L;
 98         if (imgVersion <= -18 && blocks == null) {
 99           dsQuota = in.readLong();//dsQuota
100         }
101         
102         PermissionStatus permissions = fsNamesys.getUpgradePermission();
103         if (imgVersion <= -11) {
104           permissions = PermissionStatus.read(in);
105         }
106         if (path.length() == 0) { // it is the root
107           // update the root's attributes
108           if (nsQuota != -1 || dsQuota != -1) {
109             fsDir.rootDir.setQuota(nsQuota, dsQuota);
110           }
111           fsDir.rootDir.setModificationTime(modificationTime);
112           fsDir.rootDir.setPermissionStatus(permissions);
113           continue;
114         }
115         // check if the new inode belongs to the same parent
116         if(!isParent(path, parentPath)) {
117           parentINode = null;
118           parentPath = getParent(path);
119         }
120         // add new inode
121         parentINode = fsDir.addToParent(path, parentINode, permissions,
122                                         blocks, replication, modificationTime, 
123                                         atime, nsQuota, dsQuota, blockSize);
124       }
125       
126       // load datanode info
127       this.loadDatanodes(imgVersion, in);
128 
129       // load Files Under Construction
130       this.loadFilesUnderConstruction(imgVersion, in, fsNamesys);
131       
132       this.loadSecretManagerState(imgVersion, in, fsNamesys);
133       
134     } finally {
135       in.close();
136     }
137     
138     return needToSave;
139   }
140   
141   public void set(long blkid, long len, long genStamp) {
142     this.blockId = blkid;
143     this.numBytes = len;
144     this.generationStamp = genStamp;
145   }

View Code

 

imgVersion(int):当前image的版本信息
namespaceID(int):unknown
numFiles(long):整个文件系统中包含有多少文件和目录
genStamp(long):image的时间戳
path(String):该目录或文件的路径,
replications(short):副本数
mtime(long):mtime
atime(long):atime
blocksize(long):目录的blocksize都为0
numBlocks(int):实际有多少个文件块,目录的该值都为-1
if(numBlocks > 0){
  blockid(long):属于该文件的block的blockid,
  numBytes(long):该block的大小
  genStamp(long):该block的时间戳
}
nsQuota(long):namespace Quota值,若没加Quota限制则为-1
dsQuota(long):disk Quota值,若没加限制则也为-1
...
..
.其他fields

Remark: 实际代码中对于不同版本的FSImage文件有一些分支判断

没有找到官方对于FSImage文件结构的描述,只能通过源码进行推断。  如果对于FSImage及edits.log文件结构清楚后, 应该可以实现脱离hadoop client 或者API对hdfs进行离线分析。 比如直接访问FSImage得到HDFS中的文件清单等信息,甚至直接定位到相关的datanode上的block