嵌入式Linux 2020-03-18

以下文章来源于王小二的Android站 ,作者王小二的Android站

f2fs存储结构初探_嵌入式王小二的Android站

一个乐于分享知识的程序员

前言

学习文件系统的第一步,先搞清楚文件系统在设备上的存储结构,先来简单了解一下。

F2FS空间布局图和描述选自《F2FS技术拆解》
https://mp.weixin.qq.com/s/k1ibtWF_TRQi8wbqUGjMrg

f2fs存储结构初探_嵌入式_02

F2FS空间布局

F2FS空间布局整个存储空间被划分为6个区域:

  • 超级块(SB) 包含基本分区信息和F2FS在格式化分区时确定不可更改的参数

  • 检查点(CP) 保存文件系统状态,有效NAT/SIT(见下文说明)集合的位图,孤儿inode列表(文件被删除时尚有引用无法立即释放时需被计入此列表,以便再次挂载时释放)和当前活跃段的所有者信息。和其他日志结构文件系统一样,F2FS检查点时某一给定时点一致的文件系统状态集合——可用于系统崩溃或掉电后的数据恢复。F2FS的两个检查点各占一个Segment,和前述不同的是,F2FS通过检查点头尾两个数据块中的version信息判断检查点是否有效。

  • 段信息表Segment Information Table(SIT) 包含主区域(Main Area,见下文说明)中每个段的有效块数和标记块是否有效的位图。SIT主要用于回收过程中选择需要搬移的段和识别段中有效数据。

  • 索引节点地址表Node Address Table(NAT) 用于定位所有主区域的索引节点块(包括:inode节点、直接索引节点、间接索引节点)地址。即NAT中存放的是inode或各类索引node的实际存放地址。

  • 段摘要区Segment Summary Area (SSA) 主区域所有数据块的所有者信息(即反向索引),包括:父inode号和内部偏移。SSA表项可用于搬移有效块前查找其父亲索引节点编号,

  • 主区域 Main Area 由4KB大小的数据块组成,每个块被分配用于存储数据(文件或目录内容)和索引(inode或数据块索引)。一定数量的连续块组成Segment,进而组成Section和Zone(如前所述)。一个Segment要么存储数据,要么存储索引,据此可将Segment划分为数据段和索引段。

眼见为实,耳听为虚,我们自己动手实验一样一、创建块设备

在sdcard中,新建一个100MB大小的文件f2fs_device

dd if=/dev/zero of=/sdcard/f2fs_device bs=1MB count=100   

将文件f2fs_device格式化成f2fs文件系统

make_f2fs /sdcard/f2fs_device

将f2fs_device和loop设备绑定,生成一个虚拟块设备,如果提示设备忙,13换成其他数字

losetup /dev/block/loop13 /sdcard/f2fs_device

新建一个目录f2fs_root_dir

mkdir /sdcard/f2fs_root_dir

将loop13挂在到f2fs_root_dir目录

mount -t f2fs /dev/block/loop13 /sdcard/f2fs_root_dir
二、填充数据

在目录中新建一个1.txt文件,并且写入hello world

T1_PRO:/sdcard/f2fs_root_dir # touch 1.txt
T1_PRO:/sdcard/f2fs_root_dir # echo "hello world" > 1.txt
三、hexdump工具分析

pull出文件f2fs_device,千万别出pull f2fs_root_dir这个路径

adb pull sdcard/f2fs_device

直接用hexdump分析块设备的原始数据

hexdump -C f2fs_device

3.1 dump开头内容分析

首先你会看到下面的一堆16进制的数字,就是块设备的原始数据

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
#super block 1 start
*
00000400  10 20 f5 f2 01 00 0b 00  09 00 00 00 03 00 00 00  |. ..............|
00000410  0c 00 00 00 09 00 00 00  01 00 00 00 01 00 00 00  |................|
00000420  00 00 00 00 00 64 00 00  00 00 00 00 2a 00 00 00  |.....d......*...|
00000430  31 00 00 00 02 00 00 00  02 00 00 00 02 00 00 00  |1...............|
00000440  01 00 00 00 2a 00 00 00  00 02 00 00 00 02 00 00  |....*...........|
00000450  00 06 00 00 00 0a 00 00  00 0e 00 00 00 10 00 00  |................|
00000460  03 00 00 00 01 00 00 00  02 00 00 00 b6 71 aa 9d  |.............q..|
00000470  8e e2 40 64 81 df 52 71  22 74 8d ac 00 00 00 00  |..@d..Rq"t......|
00000480  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000870  00 00 00 00 00 00 00 00  00 00 00 00 22 00 00 00  |............"...|
00000880  6a 70 67 00 00 00 00 00  67 69 66 00 00 00 00 00  |jpg.....gif.....|
00000890  70 6e 67 00 00 00 00 00  61 76 69 00 00 00 00 00  |png.....avi.....|
000008a0  64 69 76 78 00 00 00 00  6d 34 61 00 00 00 00 00  |divx....m4a.....|
000008b0  6d 34 76 00 00 00 00 00  6d 34 70 00 00 00 00 00  |m4v.....m4p.....|
000008c0  6d 70 34 00 00 00 00 00  6d 70 33 00 00 00 00 00  |mp4.....mp3.....|
000008d0  33 67 70 00 00 00 00 00  77 6d 76 00 00 00 00 00  |3gp.....wmv.....|
000008e0  77 6d 61 00 00 00 00 00  6d 70 65 67 00 00 00 00  |wma.....mpeg....|
000008f0  6d 6b 76 00 00 00 00 00  6d 6f 76 00 00 00 00 00  |mkv.....mov.....|
00000900  61 73 78 00 00 00 00 00  61 73 66 00 00 00 00 00  |asx.....asf.....|
00000910  77 6d 78 00 00 00 00 00  73 76 69 00 00 00 00 00  |wmx.....svi.....|
00000920  77 76 78 00 00 00 00 00  77 76 00 00 00 00 00 00  |wvx.....wv......|
00000930  77 6d 00 00 00 00 00 00  6d 70 67 00 00 00 00 00  |wm......mpg.....|
00000940  6d 70 65 00 00 00 00 00  72 6d 00 00 00 00 00 00  |mpe.....rm......|
00000950  6f 67 67 00 00 00 00 00  6f 70 75 73 00 00 00 00  |ogg.....opus....|
00000960  66 6c 61 63 00 00 00 00  6a 70 65 67 00 00 00 00  |flac....jpeg....|
00000970  76 69 64 65 6f 00 00 00  61 70 6b 00 00 00 00 00  |video...apk.....|
00000980  73 6f 00 00 00 00 00 00  65 78 65 00 00 00 00 00  |so......exe.....|
00000990  64 62 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |db..............|
000009a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...省略大量数据
3.1.1 超级块(SB)

超级块(SB)从0x00000400开始,也就是从1KB开始存,不是从0开始.
存在两个一模一样的超级块(SB),f2fs的设计,防止数据损坏,两个结构体间隔4KB

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400  10 20 f5 f2 01 00 0b 00  09 00 00 00 03 00 00 00  |. ..............|//SB1
*
00001400  10 20 f5 f2 01 00 0b 00  09 00 00 00 03 00 00 00  |. ..............|//SB2
3.1.3 检查点(CP)

检查点(CP)从0x00200000开始,也就是2MB开始,因为一个Segment为2MB,检查点(CP)是段对齐的

*
00200000  3b 51 5b 23 00 00 00 00  00 28 00 00 00 00 00 00  |;Q[#.....(......|#CP

3.2 文件1.txt

整个1.txt文件对应的索引(inode)从地址0x01201000到0x01202000等于0x1000B = 4KB

留个疑问:为什么文件内容"hello world"会保存在inode块而不是在数据块?

01201000  b6 81 00 0b 00 00 00 00  00 00 00 00 01 00 00 00  |................|
01201010  0c 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
01201020  4f 26 70 5e 00 00 00 00  57 26 70 5e 00 00 00 00  |O&p^....W&p^....|
01201030  57 26 70 5e 00 00 00 00  a2 f4 2a 02 9e 08 d3 1d  |W&p^......*.....|
01201040  9e 08 d3 1d af 1e c5 19  00 00 00 00 00 00 00 00  |................|
01201050  00 00 00 00 03 00 00 00  05 00 00 00 31 2e 74 78  |............1.tx|
01201060  74 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |t...............|
01201070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01201160  00 00 00 00 00 00 00 00  00 00 00 00 68 65 6c 6c  |............hell|
01201170  6f 20 77 6f 72 6c 64 0a  00 00 00 00 00 00 00 00  |o world.........|
01201180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01201f00  00 00 00 00 00 00 00 00  00 00 00 00 11 20 f5 f2  |............. ..|
01201f10  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
01201f20  00 00 00 00 06 07 18 00  73 65 6c 69 6e 75 78 75  |........selinuxu|
01201f30  3a 6f 62 6a 65 63 74 5f  72 3a 75 6e 6c 61 62 65  |:object_r:unlabe|
01201f40  6c 65 64 3a 73 30 00 00  00 00 00 00 00 00 00 00  |led:s0..........|
01201f50  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01201fe0  00 00 00 00 00 00 00 00  04 00 00 00 04 00 00 00  |................|
01201ff0  01 00 00 00 3a 51 5b 23  21 79 00 61 02 12 00 00  |....:Q[#!y.a....|
01202000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

3.3

最后一行,最大的寻址是0x06400000=100MB,我们创建的块设备就是100MB

*
06400000
四、dump.f2fs工具分析

单纯的用hexdump去分析f2fs块设备上的存储结构,我们还可以通过dump.f2fs工具去查看,当然Android源码下默认dump.f2fs是关闭的,以后我再写文章如何开启Android下的dump.f2fs。

4.1 dump.f2fs使用说明

Usage: dump.f2fs [options] device
[options]:
 -d debug level [default:0]
 -i inode no (hex)
 -n [NAT dump nid from #1~#2 (decimal), for all 0~-1]
 -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
 -S sparse_mode
 -a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
 -b blk_addr (in 4KB)
 -V print the version number and exit

4.2 dump.f2fs -n

触发dump node address table

1|T1_PRO:/sdcard $ dump.f2fs -n 0~-1 f2fs_device
   Info: No support kernel version!
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 204800 (100 MB)
Info: MKFS version
 "4.14.117+ #1 SMP PREEMPT Mon Mar 9 21:37:48 CST 2020"
Info: FSCK version
 from "4.14.117+ #1 SMP PREEMPT Mon Mar 9 21:37:48 CST 2020"
   to "4.14.117+ #1 SMP PREEMPT Mon Mar 9 21:37:48 CST 2020"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 204800 (100 MB)
Info: CKPT version = 235b513e
Info: checkpoint state = c5 :  nat_bits crc compacted_summary unmount

Done.

查看生成文件dump_nat

T1_PRO:/sdcard $ cat dump_nat
nid:    3   ino:    3   offset:    0    blkaddr:      4098  pack:2
nid:    4   ino:    4   offset:    0    blkaddr:      4609  pack:2

注意nid 4对应的blkaddr 4609,转成16进制就是1201,有没有点眼熟,看看3.2 中dump的1.txt的文件对应的inode第一样的地址01201000,所以blkaddr:4609就是代表nid为4的数据结构在存储设备中的地址,也就是blkaddr * 4k,还记得开头说的主区域 Main Area 由4KB大小的数据块组成嘛,正好对应。

01201000  b6 81 00 0b 00 00 00 00  00 00 00 00 01 00 00 00  |................|
4.3 dump.f2fs -i

dump inode号对应的inode结构体

T1_PRO:/sdcard $ dump.f2fs -i 4 f2fs_device
   Info: No support kernel version!
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 204800 (100 MB)
Info: MKFS version
 "4.14.117+ #1 SMP PREEMPT Mon Mar 9 21:37:48 CST 2020"
Info: FSCK version
 from "4.14.117+ #1 SMP PREEMPT Mon Mar 9 21:37:48 CST 2020"
   to "4.14.117+ #1 SMP PREEMPT Mon Mar 9 21:37:48 CST 2020"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 204800 (100 MB)
Info: CKPT version = 235b513e
[print_node_info: 275] Node ID [0x4:4] is inode
i_mode                              [0x    81b6 : 33206]
i_advise                            [0x       0 : 0]
i_uid                               [0x       0 : 0]
i_gid                               [0x       0 : 0]
i_links                             [0x       1 : 1]
i_size                              [0x       c : 12]
i_blocks                            [0x       1 : 1]
i_atime                             [0x5e70264f : 1584408143]
i_atime_nsec                        [0x 22af4a2 : 36369570]
i_ctime                             [0x5e702657 : 1584408151]
i_ctime_nsec                        [0x1dd3089e : 500369566]
i_mtime                             [0x5e702657 : 1584408151]
i_mtime_nsec                        [0x1dd3089e : 500369566]
i_generation                        [0x19c51eaf : 432348847]
i_current_depth                     [0x       0 : 0]
i_xattr_nid                         [0x       0 : 0]
i_flags                             [0x       0 : 0]
i_inline                            [0x       b : 11]
i_pino                              [0x       3 : 3]
i_dir_level                         [0x       0 : 0]
i_namelen                           [0x       5 : 5]
i_name                              [1.txt]
i_ext: fofs:0 blkaddr:0 len:0
i_addr[ofs]                         [0x       0 : 0]
i_addr[ofs + 1]                     [0x6c6c6568 : 1819043176]
i_addr[ofs + 2]                     [0x6f77206f : 1870078063]
i_addr[ofs + 3]                     [0x a646c72 : 174353522]
i_addr[0x3] points data block       [0xa646c72]
i_nid[0]                            [0x       0 : 0]
i_nid[1]                            [0x       0 : 0]
i_nid[2]                            [0x       0 : 0]
i_nid[3]                            [0x       0 : 0]
i_nid[4]                            [0x       0 : 0]

xattr: e_name_index:6 e_name:selinux e_name_len:7 e_value_size:24 e_value:
753A6F626A6563745F723A756E6C6162656C65643A733000

Do you want to dump this file into ./lost_found/? [Y/N] y
Info: checkpoint state = c5 :  nat_bits crc compacted_summary unmount

Done.

如果选择y,可以dump出1.txt到./lost_found/路径下
似乎这个指令以后可以用来从原始数据直接生成文件

Do you want to dump this file into ./lost_found/? [Y/N] y
T1_PRO:/sdcard $ cat ./lost_found/1.txt
hello world

注意inode的i_addr,看起来数字是不是也很眼熟,其实就是hello world,为什么i_addr不是指向数据块,而是直接存储hello world。因为F2FS支持inline data(数据直接存储在inode中),小文件大小最大可达约3.4KB,在Android大量小文件场景中对存取空间占用和性能有一定优化。

i_addr[ofs + 1]                     [0x6c6c6568 : 1819043176]
i_addr[ofs + 2]                     [0x6f77206f : 1870078063]
i_addr[ofs + 3]                     [0x a646c72 : 174353522]
01201160  00 00 00 00 00 00 00 00  00 00 00 00 68 65 6c 6c  |............hell|
01201170  6f 20 77 6f 72 6c 64 0a  00 00 00 00 00 00 00 00  |o world.........|
五、总结

对f2fs存储结构学习,只有这么一点是远远不够的,为什么研究文件系统要先研究存储结构?因为文件系统的很多代码都是按照存储结构来写的,我觉得文件系统其实就是块设备的原始数据的翻译者,管理者。