Linux 发行版本众多,现如今也得到了越来越广泛的应用,同时也面临着系统出现故障的潜在风险,本文将以发行版本 RHEL6 为例详细介绍几种 Linux 灾难恢复技术和方法,以确保 Linux 系统的安全恢复。

在介绍 Linux 灾难恢复方法之前,我们先来了解下 MBR,其全称为 Master Boot Record,即硬盘的主引导记录。它由三个部分组成,主引导程序、硬盘分区表和硬盘有效标志。在总共 512 字节的主引导扇区里主引导程序(Bootloader)占 446 个字节,第二部分是硬盘分区表,占 64 个字节,硬盘有多少分区以及每一分区的大小都记录在其中。第三部分是硬盘有效标志,占 2 个字节。具体如图示:

图 1. MBR

linux容灾_拯救模式

系统硬盘分区表破坏

生产环境中的 Linux 服务器可能会因为病毒或者意外断电而引起硬盘分区表被破坏,通常恢复硬盘分区表需要之前我们先备份其分区表的信息,一般我们使用 USB 外接设备来备份主机硬盘的分区表。

 

在主机上挂载 USB 设备后我们查看系统当前磁盘设备:

  1. [root@FCoE ~]# fdisk -l
  2. Disk /dev/sda: 43.0 GB, 42991616000 bytes
  3. 255 heads, 63 sectors/track, 5226 cylinders
  4. Units = cylinders of 16065 * 512 = 8225280 bytes
  5. Sector size (logical/physical): 512 bytes / 512 bytes
  6. I/O size (minimum/optimal): 512 bytes / 512 bytes
  7. Disk identifier: 0x00032735
  8. Device Boot Start End Blocks Id System
  9. /dev/sda1 * 1 17 131072 83 Linux
  10. Partition 1 does not end on cylinder boundary.
  11. /dev/sda2 17 147 1048576 82 Linux swap / Solaris
  12. Partition 2 does not end on cylinder boundary.
  13. /dev/sda3 147 5227 40803328 83 Linux
  14. Disk /dev/sdb: 2147 MB, 2147483648 bytes
  15. 255 heads, 63 sectors/track, 261 cylinders
  16. Units = cylinders of 16065 * 512 = 8225280 bytes
  17. Sector size (logical/physical): 512 bytes / 512 bytes
  18. I/O size (minimum/optimal): 512 bytes / 512 bytes
  19. Disk identifier: 0x00000000
  20. Disk /dev/sdb doesn't contain a valid partition table

现在我们在 sdb 这个设备上创建一个新的分区:

  1. [root@FCoE ~]# fdisk /dev/sdb
  2. Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
  3. Building a new DOS disklabel with disk identifier 0xcdd48395.
  4. Changes will remain in memory only, until you decide to write them.
  5. After that, of course, the previous content won't be recoverable.
  6. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
  7. WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
  8. switch off the mode (command 'c') and change display units to
  9. sectors (command 'u').
  10. Command (m for help): n
  11. Command action
  12. e extended
  13. p primary partition (1-4)
  14. p
  15. Partition number (1-4): 1
  16. First cylinder (1-261, default 1):
  17. Using default value 1
  18. Last cylinder, +cylinders or +size{K,M,G} (1-261, default 261):
  19. Using default value 261
  20. Command (m for help): p
  21. Disk /dev/sdb: 2147 MB, 2147483648 bytes
  22. 255 heads, 63 sectors/track, 261 cylinders
  23. Units = cylinders of 16065 * 512 = 8225280 bytes
  24. Sector size (logical/physical): 512 bytes / 512 bytes
  25. I/O size (minimum/optimal): 512 bytes / 512 bytes
  26. Disk identifier: 0xcdd48395
  27. Device Boot Start End Blocks Id System
  28. /dev/sdb1 1 261 2096451 83 Linux
  29. Command (m for help): w
  30. The partition table has been altered!
  31. Calling ioctl() to re-read partition table.
  32. Syncing disks.

在新分区 sdb1 上创建文件系统:

  1. [root@FCoE ~]# mkfs.ext3 /dev/sdb1
  2. mke2fs 1.41.12 (17-May-2010)
  3. Filesystem label=
  4. OS type: Linux
  5. Block size=4096 (log=2)
  6. Fragment size=4096 (log=2)
  7. Stride=0 blocks, Stripe width=0 blocks
  8. 131072 inodes, 524112 blocks
  9. 26205 blocks (5.00%) reserved for the super user
  10. First data block=0
  11. Maximum filesystem blocks=536870912
  12. 16 block groups
  13. 32768 blocks per group, 32768 fragments per group
  14. 8192 inodes per group
  15. Superblock backups stored on blocks:
  16. 32768, 98304, 163840, 229376, 294912
  17. Writing inode tables: done
  18. Creating journal (8192 blocks): done
  19. Writing superblocks and filesystem accounting information: done
  20. This filesystem will be automatically checked every 24 mounts or
  21. 180 days, whichever comes first. Use tune2fs -c or -i to override.

挂载新的文件系统:

  1. [root@FCoE ~]# mount /dev/sdb1 /mnt/

通常我们通过备份硬盘的 MBR 来备份硬盘分区表:

  1. [root@FCoE ~]# dd if=/dev/sda of=/mnt/sda.mbr bs=512 count=1
  2. 1+0 records in
  3. 1+0 records out
  4. 512 bytes (512 B) copied, 0.000777948 s, 658 kB/s

现在我们来写零硬盘分区表来实现类似分区表被破坏的结果:

  1. [root@FCoE ~]# dd if=/dev/zero of=/dev/sda bs=1 count=64 skip=446 seek=446
  2. 64+0 records in
  3. 64+0 records out
  4. 64 bytes (64 B) copied, 0.00160668 s, 39.8 kB/s

查询硬盘 sda 上的分区信息,发现其已不包含任何分区:

  1. [root@FCoE ~]# fdisk -l
  2. Disk /dev/sda: 43.0 GB, 42991616000 bytes
  3. 255 heads, 63 sectors/track, 5226 cylinders
  4. Units = cylinders of 16065 * 512 = 8225280 bytes
  5. Sector size (logical/physical): 512 bytes / 512 bytes
  6. I/O size (minimum/optimal): 512 bytes / 512 bytes
  7. Disk identifier: 0x00032735
  8. Device Boot Start End Blocks Id System
  9. Disk /dev/sdb: 2147 MB, 2147483648 bytes
  10. 255 heads, 63 sectors/track, 261 cylinders
  11. Units = cylinders of 16065 * 512 = 8225280 bytes
  12. Sector size (logical/physical): 512 bytes / 512 bytes
  13. I/O size (minimum/optimal): 512 bytes / 512 bytes
  14. Disk identifier: 0xcdd48395
  15. Device Boot Start End Blocks Id System
  16. /dev/sdb1 1 261 2096451 83 Linux

当主机硬盘分区表丢失了之后,再次启动后 GRUB 会因找不到配置文件而进入命令行模式:

图 2. 分区表丢失

linux容灾_灾难恢复_02

接下来我们挂载 RHEL6 的安装盘,同时也接入我们之前备份的 USB 设备,然后重启主机,选择 CD-ROM 为第一引导设备,启动后选择“Rescue installed system”。

图 3. 选择援救

linux容灾_linux容灾_03

按照提示,最终我们选择一个 shell。

图 4. 选择 shell

linux容灾_linux容灾_04

我们查询系统磁盘信息,发现硬盘设备 sda 没有包含任何分区。

  1. bash-4.1# fdik – l
  2. Disk /dev/sda: 43.0 GB, 42991616000 bytes
  3. 255 heads, 63 sectors/track, 5226 cylinders
  4. Units = cylinders of 16065 * 512 = 8225280 bytes
  5. Sector size (logical/physical): 512 bytes / 512 bytes
  6. I/O size (minimum/optimal): 512 bytes / 512 bytes
  7. Disk identifier: 0x00032735
  8. Device Boot Start End Blocks Id System
  9. Disk /dev/sdb: 2147 MB, 2147483648 bytes
  10. 255 heads, 63 sectors/track, 261 cylinders
  11. Units = cylinders of 16065 * 512 = 8225280 bytes
  12. Sector size (logical/physical): 512 bytes / 512 bytes
  13. I/O size (minimum/optimal): 512 bytes / 512 bytes
  14. Disk identifier: 0xcdd48395
  15. Device Boot Start End Blocks Id System
  16. /dev/sdb1 1 261 2096451 83 Linux

我们来恢复它的硬盘分区表,创建一个目录并且挂载之前备份的 USB 设备,我们看到它的设备名是 /dev/sdb。

  1. bash-4.1# mount /dev/sdb1 /usb
  2. bash-4.1# ls /usb
  3. lost+found sda.mbr

通过原来备份的 sda.mbr 文件来恢复硬盘设备 sda 的硬盘分区表:

  1. bash-4.1# dd if=/usb/sda.mbr of=/dev/sda bs=1 count=64 skip=446 seek=446
  2. 64+0 records in
  3. 64+0 records out
  4. 64 bytes (64 B) copied, 0.038358 s, 4.6 kB/s

再次查询系统磁盘信息:

  1. bash-4.1# fdisk -l
  2. Disk /dev/sda: 43.0 GB, 42991616000 bytes
  3. 255 heads, 63 sectors/track, 5226 cylinders
  4. Units = cylinders of 16065 * 512 = 8225280 bytes
  5. Sector size (logical/physical): 512 bytes / 512 bytes
  6. I/O size (minimum/optimal): 512 bytes / 512 bytes
  7. Disk identifier: 0x00032735
  8. Device Boot Start End Blocks Id System
  9. /dev/sda1 * 1 17 131072 83 Linux
  10. Partition 1 does not end on cylinder boundary.
  11. /dev/sda2 17 147 1048576 82 Linux swap / Solaris
  12. Partition 2 does not end on cylinder boundary.
  13. /dev/sda3 147 5227 40803328 83 Linux
  14. Disk /dev/sdb: 2147 MB, 2147483648 bytes
  15. 255 heads, 63 sectors/track, 261 cylinders
  16. Units = cylinders of 16065 * 512 = 8225280 bytes
  17. Sector size (logical/physical): 512 bytes / 512 bytes
  18. I/O size (minimum/optimal): 512 bytes / 512 bytes
  19. Disk identifier: 0xcdd48395
  20. Device Boot Start End Blocks Id System
  21. /dev/sdb1 1 261 2096451 83 Linux

硬盘设备 sda 的分区表已经恢复,重启后系统便可正常引导。

 

系统 GRUB 损坏

类似得我们可以来写零 Bootloader 来实现 GRUB 被破坏的结果:

  1. [root@FCoE grub]# dd if=/dev/zero of=/dev/sda bs=446 count=1
  2. 1+0 records in
  3. 1+0 records out
  4. 446 bytes (446 B) copied, 0.0017583 s, 254 kB/s

重启后系统会因找不到 GRUB 而卡在“Booting from Hard Disk …”

挂载系统安装光盘然后选择进入 Rescue 模式,然后恢复 GRUB:

  1. bash-4.1# chroot /mnt/sysp_w_picpath
  2. sh-4.1# grub
  3. grub > root hd(0,0)
  4. grub > setup (hd0)
  5. grub > quit

图 5. 恢复 GRUB

linux容灾_灾难恢复_05

重启主机后,系统可正常引导。

系统内核文件丢失

系统丢失内核 kernel 文件,再次启动后会提示找不到文件。

图 6. 内核丢失

linux容灾_linux容灾_06

挂载系统安装盘进入援救模式,检查 /boot 目录下发现没有 kernel 文件。

  1. bash-4.1# chroot /mnt/sysp_w_picpath
  2. bash-4.1# ls /boot
  3. ls
  4. config-2.6.32-71.el6.x86_64 lost+found
  5. efi symvers-2.6.32-71.el6.x86_64.gz
  6. grub System.map-2.6.32-71.el6.x86_64
  7. initramfs-2.6.32-71.el6.x86_64.img

从挂载的系统安装盘强制重新安装内核:

  1. sh-4.1# mount – o loop /dev/sr0 /media
  2. sh-4.1# cd /media/Server/Packages
  3. sh-4.1# rpm -ivh --force kernel-2.6.32-71.el6.x86_64.rpm
  4. warning: kernel-2.6.32-71.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, \
  5. key ID fd431d51: NOKEY
  6. Preparing... ########################################### [100%]
  7. 1:kernel ########################################### [100%]

在 /boot 目录下已经生成新的 kernel 文件 vmlinuz-2.6.32-71.el6.x86_64

  1. sh-4.1## ls /boot
  2. config-2.6.32-71.el6.x86_64 lost+found
  3. efi symvers-2.6.32-71.el6.x86_64.gz
  4. grub System.map-2.6.32-71.el6.x86_64
  5. initramfs-2.6.32-71.el6.x86_64.img vmlinuz-2.6.32-71.el6.x86_64

重启主机后,系统可正常引导。

系统镜像文件丢失

系统丢失镜像文件,主机启动后黑屏。

图 7. 镜像丢失

linux容灾_linux容灾_07

挂载系统安装盘进入援救模式 , 检查 /boot 目录下发现没有镜像文件。

  1. bash-4.1# chroot /mnt/sysp_w_picpath
  2. sh-4.1# ls /boot
  3. config-2.6.32-71.el6.x86_64 symvers-2.6.32-71.el6.x86_64.gz
  4. efi System.map-2.6.32-71.el6.x86_64
  5. grub vmlinuz-2.6.32-71.el6.x86_64
  6. lost+found

重新生成镜像文件 initramfs-2.6.32-71.el6.x86_64.img。

  1. sh-4.1# cd /boot
  2. sh-4.1# mkinit
  3. sh-4.1# ls
  4. config-2.6.32-71.el6.x86_64 lost+found
  5. efi symvers-2.6.32-71.el6.x86_64.gz
  6. grub System.map-2.6.32-71.el6.x86_64
  7. initramfs-2.6.32-71.el6.x86_64.img vmlinuz-2.6.32-71.el6.x86_64

重启主机后 , 系统可正常引导。

 

系统 /boot 分区损坏

一般来说系统 /boot 分区损坏,我们会先尝试修复文件系统。如果文件系统损坏不能修复,那么我们可以参照前述的方法来依次新建 /boot 分区,重新安装内核和镜像,然后安装 GURB 再手工编辑引导菜单,以最终来恢复系统可正常引导。通常我们需要按照如下的步骤来恢复。

创建分区

碰到比较严重的情况就是 /boot 分区已经完全损坏,启动时会提示找不到引导设备。

图 8. 引导分区损坏

linux容灾_拯救模式_08

挂载安装盘后进入援救模式,查看分区情况,发现分区 /dev/sda1 不存在。

  1. bash-4.1#
  2. Disk /dev/sda: 43.0 GB, 42991616000 bytes
  3. 255 heads, 63 sectors/track, 5226 cylinders
  4. Units = cylinders of 16065 * 512 = 8225280 bytes
  5. Sector size (logical/physical): 512 bytes / 512 bytes
  6. I/O size (minimum/optimal): 512 bytes / 512 bytes
  7. Disk identifier: 0x00000000
  8. Device Boot Start End Blocks Id System
  9. /dev/sda2 17 147 1048576 82 Linux swap / Solaris
  10. Partition 2 does not end on cylinder boundary.
  11. /dev/sda3 147 5227 40803328 83 Linux
  12. Disk /dev/sdb: 2147 MB, 2147483648 bytes
  13. 255 heads, 63 sectors/track, 261 cylinders
  14. Units = cylinders of 16065 * 512 = 8225280 bytes
  15. Sector size (logical/physical): 512 bytes / 512 bytes
  16. I/O size (minimum/optimal): 512 bytes / 512 bytes
  17. Disk identifier: 0xcdd48395
  18. Device Boot Start End Blocks Id System
  19. /dev/sdb1 1 261 2096451 83 Linux

新建一个分区并且设置它为启动分区。

  1. bash-4.1# fdisk /dev/sda
  2. WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
  3. switch off the mode (command 'c') and change display units to
  4. sectors (command 'u').
  5. Command (m for help): n
  6. Command action
  7. e extended
  8. p primary partition (1-4)
  9. p
  10. Partition number (1-4): 1
  11. First cylinder (1-5226, default 1):
  12. Using default value 1
  13. Last cylinder, +cylinders or +size{K,M,G} (1-16, default 16):
  14. Using default value 16
  15. Command (m for help): a
  16. Partition number (1-4): 1
  17. Command (m for help): p
  18. Disk /dev/sda: 43.0 GB, 42991616000 bytes
  19. 255 heads, 63 sectors/track, 5226 cylinders
  20. Units = cylinders of 16065 * 512 = 8225280 bytes
  21. Sector size (logical/physical): 512 bytes / 512 bytes
  22. I/O size (minimum/optimal): 512 bytes / 512 bytes
  23. Disk identifier: 0x00000000
  24. Device Boot Start End Blocks Id System
  25. /dev/sda1 * 1 16 128488+ 83 Linux
  26. /dev/sda2 17 147 1048576 82 Linux swap / Solaris
  27. Partition 2 does not end on cylinder boundary.
  28. /dev/sda3 147 5227 40803328 83 Linux
  29. Command (m for help): w
  30. The partition table has been altered!

重启主机以更新分区表,然后进入援救模式,并在我们新创建的分区上创建文件系统。

  1. bash-4.1# mkfs.ext4 /dev/sda1
  2. Filesystem label=
  3. OS type: Linux
  4. Block size=1024 (log=0)
  5. Fragment size=1024 (log=0)
  6. Stride=0 blocks, Stripe width=0 blocks
  7. 32128 inodes, 128488 blocks
  8. 6424 blocks (5.00%) reserved for the super user
  9. First data block=1
  10. Maximum filesystem blocks=67371008
  11. 16 block groups
  12. 8192 blocks per group, 8192 fragments per group
  13. 2008 inodes per group
  14. Superblock backups stored on blocks:
  15. 8193, 24577, 40961, 57345, 73729
  16. Writing inode tables: done
  17. Creating journal (4096 blocks): done
  18. Writing superblocks and filesystem accounting information: done
  19. This filesystem will be automatically checked every 38 mounts or
  20. 180 days, whichever comes first. Use tune2fs -c or -i to override.

安装内核镜像文件

通过前述的方法我们安装内核和镜像文件。

  1. bash-4.1# chroot /mnt/sysp_w_picpath
  2. sh-4.1# mount /dev/sda1 /boot
  3. sh-4.1# mount – o loop /dev/sr0 /media
  4. sh-4.1# cd /media/Server/Packages
  5. sh-4.1# rpm -ivh --force kernel-2.6.32-71.el6.x86_64.rpm
  6. warning: kernel-2.6.32-71.el6.x86_64.rpm: \
  7. Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
  8. Preparing... ########################################### [100%]
  9. 1:kernel ########################################### [100%]

安装 GRUB

我们安装 GRUB 到硬盘设备 sda 上。

  1. sh-4.1# grub-install /dev/sda
  2. Installation finished. No error reported.
  3. This is the contents of the device map /boot/grub/device.map.
  4. Check if this is correct or not. If any of the lines is incorrect,
  5. fix it and re-run the script `grub-install'.
  6. (fd0) /dev/fd0
  7. (hd0) /dev/sda
  8. (hd1) /dev/sdb

编辑引导菜单

由于我们创建了新的分区,其对应的 UUID 会发生变化,可以通过命令 blkid 来查询分区的 UUID。

  1. bash-4.1# blkid
  2. /dev/loop0: TYPE="squashfs"
  3. /dev/sda2: UUID="7b1e0fac-ff06-492c-848d-497e2a38c54e" TYPE="swap"
  4. /dev/sda3: UUID="ef89764e-04ff-4f26-ae82-dcab267ecc66" TYPE="ext4"
  5. /dev/sdb1: UUID="2b824352-df2a-44c6-a547-838d46f526fa" SEC_TYPE="ext2" TYPE="ext3"
  6. /dev/loop1: LABEL="RHEL_6.0 x86_64 Disc 1" TYPE="iso9660"
  7. /dev/sda1: UUID="cec964af-1618-48ff-ac33-4ef71b9d3265" TYPE="ext4"

上述的 sda3 为根分区,编辑 /boot/grub/grub.conf 文件更新其对应的 UUID,其内容如下。

  1. title Red Hat Enterprise Linux 6
  2. root (hd0,0)
  3. kernel /vmlinuz-2.6.32-71.el6.x86_64 \
  4. root=UUID=ef89764e-04ff-4f26-ae82-dcab267ecc66 rhgb quiet
  5. initrd /initramfs-2.6.32-71.el6.x86_64.img

更新 /etc/fstab

类似的我们也需要更新 /etc/fstab 里 /boot 分区对应的新 UUID,其内容如下。

  1. #
  2. # /etc/fstab
  3. # Created by anaconda on Sun Mar 18 04:35:07 2012
  4. #
  5. # Accessible filesystems, by reference, are maintained under '/dev/disk'
  6. # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
  7. #
  8. UUID=ef89764e-04ff-4f26-ae82-dcab267ecc66 / ext4 defaults 1 1
  9. UUID=cec964af-1618-48ff-ac33-4ef71b9d3265 /boot ext4 defaults 1 2
  10. UUID=7b1e0fac-ff06-492c-848d-497e2a38c54e swap swap defaults 0 0
  11. tmpfs /dev/shm tmpfs defaults 0 0
  12. devpts /dev/pts devpts gid=5,mode=620 0 0
  13. sysfs /sys sysfs defaults 0 0
  14. proc /proc proc defaults 0 0

现在我们的恢复步骤已经完成,重启主机后 GRUB 中可见我们配置的系统列表。

图 9. GRUB 菜单

linux容灾_linux容灾_09

至此 /boot 分区已恢复,系统可正常引导启动。

图 10. 系统启动

linux容灾_灾难恢复_10

总结

本文阐述了常见的 Linux 灾难恢复技术和方法,及其出现严重灾难时应注意的恢复顺序,以确保 Linux 系统在出现灾难时得以安全恢复。