前言
一、启动故障
系统无法启动,启动时内核panic:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 |
Uncompressing Linux Ok, booting the kernel. audit(1297269214.612:0): initialized ide2: I /O resource 0x3F6-0x3F6 not free . ide2: ports already in use, skipping probe Red Hat nash version 4.1.18 starting File descriptor 3 left open Reading all physical volumes. This may take a while /dev/hda : open failed: No medium found Found volume group "VolGroup_ID_17253" using metadata type lvm2 File descriptor 3 left open 8 logical volume(s) in volume group "VolGroup_ID_17253" now active File descriptor 3 left open VFS: Can't find ext3 filesystem on dev dm-0. mount : error 22 mounting ext3 mount : error 2 mounting none switchroot: mount failed: 22 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! _ |
1 |
find . -name "*.php" - exec rm -f {} \; |
二、修复环境
当时朋友有些慌张了,因为他认为这是他操作失误导致服务器瘫痪,有些不知道该怎么办了,那天我有事情比较忙,打算晚些时候再回来帮他。随后的操作中可以看到他做了很多危险的操作,我会一一提出来,大家有类似情况的时候,一定要注意。
这些都应该是前一天晚上回家琢磨的,他显然偷懒了,直接拿故障环境练手。如果不是这哥们命大,而且系统出的问题没有那么严重,那么这些连续的错误,很可能造成不可挽回的结果。
虽然对于 Windows 用户来说,看着纯命令行觉得无从下手,但是,得到了美丽的界面往往意味着你得付出些什么。在以前的某些 LiveCD 中,加载图形界面的时候,由于各种驱动和程序的加载,错误的进行了硬盘的写操作,从而导致有些人抱怨过启动 LiveCD 导致硬盘数据二次损坏,最终使得修复无望。虽然,最近这些版本的 LiveCD 可能没有这类问题,但是为了避免万一,应当尽量减少对硬盘写入操作的可能性。既然所有修复行为都会在命令行模式下进行,那就没必要启动图形界面冒风险。
三、确认问题
该准备的光盘准备了,不该操作的操作也做了,这让我很无语,虽然怀疑仅仅是逻辑错误导致superblock坏掉,应该不会有大问题,但还是让我对这次修复的可能性感到怀疑。至少这朋友完全不按照我说的办,经常的做些自己觉得没什么的危险操作,哎,前景黯淡啊。
1
2
3
4
5 |
[root@localhost liveuser] # mount |grep LogVol /dev/mapper/VolGroup_ID_17253-LogVol4 on /media/0edef924-567f-45fc-9609-51722cad6e9e type ext3 (rw,nosuid,nodev,uhelper=udisks) /dev/mapper/VolGroup_ID_17253-LogVol7 on /media/ee0c40c6-d9d1-4a81-9806-9991621db1dd type ext3 (rw,nosuid,nodev,uhelper=udisks) /dev/mapper/VolGroup_ID_17253-LogVolHome on /media/f524534e-3d24-4a22-b475-9e4b7dac0d35 type ext3 (rw,nosuid,nodev,uhelper=udisks) /dev/mapper/VolGroup_ID_17253-LogVol6 on /media/12953c57-baba-4358-baeb-cdd17d6513a2 type ext3 (rw,nosuid,nodev,uhelper=udisks) |
1
2
3
4
5
6
7
8
9 |
[root@localhost liveuser] # lvscan ACTIVE '/dev/VolGroup_ID_17253/LogVol3' [10.00 GiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVol4' [1.06 GiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVol7' [53.56 GiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVol6' [5.38 GiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVol1' [2.00 GiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVol0' [2.00 GiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVol2' [64.00 MiB] inherit ACTIVE '/dev/VolGroup_ID_17253/LogVolHome' [29.44 GiB] inherit |
1
2
3
4
5
6 |
[root@localhost liveuser] # mkdir /media/myroot [root@localhost liveuser] # mount -t ext3 /dev/mapper/VolGroup_ID_17253-LogVol3 /media/myroot mount : wrong fs type , bad option, bad superblock on /dev/mapper/VolGroup_ID_17253-LogVol3 , missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 |
[root@localhost liveuser] # dmesg | tail -n 20 [ 343.583694] EXT3-fs (dm-3): mounted filesystem with ordered data mode [ 343.585926] SELinux: initialized (dev dm-3, type ext3), uses xattr [ 346.179128] EXT3-fs: barriers not enabled [ 346.183702] kjournald starting. Commit interval 5 seconds [ 346.189688] EXT3-fs (dm-4): using internal journal [ 346.189694] EXT3-fs (dm-4): mounted filesystem with ordered data mode [ 346.193216] SELinux: initialized (dev dm-4, type ext3), uses xattr [ 348.911539] EXT3-fs: barriers not enabled [ 348.918113] kjournald starting. Commit interval 5 seconds [ 348.918151] EXT3-fs (dm-9): warning: mounting fs with errors, running e2fsck is recommended [ 348.922722] EXT3-fs (dm-9): using internal journal [ 348.922728] EXT3-fs (dm-9): mounted filesystem with ordered data mode [ 348.922738] SELinux: initialized (dev dm-9, type ext3), uses xattr [ 350.225535] EXT3-fs: barriers not enabled [ 350.230730] kjournald starting. Commit interval 5 seconds [ 350.236075] EXT3-fs (dm-5): using internal journal [ 350.236081] EXT3-fs (dm-5): mounted filesystem with ordered data mode [ 350.241386] SELinux: initialized (dev dm-5, type ext3), uses xattr [ 1957.796112] EXT3-fs (dm-2): error: can't find ext3 filesystem on dev dm-2. [ 2688.247855] EXT3-fs (dm-2): error: can't find ext3 filesystem on dev dm-2. |
四、镜像备份损坏的硬盘
执行 fsck 会对磁盘进行写操作,我们需要在此之前对磁盘进行镜像备份。这样万一 fsck 的修复造成了更大的损失,我们还可以恢复原始状态。
1 |
/dev/sdb1 on /media/BACKUP type fuseblk (rw,nosuid,nodev,allow_other,blksize=4096,default_permissions) |
1
2
3
4 |
[root@localhost ~] # dd if=/dev/VolGroup_ID_17253/LogVol3 | gzip > /media/BACKUP/server_root_p_w_picpath.gz 20971520+0 records in 20971520+0 records out 10737418240 bytes (11 GB) copied, 666.429 s, 16.1 MB /s |
1
2 |
[root@localhost ~] # ls -l /media/BACKUP/*.gz -rwxrwxrwx. 1 liveuser liveuser 5943229016 Feb 10 17:29 /media/BACKUP/server_root_p_w_picpath .gz |
五、修复
先进行第一次修复尝试。
1
2
3
4
5
6
7
8
9 |
[root@localhost liveuser] # fsck.ext3 -B 1024 /dev/mapper/VolGroup_ID_17253-LogVol3 e2fsck 1.41.12 (17-May-2010) fsck .ext3: Superblock invalid, trying backup blocks fsck .ext3: Bad magic number in super-block while trying to open /dev/mapper/VolGroup_ID_17253-LogVol3 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else ), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 |
[root@localhost liveuser] # dumpe2fs /dev/VolGroup_ID_17253/LogVol7 | grep -i superblock dumpe2fs 1.41.12 (17-May-2010) Primary superblock at 0, Group descriptors at 1-4 Backup superblock at 32768, Group descriptors at 32769-32772 Backup superblock at 98304, Group descriptors at 98305-98308 Backup superblock at 163840, Group descriptors at 163841-163844 Backup superblock at 229376, Group descriptors at 229377-229380 Backup superblock at 294912, Group descriptors at 294913-294916 Backup superblock at 819200, Group descriptors at 819201-819204 Backup superblock at 884736, Group descriptors at 884737-884740 Backup superblock at 1605632, Group descriptors at 1605633-1605636 Backup superblock at 2654208, Group descriptors at 2654209-2654212 Backup superblock at 4096000, Group descriptors at 4096001-4096004 Backup superblock at 7962624, Group descriptors at 7962625-7962628 Backup superblock at 11239424, Group descriptors at 11239425-11239428 |
1
2
3
4
5
6
7
8 |
[root@localhost liveuser] # fsck.ext3 -B 1024 -b 32768 /dev/mapper/VolGroup_ID_17253-LogVol3 e2fsck 1.41.12 (17-May-2010) fsck .ext3: Bad magic number in super-block while trying to open /dev/mapper/VolGroup_ID_17253-LogVol3 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else ), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> |
1
2
3
4
5
6
7
8 |
[root@localhost liveuser] # fsck.ext3 -b 98304 /dev/VolGroup_ID_17253/LogVol3 e2fsck 1.41.12 (17-May-2010) Superblock needs_recovery flag is clear , but journal has data. Recovery flag not set in backup superblock, so running journal anyway. /dev/VolGroup_ID_17253/LogVol3 : recovering journal Adding dirhash hint to filesystem. Pass 1: Checking inodes, blocks, and sizes Inode 81, i_blocks is 8, should be 0. Fix<y>? |
1 |
[1]+ Stopped fsck .ext3 -b 98304 /dev/VolGroup_ID_17253/LogVol3 |
1
2
3
4
5
6
7
8
9
10
11
12
13 |
[root@localhost liveuser] # fsck.ext3 -y -b 98304 /dev/VolGroup_ID_17253/LogVol3 ... Free blocks count wrong for group #78 (32254, counted=5049). Fix? yes Free blocks count wrong for group #79 (32254, counted=4724). Fix? yes Free blocks count wrong (2566343, counted=1869026). Fix? yes Free inodes count wrong for group #0 (16373, counted=16288). Fix? yes ... /dev/VolGroup_ID_17253/LogVol3 : ***** FILE SYSTEM WAS MODIFIED ***** /dev/VolGroup_ID_17253/LogVol3 : 229199 /1310720 files (1.6% non-contiguous), 752414 /2621440 blocks |
1
2
3
4
5
6
7
8
9 |
[root@localhost /] # fsck.ext3 /dev/VolGroup_ID_17253/LogVolHome e2fsck 1.41.12 (17-May-2010) /dev/VolGroup_ID_17253/LogVolHome contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/VolGroup_ID_17253/LogVolHome : 2301889 /3859072 files (9.9% non-contiguous), 2554717 /7716864 blocks |
然后,尝试挂载 LogVol3,看这次是否没有问题了:
1
2 |
[root@localhost /] # mkdir /media/myroot [root@localhost /] # mount -t ext3 /dev/VolGroup_ID_17253/LogVol3 /media/myroot |
六、备份重要数据
重新启动系统,如果一切正常,系统会正常加载所有的服务器,并且开始提供服务,那时数据就会发生改变了,在还不知道服务器是否正常的情况下贸然启动服务器,而没有备份,这是危险的。因此,我们先备份重要数据。
1
2
3
4
5 |
[root@localhost /] # cd /media/myroot [root@localhost myroot] # tar -czvf /media/BACKUP/www.tgz www [root@localhost myroot] # tar -czvf /media/BACKUP/server_lampp.tgz opt/lampp [root@localhost myroot] # tar -czvf /media/BACKUP/server_mysql.tgz opt/lampp/var/mysql ... |
1
2
3
4
5
6
7 |
[root@localhost myroot] # ls l /media/BACKUP total 6287380 -rwxrwxrwx. 1 liveuser liveuser 92690926 Feb 10 19:35 server_lampp.tgz -rwxrwxrwx. 1 liveuser liveuser 28670158 Feb 10 19:30 server_mysql.tgz -rwxrwxrwx. 1 liveuser liveuser 5943229016 Feb 10 17:29 server_root_p_w_picpath.gz -rwxrwxrwx. 1 liveuser liveuser 373677732 Feb 10 19:34 www.tgz ... |
七、重新启动
分区修好了,fsck检查各个分区也都没问题了,该备份的都备份了。可以尝试重新启动系统了,祈祷吧……
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 |
Remounting root filesystem in read -write mode: Setting up Logical Volume Management: Checking filesystems /boot : clean, 38 /26208 files, 15754 /104420 blocks /dev/VolGroup_ID_17253/LogVol4 : clean, 22 /139392 files, 12950 /278528 blocks /dev/VolGroup_ID_17253/LogVol7 : clean, 132904 /7028736 files, 882601 /14041088 blocks /dev/VolGroup_ID_17253/LogVol6 : clean, 22314 /704512 files, 168941 /1409024 blocks /dev/VolGroup_ID_17253/LogVolHome contains a file system with errors, check forced. /dev/VolGroup_ID_17253/LogVolHome : Inode 1340876 is too big. /dev/VolGroup_ID_17253/LogVolHome : UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) [FAILED] *** An error occurred during the file system check. *** Dropping you to a shell; the system will reboot *** when you leave the shell. *** Warning -- SELinux is active *** Disabling security enforcement for system recovery. *** Run 'setenforce 1' to reenable. Give root password for maintenance (or type Control-D to continue ): _ |
八、后记
这次系统故障导致我朋友很紧张的原因之一就是系统没有备份。最近一次备份也是几个月前,因此如果硬盘无法恢复,那么直接造成的结果就是这几个月的工作全部丢失。其实使用Linux备份还是比较容易的。最简单的办法是用crontab,定义的压缩一份重要数据,传到别的服务器上去,或者复制到别的物理硬盘上。可惜由于他们公司对 Linux 熟悉的人不多,因此没有人去做而已。