smartctl
hdparm
lshw
fdisk
badblock
软raid
mount /dev/md0 /opt
[root@localhost root]# cp /usr/share/doc/raidtools-1.00.3/raid*.conf.* /etc
[root@localhost root]# ls -l /etc/ |grep raid
[root@localhost root]# vi /etc/raid0.conf.sample
mkraid /dev/md0
mkfs.ext3 /dev/md0
lsraid -A -a /dev/md0
[root@localhost root]# more /proc/mdstat
不使用的时候请直接删除/etc/raidtab文件. # rm /etc/raidtab
有时想知道服务器上有几块磁盘,如果没有做raid,则可以简单使用fdisk -l就可以看到。但是做了raid呢,这样就看不出来了。那么如何查看服务器上做了raid?
windows:RAID卡厂商都有RAID安装程序与驱动的。在配置完RAID后,进WINDOWS系统,下载相应的RAID安装程序并安装。比如 LSI 1064E 在官网上就可以下载到。 或者HD tune可以查看基本的raid信息
linux:分软与硬
软件raid:只能通过Linux系统本身来查看cat /proc/mdstat,可以看到raid级别,状态等信息。
硬件raid:最佳的办法是通过已安装的raid厂商的管理工具来查看,有cmdline,也有图形界面。如Adaptec公司的硬件卡就可以通过下面的命令进行查看:
# /usr/dpt/raidutil -L all可以看到非常详细的信息。
当然更多情况是没有安装相应的管理工具,只能依靠Linux本身,一般有两种方式:
# dmesg |grep -i raid
# cat /proc/scsi/scsi
显示的信息差不多,raid的厂商,型号,级别,但无法查看各块硬盘的信息。
[root@coreserv log]# cat /proc/scsi/scsi
Attached devices:
Host: scsi6 Channel: 02 Id: 00 Lun: 00
Vendor: IBM Model: ServeRAID M1015 Rev: 2.13
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: IBM SATA Model: DEVICE 81Y3672 Rev: SA81
Type: CD-ROM ANSI SCSI revision: 00
# fdisk -l
Disk /dev/sda: 145.9 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 17750 142472452+ 8e Linux LVM
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE Model: ST3146356SS Rev: HS09
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: SEAGATE Model: ST3146356SS Rev: HS09
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 01 Id: 00 Lun: 00
Vendor: Dell Model: VIRTUAL DISK Rev: 1028
Type: Direct-Access ANSI SCSI revision: 05
通过以上信息可以看出,该服务器有两块磁盘。品牌是希捷的,磁盘代号为 ST3146356SS,如果你熟悉细节磁盘的代号命名规则,你会轻易判定该磁盘大小为146G 。再根据fdisk 得出的结果可以判定,该服务器是拿两块146G的硬盘做的raid1.
不同的文件系统(xfs,reiserfs,ext3)都有自己的检测和修复工具。检测之前可以先使用dmesg命令查看有没有硬件I/O故障的日志,如果有,先用fsck看看是不是文件系统有问题,如果不是则可以使用下面介绍硬盘检测和优化方法来修复它。 grep "error" /va/log/messages*
--------------------------------------------------------------------------------------------------------------
使用SMART检测硬盘
SMART是一种磁盘自我分析检测技术,早在90年代末就基本得到了普及每一块硬盘(包括IDE、SCSI),在运行的时候都会将自身的若干参数记录下来,这些参数包括型号、容量、温度、密度、扇区、寻道时间、传输、误码率等。硬盘运行了几千小时后,很多内在的物理参数都会发生变化,某一参数超过报警阈值,则说明硬盘接近损坏,此时硬盘依然在工作,如果用户不理睬这个报警继续使用,那么硬盘将变得非常不可靠,随时可能故障。
启用SMART
SMART是和主板BIOS上相应功能配合的,要使用SMART,必须先进入到主板BIOS设置里边启动相关设置。一般从Pentium2级别起的主板,都支持SMART,BIOS启动以后,就是操作系统级别的事情了(Windows没有内置SMART相关工具,需要安装第三方工具软件),好在Linux上很早就有了SMART支持了,如果把Linux装在VMware等虚拟机上,在系统启动时候可以看到有个服务启动报错:smartd。这个服务器就是smart的daemon进程(因为vmware虚拟机的硬盘不支持SMART,所以报错)。smartd是一个守护进程(一个帮助程序),它能监视拥有自我监视,分析和汇报技术(Self-Monitoring, Analysis, and Reporting Technology - SMART)的硬盘。SMART体系使得硬盘能监视并汇报自己的运行状况.它的一个重要特性是能够预测失败,使得系统管理员能避免数据丢失。
[root@coreserv log]# rpm -qf /usr/sbin/smartctl
smartmontools-5.42-2.el6.x86_64
[root@coreserv log]# rpm -ql smartmontools
/etc/rc.d/init.d/smartd
/etc/smartd.conf
/etc/sysconfig/smartmontools
/usr/sbin/smartctl
/usr/sbin/smartd
/usr/sbin/update-smart-drivedb
[root@localhost ~]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
这是一个固态盘
[root@localhost ~]# smartctl -i /dev/sda
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-431.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: Kingstek 120GB
Serial Number: AA000000000000001053
LU WWN Device Id: 0 000000 000000000
Firmware Version: 20150818
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ACS-2 (revision not indicated)
Local Time is: Tue Jan 8 09:26:49 2019 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
----------------------------------------------------------------------------------------------------------------------------------
使用badblocks检测硬盘坏块
badblocks命令可以检查磁盘装置中损坏的区块。执行该指令时须指定所要检查的磁盘装置,及此装置的磁盘区块数。
badblocks -s//显示进度 -v//显示执行详细情况 /dev/sda1
# badblocks -s -v /dev/sda
正在检查从 0 到 244198583的块
Checking for bad blocks (read-only test): ^C0.10% done, 0:04 elapsed
Interrupted at block 272896
$badblocks -s//显示进度 -w//以写去检测 -v//显示执行详细情况 /dev/sda2
# badblocks -w -s -v /dev/sda1
Checking for bad blocks in read-write mode
From block 0 to 25607577
Testing with pattern 0xaa: ^C0.73% done, 0:03 elapsed
注意,不能以写的方式检测已经挂载的硬盘
----------------------------------------------------------------------------------------------------------------------------
使用hdparm测试
yum install hdparm
测试硬盘读写速度
# hdparm -Tt /dev/sda
可以查看转速,型号
[root@kvm2 ~]# hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: ST1000DM003-1ER162
Serial Number: Z4YBD720
Firmware Revision: CC45
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
[root@kvm2 ~]# hdparm -i /dev/sda
/dev/sda:
Model=ST1000DM003-1ER162, FwRev=CC45, SerialNo=Z4YBD720
---------------------------------------------------------------------------------------------------------------------
下载安装
下载地址:ftp://download2.boulder.ibm.com/ecc/sar/CMA/XSA/ibm_utl_sraidmr_megacli-8.00.48_linux_32-64.zip
或https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-06_MegaCLI.zip
在线下载:
wget ftp://download2.boulder.ibm.com/ecc/sar/CMA/XSA/ibm_utl_sraidmr_megacli-8.00.48_linux_32-64.zip
磁硬盘阵列后如何检测和监控硬盘健康状况?
MegaCli使用手册
wget https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-06_MegaCLI.zip
unzip -d me 8-07-06_MegaCLI.zip
cd linux
rpm -ivh MegaCli-8.07.06-1.noarch.rpm
cd /opt/MegaRAID/MegaCli/
./MegaCli64 -adpcount
./MegaCli64 -AdpAllInfo -aALL
[root@kvm1 MegaCli]# ./MegaCli64 -adpcount
[root@kvm1 MegaCli]# ./MegaCli64 -AdpAllInfo -aALL
[root@kvm1 MegaCli]# ./MegaCli64 -LdPdInfo -aALL
[root@kvm1 MegaCli]# ./MegaCli64 -LDInfo -Lall -aALL
[root@kvm1 MegaCli]# ./MegaCli64 -AdpBbuCmd -aALL
命令行具体使用
[root@kvm1 MegaCli]# ./MegaCli64 -AdpAllInfo -aALL
Adapter #0
==============================================================================
Versions
================
Product Name : ServeRAID M5210
Serial No : SV61224052
FW Package Build: 24.9.0-0029
Mfg. Data
================
Mfg. Date : 03/18/16
Rework Date : 00/00/00
Revision No : 04E
Battery FRU : N/A
Image Versions in Flash:
================
BIOS Version : 6.25.03.3_4.17.08.00_0x060E0301
FW Version : 4.290.00-4923
NVDATA Version : 3.1507.00-0011
Ctrl-R Version : 5.10-0710
Preboot CLI Version: 01.07-05:#%0000
Boot Block Version : 3.07.00.00-0002
Pending Images in Flash
================
None
PCI Info
================
Controller Id : 0000
Vendor Id : 1000
Device Id : 005d
SubVendorId : 1014
SubDeviceId : 0454
Host Interface : PCIE
ChipRevision : C0
Link Speed : 0
Number of Frontend Port: 0
Device Interface : PCIE
Number of Backend Port: 8
Port : Address
0 50000397081bdd32
1 50000397081b3932
2 5000c50096e01591
3 50000397a8430476
4 50000397a8430306
5 0000000000000000
6 0000000000000000
7 0000000000000000
HW Configuration
================
SAS Address : 500605b00ba2c280
BBU : Absent
Alarm : Absent
NVRAM : Present
Serial Debugger : Present
Memory : Present
Flash : Present
Memory Size : 1024MB
TPM : Absent
On board Expander: Absent
Upgrade Key : Present
Temperature sensor for ROC : Present
Temperature sensor for controller : Absent
ROC temperature : 58 degree Celsius
Settings
================
Current Time : 8:40:57 1/7, 2019
Predictive Fail Poll Interval : 300sec
Interrupt Throttle Active Count : 16
Interrupt Throttle Completion : 50us
Rebuild Rate : 30%
PR Rate : 30%
BGI Rate : 30%
Check Consistency Rate : 30%
Reconstruction Rate : 30%
Cache Flush Interval : 4s
Max Drives to Spinup at One Time : 2
Delay Among Spinup Groups : 12s
Physical Drive Coercion Mode : 1GB
Cluster Mode : Disabled
Alarm : Disabled
Auto Rebuild : Enabled
Battery Warning : Disabled
Ecc Bucket Size : 15
Ecc Bucket Leak Rate : 1440 Minutes
Restore HotSpare on Insertion : Disabled
Expose Enclosure Devices : Enabled
Maintain PD Fail History : Enabled
Host Request Reordering : Enabled
Auto Detect BackPlane Enabled : SGPIO/i2c SEP
Load Balance Mode : Auto
Use FDE Only : Yes
Security Key Assigned : No
Security Key Failed : No
Security Key Not Backedup : No
Default LD PowerSave Policy : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 10
Auto Enhanced Import : Yes
Any Offline VD Cache Preserved : No
Allow Boot with Preserved Cache : No
Disable Online Controller Reset : No
PFK in NVRAM : No
Use disk activity for locate : No
POST delay : 90 seconds
BIOS Error Handling : Stop On Errors
Current Boot Mode :Normal
Capabilities
================
RAID Level Supported : RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span
Supported Drives : SAS, SATA
Allowed Mixing:
Mix in Enclosure Allowed
Status
================
ECC Bucket Count : 0
Limitations
================
Max Arms Per VD : 32
Max Spans Per VD : 8
Max Arrays : 128
Max Number of VDs : 64
Max Parallel Commands : 928
Max SGE Count : 60
Max Data Transfer Size : 8192 sectors
Max Strips PerIO : 42
Max LD per array : 64
Min Strip Size : 64 KB
Max Strip Size : 1.0 MB
Max Configurable CacheCade Size: 0 GB
Current Size of CacheCade : 0 GB
Current Size of FW Cache : 831 MB
Device Present
================
Virtual Drives : 3
Degraded : 0
Offline : 0
Physical Devices : 6
Disks : 5
Critical Disks : 0
Failed Disks : 0
Supported Adapter Operations
================
Rebuild Rate : Yes
CC Rate : Yes
BGI Rate : Yes
Reconstruct Rate : Yes
Patrol Read Rate : Yes
Alarm Control : No
Cluster Support : No
BBU : Yes
Spanning : Yes
Dedicated Hot Spare : Yes
Revertible Hot Spares : Yes
Foreign Config Import : Yes
Self Diagnostic : Yes
Allow Mixed Redundancy on Array : No
Global Hot Spares : Yes
Deny SCSI Passthrough : No
Deny SMP Passthrough : No
Deny STP Passthrough : No
Support Security : Yes
Snapshot Enabled : No
Support the OCE without adding drives : Yes
Support PFK : Yes
Support PI : Yes
Support Boot Time PFK Change : Yes
Disable Online PFK Change : Yes
Support LDPI Type1 : No
Support LDPI Type2 : No
Support LDPI Type3 : No
PFK TrailTime Remaining : 0 days 0 hours
Support Shield State : Yes
Block SSD Write Disk Cache Change: Yes
Support Online FW Update : Yes
Supported VD Operations
================
Read Policy : Yes
Write Policy : Yes
IO Policy : Yes
Access Policy : Yes
Disk Cache Policy : Yes
Reconstruction : Yes
Deny Locate : No
Deny CC : No
Allow Ctrl Encryption: No
Enable LDBBM : No
Support Breakmirror : Yes
Power Savings : No
Supported PD Operations
================
Force Online : Yes
Force Offline : Yes
Force Rebuild : Yes
Deny Force Failed : No
Deny Force Good/Bad : No
Deny Missing Replace : No
Deny Clear : No
Deny Locate : No
Support Temperature : Yes
Disable Copyback : No
Enable JBOD : No
Enable Copyback on SMART : Yes
Enable Copyback to SSD on SMART Error : Yes
Enable SSD Patrol Read : No
PR Correct Unconfigured Areas : Yes
Error Counters
================
Memory Correctable Errors : 0
Memory Uncorrectable Errors : 0
Cluster Information
================
Cluster Permitted : No
Cluster Active : No
Default Settings
================
Phy Polarity : 0
Phy PolaritySplit : 0
Background Rate : 30
Strip Size : 256kB
Flush Time : 4 seconds
Write Policy : WB
Read Policy : Adaptive
Cache When BBU Bad : Disabled
Cached IO : No
SMART Mode : Mode 6
Alarm Disable : No
Coercion Mode : 1GB
ZCR Config : Unknown
Dirty LED Shows Drive Activity : No
BIOS Continue on Error : 0
Spin Down Mode : None
Allowed Device Type : SAS/SATA Mix
Allow Mix in Enclosure : Yes
Allow HDD SAS/SATA Mix in VD : No
Allow SSD SAS/SATA Mix in VD : No
Allow HDD/SSD Mix in VD : No
Allow SATA in Cluster : No
Max Chained Enclosures : 16
Disable Ctrl-R : Yes
Enable Web BIOS : No
Direct PD Mapping : No
BIOS Enumerate VDs : Yes
Restore Hot Spare on Insertion : No
Expose Enclosure Devices : Yes
Maintain PD Fail History : Yes
Disable Puncturing : Yes
Zero Based Enclosure Enumeration : No
PreBoot CLI Enabled : No
LED Show Drive Activity : Yes
Cluster Disable : Yes
SAS Disable : No
Auto Detect BackPlane Enable : SGPIO/i2c SEP
Use FDE Only : Yes
Enable Led Header : No
Delay during POST : 0
EnableCrashDump : Yes
Disable Online Controller Reset : No
EnableLDBBM : No
Un-Certified Hard Disk Drives : Allow
Treat Single span R1E as R10 : No
Max LD per array : 64
Power Saving option : All power saving options are disabled
Default spin down time in minutes: 30
Enable JBOD : No
TTY Log In Flash : No
Auto Enhanced Import : Yes
BreakMirror RAID Support : Yes
Disable Join Mirror : No
Enable Shield State : Yes
Time taken to detect CME : 60s
Exit Code: 0x00