文章目录

  • ​​1.现象举例​​
  • ​​2.hostbyte和driverbyte​​
  • ​​3.FC链路的硬件故障​​
  • ​​4.源码分析​​

1.现象举例

  • 1.hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 16 08:06:53 localhost kernel: sd 11:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 16 08:06:53 localhost kernel: sd 11:0:0:0: [sdh] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  • 2.hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   37.404796] sd 0:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 37.404806] blk_update_request: I/O error, dev sda, sector 0
  • 3.hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec  6 18:12:13 localhost kernel: sd 20:0:0:0: [sdb] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Dec 6 18:12:13 localhost kernel: blk_update_request: I/O error, dev sdb, sector 512

2.hostbyte和driverbyte

  • linux/include/scsi/scsi.h
/*
* Use these to separate status msg and our bytes
*
* These are set by:
*
* status byte = set from target device
* msg_byte = return status from host adapter itself.
* host_byte = set by low-level driver to indicate status.
* driver_byte = set by mid-level.
*/
#define status_byte(result) (((result) >> 1) & 0x7f)
#define msg_byte(result) (((result) >> 8) & 0xff)
#define host_byte(result) (((result) >> 16) & 0xff)
#define driver_byte(result) (((result) >> 24) & 0xff)
  • hostbyte码值对应的含义如下:
132 /*
133 * Host byte codes
134 */
135
136 #define DID_OK 0x00 /* NO error */
137 #define DID_NO_CONNECT 0x01 /* Couldn't connect before timeout period */
138 #define DID_BUS_BUSY 0x02 /* BUS stayed busy through time out period */
139 #define DID_TIME_OUT 0x03 /* TIMED OUT for other reason */
140 #define DID_BAD_TARGET 0x04 /* BAD target. */
141 #define DID_ABORT 0x05 /* Told to abort for some other reason */
142 #define DID_PARITY 0x06 /* Parity error */
143 #define DID_ERROR 0x07 /* Internal error */
144 #define DID_RESET 0x08 /* Reset by somebody. */
145 #define DID_BAD_INTR 0x09 /* Got an interrupt we weren't expecting. */
146 #define DID_PASSTHROUGH 0x0a /* Force command past mid-layer */
147 #define DID_SOFT_ERROR 0x0b /* The low level driver just wish a retry */
148 #define DID_IMM_RETRY 0x0c /* Retry without decrementing retry count */
149 #define DID_REQUEUE 0x0d /* Requeue command (no immediate retry) also
150 * without decrementing the retry count */
151 #define DID_TRANSPORT_DISRUPTED 0x0e /* Transport error disrupted execution
152 * and the driver blocked the port to
153 * recover the link. Transport class will
154 * retry or fail IO */
155 #define DID_TRANSPORT_FAILFAST 0x0f /* Transport class fastfailed the io */
156 #define DID_TARGET_FAILURE 0x10 /* Permanent target failure, do not retry on
157 * other paths */
158 #define DID_NEXUS_FAILURE 0x11 /* Permanent nexus failure, retry on other
159 * paths might yield different results */
160 #define DID_ALLOC_FAILURE 0x12 /* Space allocation on the device failed */
161 #define DID_MEDIUM_ERROR 0x13 /* Medium error */
  • hostbyte
  • Linux内核I/O报错信息中hostbyte与driverbyte含义_硬件故障

  • driverbyte
  • Linux内核I/O报错信息中hostbyte与driverbyte含义_linux_02

3.FC链路的硬件故障

  • 光模块、光纤线或者HBA卡有异常,常见的表现有:
    a.链路出现误码:需更换整条链路光模块和光纤线
    b.存储上报光模块异常告警:更换光模块
    c.主机日志报错,常见的是0x70000错误,是HBA卡内部错误。
    示例:UP_done:C0P1L2, r=70000 , MPP_SELECTION_TIMEOUT, sk=0, ASC/ASCQ=0/0, SN:92288176
  • 常见HBA错误返回码如下:

4.源码分析