1)         环境介绍
OSredhat enterprise Linux 4.6 x86
Cluster:RHCS 2 nodes
多路径软件:emc powerpath 5.1 for linux
Storage:EMC AX4-5   EMC CX300
Ax4-5有一个LUN映射给主机,CX300有两个LUN映射给主机
2)         故障描述
在磁阵上配置好LUN映射后,先后重新两节点服务器。两节点都认到所映射存储单元(LUN)。运行fdisk –l查看LUN在主机(OS)看到的设备名。发现两节点认到的设备名不一致。其中,node1认到emcpoweraemcpowercemcpowerdnode2认到emcpoweraemcpowerbemcpowerc;根据所划分空间的大小,可知其中node1 emcpowera对应node2 emcpoweranode1 emcpowerc对应node2 emcpowerbnode1 emcpowerd对应node2 emcpowerc
由于两节点要做cluster,在群集中配置共享存储时,要求两节点对识别到的LUN要有相同的设备名。
3)         分析排错
node2识别到的盘符是对的;node1有问题,不知道为何把emcpowerb搞没了。
node1上执行powermt display dev=all
emcpadm getfreepseudos –n 5发现node1emcpowerb并在列表中。
由于业务系统上线在即,没有更多的时间去考虑和分析。当时想到两种思路,一是删除node1上识别到的路径,重启机器看看是否能解决;二是,将node2的盘符手动修改为和node1一样。
排错思路一操作:
powermt remove dev=all //删除当前认到的路径
powermt config  //路径重认
powermt display dev=all
reboot
问题依然存在,没有得到解决;
排错思路二操作:
node2上操作
emcpadm getfreepseudos //发现emcpowerd 可用;
emcpadm –s emcpowerc –t emcpowerd
emcpadm –s emcpowerb –t emcpowerc
powermt save
Reboot
至此,两节点都认到emcpowera,emcpowerc,emcpowerd,问题解决。
 
4)         结论
由于node1之前做测试时,曾有emcpowerb存在过,在移走该设备后, powerpath配置数据库未能及时更新。导致emcpowerb表现为占用。
后续我找了相关的文章,发现通过强制删除powerpath配置的文件方式尝试进行解决。操作步骤如下:
停止powerpath服务
/etc/init.d/PowerPath stop  
保存当前配置文件的备份
# cp /etc/powermt.custom /etc/powermt.custom.old_config
# cp /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.dat.old_config
# cp /etc/emcp_devicesDB.idx /etc/emcp_devicesDB.idx.old_config
删除powerpath相关配置文件
 # rm /etc/powermt.custom /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.idx
重启powerpath服务
# /etc/init.d/PowerPath start
保持powerpath配置
# powermt save
5)         参考
root cause 1
In some cases, during installation of PowerPath and device reconfiguration, a server may skip a few "emcpower" devices due to devices that were removed.  PowerPath keeps track of devices and makes sure that the emcpower device names remains the same regardless of the underlying Linux /dev/sd# device.
Fix:steps for powerpath 4.x
1) Make sure all I/O is stopped and all of the file systems to the array are unmounted.
2) Stop PowerPath
# /etc/init.d/PowerPath stop
3) Make a backup copy of the current PowerPath custom file just in case
# cp /etc/powermt.custom /etc/powermt.custom.old_config
4) Make a backup copy of the current PowerPath config dat file...just in case
# cp /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.dat.old_config
5) Make a backup copy of the current PowerPath config idx file...just in case
# cp /etc/emcp_devicesDB.idx /etc/emcp_devicesDB.idx.old_config
6) Remove the old config files # rm /etc/powermt.custom /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.idx
7) Remove the /etc/emc/archive directory.
# rm –r /etc/emc/arvhive
8) Start PowerPath
# /etc/init.d/PowerPath start
9) Save the new configuration
# powermt save
In some cases with PowerPath 4.x this process will clean up the PowerPath devices but they still will not be discovered in Bus-Target-LUN order so if you are trying to synchronize emcpower device numbers between two cluster nodes it may not work.  In this case it is recommended that you present the devices to the node one at a time in the order you want them to appear.
root cause 2
Devices were not added to the nodes in the same order
Fix:steps for powerpath 4.x
       Use the emcpadm command to change the emcpower pseudo devices to the desired names.
In order to "fix" the discrepancy between the two nodes the emcpadm command can be used.
1Use the command below in order to determine the emcpower devices that are already in use
# emcpadm getused
PowerPath pseudo device names in use:
        Pseudo Device Name      Major# Minor#
                emcpowera         232      0
                emcpowerb         232     16
                emcpowerc         232     32
                emcpowerd         232     48
                emcpowere         232     64
                emcpowerg         232     96
2Use the command below in order to determine the emcpower devices that are available
# emcpadm getfree -n 5 -b emcpowera
PowerPath pseudo device names not in use:
        Pseudo Device Name      Major# Minor#
                emcpowerf         232     80
                emcpowerh         232    112
                emcpoweri         232    128
                emcpowerj         232    144
                emcpowerk         232    160
3Use the command below to rename a device
# emcpadm rename -s emcpowerg -t emcpowerf  
4The "emcpadm getused" command can now be used again to check the devices after the rename
# emcpadm getused
PowerPath pseudo device names in use:
        Pseudo Device Name      Major# Minor#
                emcpowera         232      0
                emcpowerb         232     16
                emcpowerc         232     32
                emcpowerd         232     48
                emcpowere         232     64
                emcpowerf         232     80
5Note In order to make sure that the actual volumes match between the two cluster nodes the "powermt display dev=all" command can be used from each node in the cluster for comparison.
# powermt display dev=all
Pseudo name=emcpowerf
CLARiiON ID=WRE00021500573 [Linux103]
Logical device ID=6006016022470A0084D8358B528BD911 [LUN 10]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==============================================================================
---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats -
## HW Path                 I/O Paths    Interf.   Mode    State  Q-IOs errors
==============================================================================
  2 lpfc                      sdg        SP A0     active  alive      0      0
  3 lpfc                      sdm        SP B0     active  alive      0      0