实验拓扑图:

wKioL1cxmzbQ5iGXAACXkDmFVxk903.png


故障现象描述:

1.所有PC跨网段路由下一跳指向3750X,网络互通,无异常。

2.所有PC跨网段路由下一跳指向N3K VIP,有以下系列问题:

(1)PC1跨网段路由下一跳指向N3K VIP时,其它所有机器跨网段访问不通(所有用N3K VIP做下一跳的机器;实验中碰到的是PC1,只要PC1跨网段下一跳不指向N3K,其它同段甚至同台母机的其它虚拟机都可跨网段访问, 但存在一定的丢包)

(2)跨网段访问偶尔有丢包,传输数据只有几百K/s

(3)跨网段下一跳路由指向3750X的非VLAN200段机器,与192.168.253.0段不通


故障排查:

1.检查N3k路由、HSRP正常

2.PC1跨网段路由下一跳指向N3K VIP,在PC1上ping PC2 30个包,且在PC1和PC2上开启tcpdump抓包

(1)PC1上抓包显示发30个包到PC2且没有接收到来自PC2的包

(2)PC2上抓包如下图,只接到来自PC1的4个包并有回传

wKioL1cxm2CxTiM4AAHXmsxsU0c180.png


(3)最后联系CISCO技术支持,在N3K Standby上抓包分析结果为经过N3K的包未通过转发芯片转发,而是走了CPU;查询相关资料,答复是此IOS版本BUG。我用的版本为:version 6.0(2)A6(3)

解决方案:

重启所有SVI接口,并在SVI接口中执地no autostate命令。


BUG链接地址:https://bst.cloudapps.cisco.com/bugsearch/bug/CSCup65482/?reffering_site=dumpcr

BUG原文内容:

Nexus3500: Traffic incorrectly punted to CPU matching copp-s-l3mtufail

CSCup65482

 

Description

Symptom:
Traffic flowing through the switch may get punted to the CPU, matching the 'copp-s-l3mtufail' class in the CoPP.

Conditions:
Nexus 3500 switch running one of the affected releases
AND
This issue is triggered after SVI(s) flap or going down.
On SVI flap, due to this bug, MTU value is getting misprogrammed.

Workaround:
Shut / no shut of VLAN SVIs stop the traffic incorrectly sent to CPU.

Further Problem Description:
SVI flaps when all the Layer2 interfaces in that vlan goes down.
Configure 'no autostate' on SVI(s) to avoid the issue from happening.
Command Reference:
http://www.cisco.com/en/US/docs/switches/datacenter/nexus3000/sw/interfaces/503_U5_1/b_Cisco_n3k_Interfaces_Configuration_Guide_503_u5_1_chapter_010.html#task_FC25C8615CC443F28DA237782DD9B0A0

(4)再测试,跨网段数据传输可达到40M以上,且PC1跨网段下一跳指向N3K VIP的灵异现象也消失了。

(5)再测试,PC3与PC4还是不通;再PC3与PC4上同时开启tcpdump,再PC3向PC4 PING 10个包

结果:PC4收到10个包并有反传,而PC3未收到来自PC4的包

wKiom1cxm5PBiZaIAAbFGPKwOzE384.png

因PC4的下一跳在3750X上,登陆3750X,开启debug;发现发往PC3的ICMP包被重定向到了172.16.101.178

wKiom1cxm8Kx42wfAAGCrP4RKwY687.png

查看3750X的路由表,发现有一条192.168.253.0下一跳指向172.16.101.178的路由,删除后,网络恢复正常