Cisco4500系列交换机CPU超载

原创作品,允许转载,转载时请务必以超链接形式标明文章  原始出处​ 、作者信息和本声明。否则将追究法律责任。 ​http://unihan.blog.51cto.com/317747/141233

 

浅谈Cisco4500系列交换机CPU超载

 

一:CPU超载原因:

CISCO4500系列交换CPU超载的原因很多。最常见的原因为网络中异常包过多,使核心交换机CPU疲于控制,转发异常包,CPU超载运作。在我们公司有出现过下列三种方式导致异常包过多现象:

,DHCP协议包过多);

2:二层网络串接成回路(广播风暴);

3:测试软件使用不当(该软件能持续发送广播包或者多包);

 

二:Troubleshooting过程可参考的命令:

    1: show processes cpu

    2: show platform health

3: show platform cpu packet statistics

4: debug platform packet all receive buffer

5: show platform cpu packet buffered

6:show interfaces | include L2 |line |broadcast

7:show interfaces  interface counters

8:show interface | include line |\/sec

9:monitor session 1 source cpu       monitor session 2 destination interface  interfaces

 

三:一般处理办法:

针对上述导致CPU超载原因,在现实中稍有不同的处理步骤。病毒以及测软件产生异常包,我们需要找到源头并封挡它;网络串接成环,我们需要定位哪个端口下或哪些端口串接成环并做出处理。

最近5厂出现测试软件使用不当,导致核心45 CPU超载的异常事件,现结合案子大概讲述下处理过程:

 

:收到核心45超载报警信息后,登入45, show processes cpu 查看各个进程占用率情况,发现Cat4k Mgmt进程占用率过大,此为核心进程。

F5-4506-DOWN# show processes cpu

CPU utilization for five seconds: 87%/1%; one minute: 85%; five minutes: 85%

 PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

   1     3643880  17411824        209  0.00%  0.00%  0.00%   0 Chunk Manager   

   2       10624   1740948          6  0.00%  0.00%  0.00%   0 Load Meter      

   3           0         1          0  0.00%  0.00%  0.00%   0 Deferred Events 

   4           0         1          0  0.00%  0.00%  0.00%   0 CEF IPC Backgrou

   5    16913420   1579962      10704  0.00%  0.20%  0.18%   0 Check heaps     

   6          28       143        195  0.00%  0.00%  0.00%   0 Pool Manager    

   7           0         2          0  0.00%  0.00%  0.00%   0 Timers          

  -------------   Output suppressed--------------------------

 

  33      316292   8740292         36  0.00%  0.00%  0.00%   0 Per-Second Jobs 

  34     4661080    277884      16773  0.00%  0.06%  0.05%   0 Per-minute Jobs 

  35   8688161121758382511        494  6.85%  7.29%  7.30%   0 Cat4k Mgmt HiPri

   36   721412156 357038431       2020 74.56% 68.72% 68.57%   0 Cat4k Mgmt LoPri

  37      212616  10593419         20  0.00%  0.00%  0.00%   0 Galios Reschedul

  38           8        69        115  0.00%  0.00%  0.00%   0 IOS ACL Helper   

  39           0         2          0  0.00%  0.00%  0.00%   0 NAM Manager     

       ----------------  Output suppressed--------------------------

               

: show platform health 进一步确认该平台具体程序利用率,发现K2CpuMan Review 占用率最大,包的转发需要调用的该进程,至此,有个大概的眉目了,可以判断有大量数据包在作怪.

 

F5-4506-DOWN#show platform health

                      %CPU   %CPU    RunTimeMax   Priority  Average %CPU  Total

                     Target Actual Target Actual   Fg   Bg 5Sec Min Hour  CPU

Lj-poll                1.00   0.01      2      0  100  500    0   0    0  13:45

GalChassisVp-review    3.00   0.20     10     16  100  500    0   0    0  88:44

S2w-JobEventSchedule  10.00   0.57     10      7  100  500    1   0    0  404:22

Stub-JobEventSchedul  10.00   0.00     10      0  100  500    0   0    0  0:00

StatValueMan Update    1.00   0.09      1      0  100  500    0   0    0  91:33

Pim-review             0.10   0.00      1      0  100  500    0   0    0  4:46

Ebm-host-review        1.00   0.00      8      4  100  500    0   0    0  14:01

Ebm-port-review        0.10   0.00      1      0  100  500    0   0    0  0:20

Protocol-aging-revie   0.20   0.00      2      0  100  500    0   0    0  0:01

Acl-Flattener          1.00   0.00     10      5  100  500    0   0    0  0:04

KxAclPathMan create/   1.00   0.00     10      5  100  500    0   0    0  0:21

KxAclPathMan update    2.00   0.00     10      6  100  500    0   0    0  0:05

KxAclPathMan reprogr   1.00   0.00      2      1  100  500    0   0    0  0:00

TagMan-InformMtegRev   1.00   0.00      5      0  100  500    0   0    0  0:00

TagMan-RecreateMtegR   1.00   0.00     10     14  100  500    0   0    0  0:18

K2CpuMan Review       30.00  91.31     30     92  100  500  128 119   84  13039:02

K2AccelPacketMan: Tx  10.00   2.30     20      0  100  500    2   2    2  1345:30

K2AccelPacketMan: Au   0.10   0.00      0      0  100  500    0   0    0  0:00

    --------------   Output suppressed--------------------------

 

: show platform cpu packet statistics 查看有发现L2/L3Control队列中需要CPU处理的数据包比较多。

    

   F5-4506-DOWN# sho platform cpu packet statistics

Packets Dropped In Hardware By CPU Subport (txQueueNotAvail)

  -------------   Output suppressed--------------------------

 Packets Received by Packet Queue

 

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

Esmp

L2/L3Control       

Host Learning                  9303858         0         0         0          0

L3 Fwd High                       1535         0         0         0          0

L3 Fwd Medium                    19512         0         0         0          0

L3 Fwd Low                     3953395         0         0         0          0

L2 Fwd High                          7         0         0         0          0

 -------------   Output suppressed--------------------------

,IP ,MAC等相关信息。接着层层往下查找即可定位到异常包的源头,最后封挡源头并观察CPU利用率。本次案子是利用 cisco设备本身具有的命令来获取异常包信息:

F5-4506-DOWN#   debug platform packet all receive buffer

platform packet debugging is on

F5-4506-DOWN# sho platform cpu packet buffered

Total Received Packets Buffered: 1024

-------------------------------------

Index 0:

100 days 18:19:59:900721 - RxVlan: 517, RxPort: Gi4/47

Priority: Normal, Tag: Dot1Q Tag, Event: Input Acl, Flags: 0x40, Size: 1362

 Type/Len 0x0800

Ip: ver:4 len:20 tos:0 totLen:1344 id:62005 fragOffset:0 ttl:1 proto:udp

    src: 192.168.1.100 dst: 224.0.0.1 firstFragment lastFragment

Remaining data:

 0: 0x4  0x9C 0x4  0xD2 0x5  0x2C 0x58 0x82 0x47 0x0 

10: 0x45 0x1E 0x8A 0xDD 0xC2 0x72 0xA5 0xAA 0x1F 0xD4

20: 0x29 0x41 0x1C 0x2  0x2B 0x1A 0x8  0x1F 0x3E 0x0 

 

Index 1:

100 days 18:19:59:901497 - RxVlan: 517, RxPort: Gi4/47

Priority: Normal, Tag: Dot1Q Tag, Event: Input Acl, Flags: 0x40, Size: 1362

Eth: Src 00-E0-4C-B1-7F-4D Dst 01-00-5E-00-00-01 Type/Len 0x0800

Ip: ver:4 len:20 tos:0 totLen:1344 id:62006 fragOffset:0 ttl:1 proto:udp

    src: 192.168.1.100 dst: 224.0.0.1 firstFragment lastFragment

Remaining data:

 0: 0x4  0x9C 0x4  0xD2 0x5  0x2C 0xB3 0x1A 0x47 0x0 

10: 0x45 0x15 0xC7 0xD8 0x4F 0x2E 0x11 0x72 0x4E 0xF8

20: 0x43 0xA  0x29 0x23 0x48 0x20 0xFD 0xA0 0x3  0xFF

 

Index 2:

100 days 18:19:59:902274 - RxVlan: 517, RxPort: Gi4/47

Priority: Normal, Tag: Dot1Q Tag, Event: Input Acl, Flags: 0x40, Size: 1362

Eth: Src 00-E0-4C-B1-7F-4D Dst 01-00-5E-00-00-01 Type/Len 0x0800

Ip: ver:4 len:20 tos:0 totLen:1344 id:62007 fragOffset:0 ttl:1 proto:udp

    src: 192.168.1.100 dst: 224.0.0.1 firstFragment lastFragment

Remaining data:

 0: 0x4  0x9C 0x4  0xD2 0x5  0x2C 0x71 0x68 0x47 0x0 

10: 0x45 0x1C 0xB7 0xF9 0x7D 0xBA 0x9F 0x2F 0xBA 0xEB

20: 0x26 0xC2 0xEA 0xA3 0x7E 0x5D 0x0  0x58 0x8  0x0 

-------------   Output suppressed--------------------------

根据上述信息,我们可得出大量多包,源头在vlan 517,从端口Gi4/47发送至4506-Down,源IP(192.168.1.100),源MAC(00-E0-4C-B1-7F-4D),至此,可继续往下查找此源头的网络接入点并现场确认后隔离。必要时可直接在4506-UP中drop掉。

 

 

本文出自 “michael xiao'blog​” 博客,请务必保留此出处http://unihan.blog.51cto.com/317747/141233