一台华为s5328的老交换机,最近发现有时候snmp协议采集不到他的流量,

华为交换机cpu占用过高案例分享_华为交换机

登录交换机发现cpu占用有点高,

<Huawei>display cpu-usage
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage            : 49% Max: 100%
CPU Usage Stat. Time : 2023-12-20  10:13:51 
CPU utilization for five seconds: 49%: one minute: 48%: five minutes: 44%
Max CPU Usage Stat. Time : 2023-12-20 09:11:46.

TaskName             CPU  Runtime(CPU Tick High/Tick Low)  Task Explanation
BOX                   0%         0/  3100a8       BOX Output                   
_TIL                  0%         0/       0       Infinite loop event task     
_EXC                  0%         0/       0       Exception Agent Task         
VIDL                 51%         0/7a3391c0       DOPRA IDLE                   
TICK                  0%         0/ 1774618                                    
bcmRX                 4%         0/ adb0c8a       bcmRX                        
CLKI                  0%         0/       0       CLKI                         
RTMR                  0%         0/  993d8a       RTMR                         
DEV                   0%         0/       0       DEV  Device                  
FCAT                  0%         0/   177d5       FCAT FECD task for catch
                                                  packet                       
DELM                  0%         0/    b3a6       DELMAC FOR STP               
FTS                  12%         0/1f0a127d       FTS                          
IPCQ                  0%         0/  b527cd       IPCQIPC task for single queue

查看日志发现有cpu占用过高的报警:

Dec 20 2023 13:24:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[6]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=fib-hit, ExceededPacketCount=016039)
Dec 20 2023 13:24:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[7]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=052)
Dec 20 2023 13:17:37+08:00 JWK-5328 %%01VOSCPU/4/CPU_USAGE_RESUME(l)[8]:CPU utilization recovered to the normal range.
Dec 20 2023 13:15:35+08:00 JWK-5328 %%01VOSCPU/4/CPU_USAGE_HIGH(l)[9]:The CPU is overloaded(CpuUsage=100%, Threshold=95%), and the tasks with top three CPU occupancy are:
FTS  total      : 48%
SOCK  total      : 23%
bcmRX  total      : 9%
Dec 20 2023 13:14:44+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[10]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=fib-hit, ExceededPacketCount=022947)
Dec 20 2023 13:14:44+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[11]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0828)
Dec 20 2023 13:04:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[12]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=03931)
Dec 20 2023 13:04:42+08:00 JWK-5328 %%01INFO/4/SUPPRESS_LOG(l)[13]:Last message repeated 2 times.(InfoID=1082200067, ModuleName=SNMP, InfoAlias=SNMP_FAIL)
Dec 20 2023 13:04:31+08:00 JWK-5328 %%01SNMP/4/SNMP_FAIL(l)[14]:Failed to login through SNMP. (Ip=185.180.143.136, Times=5, Reason=the community was incorrect)
Dec 20 2023 12:54:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[15]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0638)
Dec 20 2023 12:44:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[16]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0519)
Dec 20 2023 12:34:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[17]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0971)
Dec 20 2023 12:24:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[18]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=01331)
Dec 20 2023 12:14:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[19]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0151)
Dec 20 2023 12:04:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[20]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=083)

占用最高的是FTS,

同时日志里很多Protocol=ttl-expired…告警,意思是cpu处理了太多的ttl超时报文。

在上级路由将路由条目精细化,将原本指向下级交换机的一大段IP删除,改为只指正在使用的地址段,再次查看cpu占用率,已经降下来了,FTS只占用2%了。

<Huawei>display cpu-usage
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage            : 16% Max: 100%
CPU Usage Stat. Time : 2023-12-20  13:36:02 
CPU utilization for five seconds: 16%: one minute: 39%: five minutes: 46%
Max CPU Usage Stat. Time : 2023-12-20 09:11:46.

TaskName             CPU  Runtime(CPU Tick High/Tick Low)  Task Explanation
BOX                   0%         0/  1ee2cc       BOX Output                   
_TIL                  0%         0/       0       Infinite loop event task     
_EXC                  0%         0/       0       Exception Agent Task         
VIDL                 84%         0/c945e100       DOPRA IDLE                   
TICK                  0%         0/  c2345c                                    
bcmRX                 0%         0/ 29dbe30       bcmRX                        
CLKI                  0%         0/       0       CLKI                         
RTMR                  0%         0/  3d575d       RTMR                         
DEV                   0%         0/       0       DEV  Device                  
FCAT                  0%         0/    d482       FCAT FECD task for catch
                                                  packet                       
DELM                  0%         0/    66fa       DELMAC FOR STP               
FTS                   2%         0/ 5da1963       FTS                          
IPCQ                  0%         0/  6d1610       IPCQIPC task for single queue

总结,此问题原因是上级三层设备将没在使用的IP段指向下级三层设备,互联网上的扫描、探测等恶意流量到达下级设备,下级设备又通过默认路由回到上级设备,数据包在两个设备间形成环路,导致cpu负载过高,解决办法是将路由表精细化。

————2023-12-30第二次编辑

又想了想,更合理的方法是在下级设备(老s5328)操作,将上级路由设备指过来的大段IP地址指向黑洞路由即可, ip route-static X.X.X.0 255.255.254.0 Null0。