一台华为s5328的老交换机,最近发现有时候snmp协议采集不到他的流量,
登录交换机发现cpu占用有点高,
<Huawei>display cpu-usage
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage : 49% Max: 100%
CPU Usage Stat. Time : 2023-12-20 10:13:51
CPU utilization for five seconds: 49%: one minute: 48%: five minutes: 44%
Max CPU Usage Stat. Time : 2023-12-20 09:11:46.
TaskName CPU Runtime(CPU Tick High/Tick Low) Task Explanation
BOX 0% 0/ 3100a8 BOX Output
_TIL 0% 0/ 0 Infinite loop event task
_EXC 0% 0/ 0 Exception Agent Task
VIDL 51% 0/7a3391c0 DOPRA IDLE
TICK 0% 0/ 1774618
bcmRX 4% 0/ adb0c8a bcmRX
CLKI 0% 0/ 0 CLKI
RTMR 0% 0/ 993d8a RTMR
DEV 0% 0/ 0 DEV Device
FCAT 0% 0/ 177d5 FCAT FECD task for catch
packet
DELM 0% 0/ b3a6 DELMAC FOR STP
FTS 12% 0/1f0a127d FTS
IPCQ 0% 0/ b527cd IPCQIPC task for single queue
查看日志发现有cpu占用过高的报警:
Dec 20 2023 13:24:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[6]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=fib-hit, ExceededPacketCount=016039)
Dec 20 2023 13:24:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[7]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=052)
Dec 20 2023 13:17:37+08:00 JWK-5328 %%01VOSCPU/4/CPU_USAGE_RESUME(l)[8]:CPU utilization recovered to the normal range.
Dec 20 2023 13:15:35+08:00 JWK-5328 %%01VOSCPU/4/CPU_USAGE_HIGH(l)[9]:The CPU is overloaded(CpuUsage=100%, Threshold=95%), and the tasks with top three CPU occupancy are:
FTS total : 48%
SOCK total : 23%
bcmRX total : 9%
Dec 20 2023 13:14:44+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[10]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=fib-hit, ExceededPacketCount=022947)
Dec 20 2023 13:14:44+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[11]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0828)
Dec 20 2023 13:04:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[12]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=03931)
Dec 20 2023 13:04:42+08:00 JWK-5328 %%01INFO/4/SUPPRESS_LOG(l)[13]:Last message repeated 2 times.(InfoID=1082200067, ModuleName=SNMP, InfoAlias=SNMP_FAIL)
Dec 20 2023 13:04:31+08:00 JWK-5328 %%01SNMP/4/SNMP_FAIL(l)[14]:Failed to login through SNMP. (Ip=185.180.143.136, Times=5, Reason=the community was incorrect)
Dec 20 2023 12:54:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[15]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0638)
Dec 20 2023 12:44:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[16]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0519)
Dec 20 2023 12:34:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[17]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0971)
Dec 20 2023 12:24:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[18]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=01331)
Dec 20 2023 12:14:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[19]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=0151)
Dec 20 2023 12:04:42+08:00 JWK-5328 %%01DEFD/4/CPCAR_DROP_MPU(l)[20]:Rate of packets to cpu exceeded the CPCAR limit on the MPU. (Protocol=ttl-expired, ExceededPacketCount=083)
占用最高的是FTS,
同时日志里很多Protocol=ttl-expired…告警,意思是cpu处理了太多的ttl超时报文。
在上级路由将路由条目精细化,将原本指向下级交换机的一大段IP删除,改为只指正在使用的地址段,再次查看cpu占用率,已经降下来了,FTS只占用2%了。
<Huawei>display cpu-usage
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage : 16% Max: 100%
CPU Usage Stat. Time : 2023-12-20 13:36:02
CPU utilization for five seconds: 16%: one minute: 39%: five minutes: 46%
Max CPU Usage Stat. Time : 2023-12-20 09:11:46.
TaskName CPU Runtime(CPU Tick High/Tick Low) Task Explanation
BOX 0% 0/ 1ee2cc BOX Output
_TIL 0% 0/ 0 Infinite loop event task
_EXC 0% 0/ 0 Exception Agent Task
VIDL 84% 0/c945e100 DOPRA IDLE
TICK 0% 0/ c2345c
bcmRX 0% 0/ 29dbe30 bcmRX
CLKI 0% 0/ 0 CLKI
RTMR 0% 0/ 3d575d RTMR
DEV 0% 0/ 0 DEV Device
FCAT 0% 0/ d482 FCAT FECD task for catch
packet
DELM 0% 0/ 66fa DELMAC FOR STP
FTS 2% 0/ 5da1963 FTS
IPCQ 0% 0/ 6d1610 IPCQIPC task for single queue
总结,此问题原因是上级三层设备将没在使用的IP段指向下级三层设备,互联网上的扫描、探测等恶意流量到达下级设备,下级设备又通过默认路由回到上级设备,数据包在两个设备间形成环路,导致cpu负载过高,解决办法是将路由表精细化。
————2023-12-30第二次编辑
又想了想,更合理的方法是在下级设备(老s5328)操作,将上级路由设备指过来的大段IP地址指向黑洞路由即可, ip route-static X.X.X.0 255.255.254.0 Null0。