课题内容:
由于eBGP接口掩码配置错误,导致的路由拒绝接收
知识点:BGP第三方下一跳、BGP路由更新
根据现有网络技术学习及参考材料,BGP对等体之间建立对等关系,传递路由更新,并未有检查对等体掩码的行为。
本文将结合实际案例为大家分享一个MPLS专网中由于一个子网掩码配置错误导致的eBGP对等体拒绝接收路由更新的场景。
网络拓扑:
借着研究课题,复习一下MPLS 专网的基本部署练习;
部署 VRF: R1、R4、R5做相同配置
R1(config)#ip vrf CTO
R1(config-vrf)#rd 1:1
R1(config-vrf)#route-target 6:6
配置基本的IP地址
配置省略,这里仅仅展示 vrf接口的配置;
R1(config-if)#ip vrf forwarding CTO
R1(config-if)#ip address 10.1.1.13 255.255.255.252
R1(config-if)#no shutdown
R4(config)#interface e0/0
R4(config-if)#ip vrf forwarding CTO
R4(config-if)#ip address 10.1.1.17 255.255.255.0 //大家注意,这里我故意把掩码配置错误了 //
R4(config-if)#no shutdown
R5(config)#interface e0/0
R5(config-if)#ip vrf forwarding CTO
R5(config-if)#ip address 10.1.1.21 255.255.255.252
R5(config-if)#no shutdown
//切记,一定不要忘记检查和验证配置哦 //
基础的 ping 命令和 牛逼的 show ip interface brief 是行之有效的方法,当然,PE设备上的ping vrf CTO X.X.X.X 还是要注意的呢;
配置 MPLS Core 的IGP,配置省略
验证必不可少
R3#show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface
10.1.255.5 0 FULL/ - 00:00:31 10.1.1.9 Ethernet0/3
10.1.255.4 0 FULL/ - 00:00:31 10.1.1.5 Ethernet0/2
10.1.255.1 0 FULL/ - 00:00:30 10.1.1.1 Ethernet0/0
R3#show ip route ospf | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 10 subnets, 2 masks
O 10.1.255.1/32 [110/11] via 10.1.1.1, 00:01:03, Ethernet0/0
O 10.1.255.4/32 [110/11] via 10.1.1.5, 00:00:53, Ethernet0/2
O 10.1.255.5/32 [110/11] via 10.1.1.9, 00:00:42, Ethernet0/3
配置AS 65078的 iBGP
当然,仅仅为了实验,我们这里R7和R8就采用直连接口做BGP对等体配置;
R7#show run | s r b
router bgp 65078
network 10.7.1.0 mask 255.255.255.0
neighbor 10.1.1.26 remote-as 65078
neighbor 10.1.1.26 next-hop-self
R8#show run | s router bgp
router bgp 65078
network 10.8.1.0 mask 255.255.255.0
neighbor 10.1.1.25 remote-as 65078
neighbor 10.1.1.25 next-hop-self
验证:
R7#show ip bgp
BGP table version is 3, local router ID is 10.7.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 10.7.1.0/24 0.0.0.0 0 32768 i
*>i 10.8.1.0/24 10.1.1.26 0 100 0 i
R8#show ip bgp
BGP table version is 3, local router ID is 10.8.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*>i 10.7.1.0/24 10.1.1.25 0 100 0 i
*> 10.8.1.0/24 0.0.0.0 0 32768 i
继续部署MPLS Core,完成内部BGP配置
R3#show run | s r b
router bgp 65001
bgp log-neighbor-changes
bgp listen range 10.1.255.0/24 peer-group iBGP
no bgp default ipv4-unicast
neighbor iBGP peer-group
neighbor iBGP remote-as 65001
neighbor iBGP update-source Loopback0
!
address-family ipv4
exit-address-family
!
address-family ***v4
neighbor iBGP activate
neighbor iBGP send-community extended
neighbor iBGP route-reflector-client
exit-address-family
R1、R4、R5
router bgp 65001
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 10.1.255.3 remote-as 65001
neighbor 10.1.255.3 update-source Loopback0
!
address-family ipv4
exit-address-family
!
address-family ***v4
neighbor 10.1.255.3 activate
neighbor 10.1.255.3 send-community extended
exit-address-family
验证:
R3#show bgp ***v4 unicast all summary
BGP router identifier 10.1.255.3, local AS number 65001
BGP table version is 1, main routing table version 1
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
*10.1.255.1 4 65001 5 5 1 0 0 00:01:19 0
*10.1.255.4 4 65001 2 2 1 0 0 00:00:31 0
*10.1.255.5 4 65001 2 2 1 0 0 00:00:28 0
* Dynamically created based on a listen range command
Dynamically created neighbors: 3, Subnet ranges: 1
BGP peergroup iBGP listen range group members:
10.1.255.0/24
Total dynamically created neighbors: 3/(100 max), Subnet ranges: 1
配置MPLS标签协议: LDP
R3(config)#interface range e0/0,e0/2-3
R3(config-if-range)#mpls ip
R1(config)#interface e0/0
R1(config-if)#mpls ip
R4(config)#interface e0/2
R4(config-if)#mpls ip
R5(config)#interface e0/3
R5(config-if)#mpls ip
观察LDP邻居建立情况:
R3#
*Sep 6 08:47:50.047: %LDP-5-NBRCHG: LDP Neighbor 10.1.255.1:0 (1) is UP
R3#
*Sep 6 08:48:21.644: %LDP-5-NBRCHG: LDP Neighbor 10.1.255.4:0 (2) is UP
R3#
*Sep 6 08:48:39.094: %LDP-5-NBRCHG: LDP Neighbor 10.1.255.5:0 (3) is UP
配置 PE – CE 之间的eBGP
R6(config)#router bgp 65006
R6(config-router)#network 10.6.1.0 mask 255.255.255.0
R6(config-router)#neighbor 10.1.1.13 remote-as 65001
R7(config)#router bgp 65078
R7(config-router)#neighbor 10.1.1.17 remote-as 65001
R8(config)#router bgp 65078
R8(config-router)#neighbor 10.1.1.21 remote-as 65001
R1(config)#router bgp 65001
R1(config-router)#address-family ipv4 vrf CTO
R1(config-router-af)#neighbor 10.1.1.14 remote-as 65006
R1(config-router-af)#
*Sep 6 08:54:22.981: %BGP-5-ADJCHANGE: neighbor 10.1.1.14 *** vrf CTO Up //R1和R6对等体建立成功 //
R4(config)#router bgp 65001
R4(config-router)#address-family ipv4 vrf CTO
R4(config-router-af)#neighbor 10.1.1.18 remote-as 65078
R4(config-router-af)#
*Sep 6 08:55:31.655: %BGP-5-ADJCHANGE: neighbor 10.1.1.18 *** vrf CTO Up //R4和R7对等体建立成功 //
R5(config)#router bgp 65001
R5(config-router)#address-family ipv4 vrf CTO
R5(config-router-af)#neighbor 10.1.1.22 remote-as 65078
R5(config-router-af)#
*Sep 6 08:56:40.336: %BGP-5-ADJCHANGE: neighbor 10.1.1.22 *** vrf CTO Up //R5和R8对等体建立成功 //
至此,一个基本的MPLS专网部署完毕。
现在进行验证:
R6#show ip route bgp | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 6 subnets, 3 masks
B 10.7.1.0/24 [20/0] via 10.1.1.13, 00:02:54
B 10.8.1.0/24 [20/0] via 10.1.1.13, 00:02:54
R8#show ip route bgp | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 8 subnets, 3 masks
B 10.6.1.0/24 [20/0] via 10.1.1.21, 00:03:54
B 10.7.1.0/24 [200/0] via 10.1.1.25, 00:22:09
R7#show ip route bgp | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 8 subnets, 3 masks
B 10.6.1.0/24 [200/0] via 10.1.1.26, 00:04:26
B 10.8.1.0/24 [200/0] via 10.1.1.26, 00:22:53
// R7上关于65006的路由的下一跳去往了 R8,而不是去往 R4,显然这是出了问题的 //
R7#show ip bgp
BGP table version is 4, local router ID is 10.7.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*>i 10.6.1.0/24 10.1.1.26 0 100 0 65001 65006 i
*> 10.7.1.0/24 0.0.0.0 0 32768 i
*>i 10.8.1.0/24 10.1.1.26 0 100 0 i
// R7并没有从 R4 学习到任何路由哦 //
R4#show bgp ***v4 unicast vrf CTO neighbors 10.1.1.18 advertised-routes
BGP table version is 5, local router ID is 10.1.255.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 1:1 (default for vrf CTO)
*>i 10.6.1.0/24 10.1.255.1 0 100 0 65006 i
Total number of prefixes 1
// R4向R7通告了 10.6.1.0/24的 BGP前缀,看起来问题出在R7上的样子呢(事实证明我的想法是错误的) //
然而事实上我们并没有在R7上部署任何入站路由过滤策略
R7#show ip protocols | section bgp
Routing Protocol is "bgp 65078"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
10.1.1.17
10.1.1.26
Maximum path: 1
Routing Information Sources:
Gateway Distance Last Update
10.1.1.26 200 00:13:58
Distance: external 20 internal 200 local 200
经过一番思索,最终将故障判断定位在更新报文中
为了让大家更清晰的看到造成故障的根本原因,我特意将报文抓取了出来;
同时在 R7 上开启debug,观察更新情况:
R7#debug ip bgp updates in
R7#clear ip bgp * soft in // 在 R7上强制 R4 发送路由更新过来 //
从报文中可以清晰的看出,从R4更新给 R7的前缀中,下一跳属性被设置为了 10.1.1.6,而不是自身的 e0/0 接口的地址 10.1.1.17;
再来看下 R7 的debug log
log指出,来自 10.1.1.17(R4)的更新,下一跳属性为10.1.1.6,并不在本地子网中,也不在本地接口直连范围内,并被拒绝收取;
那么,为什么R4要做出如此荒谬的事情嘞?
这就不得不考虑我们在最开始提到的 第三方下一跳;
R4#show ip cef vrf CTO 10.6.1.0
10.6.1.0/24
nexthop 10.1.1.6 Ethernet0/2 label 16 21
// 通过转发表观察,R4去往 10.6.1.0/24的下一跳为 10.1.1.6 ,即R3的E0/2接口;
而 R4的 e0/0 接口子网掩码为 24 位,根据第三方下一跳的自动优化机制,R4 认为 10.1.1.6 和 e0/0 的接口地址 10.1.1.17在同一子网啊,因此更新出去的前缀信息上携带的下一跳就是 10.1.1.6 啦。
那么,如何验证我们的想法究竟是否正确呢?
咱们在 R4 上针对 R7的 eBGP邻居做一个下一跳自我,强制修改下一跳为 10.1.1.17 ,并观察现象
R4
router bgp 65001
address-family ipv4 vrf CTO
neighbor 10.1.1.18 next-hop-self
验证想法:
R7#cle ip bgp * soft
R7#show ip bgp
BGP table version is 5, local router ID is 10.7.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 10.6.1.0/24 10.1.1.17 0 65001 65006 i
* i 10.1.1.26 0 100 0 65001 65006 i
*> 10.7.1.0/24 0.0.0.0 0 32768 i
*>i 10.8.1.0/24 10.1.1.26 0 100 0 i
// 呐,路由从R4学来啦 //
为什么说这种故障难以排查呢? 因为如果R4的 e0/0 接口在全局的话,接口IP地址是无法成功配置上的,正因为在vrf中,才会有这种情况的发生。
当然,最正经的解决方法,还是老老实实的把接口掩码修改为正确的。