专业知识
H3C 大二层组网STP相关问题解析及优化
2021-09-20
组网及说明
组网如下:
问题描述
如拓扑所示,现场是一个大的二层网络,且全部为默认的配置,都在一个MSTP实例0中,图中只标出了核心以及汇聚交换机,汇聚下面还有更多的接入交换机,接入下面接有大量的AP设备,客户现场频繁出现保活报文丢失导致的AP掉线的情况。
过程分析
查看交换设备上的记录,存在拥塞丢包的情况,且前一天清除后,第二天还会继续增长,如下:
[H3C]dis qos queue-statistics interface <接口> outbound
Interface: GigabitEthernet1/7/0/2
Direction: outbound
Forwarded: 11698 packets, 2778275 bytes
Dropped: 611 packets, 604469 bytes
Queue 0
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 1
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 2
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 611 packets, 604469 bytes
Current queue length: 0 packets
Queue 3
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 4
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 5
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 6
Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 7
Forwarded: 11698 packets, 2778275 bytes, 0 pps, 0 bps
Dropped: 0 packets, 0 bytes
Current queue length: 0 packets
Queue 2是转发数据报文的,Queue 6或者7是转发协议报文。
且各接口拥塞丢包的个数基本一致,查看设备上存在TC的日志,如下:
%Aug 29 10:16:17:104 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:16:14:291 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:10:35:111 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:10:32:292 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:07:59:115 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:07:56:226 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:07:47:116 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:07:44:296 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:04:24:123 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 10:04:21:298 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 09:58:50:135 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 09:58:47:330 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
%Aug 29 09:55:16:141 2019 W-B1F-IT-AP-1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/25 was notified a topology change.
因此可以判断出,该拥塞丢包就是由于TC导致的转发表项刷新,造成大量的报文泛洪,因此造成拥塞丢包
解决方法
针对当前的组网,我们发现,几乎所有的TC报文均来自于同一个汇聚设备,因此,可以将该设备上联核心的接口stp给关闭了,这样就可以防止tc-bpdu报文的泛洪。
同时,对于这种大的二层网络环境,有如下的优化方法:
1、STP优化,在接终端的端口配置边缘端口,BPDU保护,这样可以防止不必要的TC报文产生,导致网络震荡;同时可以将大的STP域进行分割,一般而言,越靠近核心越不容易出现环路,因此可以在汇聚与核心互联的端口关闭STP,这样可以防止stp报文的广播,尤其是TC报文带来的网络震荡,同时可以在设备上指定各汇聚设备为自己的根桥,防止后期扩容接入设备出现抢根导致的STP收敛情况;
2、开启端口隔离,一般在核心上各个与汇聚之间互联的端口开启端口隔离,配置之后各汇聚之间的二层互访就是相互隔离的,可以防止二层广播的报文在网络中泛洪,如果确实又有部分二层互访的需求,可以在开启端口隔离的设备上开启本地ARP代理,通过配置 local-proxy-arp enable 开启代理后,报文在设备上就可以走三层转发,不受端口隔离的限制,从而实现互访的需求。
对于STP问题的排查方法和相关命令:
1、display stp interafce <接口>;可以查看当前接口STP状态,很多二层不通都是被stp给阻塞了
2、display stp tc;可以查看设备上的TC记录,recieve指的是该接口收到的TC报文个数,send指的是该接口的TC发送的个数,当需要查找TC的来源时,可以查看logbuffer中的记录,如果是notify,指的就是接口收到TC报文,如果是detect,指的就是接口本身产生的TC报文,可以通过这种排查方式,逐个设备去查找TC的来源
3、display stp history可以查看设备上各接口的stp状态变化情况,可以根据这个记录判断出当前拓扑的整个变化过程。