专业知识
H3C AP掉线、无法获取地址排查案例
2021-09-20
组网及说明
AC旁挂于核心交换机,核心交换机作为网关、DHCP服务器;
AP三层注册,由核心下发Option 43字段,AP三层访问AC注册上线;
问题描述
反馈升级版本后,AP整体呈掉线的趋势,且在线AP数量由1300+逐渐掉至900+,且仍在掉线
Total number of APs: 1347
Total number of connected APs: 704
Total number of connected manual APs: 704
AC侧display logbuffer:
CWS/4/CWS_AP_DOWN: CAPWAP tunnel to AP 17D-2F-211 went down. Reason: Neighbor dead timer expired.
APMGR/6/APMGR_AP_OFFLINE: AP 17D-2F-211 went offline. State changed to Idle.
CWS/4/CWS_AP_DOWN: CAPWAP tunnel to AP MBWY-3FB-314 went down. Reason: Failed to retransmit message.
APMGR/6/APMGR_AP_OFFLINE: AP MBWY-3FB-314 went offline. State changed to Idle.
怀疑AP到AC有线链路存在环路、丢包或网络震荡;
过程分析
查看AP统计隧道的记录,display wlan ap statistics tunnel-down-record;
Neighbor dead timer expire (较为多数)
Failed to retransmit message
Processed join request in Run state(AP重新上线的情况)
排查有线链路,AP获取地址情况;
使用一个测试的AP,将其重新上电,核心交换机(DHCP server)开启debug dhcp server packet:
发现DHCP服务有回复offer报文,未收到来自AP的进一步的Request报文,于是需要去下联的设备,确认是否有丢包;
接入交换机上发现了丢包,查看汇聚设备和接入设备的接口下开启了DHCP Snooping;
监听DHCP-Request和DHCP-ACK报文,记录DHCP Snooping表项;消耗接入交换机性能,导致获取地址异常,未转发DHCP Client的报文;
解决方法
进一步确认PoE交换机性能较弱,开启dhcp snooping后dhcp报文上送CPU被丢弃,导致AP和终端的地址无法获取。
接入和汇聚交换机侧关闭DHCP Snooping,减少设备处理负载;关闭后,获取地址正常,AP上线无线业务恢复。