Friday 26 June 2015

The Double Wireshark

I've been trying to diagnose a problem with multicast forwarding over MPLS with MLDP — packets are going missing somewhere between the PE router, including the PIM Hellos, resulting in PIM neighbours occasionally timing out, even when no traffic is flowing across the MDT).

This morning, I did some packet captures over a quiet MDT (i.e. one where there was no traffic being forwarded, other than the PIM Hellos).  The backbone links are 10G and busy most of the time, making looking for a single missing packet tricky, so I changed OSPF costs and HSRP tracking object statuses to try and move as much traffic as possible away from them and making sniffing the links possible with a reasonable degree of confidence there was no packet drop.

I started at the PE router to check it was actually sending the PIM Hellos on its uplink, which it was (at the time another PE router didn't receive them).  I then moved on to one of our core (P) routers, which is also the MDT root: I needed to see if the packets were ingressing and egressing correctly and see if they were going missing.

To do this, I attached two Ethernet interfaces to my MacBook Air (one Thunderbolt and one USB) —although traffic levels were only a few megabits, I used the USB one for ingress and Thunderbolt one for egress (as the latter is arguably more critical) and ran two Wiresharks:


The core router was then set to mirror the ingress and egress ports to different monitoring destinations - Gi5/2 and Gi6/2 (the copper ports on the Supervisor cards) and attached to the MacBook Air:


The missing packets were evident by comparing the captures:


These corresponded to the destination PE router showing it had missed a PIM Hello and the timeout not resetting (and a debug ip pim vrf ... hello / terminal monitor would show it had gone missing).

I've emailed the captures and report to the support partner / Cisco.