Monday 16 September 2013

DHCP Option 82 update - Cisco and HP

Following the work on DHCP with Option 82 and Cisco switches and routers, I thought I'd summarise things and try getting it to work with HP ProCurve switches, too (by "ProCurve" I mean the "traditional" HP switches and not the ex-H3C ones running Comware - we don't have any of the latter).

The agent remote-id and circuit-id fields are left as vendor-specific and, of course, differ between switch platforms.  Detecting the different formats is a little hit-and-miss, but seems to be possible between HP and Cisco, at least.  When I get time later, I may look at Extreme XOS stuff as we use that too (although only in our data centres, where we don't have DHCP in use, except to set up servers initially).

HP DHCP Snooping and Option 82


HP has three modes for the remote-id, set with the dhcp-snooping option 82 remote-id global command: mac (the default) - just 6 bytes of the base MAC address of the switch, subnet-ip - the switch's IP address on the VLAN with the client (I have no idea what happens if there isn't one set), and mgmt-ip - the management IP address (the IP address set on the management VLAN).  The latter looks the most useful.  There is no leading byte to indicate which of these options has been selected.

There seems to be no way to control what the format of the circuit-id takes: on the switches I've tested it on - a 2610-24-PWR and a 5412), it's two bytes - the first is zero and I assume would increase with slot numbers on a chassis-based switch (the AP I have on the 5412 is in slot A and it's still 0); the second is the port number.

Cisco and HP Option 82 information compared


The remote-id is as follows:

Format01234567-
Cisco(default)0Switch base MAC address
Hostname1LenHostname or explicit string
HPmacSwitch base MAC address
subnet-ipClient VLAN IP address?
mgmt-ipManagement VLAN IP address

The logic to parse this field seems best as:
  1. If byte 0 is 1, it's probably a Cisco hostname (it's unlikely to be an IP address "1.q.r.s" or a multicast MAC address), so print the string starting at byte 2
  2. If it's 4 bytes long, assume it's an HP IP address, so print it as an IPv4 address
  3. Else assume it's an HP MAC address so print as colon-separate hex string (which, if it's not a MAC address, we can still translate)
The circuit-id is one of:

Format0123456-
Ciscovlan-mod-port04 (= Length)VLAN ID (big endian)Module (slot)Port
string1LengthPort and VLAN string
HP-SlotPort

Depending on what the remote-id was selected as, it seems best to print this as:

  • Cisco - parse the 2-byte VLAN ID and print the port as "module/port" in decimal
  • HP - print the data as hyphen-separated decimals (it could be separated by slashes, but this is just to differentiate)

Cisco configuration


The following is what I've used on the Cisco switches; the lines with a leading "!" are defaults:

ip dhcp snooping vlan ...
!ip dhcp snooping information option
ip dhcp snooping information option format remote-id hostname
!no ip dhcp snooping information option allow-untrusted
ip dhcp snooping database ...
ip dhcp snooping database write-delay 900
ip dhcp snooping
!
interface <uplink>
 ip dhcp snooping trust

On the Cisco routers, I've issued the following, to allow Option 82 to be set by a downstream switch:

ip dhcp relay information trust-all

HP configuration


The following is what I've used on HP switches; the lines with a leading "!" are defaults:

dhcp-snooping vlan ...
dhcp-snooping option 82 remote-id mgmt-ip
!dhcp-snooping option 82 untrusted-policy drop
!dhcp-snooping option 82
dhcp-snooping database file ...
dhcp-snooping
!
interface <uplink>
 dhcp-snooping trust

Printing the agent details in ISC DHCP


Adapting the code posted before to cope with these variations:

# if the agent (Option 82) details are present, attempt to read information
# from them; this is tricky because different vendors and configurations can
# return information in conflicting formats and it's difficult to work out
# what format the information is in, so we make some assumptions

if exists agent.remote-id {
  if substring(option agent.remote-id, 0, 1) = 1 {
    # the first byte of the remote ID is 1 - that's unlikely to be an IP addr-
    # ess and, if it were a MAC address it would be multicast, so it is
    # probably a hostname from a Cisco switch
    log(
      info,
      concat(
        "agent information ",
        binary-to-ascii(10, 8, ".", leased-address),
        " to ",
        binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
        " on ",
        substring(option agent.remote-id, 2, extract-int(substring(option agent.remote-id, 1, 1), 8)),
        " port ",
        binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 4, 1)),
        "/",
        binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 5, 1)),
        " VLAN ",
        binary-to-ascii(10, 16, "", substring(option agent.circuit-id, 2, 2))));

  } elsif substring(option agent.remote-id, 4, 2) = "" {
    # if the length of the remote ID is less than 6, we probably have an IP
    # address from an HP
    log(
      info,
      concat(
        "agent information ",
        binary-to-ascii(10, 8, ".", leased-address),
        " to ",
        binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
        " on ",
        binary-to-ascii(10, 8, ".", option agent.remote-id),
        " port ",
        binary-to-ascii(10, 8, "-", option agent.circuit-id)));

  } else {
    # otherwise, we probably have an HP MAC address, so just print the rem-
    # ID as comma-separated hex; this doesn't hurt for anything else, anyway,
    # as we can always translate it
    log(
      info,
      concat(
        "agent information ",
        binary-to-ascii(10, 8, ".", leased-address),
        " to ",
        binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
        " on ",
        binary-to-ascii(16, 8, ":", option agent.remote-id),
        " port ",
        binary-to-ascii(10, 8, "-", option agent.circuit-id)));
  }
}

This gives log output for an HP:

Sep 16 22:56:52 janganmun dhcpd: agent information 172.30.162.130 to 6c:f3:7f:c0:54:cf on 172.30.64.114 port 0-23

... the switch has management IP address 172.30.64.114 and the port is A23 (on a 5412).

Or, for a Cisco:

Sep 16 23:08:23 janganmun dhcpd: agent information 172.30.140.9 to 24:de:c6:c6:51:40 on sw-ucs-rnb-n3 port 2/37 VLAN 3008

... switch sw-ucs-rnb-n3, port Gi2/0/37 on VLAN 3008.

Saturday 14 September 2013

DHCP Option 82, Cisco switches and routers and the ISC DHCP server

A bit of "fun" with IPv6 multicast this week which I'll come onto in another post...

We're moving into a new building beginning on Monday and our new network makes heavy use of pooled DHCP (with dynamic allocations of private addresses), and an option of fixed private or public addresses for devices needing fixed registrations and/or access from outside the University network.  The dynamic pooled allocations wouldn't need to be registered, enabling odd machines to be brought onto the network more easily, saving the fixed registrations for things that really need them.

Note: I did a follow-up article about this, to cover HP switches as well.  The details of the Cisco data are below, though.

DHCP Option 82, Cisco switches and routers


When using dynamic DHCP to unknown clients, you generally need a way to track the switch ports that hosts attach to so they can be located in the event of misbehaviour.  I think the long term way to do this is something similar to what we've done with our wireless system: using MAC-based authentication to trigger RADIUS accounting messages (or periodic MAC address table scraping).  However, for the short term, I was wondering if I could get DHCP Option 82 to help me out.

DHCP Option 82 is intended for things such as metro Ethernet or cable modems, whereby a switch providing an edge port will insert it into a DHCPREQUEST packet to indicate the switch and port where it was received, thus allowing the location of the host to be logged by the DHCP server.  It all looks promising but I've never got it to work - however, a bit of work today sorted that out.

The problem I had previously was that when Option 82 information is inserted into the DHCP packet, the DHCP server never saw the DHCPREQUEST.  This turned out to be the Cisco router: the default situation for packets received with Option 82 present, but the GIADDR (Gateway IP Address) field blank (0.0.0.0), is that the request will be discarded.  The GIADDR field is set by the relay agent (typically the router) to its address on the interface where the request was received, allowing the server to know which network the client is connected to.

In our situation, the edge switch will set Option 82 and leave GIADDR blank.  The router will relay the packet and fill in GIADDR.  I would imagine this is how most people would use it and it's odd that Cisco's default configuration doesn't do this.

Anyway, to fix this problem, the router interface must be set to trust the Option 82 information, even when GIADDR is blank, you do:

interface Vlan3008
 ip dhcp relay information trusted
 ip helper-address 172.28.208.86
 ip helper-address 172.28.208.87
 ...

... or for all VLANs, you can used the global context configuration command:

ip dhcp relay information trust-all

... then you can configure your edge switch with DHCP Snooping (and ARP Inspection, if required).

Which brings me on to part two of this - how to parse the Cisco Option 82 information in the ISC DHCP server and log it.

Cisco formats for Option 82 data


The edge switch command ip dhcp snooping information option format remote-id hostname above sets Option 82 to log the switch hostname rather than MAC address as the remote-id - I think that's more useful.

Cisco switches default to a binary-encoded vlan-mod-port for the circuit-id, although this can be overridden on a per-interface basis with something like:

interface GigabitEthernet1/0/2
 ip dhcp snooping vlan 3008 information option format-type circuit-id string Gi1/0/2:3008

... the second format is much nicer as you can just print it, but you have to manually set it on a per-port and per-VLAN basis, which is very tedious.  I don't like tedious things, so parsing the vlan-mod-port format is desirable.  There is a document called DHCP Option 82 Configurable Circuit ID and Remote ID on Cisco's website, but that just explains the ASCII variants, so I thought I'd document the default (binary) types too.

Remote ID


There are two formats for this, depending on whether it's the default (switch MAC address) or specified hostname:

012-
06 (= Length)Switch MAC address
1LengthHostname

Circuit ID


This also has two formats, depending on whether the vlan-mod-port option is used, or something custom:

0123456-
04 (= Length)VLAN ID (big endian)Module (slot)Port ID
1LengthPort and VLAN circuit-id string

Parsing Option 82 from Cisco devices in the ISC DHCP server


The following code, stuck at the top level in the dhcpd.conf file will log messages such as dhcpd: agent information 172.30.140.4 to 24:de:c6:c6:51:92 on sw-ucs-rnb-s4 port 1/2 VLAN 3008, if the agent information is present and the remote ID is in ASCII hostname format:

if ((exists agent.remote-id) and
   (substring(option agent.remote-id, 0, 1) = 1)) {

  log(
    info,
    concat(
      "agent information ",
      binary-to-ascii(10, 8, ".", leased-address),
      " to ",
      binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
      " on ",
      substring(option agent.remote-id, 2, extract-int(substring(option agent.remote-id, 1, 1), 8)),
      " port ",
      binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 4, 1)),
      "/",
      binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 5, 1)),
      " VLAN ",
      binary-to-ascii(10, 16, "", substring(option agent.circuit-id, 2, 2))));
}

The ISC server's expression syntax is fairly limited and makes coping with alternatives very difficult - for example, to handle the same thing as above but with the remote ID being a MAC address, it would require an appropriate complete copy of the statement above, in addition, if both are to be handled.

Tuesday 3 September 2013

My GNS3 environment

I was asked for my GNS3 environment by our Cisco SE the other day.  I thought I'd post a current picture:
I'm getting ready to enable IPv6 multicast, so expect some stuff about that soon.

Wednesday 28 August 2013

Asus Eee PC 4G netbook running Ubuntu 12.04 LTS

I got given an Asus Eee PC 4G (with a rather fetching lime green lid) by the legend that is Bruce Beckles in the UCS - I thought it might make a handy netbook to use for accessing serial ports and quick poking about.  The thing I hadn't realised about it was that it had a custom 4GB SSD and that made using it as a "desktop" machine (i.e. with X and so on) a little more tricky as Ubuntu 12.04 needs >4GB of hard disk for a Desktop install.

I tried various distributions including Debian, thinking it would be more lightweight.  Well, it might have been but it was quite tedious to setup and I had to work around that the bootloader sets a resolution that the 800x480 WVGA screen doesn't support by specifying a screen mode at the GRUB prompt.  Not so user friendly.

I settled on putting on Ubuntu Server 12.04 (Precise Pangolin) LTS as that is fairly minimal, but then had to get the desktop running, along with wireless.  That was a bit of fun and games - not to get things basically running but sort out things like an X logon environment, support wireless, etc.

Package I ended up installing included:
  • xfce4 - lightweight minimal X deskop and window manager
  • chromium
  • putty - as a serial terminal
  • traceroute - amazing you get mtr but not traceroute by default
  • xfce4-terminal - a Terminal program for xfce4
  • lxdm - X Display Manager (to give a logon box): problems with this (below)
  • acpi - query the battery status and to make the power button turn the computer off
  • camorama - to support the built-in webcam (pointless, but there you go)
  • wicd - lightweight wireless connection manager (much like the GNOME Network Manager but without all the heavyweight GUI junk that goes with it); it sits nicely in the icon bar at the top of the xfce4 display - I was manually editing wpa-supplicant.conf, but this is much easier and will automatically connect to available wireless networks, plus brings up the ethernet interface with DHCP but not holding up the boot sequence (unlike editing /etc/network/interfaces manually)
  • xfec4-clipman - multiple clipboard manager for X: it has the handy option to synchronise the X clipboard and the selection to allow pasting between PuTTY and a Terminal; sits in the icon bar at the top
  • leafpad - for jotting notes
  • xfonts-terminus + console-terminus - I spent a lot of time searching for a font that would look OK and not antialias horribly on the tiny screen and this ended up being the best I found: this one doesn't need antialiasing
  • apt-file - to find out what package installed a particular file
I installed lxdm to get an X-based logon box (I could run startx but that still lets people break into the  caused me a bit of fun - it seems to have some problems with the default Ubuntu install package: it installs a pile of different desktop sessions in /usr/share/xsessions which don't work as the required window managers and other programs aren't present.  I made a directory under there called DISABLED and moved all the *.desktop files into it to not allow broken selections.

I also had a spare 1GB SODIMM to swap for the supplied 512MB one, so doubling the memory.  This enabled me to disable swap on the SSD and get more caching, which will probably prolong its life.  The system uses under 250MB to boot and get X running, so there's plenty of free RAM.

The end result of all this is an install of 1.9GB on the internal SSD, which leaves plenty of space.

Sunday 25 August 2013

PIM-SM and multiple edge routers - suboptimal forwarding

I had a problem reported by Martyn Johnson in the University Computer Laboratory (CL) regarding multicast (running PIM-SM) - he had a host on a subnet (by "subnet", I mean a VLAN or other layer 2 domain which might have multiple unicast subnets, of course) which had subscribed to a multicast group; the subnet had redundant routers (using HSRP, I guess, but maybe VRRP) and traffic was being forwarded onto the network by the least optimal of these routers:



The multicast source - server "S" - (top left - 1.3.0.2 in my example, sending to group 239.0.0.1) was actually somewhere across the University backbone and from the internet and was sourced via MSDP, but the principle is the same: the traffic was coming in via R1 (which was also the RP), over to R2 and then being forwarded on to the member/receiver - host "M" (bottom left):

R1#show ip mroute 239.0.0.1
...

(*, 239.0.0.1), 00:08:34/stopped, RP 1.9.0.1, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:07:44/00:02:38

(1.3.0.2, 239.0.0.1), 00:00:23/00:03:27, flags: T
  Incoming interface: Ethernet1/2, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:00:23/00:03:06

And R2:

R2#show ip mroute 239.0.0.1

(*, 239.0.0.1), 00:08:38/stopped, RP 1.9.0.1, flags: SJC
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:08:38/00:02:22

(1.3.0.2, 239.0.0.1), 00:00:27/00:02:51, flags: JT
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:00:27/00:02:32

The optimal path would be directly from R1 onto the subnet: I would have expected R1 to detect R2 sending traffic out onto the subnet and send a PIM Assert, then win that to be the forwarder for that source on that subnet.  However, that didn't happen.

The PIM DR

I had to remind myself how all this worked and then, of course, realised I didn't understand it as well as I thought I did.

First up, I got a bit muddled with IGMP, but after a bit of re-reading, answered one of my long-standing queries that the IGMP Querier has nothing to do with the actual multicast forwarding router (at least for IGMPv2 onwards - in IGMPv1 there is no mechanism to elect a Querier, so the choice of that is left up to the multicast routing protocol).  IGMPv2 has no way to force the selection of a Querier but defaults to the router with the highest IP address.

Then there's the PIM-SM - that elects a DR on a subnet which will, by default, be responsible for forwarding traffic onto that subnet - this defaults to the PIM router with the lowest IP address.  The reason these two differ is the distribute the work of supporting multicast on a subnet, but it's important to remember that the PIM DR is the one that actually forwards the traffic for directly-connected members; the IGMP Querier is just responsible for soliciting periodical group membership from receivers.  You can force a particular router to be the PIM DR by setting the DR priority (ip pim dr-priority ...).

You can check these both with show ip igmp interface ... | inc router - or you can just look at the PIM stuff with show ip pim interface.

The DR is also responsible for forwarding received multicast traffic from the directly-connected subnet onto the routed backbone network, but that can change when the Shortest Path Tree (SPT) is built later.

So, in Martyn's case I guessed (but haven't verified yet) that the IGMP Querier would be R1 and the PIM DR would be R2.

So, when M sends an IGMP Membership Report (colloquially "IGMP Join") then R2 is the one which responds by joining the shared tree for and forwarding traffic onto the subnet.  Once the traffic started flowing, R2 joins the SPT for each source, which it did.

Here, it's important to remember that R2 will still be the DR serving the M's subnet: even if the SPT results in a shorter path to the source, it will still be begun (at the receiving end) from DR:

  • the shared tree starts at the DR for the source's subnet and goes up to the RP, then down to the DRs for the members' subnets
  • the SPT starts at the receiver's subnet's DR and uses the RPF table to connect it back to a PIM router connected to the source's subnet, which may or may not be the DR for that subnet, depending on how the IGP has converged

(When investigating this on the University backbone network, this caused me some confusion because our OSPF is set with maximum-paths 1 and static/connected routes distributed as E1s, so they are "sticky" - it can be the case that the active route for a particular source leads to a different router than the DR on it's subnet, meaning the SPT will not come from the same router that the shared tree does.  This is particularly confusing because changes in the order routes get advertised can cause the SPTs can be different, just as with unicast routes.)

I thought a PIM Assert would step in here, resulting in R1 taking over duties for the receiver's subnet, but it didn't.

Trying PIM-DM

Switching things over to PIM-DM on M's subnet and link between R1 and R2 DID solve the problem: here, the traffic flows out of both R1 and R2 onto the subnet and the routers detect that traffic is arriving on that interface which doesn't match the Reverse Path Forwarding check for S.  We get an Assert and R1 wins:

R1#show ip mroute 239.0.0.1 1.3.0.2
...
(1.3.0.2, 239.0.0.1), 00:00:33/00:02:50, flags: T
  Incoming interface: Ethernet1/2, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Dense, 00:00:09/00:00:00
    Ethernet1/1, Forward/Dense, 00:00:33/00:00:00, A

And R2 has pruned the interface, having lost the Assert:

R2#show ip mroute 239.0.0.1
...
(1.3.0.2, 239.0.0.1), 00:00:32/00:02:30, flags: PJT
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Prune/Dense, 00:00:32/00:02:27

So maybe PIM-DM would work better than PIM-SM here!

Introducing a downstream PIM-SM router

The examples in Cisco's Developing IP Multicast Networks, Volume I don't cover the situation which we have here, which is why I don't think it's explained there.  They do, however, cover the following situation (in the PIM-DM section), with R3 added to the client subnet and a member, M2, added downstream of that:


With no source, once everything has converged (with R3 joining the shared tree for the group), the multicast forwarding tables will look as follows for 239.0.0.1 on R1:

R1#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:05:43/00:02:54, RP 1.9.0.1, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:04:17/00:03:10
    Ethernet1/1, Forward/Sparse, 00:05:43/00:02:56

And R2:

R2#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:11:04/00:02:59, RP 1.9.0.1, flags: SC
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:11:04/00:02:59

And R3 is how you would expect:

R3#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:28:37/00:02:29, RP 1.9.0.1, flags: SC
  Incoming interface: Ethernet1/0, RPF nbr 1.2.0.253
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:28:37/00:02:29

Note that both R1 and R2 have added their link to the subnet with M3 and R3 into their outgoing interface list (highlighted), but for two different reasons:

  • R2 has added it because M has joined the group (via IGMP) and it is the DR for that subnet, as before, and
  • R1 because R3 has sent a join (via PIM-SM) for the shared tree towards the RP (which would be best reached directly across the shared network)
This duplication of forwarding onto the shared subnet is not resolved at this point: only when traffic is actually forwarded and detected by one (or both) of the routers which are forwarding is a PIM Assert generated and an election takes place to determine the winning forwarder for a particular (S, G).

So, when S starts to send traffic to the group, R1 and R2 will forward traffic onto the shared subnet and an Assert will be triggered, which R1 will win for (1.3.0.2, 239.0.0.1):

R1#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:03:54/00:02:36, RP 1.9.0.1, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:03:51/00:02:36
    Ethernet1/1, Forward/Sparse, 00:03:54/00:02:33

(1.3.0.2, 239.0.0.1), 00:01:24/00:03:21, flags: T
  Incoming interface: Ethernet1/2, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:00:53/00:02:36
    Ethernet1/1, Forward/Sparse, 00:01:24/00:03:03, A

And R2 loses and prunes (1.3.0.2, 239.0.0.1) to that subnet from its forwarding table:

R2#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:05:28/stopped, RP 1.9.0.1, flags: SJC
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:05:28/00:02:25

(1.3.0.2, 239.0.0.1), 00:03:01/00:02:57, flags: PJT
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list: Null

So here, it seems that having a downstream router running PIM-SM helps the traffic flow to optimise.

Hello and why...

I'm starting the blog as an experiment to jot down little snippets of information I find in my job or other things and want to try and find them again...

I often spend a lot of time investigating and (if I'm lucky) solving problems but they then don't result in anything being written up, but just an explanation in the tea room or a corridor, or maybe an email.  This is an attempt to write some scratch notes about them to go back later and remind myself why.

Let's see how it works out.