24 January 2018

Cisco ACI APIC and Spine/Leaf Upgrade Process

I'm just getting started with ACI in general. Here's the general process to upgrade the APICs/spine/leaf.

First check out the Cisco document here. In case the link moves/etc the document is "Cisco APIC Management, Installation, Upgrade and Downgrade Guide". This is essentially an abbreviated version of that document.

Of particular importance in that link is the section around  "Supported Upgrade Paths for APIC Controller and Switch Software" and associated downgrade section. Make sure you can jump from where you are to where you want to go. If it ain't listed, it ain't supported... and prepare for headaches.

The basic process is:

  1. Get files from Cisco onto a HTTP/SCP server and then uploaded to APIC
  2. Get APICs upgraded
  3. Wait for things to stabilise.
  4. Get Leaf/Spines upgraded
  5. Wait for things to stabilise.
This whole process took a few hours to complete... but I was gifted with having a fast internet connection to download files/etc with. I do the above using the (wimpy) GUI methods but the linked document lists ways to do the same using REST/CLI/Console/etc.

Getting the files from Cisco, to an intermediate HTTP/SCP box and onto the APICs
I couldn't believe it when I downloaded them but the files are gigantic. There is basically two main bits of software to get; APIC and ACI switch software. Thankfully Cisco put the matching APIC and ACI versions in the same sub-heading/version "Application Policy Infrastructure Controller (APIC)" on their download site. Basically if you click on 3.0 it has the APIC version (3.0.1 in my case) and the leaf/spine version (13.0.1) in the same section. In my case (going from 2.2 to 3.0) the files were:
  • aci-apic-dk9.3.0.1k.iso - For APICs 
  • aci-n9000-dk9.13.0.1k.bin - For ACI Leaf/Spine
Grab them from Cisco per normal process. Upload them to a HTTP/SCP server. 

In APIC, create a "Download Task" (Admin > Firmware > Download Tasks) point to each file individually. Once the task is created the file will be downloaded to APIC. You can see the status under the "Operational" tab of this page.

It looks like you can upload files directly onto the APIC from the GUI as well now (I didn't try that here though). This looks to be done through "Firmware Repository" under Admin > Firmware > Firmware Repository and clicking the "Upload Firmware to APIC" action.

Upgrading the APICs
In Admin > Firmware > Controller Firmware you'll have an action to "Upgrade Controller". Select the version/scheduled/etc and off you go. The screen will update the Upgrade Progress status bar for each APIC. The system will do one APIC at a time automatically/etc so just sit back and let it do its thing.... which brings me to...


Waiting for things to Stabilise
Just note that during this waiting time APICs will reload. This is non-disruptive as APICs aren't involved in production traffic but are only used to push policy to nodes/etc. This was a good 10-30min process for me. Had to reload the APIC browser session after they rebooted as well.

The APICs all appeared in the "Controller Firmware" screen as being "Upgraded Successfully"

Upgrading the Leaf/Spines
Similar to the APICs, except that you are going to be potentially impacting production traffic if things go bad. Basically under "Firmware Groups" in Admin > Firmware > Fabric Node Firmware > Firmware Groups create a group of AllNodes and select the ACI version you want to go to. 

Before worrying about doing all at the same time... just keep in mind the next bit is to create Maintenance Groups whereby you dictate which switches to upgrade at the same time. under "Maintenance Groups" in Admin > Firmware > Fabric Node Firmware > Maintenance Groups. Make a primary and a secondary maintenance group of nodes.

You kick off the upgrade by clicking "Upgrade Now" action of the primary Maintenance Group... then you wait patiently for things to come back and do the same for the secondary group. \

Based on the link (I've not tested though):

  • Up to 20 nodes are upgraded at the same time
  • Only one member of a VPC peer is ever upgraded at the same time (nice!)

Waiting for things to Stabilise
Up to 12 minutes is the estimate on how long it will take in the guide... be patient. The nodes will come back and things will be good (hopefully). During the upgrade process nodes will reboot and production traffic will experience some minor disruption provided everything is dual-mode connected.

Obviously take this all with a grain of salt... I am not an ACI expert but wanted to write some notes to summarise the wordy Cisco process. Some of my colleagues have screwed this up in the past and managed to get things going again (albeit onsite) using some of the other methods (i.e. CLI/etc).

Good luck! Hope this helps...

12 January 2017

VPN over Satellite - Why it doesn't always work at full bandwidth

I came across an interesting situation recently whereby a customer had to move datacentres and had to reconnect their Satellite connected remote sites to connect to a different datacentre using different VPN technologies. Through the move we discovered and interesting situation regarding Satellite and VPN that I thought warranted a quick post.

Some background about Satellite communications...
Satellite communications have inherit limitations in regards to round trip time (RTT) due to the distance packets travel in order to get from source to destination (i.e. propagation delay). RTT of Satellite links are typically above the 500ms mark. Consider what this delay does to network communications. UDP will suffer the RTT delay associated with opening the connection (i.e. handshake) with a back-n-forth of 500ms required for each part of the handshake but after that the performance across these high-RTT links may not be so bad. Once the stream is opened data will quite happily flow until its finished as UDP doesn't care if bits are lost along the way it just sends one after another until its sent everything (i.e. connectionless).

On the other hand consider TCP. It also has to establish a handshake and will require the 500ms delay for each back-n-forth of the setup. However, once the handshake is completed the data stream, compared to UDP, behaves nothing like the UDP stream! It chugs!

TCP is a connection orientated protocol and it will only send the next chunk of data once the last chunk it has sent has been acknowledged and received successfully. Over short-RTT links this isn't such a big deal, we lose a packet and we simply kick it off again and wait for the other end to acknowledge. In standard networks, because the RTT is low, the penalty for doing this is negligible... and life goes on. Over satellite links the delay associated with a lost chunk of data in a TCP stream is very detrimental due to the 500ms per-direction back-n-forth required to restart the chunk that was lost.

To get around the inherit limitations of TCP over Satellite (and make it workable), network boffins came up with some fancy optimization. Satellite providers these days typically provide some sort of TCP/UDP header optimization for their services that buffers packets and tricks the PCs at either end of the Satellite into thinking the network is more responsive than it is or that data can be sent quicker than what the typical RTT back-n-forth would dictate under normal circumstances. This works great provided the optimization technology supports the protocols you're using.... (*ominous fore-shadowing*).

Now what about VPNs? Typical IPSEC VPNs over Satellite behave pretty similar to the vanilla TCP situation described above (i.e poorly) except (you guessed it) the Satellite modems can't optimize the TCP/UDP packet headers tunnels across the VPN. This is because the TCP packet headers are encapsulated and encrypted inside ESP packets (i.e. the modem can't see it's TCP or optimize it). Similar is true for IP Tunnels (i.e. using GRE)!

Overall this results in a lot of frustration and stock-standard VPNs connecting without issue but not forwarding traffic at speeds that non-VPN traffic operates at.

How to configuring VPNs over Satellite links in a way that can be optimized...
So in the wall of text above, we noted that UDP is connectionless, that satellite modems do some form of spoofing to trick TCP/UDP into working fast despite the poor RTT and that VPNs hide the underling TCP/UDP from the modems preventing this optimization! Phew!

A standard VPN uses ESP packets for the tunnel... what if we made it use UDP as the transport instead? On a Cisco router this would be configured using the following (note this is default in newer software)
crypto isakmp nat-traversal  20
The above command points to the fact that NAT traversal suffers from the similar issues.

Hope it helps someone!

ASA - Which serial number to lodge PAKs against

ASAs sometimes show one serial number in the "show version" output and a completely different one in the "show inventory" output. If you are tasked with applying a license to your ASA you will probably ask "which serial number do I use?" The simple answer is; use the "show inventory" serial number.

But what if I've already lodged my license against the wrong serial number? Simply email/contact the licensing team (licensing@cisco.com) and they'll sort it out for you.