Saturday, September 02, 2017

Cisco VSS recovery from a failed router

I recently had a client with a Cisco Virtual Switch System (VSS) pair that had one of the devices fail. I'll not go into the details on that, however I will cover how to recover from a failed VSS switch and get the new device operational again.

Background info:
Cisco VSS Link:  Cisco VSS
My history with VSS:  I hated it for years.  It was very common amongst the engineers I worked (at Cisco for 16 years) with to say "Friends don't let friends run VSS".  I purposefully removed it on several occasions when I was under time crunches because of the additional complexity it introduced during setup and troubleshooting.  In the last few years (shortly after it was available on the Cisco 4500 platform) I changed my mind.  Cisco fixed most of the issues and made setup much better, so I'm ok with it now.

VSS takes two L3 switches and combines their control plane into one.  In a basic fashion you use an active supervisor from one switch and fail over to the standby supervisor in a secondary switch.  All ports are operational in both switches regardless of the current operational supervisor.   There are about 1000 caveats and conditions that make everything I just wrote far too simplistic or wrong, but it serves our purposes here.

Implications to losing a switch:
You just lost the config that was specific to that old switch and you will need to recreate it.
1.  VSS Link configuration - This is the config for the specific interfaces to connect between the two devices.  You might still have the config in the old device, by the time I got to this one it was gone.  I suspect it was from a reboot or other tinkering with the config as the client had attempted the repair at least twice.
2.  You need a switch number that matches the operational switch.
3. Y ou need to initialize the new switch back into the pair.

Procedure:
This is actually very easy.  What you will do is boot up the new device and connect to the console port.

1.  Make sure the software versions are the same and the boot config registry is the same on both switches.

2.  Take a look at the running switch and find this information near the top.
switch virtual domain 200
 switch mode virtual
 switch 2

Domain Number:
The domain number "200" needs to match on both devices.

Switch number:
Please notice that the current operational switch is number 2, or the "B" switch in the VSS Pair.  When you create your config for the new switch it will be switch 1.  They must be different numbers so if switch 2 dies, make the replacement switch 2.  You might also find some priority information.  I'll leave that out as it's not important to what we are doing.  Go ahead and configure priority any way you may need.

Make your own config for the new switch.
switch virtual domain 200
 switch mode virtual
 switch 1

3.  Find the port-channel you need to use.  You will find something like this in the old config.  You are looking for the old port-channel number from the switch you are replacing to the switch that is currently running.  In this case, switch A is being replaced.
!
interface Port-channel1
 description 6840A Po1-->6840B Po2 (VSS LINK)
 no ip address
 no platform qos channel-consistency
!
interface Port-channel2
 description 6840B Po2-->6840A Po1 (VSS LINK)
 no ip address
 no platform qos channel-consistency

So, I'm going to need to recreate Port-Channel 1 on the new switch.  We also need to put the switch virtual link command in there.  Let's put that into the config for the new switch.  Our new switch config now looks like this:
switch virtual domain 200
 switch mode virtual
 switch 1
!
interface Port-channel1
 description 6840A Po1-->6840B Po2 (VSS LINK)
 no ip address
 no platform qos channel-consistency
 switch virtual link 1
!

4.  Go find the interfaces that will be the VSS link.  In this case we will have two and there will probably be no configuration in the existing device.  Not to worry, we will just create them.  Please note that we are working on a single switch right now, so there are only two numbers in the interface (slot/port).  When it become a VSS pair it will have the switch number first.  (switch/slot/port).  We also match the channel-group number with the port-channel number from above.
!
interface TenGigabitEthernet1/20
 description 6840A TE1/20-->Po1-->6840B Te1/20
 no switchport
 no ip address
 no shutdown
 channel-group 1 mode on
!
interface TenGigabitEthernet1/30
 description 6840A TE1/30-->Po1-->6840B Te1/30
 no switchport
 no ip address
 no shutdown
 channel-group 1 mode on

5. Make sure the interfaces are up before you try to do the conversion.
Router(config)#do show int ten 1/20
TenGigabitEthernet1/20 is up, line protocol is up (connected)
Router(config)#do show int ten 1/30
TenGigabitEthernet1/30 is up, line protocol is up (connected)

6.  Initiate the conversion of the new switch into the VSS pair.
Router#switch convert mode virtual

This command will convert all interface names
to naming convention "interface-type switch-number/slot/port",
save the running config to startup-config and
reload the switch.

NOTE: Make sure to configure one or more dual-active detection methods
once the conversion is complete and the switches have come up in VSS mode.

Do you want to proceed? [yes/no]: yes
Converting interface names
Building configuration...
[OK]

7. Verify VSS is running once it's done.
6840A#show switch virtual
Switch mode                  : Virtual Switch
Virtual switch domain number : 200
Local switch number          : 2
Local switch operational role: Virtual Switch Active
Peer switch number           : 1
Peer switch operational role : Virtual Switch Standby

8. The config that was pasted into the new switch:
switch virtual domain 200
 switch mode virtual
 switch 1
!
interface Port-channel1
 description 6840A Po1-->6840B Po2 (VSS LINK)
 no ip address
 no platform qos channel-consistency
 switch virtual link 1
!
interface TenGigabitEthernet1/20
 description 6840A TE1/20-->Po1-->6840B Te1/20
 no switchport
 no ip address
 no shutdown
 channel-group 1 mode on
!
interface TenGigabitEthernet1/30
 description 6840A TE1/30-->Po1-->6840B Te1/30
 no switchport
 no ip address
 no shutdown
 channel-group 1 mode on
!
switch convert mode virtual

Relevance:  
This information should be relevant on the following platforms:  6800, 4500, 6500, 6840