First thing I ask for is the user/pass he wants us to use on the new ADSL circuit. He obliges my request instantly. We then configure the router, ship it to site and think nothing more of it. This week he installed the router and informed me that "its not working". Skeptical at first, I try to SSH into the Cisco router on its public IP and no go. Immediate thought is "something's not configured on the router or its an ISP issue".
We work towards canceling out the configuration first. Configuration file is shared amongst the team... no issues flagged. IOS in use was good. Nothing weird or out of place... At this point it was time to bite the bullet and call the ISP.... Telstra. They do their testing and note the following: "I see the ADSL authenticating and disconnecting every 60 seconds or so on the phone line you've quoted. The site isn't losing ADSL Sync/Carrier detect and the line looks great".
Now we're in firefighting mode as everyone else is adamant it's installed correctly and the line is authenticating. We send our guys onsite to do some troubleshooting. We get logs similar to below (from http://www.cisco.com/en/US/tech/tk175/tk15/technologies_configuration_example09186a008071a7c2.shtml#l1b):
Router#debug ppp negotiation
PPP protocol negotiation debugging is on
Router#
2w3d: Vi1 PPP: No remote authentication for call-out
2w3d: Vi1 PPP: Phase is ESTABLISHING
2w3d: Vi1 LCP: O CONFREQ [Open] id 146 len 10
2w3d: Vi1 LCP: MagicNumber 0x8CCF0E1E (0x05068CCF0E1E)
2w3d: Vi1 LCP: O CONFACK [Open] id 102 Len 15
2w3d: Vi1 LCP: AuthProto CHAP (0x0305C22305)
2w3d: Vi1 LCP: MagicNumber 0xD945AD0A (0x0506D945AD0A)
2w3d: Di1 IPCP: Remove route to 20.20.2.1
2w3d: Vi1 LCP: I CONFACK [ACKsent] id 146 Len 10
2w3d: Vi1 LCP: MagicNumber 0x8CCF0E1E (0x05068CCF0E1E)
2w3d: Vi1 LCP: State is Open
2w3d: Vi1 PPP: Phase is AUTHENTICATING, by the peer
2w3d: Vi1 CHAP: I CHALLENGE id 79 Len 33 from "6400-2-NRP-2"
2w3d: Vi1 CHAP: O RESPONSE id 79 Len 28 from "John"
2w3d: Vi1 CHAP: I SUCCESS id 79 Len 4
2w3d: Vi1 PPP: Phase is UP
2w3d: Vi1 IPCP: O CONFREQ [Closed] id 7 Len 10
2w3d: Vi1 IPCP: Address 0.0.0.0 (0x030600000000)
2w3d: Vi1 IPCP: I CONFREQ [REQsent] id 4 Len 10
2w3d: Vi1 IPCP: Address 20.20.2.1 (0x030614140201)
2w3d: Vi1 IPCP: O CONFACK [REQsent] id 4 Len 10
2w3d: Vi1 IPCP: Address 20.20.2.1 (0x030614140201)
2w3d: Vi1 IPCP: I CONFNAK [ACKsent] id 7 Len 10
2w3d: Vi1 IPCP: Address 40.1.1.2 (0x030628010102)
2w3d: Vi1 IPCP: O CONFREQ [ACKsent] id 8 Len 10
2w3d: Vi1 IPCP: Address 40.1.1.2 (0x030628010102)
2w3d: Vi1 IPCP: I CONFACK [ACKsent] id 8 Len 10
2w3d: Vi1 IPCP: Address 40.1.1.2 (0x030628010102)
2w3d: Vi1 IPCP: State is Open
2w3d: Di1 IPCP: Install negotiated IP interface address 40.1.1.2
2w3d: Di1 IPCP: Install route to 20.20.2.1
The above is a completely normal connection on the Cisco router. We're not getting anything weird from the router. It gets its public IP correctly from the ISP... everything looks good. I ask the onsite tech to do a "show users" and get the PPP next-hop IP... that's ok. "show routes" that's ok as well everything is pointing in the right direction.
"Let's try ping the next-hop from the site router"... works ok. "Let's try ping something I know responds to pings on the internet"... this fails. Ok. Check the routes again (still good). Do a traceroute from the router to a public IP... it goes to next-hop IP then dies. "Has to be ISP routing problem then," I decide.
"Let's try ping the next-hop from the site router"... works ok. "Let's try ping something I know responds to pings on the internet"... this fails. Ok. Check the routes again (still good). Do a traceroute from the router to a public IP... it goes to next-hop IP then dies. "Has to be ISP routing problem then," I decide.
Back to the ISP "Everything looks normal. I can see it's been connected for weeks now". The Telstra tech sounded absolutely confident... but he does go through and rebuild the circuit from scratch regardless (very nice thing to do to cancel out sticky configurations on their DSLAM gear and kudos to the guy for the effort).
It is at this point where things started to click. "The ADSL was only connected to the router a few days ago...how could it be active for weeks?" I forgot the one golden rule to never forget... don't trust information from customers.
I ask the ISP to do some more snooping around.... and "Bam!" There it was... the customer had already allocated the user/pass we were authenticating with on the new on another ADSL circuit onsite. Telstra finally found the second circuit using the same credentials. The customer had not kept track of the ADSL authentication details used throughout his network and had effectively re-allocated the same service details (ADSL user/pass) twice.
Things to take away from this:
- If an ADSL circuit is already in use by another account you will:
- Be able to authenticate (depends on the ISP and their radius setup)
- Be able to get a public IP address you would normally get (static public IP in this case)
- Be unable to inject a static-framed route into the ISP network to advertise the public IP availability via the L2 ATM link. The ISP network will already have an active route with the same IP/MASK.
- The above will result in "normal" looking router connection but no route-abiltiy from/to the public world
- If a customer provides you information... you should double-check it. :)