nanog mailing list archives

Re: CenturyLink RCA?


From: Aaron1 <aaron1 () gvtc com>
Date: Mon, 31 Dec 2018 12:49:38 -0600

Yeah, could have been one of those...gone from bad to worse things like Dave mentioned... initial problem and course of 
action perhaps led to a worse problem.

I’ve had DWDM issues that have taken down multiple locations far apart from each other due to how the transport guys 
hauled stuff 

A few years back I had about 15 routers all reboot suddenly... they were all far apart from each other, turned out to 
be one of the dual bgp sessions to rr cluster flapped and all 15 routers crash rebooted.

But ~50 hours of downtime !? 

Aaron

On Dec 31, 2018, at 11:41 AM, Dave Temkin <dave () temk in> wrote:

On Mon, Dec 31, 2018 at 11:33 AM Naslund, Steve <SNaslund () medline com> wrote:

They shouldn’t need OOB to operate existing lambdas just to configure new ones.  One possibility is that the 
management interface also handles master timing which would be a really bad idea but possible (should be redundant 
and it should be able to free run for a reasonable amount of time).  The main issue exposed is that obviously the 
management interface is critical and is not redundant enough.  That is if we believe the OOB explanation in the 
first place (which by the way is obviously not OOB since it wiped out the in band network when it failed).

 

Steven Naslund

Chicago IL

 

 
A theory, and only a theory, is that they decided to, in order to troubleshoot a much smaller problem (OOB/etc.), 
deploy an optical configuration change that, when faced with inaccessibility to multiple nodes, ended up causing a 
significant inconsistency in their optical network, wreaking havoc on all sorts of other systems. With the OOB 
network already in chaos, card reseats were required to stabilize things on that network and then they could rebuild 
the optical network from a fully reachable state.

Again, only a theory.

-Dave

 
 

This seems entirely plausible given that DWDM amplifiers and lasers being a complex analog system, they need OOB to 
align. 

--

Eric




Current thread: