nanog mailing list archives
Re: Famous operational issues
From: Sabri Berisha <sabri () cluecentral net>
Date: Fri, 19 Feb 2021 12:15:53 -0800 (PST)
----- On Feb 19, 2021, at 3:07 AM, Daniel Karrenberg dfk () ripe net wrote: Hi,
Lessons: HW/SW mono-cultures are dangerous. Input testing is good practice at all levels software. Operational co-ordination is key in times of crisis.
Well... Here is a very similar, fairly recent one. Albeit in this case, the opposite is true: running one software train would have prevented an outage. Some members on this list (hi, Brian!) will recognize the story. Group XX within $company decided to deploy EVPN. All of backbone was running single $vendor, but different software trains. Turns out that between an early draft, implemented in version X, and the RFC, implemented in version Y, a change was made in NLRI formats which were not backwards compatible. Version X was in use on virtually all DC egress boxes, version Y was in use on route reflectors. The moment the first EVPN NLRI was advertised, the entire backbone melted down. Dept-wide alert issued (at night), people trying to log on to the VPN. Oh wait, the VPN requires yubikey, which requires the corp network to access the interwebs, which is not accessible due to said issue. And, despite me complaining since the day of hire, no out of band network. I didn't stay much longer after that. Thanks, Sabri
Current thread:
- Re: Famous operational issues, (continued)
- Re: Famous operational issues Justin Wilson (Lists) (Feb 17)
- Re: Famous operational issues David Guo via NANOG (Feb 17)
- Re: Famous operational issues Jared Mauch (Feb 17)
- Re: Famous operational issues Justin Wilson (Lists) (Feb 17)
- Message not available
- Re: Famous operational issues John Kristoff (Feb 17)
- Re: Famous operational issues Brian Knight via NANOG (Feb 18)
- Re: Famous operational issues Ben Cannon (Feb 21)
- Re: Famous operational issues Sabri Berisha (Feb 19)
- Re: Famous operational issues Warren Kumari (Feb 19)
- Re: Famous operational issues Shawn L via NANOG (Feb 23)
- Re: Famous operational issues Warren Kumari (Feb 23)
- Re: Famous operational issues Randy Bush (Feb 23)
- Re: Famous operational issues Valdis Klētnieks (Feb 23)
- Re: Famous operational issues Randy Bush (Feb 24)