nanog mailing list archives

RE: Peering/Transit eBGP sessions -pet or cattle?


From: <adamv0025 () netconsultings com>
Date: Mon, 10 Feb 2020 16:42:31 -0000


Baldur Norddahl
Sent: Monday, February 10, 2020 3:06 PM

No matter how much money you put into your peering router, the session 
will be no more stable that whatever the peer did to their end.

Agreed, that's a fair point,  

Plus at some
point you will need to reboot due to software upgrade or other reasons. 

There are ways of draining traffic for planned maintenance.

If
you care at all, you should be doing redundancy by having multiple 
locations, multiple routers. You can then save the money spent on each 
router, because a router failure will not cause any change on what the 
internet sees through BGP.

I think router failure will cause change on what the Internet sees as you rightly outlined below:

Also transits are way more important than peers. Loosing a transit 
will cause massive route changes around the globe and it will take a 
few minutes to stabilize. Loosing a peer usually just means the peer 
switches to the transit route, that they already had available.

agreed and I suppose the questions is whether folks tend to try minimizing these impacts by all means possible or just 
take it as necessary evil that will eventually happen.

Peers are not equal. You may want to ensure redundancy to your biggest 
peers, while the small fish will be fine without.

To be explicit: Router R1 has connections to transits T1 and T2. 
Router R2 also has connections to the same transits T1 and T2. When 
router R1 goes down, only small internal changes at T1 and T2 happens. 
Nobody notices and the recovery is sub second.

Good point again,
Though if I had only T1 on R1 and only T2 on R2 then convergence won't happen inside each Transit but instead between 
T1 and T2 which will add to the convergence time. 
So thinking about it seems the optimal design pattern in a distributed (horizontally scaled out) edge would be to try 
and pair up -i.e. at least two edge nodes per Transit (or Peer for that matter), in order to allow for potentially 
faster intra-Transit convergence rather than arguably slower inter-transit convergence.  
 
adam




Current thread: