nanog mailing list archives

Re: Peering/Transit eBGP sessions -pet or cattle?


From: Baldur Norddahl <baldur.norddahl () gmail com>
Date: Mon, 10 Feb 2020 16:06:14 +0100

No matter how much money you put into your peering router, the session will
be no more stable that whatever the peer did to their end. Plus at some
point you will need to reboot due to software upgrade or other reasons. If
you care at all, you should be doing redundancy by having multiple
locations, multiple routers. You can then save the money spent on each
router, because a router failure will not cause any change on what the
internet sees through BGP.

Also transits are way more important than peers. Loosing a transit will
cause massive route changes around the globe and it will take a few
minutes to stabilize. Loosing a peer usually just means the peer switches
to the transit route, that they already had available.

Peers are not equal. You may want to ensure redundancy to your biggest
peers, while the small fish will be fine without.

To be explicit: Router R1 has connections to transits T1 and T2. Router R2
also has connections to the same transits T1 and T2. When router R1 goes
down, only small internal changes at T1 and T2 happens. Nobody notices and
the recovery is sub second.

Peers are less important: R1 has connection to internet exchange IE1 and R2
to a different internet exchange IE2. When R1 goes down the small peers at
IE1 are lost but will quickly reroute through transit. Large peers may be
present at both internet exchanges and so will instantly switch the traffic
to IE2.

Regards,

Baldur



On Mon, Feb 10, 2020 at 1:38 PM <adamv0025 () netconsultings com> wrote:

Hi,



Would like to take a poll on whether you folks tend to treat your
transit/peering connections (BGP sessions in particular) as pets or rather
as cattle.

And I appreciate the answer could differ for transit vs peering
connections.

However, I’d like to ask this question through a lens of redundant vs
non-redundant Internet edge devices.

To explain,

   1. The “pet” case:

Would you rather try improving the failure rate of your transit/peering
connections by using resilient Control-Plane (REs/RSPs/RPs) or even
designing these as link bundles over separate cards and optical modules?

Is this on the bases that doesn’t matter how hard you try on your end
(i.e. distribute your traffic to multitude of transit and peering
connections or use BFD or even BGP-PIC Edge to shuffle thing around fast,
any disruption to the eBGP session itself will still hurt you in some way,
(i.e. at least some partial outage for some proportion of the traffic for
not insignificant period of time) until things converge in direction from
The Internet back to you.



   1. The “cattle” case:

Or would you instead rely on small-ish non-redundant HW at your internet
edge rather than trying to enhance MTBF with big chassis full of redundant
HW?

Is this cause eventually the MTBF figure for a particular transit/peering
eBGP session boils down to the MTBF of the single card or even single
optical module hosting the link, (and creating bundles over separate cards
-well you can never be quite sure how the setup looks like on the other end
of that connection)?

Or is it because the effects of a smaller/non-resilient border edge device
failure is not that bad in your particular (maybe horizontally scaled)
setup?



Would appreciate any pointers, thank you.

Thank you



adam




Current thread: