nanog mailing list archives
Re: A survey on BGP MRAI timer values in practice
From: "Jakob Heitz \(jheitz\) via NANOG" <nanog () nanog org>
Date: Wed, 9 Jun 2021 17:03:46 +0000
In Cisco, MRAI is "advertisement-interval". MRAI helps to reduce route update multiplication in highly redundant networks. OTOH, it can increase the time it takes to re-advertise a complete internet table in some router implementations. Update multiplication due to redundant network connections causes some receivers of the multiple updates to become slow peers. Here's an experiment: Do something to cause a BGP route refresh, like the equivalent of "clear bgp soft out". It will not change any routes. It just resends everything that was already sent. See how long it takes with MRAI=0. Then set MRAI to about half of that value and do the refresh again. If it takes substantially longer to complete the refresh, stick with MRAI=0. If there is no significant difference, use MRAI of 1 or 2 seconds. Regards, Jakob. -----Original Message----- Date: Wed, 9 Jun 2021 08:53:19 +0300 From: Saku Ytti <saku () ytti fi> On Wed, 9 Jun 2021 at 01:18, Adam Thompson <athompson () merlin mb ca> wrote: If your work results in actionable recommendations such as "don't use BGP
out-delay timers to mitigate XYZ in circumstance LMNO, do ABC instead", that's fantastic. Please keep us advised, and do post aggregated survey results here once you close the survey.
What is actionable? What is the goal? The question as OP presented contains some assumptions a) better convergence is needed b) MRAI is important part of the solution space Neither are provable. We already know how to make DFZ convergence really fast (or at least orders of magnitude faster than it is), that information exists, but that isn't deployed because customers are not asking for it, so providers are not aware that there is room for improvements. Things don't optimise to be as good as they can be, things optimise to be as bad as the market allows them to be. And the market accepts the DFZ convergence. If you do decide to optimise for DFZ convergence, without commercial pressure, you will risk lower availability, because you'll be using configuration less tested by other customers and everyone knows how terrible quality every NOS is. Everyone finds novel bugs, in the same damn protocols we've ran +20 years. It's like running Windows and Linux and regularly finding out listing files in a directory breaks your service, year after year after year. For those who are interested in better convergence - change your interface down reporting to 0 (there may be delay before interface down is reported to system, so that optical protection works without causing outage) - use 'add-path' or at least 'best-external' in iBGP, so that you always have backup eBGP route immediately available once best is invalidated (normally you have lot of delay to find next best, once you lose your best eBGP) - tie your route validity to IGP, so you can invalidate your BGP the moment IGP disappears - ensure IGP converges fast (another topic) - set MRAI to 0 - use PIC edge - ensure your BGP NLRI can be as large as MTU allows - ensure your convergence isn't bottle necked by slow peer in group - ensure you are not dropping received TCP packets on punt path - ensure your fast external fallover works (eBGP down, on int down) this is quite easy to break - then ensure everyone else in the DFZ does the same thing But from a business POV, don't do any of this, you will have more bugs and lower availability and your customers will be less happy.
I *am *specifically interested in the answer to "Have you ever had to adjust BGP out-delay with any of your peers, and why?" It would be great if we could derive that answer from the survey results, but anecdotal replies here would also be helpful. All you larger(-than-me) network operators out there: when would I need to use out-delay? Why? What does it accomplish? Good luck in reformulating your survey to get better engagement, -Adam *Adam Thompson* Consultant, Infrastructure Services [image: 1593169877849] 100 - 135 Innovation Drive Winnipeg, MB, R3T 6A8 (204) 977-6824 or 1-800-430-6404 (MB only) athompson () merlin mb ca www.merlin.mb.ca ------------------------------ *From:* NANOG <nanog-bounces+athompson=merlin.mb.ca () nanog org> on behalf of Saku Ytti <saku () ytti fi> *Sent:* June 8, 2021 01:06 *To:* shahrooz () cs umass edu <shahrooz () cs umass edu> *Cc:* nanog list <nanog () nanog org>; Arun Venkataramani <arun () cs umass edu> *Subject:* Re: A survey on BGP MRAI timer values in practice On Mon, 7 Jun 2021 at 19:32, <shahrooz () cs umass edu> wrote:We often read that the Internet (i.e. BGP) has a long convergence delay. But why is it so slow? And can we (researchers) do anything about it?Create business incentives to improve it. This is a non-technical problem, we've long had technical tools to make it fast, there just isn't incentive to make it fast. Customers are not asking operators for better convergence speeds.Please help us out to find out by answering our short anonymous survey (<10 minutes).Can you tell me what have you done so far? What are the default MRAI values for each AFI/SAFI for IOS, IOS-XR, Junos, SROS, VRP and EOS? Then people responding don't have to check what their NOS does, they can refer to your table and tell the default value, since this is what99% will be using.Now your survey has built-in selection bias, people who answer it are people who know what it is and who are concerned about it and have changed it, this is not a representative group and you will start your work with very bad data. -- ++ytti
-- ++ytti
Current thread:
- A survey on BGP MRAI timer values in practice shahrooz (Jun 07)
- Re: A survey on BGP MRAI timer values in practice Saku Ytti (Jun 07)
- Re: A survey on BGP MRAI timer values in practice Adam Thompson (Jun 08)
- Re: A survey on BGP MRAI timer values in practice Saku Ytti (Jun 08)
- Re: A survey on BGP MRAI timer values in practice Randy Bush (Jun 09)
- Re: A survey on BGP MRAI timer values in practice Saku Ytti (Jun 09)
- Re: A survey on BGP MRAI timer values in practice Mark Tinka (Jun 10)
- Re: A survey on BGP MRAI timer values in practice Adam Thompson (Jun 10)
- Re: A survey on BGP MRAI timer values in practice Adam Thompson (Jun 08)
- Re: A survey on BGP MRAI timer values in practice Saku Ytti (Jun 07)
- <Possible follow-ups>
- Re: A survey on BGP MRAI timer values in practice Jakob Heitz (jheitz) via NANOG (Jun 09)