nanog mailing list archives

Re: Networks ignoring prepends?


From: Steve Gibbard <scg () gibbard org>
Date: Mon, 22 Jan 2024 10:17:54 -0800

To expand on what others have said here, I find it helpful to think of BGP as a policy enforcement protocol, rather 
than as a distance vector routing protocol.  

To that end, there’s a generally expected hierarchy of routes, and then a lot of individuality between networks.  
Having done traffic engineering for some global CDNs, there’s a bunch of inbound traffic control that you can do by 
letting an understanding of how most other providers think about this guide your transit and peering policies, and a 
remaining portion that generally needs to be solved through either discussions, negotiations, or commercial 
arrangements with the sending party or their upstreams.

For the general rules, local-preference trumps everything else.  The number of AS path hops comes after 
local-preference.  Other things being equal networks usually like to hand off traffic to a short AS path, and at the 
closest point to its origination (there are valid performance reasons for this) but local-preference policies will 
override both of those.

Local-preferences usually have three default tiers — customer, peering, and transit.  In other words, get paid, hand 
off for free, and pay.  There are often some additional peers that can be selected for traffic engineering reasons, 
either internally or by customers using BGP communities.  BUT, those BGP communities don’t transit to other ASes, so 
even if you manage to signal one hop up stream, you may still find your upstream provider announcing your routes to 
those who have different ideas.

One example of this from the early days of anycasted DNS root servers involved k.root-servers.net 
<http://k.root-servers.net/> installing a node in Delhi, which pulled 60% of its traffic from North America.  This was 
clearly non-optimal.  They had attempted to get routing diversity by getting transit from different providers in 
different parts of the world, but their Delhi node was, if I recall correctly, a customer of a customer of a customer 
of Level3.  Oops.

So, what do you do about this?

If you’re a global network operator, you probably attempt to maintain consistent peering/transit relationships across 
sites.  That way, AS paths and local-preferences should be fairly even, and you can let nearest exit routing do its 
thing.

If you have a smaller network, but have multiple interconnection locations that are far enough apart to make a 
performance difference, make the same transit and peering relationships at each one.  Make exceptions only for peers 
(not transit providers) whose customers or services only exist in one of the areas, and make sure they don’t announce 
your routes to their upstreams.  That way you won’t trombone traffic.

If you’ve done all that, and traffic is still coming in the wrong place, then you start talking to people.  “Hey, I’m 
buying transit from you in both Asia and the Western US, and all my traffic from asian-country-x is coming into San 
Jose.  Why?”  “Well, they only have a 100 Mb/s interconnection to us in Asia.  We have to traffic engineer around it.”  
And then you have to figure out how to convince some national telco to want to talk to you more than they want to talk 
to your transit provider.

I think in your case, I would be asking why you have a 5,000 mile, five-prepend loop to get to a provide ten miles 
away.  It suggests that your network is doing things 5,000 miles away that are inconsistent with what you're doing 
locally, or that you have upstreams who aren’t interconnecting locally or aren’t maintaining sufficient capacity or 
sufficient political relationships on those paths.  All of those would predictably have this result.  The solution is 
likely to take a look at your transit relationships, ask your transit providers about their transit relationships, and 
either supplement or switch to a set of transit providers who can provide the routing you want.

-Steve



On Jan 22, 2024, at 4:49 AM, William Herrin <bill () herrin us> wrote:

Howdy,

Does anyone have suggestions for dealing with networks who ignore my
BGP route prepends?

I have a primary ingress with no prepends and then several distant
backups with multiple prepends of my own AS number. My intention, of
course, is that folks take the short path to me whenever it's
reachable.

A few years ago, Comcast decided it would prefer the 5000 mile,
five-prepend loop to the short 10 mile path. I was able to cure that
with a community telling my ISP along that path to not advertise my
route to Comcast. Today it's Centurylink. Same story; they'd rather
send the packets 5000 miles to the other coast and back than 10 miles
across town. I know they have the correct route because when I
withdraw the distant ones entirely, they see and use it. But this time
it's not just one path; they prefer any other path except the one I
want them to use. And Centurylink is not a peer of those ISPs, so
there doesn't appear to be any community I can use to tell them not to
use the route.

I hate to litter the table with a batch of more-specifics that only
originate from the short, preferred link but I'm at a loss as to what
else to do.

Advice would be most welcome.

Regards,
Bill Herrin

-- 
William Herrin
bill () herrin us
https://bill.herrin.us/




Current thread: