nanog mailing list archives

Re: [Nanog] Cogent Router dropping packets


From: Joe Greco <jgreco () ns sol net>
Date: Mon, 21 Apr 2008 10:41:21 -0500 (CDT)

On Sat, Apr 19, 2008 at 7:26 PM, manolo <mhernand1 () comcast net> wrote:
Some things just never change at cogent.. fought them for months way
 back when to get me off their infamous 2 bgp peer setup after many an
 outage due to this setup, they finally put us on a single bgp session
 but it took forever. Lets just say cogent didn't last long at the
 company I worked for.

Could you provide additional details on the failure mode experienced
resultant from this "two tiered" configuration?  How did moving to a
"conventional" configuration with a single directly-connected neighbor
solve things?

For those unfamiliar, Cogent has a system where you set up an EBGP peering
with the Cogent router you're connected to, for the purposes of announcing
your routes into Cogent.  However, these are typically smaller, aggregation
class routers, and do not handle full tables - so you don't get your routes
from that router.  To get a full table FROM Cogent, you need to set up an
EBGP multihop session with them, to their nearest full-table router.  I 
believe they actually do all their BGP connections in that manner.

This probably makes a lot of sense from an engineering point of view, and
could be construed as a BGP competence test.  On the other hand, it does
have the potential to make things more complex in the event of a failure.

I'm not aware of any flaws with such a design that would cause "many an
outage," and connections that we've managed for customers with Cogent
suggest that it works well.  However, if there are problems within the
local Cogent node, I could easily see situations where hard-to-identify
problems could result.  That would seem to me to be an equipment, capacity,
or possibly a configuration issue, but not something which discredits the
overall strategy.  Given that they're providing inexpensive bandwidth, it
isn't likely that they'll be sticking large routers everywhere for the
customers who want a full table and a simpler BGP configuration.

There are many things that you can realistically criticize Cogent for, but
I'm not sure the peerA/peerB thing should be one of them.  It is certainly
more complex, but seems to serve a purpose.

What steps were taken during your postmortem and subsequent lab
simulations to verify that the outages were not with the customer-side
implementation, or perhaps a simple typographical error?

Here in H-town, we are deploying a metro/BLEC network comprised of
1000s of small L3 boxes not carrying full tables (Cisco 3560 and
similar), and would like very much to learn from these major
architectural mistakes, so that we can avoid similar outage scenarios.
 Any information you could provide would be excellent.

Interesting :-)

  You get what you pay for....

Not passing any judgment on quality, Cogent is more towards the middle
of the road for price, these days, on larger commits.

Or in places like Ashburn.  I've been wondering what their future strategy
will be.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.

_______________________________________________
NANOG mailing list
NANOG () nanog org
http://mailman.nanog.org/mailman/listinfo/nanog


Current thread: