nanog mailing list archives

Re: Someone Please Help Me Understand


From: Faisal Imtiaz <faisal () snappytelecom net>
Date: Mon, 4 Apr 2016 13:37:55 +0000 (GMT)

Eric,
There is no simple cut and dry way of troubleshooting such a situation, other than need to look at the problem in 
multiple different ways..
It also helps in being able to do some comparative test/results with another nearby network...

It is also not un-common to have to shutdown a peer v.s prepend.. when troubleshooting.
One has to keep in mind that many of the IP Transit networks use local pref for customer routes, thus nullifying 
(ignore) the AS prepends.
Each provider is different, HE does not have a published set of communities, thus effectively do not allow their 
customers to do any significant traffic 
engineer.. (anyone from HE, if I am wrong, please feel free to correct me). 
Level3 by default overrides any AS prepends with local pref, but does allow it's customers to use communities to 
override those settings.

. I am not trying to publically shame or air dirty laundry, I am just trying to
understand the situation more.  CDNs bring a whole new level I have yet to
comprehend with multicast DNS and GeoIP responses...


Understood, I have been there so I can relate. Nanog is a great place to learn, even when asking dumb questions, folks 
here have been very supportive in explaining, and every now and then one sees a sarcastic reply, but overall I cannot 
say I have ever had anyone treat me in a condescending manner.

My humble suggestion is that you start with simple stuff first .. i.e. bgp traffic engineering before trying to wrap 
your head around multicast DNS and GeoIP response... I often find the answer to complex issues to be in the simple 
stuff, which often gets overlooked !

:)


Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232

Help-desk: (305)663-5518 Option 2 or Email: Support () Snappytelecom net

----- Original Message -----
From: "Eric Rogers" <ecrogers () precisionds com>
To: "Faisal Imtiaz" <faisal () snappytelecom net>
Cc: "nanog list" <nanog () nanog org>
Sent: Monday, April 4, 2016 8:46:41 AM
Subject: RE: Someone Please Help Me Understand

Thanks Faisal,

I appreciate the time you took and the detail you have placed.  I did try
prepending our HE connection thinking it was an issue via HE, and we started
going out Level3, and it also went to Dallas with nearly the same packet loss.
I don't know what the return path is/was, but through another provider, it
also showed major packet loss.  That leads me to believe that FB is/was having
issues in Dallas.  Maybe on their peering port?  I have since found out they
don't peer through the route servers, but only directly through the exchanges
(direct peering relationship).  I have since submitted a peering request to FB
and also submitted a request to their NOC to look at the packet loss and why we
are getting Dallas IPs.  I have not received a response to either.

I can use the community strings to manipulate our announcement of our routes,
but won't DNS tell the browser what IP to ultimately get the data?

I am not trying to publically shame or air dirty laundry, I am just trying to
understand the situation more.  CDNs bring a whole new level I have yet to
comprehend with multicast DNS and GeoIP responses...

Eric Rogers
PDS Connect
www.pdsconnect.me
(317) 831-3000 x200


-----Original Message-----
From: Faisal Imtiaz [mailto:faisal () snappytelecom net]
Sent: Sunday, April 3, 2016 8:27 PM
To: Eric Rogers
Cc: nanog list
Subject: Re: Someone Please Help Me Understand

Hi Eric,

With this type of connectivity you have to pay attention to Traffic
Engineering...

And when I say, traffic engineering, I mean both ways.. how you are sending
traffic to them along with how they are sending traffic to you... (sometimes a
bit more challenging to do).

I will give you two specific example, just to illustrate the point...

We are located in the east coast, we have ip transit to Cogent network, via one
intermediary ASN.
We also have IP Transit with GTT and Hibernia networks.
We also have direct peering on multiple Peering Fabrics.

1st cases...
We have our outbound traffic engineered to prefer direct routes.. e.g. when
sending traffic to Cogent, we send it out via the intermediary ASN to Cogent.
However when traffic is coming back from Cogent.... they see our prefixes via
intermediary ASN as well as Hibernia Networks, since Hibernia networks is a
lower ASN, they prefer that route....
So, one can say, no big deal, except, Hibernia Networks connects to Cogent on
the West Coast !... so our return traffic is going from the east coast to west
coast and them back to east coast....
So one can easily say... Houston we have a problem !...

2nd Case..
We are peered with some networks at Telx TIE, via one of our (intermediary)
ASN...So while we can send traffic over to that network via our ASN, however
that networks sees our prefixes via our (intermediary) ASN as Hibernia as
well.... Hibernia being a lower ASN, they send traffic back to us via them...

In both cases we use communities to take corrective action....

Moral of the story is..... just because you have multiple peers, and peer with
folks on the Peering Fabric, the default configuration of BGP will not
AUTOMAGICALY  optimize the paths in your favor....

And thus the condition you describe will be the result...

Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232

Help-desk: (305)663-5518 Option 2 or Email: Support () Snappytelecom net

----- Original Message -----
From: "Eric Rogers" <ecrogers () precisionds com>
To: "nanog list" <nanog () nanog org>
Sent: Saturday, April 2, 2016 1:54:40 PM
Subject: Someone Please Help Me Understand

Ok, I'm trying to learn, so bear with me.



We are an ISP in Indianapolis that has full routes from 3 different
providers HE.Net in Columbus OH being one.  We also are peered with 2
peering exchanges, including EquinixIX in Chicago.  The problem is
Instagram and Facebook (same company, I know) for our customers seems
very slow.



This is where I need a way to troubleshoot/understand more.  I did a
traceroute to the IP that is serving the pictures, and it resolves to
the FBCDN servers in Dallas, and is showing packet loss and pings once
it hits Dallas, and are in the 1xxs of ms.



Tracing route to instagram-p3-shv-01-dfw1.fbcdn.net [31.13.66.52]

over a maximum of 30 hops:



 1     4 ms     3 ms     4 ms  10.7.0.1

 2    20 ms    43 ms    42 ms  inmtvlobs-rtr-01.dynamic.pdsconnect.me
[192.69.57.1]

 3    25 ms    47 ms    29 ms
inmtvlmwt-rtr-01.infrastructure.pdsconnect.me [192.69.48.162]

 4    46 ms    32 ms    58 ms
inindyhen-core1.infrastructure.pdsconnect.me [192.69.48.193]

 5    36 ms    53 ms    51 ms  ge2-4.core1.cmh1.he.net [184.105.32.1]

 6    47 ms    41 ms    75 ms  10ge1-2.core1.chi1.he.net
[184.105.222.165]

 7    57 ms    57 ms    53 ms  100ge14-1.core2.chi1.he.net
[184.105.81.97]

 8    57 ms    73 ms    84 ms  100ge12-1.core1.mci3.he.net
[184.105.81.209]

 9    75 ms    73 ms   102 ms  10ge15-6.core1.dal1.he.net
[184.105.222.10]

10    93 ms   103 ms    92 ms  eqix-da1.facebook.com [206.223.118.176]

11   102 ms   101 ms     *     psw01c.dfw1.tfbnw.net [173.252.65.196]

12    92 ms    97 ms   105 ms  msw1aq.01.dfw1.tfbnw.net [204.15.21.89]

13   110 ms     *       98 ms  instagram-p3-shv-01-dfw1.fbcdn.net
[31.13.66.52]



Since I am peered with the route servers in EquinixIX Chicago,
shouldn't the data be coming from there, or at least hit their
routers?  In my trace, it shows HE to Chicago, then to Dallas.  How
does FB decide what IP the content gets displayed from, and is there
anything I can do as a provider?  If it is DNS, I can obviously clear
the cache to see if it gets new IPs.  If I'm not getting FB peering
IPs in Chicago, do I need to peer directly?  Should I get FaceBook involved?



Eric Rogers

PDS Connect

(317) 831-3000 x200


Current thread: