nanog mailing list archives

Re: "Simple" Multi-Homing ? (was Re: CIDR Report)


From: Chris Williams <chris.williams () third-rail net>
Date: Mon, 16 May 1988 14:49:16 -0400


Those are examples of contingencies that should be covered in your
service level agreement.  The SLA should hold the provider responsible
for any loss of income, recovery costs, or whatever, should they be the
ones to screw up.  Assuming you trust the provider to honour the SLA
you've worked out with them then your level of risk is mitigated even if
it's not done exactly in the way you might prefer under ideal
circumstances -- we are talking about (hopefully exceptional) events
here, after all!  If you don't trust your provider to honour their
agreements then I'd humbly suggest you find one you can trust!  ;-)

Most SLAs I've seen, at least for smaller customers, are of the type "if
we're down for a day, you get a free week", which means in general your
maximum remedy for an outage is the cost of a T1 for a month. I think it
is pretty plausible that a company which only needed a T1 of bandwidth
could lose a lot more than $1500 worth of business if they were down for
a day or two.

All but the last are also examples where basic link-level redundancy
will help to avoid total outages.  You don't need an ASN and full BGP
route peering just to remain connected when your T1 goes down!  Please
let's solve the right problem here!

All three were examples of miscommunication causing someone at the
provider to intentionally suspend or terminate service. It would hardly
matter how many links you had to the provider when they chose to shut
you down.

If such action is specified in your contract then you've accepted that
risk and you should mitigate it appropriately (eg. by regularly testing
and securing your servers!).  I'd hope that if you did have redundant
routing then your other provider would also cut you off for the same
reason and at approximately the same time!

The situation I was trying to highlight was one where such an incident
occurs, and the customer quickly and appropriately responds, but one of
their providers overreacts and at some point during the process suspends
service. It is really not about who is right, but about the fact that
any given provider is run by a small group of humans, and that any given
group of humans is to some degree unpredictable. If you only have one
provider, it only takes one human mishandling a situation to take you
offline.

I would hope that most reasonable providers would _not_ cut off a
customer immediately if they were found to be a source of misbehavior,
but first ask them politely to fix the problem (with the exception, of
course, of immediately blocking any traffic that was actively
interfering with someone else's operation). If you have discovered a way
to make a machine guaranteed and perfectly secure, I might reconsider
this position. ;P

I think there's another alternative that's being missed here too that'll
satisfy the majority of needs of quite a few people, if not most.  It
should be trivial to obtain only the minimum necessary address space
from both providers and truly multi-home the servers requiring
redundancy!  For outgoing connections you simply flip the default route
on each server as necessary (perhaps using automated tools) and for
incoming connections you just put multiple A RRs in your DNS for each
service requiring redundancy.  Load balancing opportunities spring to
mind here too!

Although I agree that this is a possible solution, I think at some point
it would become awefully hard to manage -- also, it only addresses a
subset of the situations requiring multihoming.

Do you know of any software to help implement this type of solution? I
can imagine how to script up the default-route swapping pretty easily on
a Unix box, but AFAIK it would likely require a reboot under NT, and I'm
not sure how you would go about automating it even then..

Maybe a good way to go about it would be to set up a box to do reverse
NAT for incoming connections to either set of server IPs, and then
round-robin between IP spaces for outgoing connections? I think this
could be set up with IPF under *BSD/Linux, I'm not familiar enough with
NAT under IOS to know how hard it would be to do with a Cisco router..
This would have the advantage of simplifying the server configurations,
and there should really be something in the way of a firewall/filter in
front of them anyhow.

The only real disadvantage I can see of this solution is that the
load-balancing is not topology-sensitive -- on the other hand, if you
weren't going to receive full views anyway, or if both providers end up
connecting to the tier-1 backbones in the smae place, this is a moot
point, and you are actually better off with round-robin load-balancing.



Current thread: