nanog mailing list archives

Re: Polling Bandwidth as an Aggregate


From: Leo Bicknell <bicknell () ufp org>
Date: Fri, 20 Jan 2012 07:32:20 -0800

In a message written on Fri, Jan 20, 2012 at 12:16:14AM -0600, Jimmy Hess wrote:
Except Cacti/RRDTOOL is really just a great visualization tool, while you
can build stacks, it is not something that accurately meters data for
billing purposes.   The right kind of tool to use would be a netflow or
network tap-based billing tool,  that  actually meters/samples specific
datapoints at a specific interval and applies the billing business logic
for reporting based on sampled data points,  instead of  smoothed averages
of approximations.

To suggest Netflow is more accurate than rrdtool seems rather strange
to me.   It can be as accurate, but is not the way most people
deploy it.

RRDTool pulls the SNMP counters from an interface and records them to a
file.  With no aggregation, and assuming your device has accurate SNMP,
this should be 100% accurate.  While you are right that the defaults for
RRDTOOL aggregate data (after a day, week, and month, approximately)
those aggregates can be disabled keeping the raw data.  I know several
ISP's that keep the raw data and use it for billing using these tools.

Netflow often suffers right at the source.  If you want to bill off
netflow data 1:1 netflow is almost required, while most ISP's do sampled
Netflow at 1:100 or 1:1000.  Those sampling levels produce more
inaccuracy than RRDTool's aggregation function.  What's more, once the
data is put into the Netflow collector, they all do aggregation as well,
just like RRDTool.  Again, you can disable much of it with careful
configuration.

But let's compare apples to apples.  Let's consider RRDTool configured
to not aggregate with 1:1 netflow configured to not aggregate.  RRDTool
polls a monotonically increasing counter.  Should a poll be missed no
data is lost about the total number of bytes transferred.  Thus you can
bill by the number of bytes transferred with 100% accuracy, even with
missed polls.  If you bill by the bit-rate, you can interpolate a single
missing data point which high accuracy as well.

Netflow is a continuous stream of UDP across the network.  If a UDP
packet is lost between the router and the collector there is no way to
reconstruct that data, and it is lost forever.  Thus any network events
means you won't have the data to bill your customer, and you're pretty
much stuck always underbilling them with the data actually collected.

If data is not gathered using a mechanism that communicates timestamp to
the poller, datapoints will still be imprecise, SNMP would be an example
--  the cacti application may assume the SNMP response is current data, but
possibly on the actual hardware, the internal MIB on the device was
actually updated 10 seconds ago,  which means there will be  small spikes
in traffic rate graphs that do not represent actual spikes in traffic.

Most of the large ISP's I know of moved away from both of the solutions
above to propretary, custom solutions.  They SNMP poll the counters and
store that data in a database with high resolution counters, forever,
never aggregated.  The necessary perl/python/ruby code to do that and
stick it in mysql or postgres is only a few pages long and easy to
audit.

-- 
       Leo Bicknell - bicknell () ufp org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/

Attachment: _bin
Description:


Current thread: