IDS mailing list archives

RE: TippingPoint Releases Open Source Code for FirstIntrusionPrev ention Test Tool, Tomahawk

From: "Brian Smith" <bsmith () tippingpoint com>
Date: Tue, 9 Nov 2004 20:59:10 -0600
I'm the author of Tomahawk; I appreciate all the interest in this.
I'd like to jump in and clarify what the tool does and what I think it's
good for (and not so good for).

As most people have figured out by now, tomahawk is a high performance,
inline version of tcpreplay.  What it does is pretty simple: it loads
one or more tcpdump packet captures (I'll call these pcaps) and
replays them through an IPS.  To do this, it uses a PC with 3 NICs,
one for management and two for data that connect to the IPS, like this:

           +---------+        +-----------+       
           |         |        |           |
           |         A <----> eth0        |
           |   IPS   |        |    PC  eth2 <---> mgmt
           |         B <----> eth1        |
           |         |        |           |
           +---------+        +-----------+       

The replay is "interface aware"; that is, packets are played through the
IPS in an order that's consistent with what the IPS would have seen had
it been on the network at the time the trace was captured.  For example,
suppose that eth0 and eth1 are the data interfaces.  A TCP three way
handshake will be replayed by sending the SYN out eth0, and SYN-ACK out
eth1, and an ACK out eth0.  Tomahawk will wait for the SYN to arrive at
eth1 before sending the SYN-ACK, just as in a real client/server
communication.  If a packet is dropped by the IPS, tomahawk will
retransmit the packet, up to a set number of times (timeout and retry
count are controlled via the command line).

When the replay is finished, either because all the packets made it
through or because the number of retransmissions was exceeded, tomahawk
reports whether the replay completed or timed out.

If a single trace was replayed using tomahawk, replay performance would
be almost completely dominated by latency through the IPS.  For example,
if the IPS latency were 10 ms, then you would get 100 packets/sec through
the IPS.  In fact, you'll get somewhat better performance than this, since
tomahawk transmits windows of data.  For example, if there are 3 packets
to send out eth0, tomahawk will send all three at once.  Nonetheless,
performance will be dominated by the latency.

To ramp up the bandwidth, tomahawk can replay multiple copies of the same
pcap in parallel.  Each copy is given its own block of IP addresses.
This allows considerable parallelism, since each copy can do its own
windowing.  The code is fairly well optimized for this type of operation.
In practice, this allows a single PC to replay about 300 Mbps through an
IPS and to simulate a good size chunk of a class B network.

You can create higher loads by aggregating traffic from several
tomahawk servers through a gigabit switch before going through the
IPS.  This allows you to create a gigabit test bed for a few
thousand dollars.

If you model the IPS as a FIFO queueing device, you can see that
the network performance reported by tomahawk is roughly the maximum
bandwidth the IPS can sustain at zero loss.  This is because the
bandwidth transmitted by tomahawk is exactly the same as the
bandwidth received from the IPS.

For example, suppose the device under test has a simple FIFO queue
and can process 500 Mbps of traffic.  Suppose further that the
tomahawk test jig can generate an aggregate of 1000 Mbps of traffic.
If tomahawk attempts to replay more than 500 Mbps of traffic through
the device, the queue on the device will begin to fill.  This will
automatically cause tomahawk to back off, almost instantaneously,
because it will stop transmitting packets while waiting for the
packets in the queue to be received.

So that's what it does.  Now, what's it good for?

The most obvious use is to play an attack pcap using tomahawk and verify
that the IPS will block the attack.  If tomahawk reports that the replay
completed, the attack made it through (i.e., all the packets made it
through), regardless of what the alert log states.  As others have
noted, as a pure coverage test, this test is only as good as the attack
pcaps used, and these are not easy to find.  But one thing it is
good for is repeatability testing.

Repeatability testing checks if the IPS is deterministic. If you take
a sample set of attacks, an IPS will block some and miss some.  For
example, given a set of 20 attacks, an IPS may block 18 out of 20.
You can't infer much about the attack coverage of the IPS from that,
because the sample size is too small (sort of like doing an exit poll
of 20 people).  But if you replay those attacks a thousand times, you'd
better see 18000 blocks and 2000 completes.  Otherwise, the IPS is
only blocking attacks some of the time.

In real deployments, lack of repeatability can show up as leakage.
In a worm storm, an IPS may get barraged with hundreds or thousands
of attacks per second.  If just one of them leaks through, the worm can
spread to the network on the far side of the IPS.

Another useful thing you can do with tomahawk is check whether the IPS
will block legitimate traffic.  In this test, you take a sample of
clean traffic off your network and replay through the IPS using tomahawk.
If the IPS is blocking legitimate traffic, the pcap will time out.
There are several details behind this procedure -- after all, the trace
may contain legitimate attacks, and these must be removed.  I'm happy to
discuss these details in another thread, but this note is getting pretty
long and I want to describe another class of tests that use tomahawk:
performance testing.

In order to accurately predict the network performance of an IPS,
it is critical that a realistic protocol mix be used.  The reasons
are simple.  When an IPS inspects traffic, different code paths
are executed depending on the content.  For example, HTTP traffic
uses a different code path than DNS.  One invokes TCP reassembly
and the HTTP decoder, while the other uses the UDP parser and DNS
decoder.  These different code paths can have very different
performance characteristics.

For instance, suppose that a hypothetical IPS can process 1 Gbps
of HTTP traffic, but only 10 Mbps of DNS traffic.  If you do all
your lab tests with HTTP traffic, the IPS will look great in lab.
When you install it in the network with significant DNS traffic,
the IPS will crater the network.

Most IPSs have hundreds, or even thousands, of such code paths.
The performance of an IPS in a given network will depend on the
exact protocol mix present in that network.  Tomahawk can be used
to replicate the exact protocol mix a given network by replaying
a packet trace capture on that network.

Another example: tomahawk can be used to test how many connections
per second (CPS) an IPS can do.  Here's how: I created a little script
to open and close a TCP connection to a server, 1000 times.  I captured
a packet trace of this traffic.  The trace has 6000 packets -- each
TCP session has 6 packets, three for setup and three for teardown.
If you replay this pcap using tomahawk with 250 copies in parallel,
you can time how long it takes to open and close 250K TCP sessions.
With 3 generic PCs, I can replay 750,000 connections in about 8 seconds
over a crossover cable -- about 93000 connections per second.
If you replace the crossover with an IPS, you can test the performance
of the IPS.

You can also use tomahawk to create background traffic for other tests.
For example, you can replay traffic from the target network at 500 Mbps
then check the latency through the IPS (using a smartbits or equivalent
if you have one, ping if you don't).  Or you can check the performance
of a given workload (e.g., timing how long it takes to copy a large file
from an NFS or SMB server).  A command line parameter to tomahawk
will limit the replay rate of the pcap, allowing you to set the level
of background traffic.

Once you have a test jig like this, you can combine tests.  For example,
you can use one tomahawk server to send attacks at an IPS, and measure
what effect blocking has on the network performance.  Or you can check
repeatability under load.  Or managability while blocking under load.
And so on.

As a historical note, I developed the code about 2.5 years ago as part
of our quality assurance program, to predict the performance of our IPS
in real world environment.  We've been using it ever since, so it's
pretty stable.  We decided to release it because we think that it fills
a need.

If you want to learn more about the tool, try it out.  Like tcpreplay,
or any tool, it's not a panacea for all testing problems.  If you find
it useful, let me know.  If you think it can be improved, please
post a legitimate criticism or, better yet, improve on it by improving
the code or posting a better tool.

The IPS industry is just starting to see mainstream acceptance.  As
a community, we need to start defining tools and benchmark tests that
we can use to do objective, apples to apples comparisons of the pros
and cons of different products.  Tomahawk is just a start in that
direction, an attempt to get the ball rolling.  I hope we can have
a productive discussion on this topic, not just bashing and suspicion,
so that other vendors will be encouraged to publish open source versions
of their tools.

    Brian

--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from
CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708
to learn more.
--------------------------------------------------------------------------
Current thread:

RE: TippingPoint Releases Open Source Code for FirstIntrusionPrev ention Test Tool, Tomahawk Brian Smith (Nov 12)
- <Possible follow-ups>
- RE: TippingPoint Releases Open Source Code for FirstIntrusionPrev ention Test Tool, Tomahawk Brian Smith (Nov 12)