IDS mailing list archives

Re: Intrusion Detection Evaluation Datasets


From: "\"Zow\" Terry Brugger" <zow () acm org>
Date: Thu, 12 Mar 2009 08:40:04 -0700

Stefano,

An overwhelming majority of network based IDSs use only spatial
information present in packet headers.

"spatial" information ? if you mean "IP addresses", then

I took "spatial" information to mean connection or packet header data
-- more than just IP addresses, but lacking the unstructured data
portions.

1) your statement is definitely not true and

Actually, I think it is: the majority of unique NIDSs that I am
familiar with were built to use the KDD Cup '99 dataset. I pray none
of those systems are actually used in production anywhere. Let's face
it, only a handful of signature based network intrusion detectors were
ever built. After Marty released Snort to the community, there really
hasn't been a need to build another. Sure, a couple have been so that
they wouldn't be "encumbered" by the open source license, but there
really haven't been any major changes to signature based detection in
the past decade (just thousands of tweaks). Most anomaly or machine
learning based detectors will only work with structured data, so they
limit themselves to the header portions of the packets or connection
records.

2) such IDSs "work" only because of the artifacts in the evaluation datasets

We can't really say that conclusively. At this point we can only say
that any successes demonstrated by those systems has been due to flaws
in the evaluation datasets. For lack of good evaluation datasets, we
have no idea how those systems might perform in real world
environments. More importantly, for any system which requires training
data we must question how portable it is across different networks;
should it require unique training data for a given network, is it
feasible that such training data will ever be available?

I see a lot of people saying (correctly) that advanced (non-signature
based) NIDS can't be researched until we have good evaluation
datasets, and I see a lot of people ignoring them and doing it anyway.
Is anyone (else) actually working on fixing the data problem?

Cheers,
Terry



Current thread: