Wireshark mailing list archives

Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks

From: Bálint Réczey <balint () balintreczey hu>
Date: Tue, 22 Apr 2014 13:52:32 +0200

[Bringing the discussion to -dev with Evan's permission]

2014-04-22 10:15 GMT+02:00 Anders Broman <anders.broman () ericsson com>:



-----Original Message-----
From: wireshark-core-bounces () wireshark org [mailto:wireshark-core-bounces () wireshark org] On Behalf Of Evan Huus
Sent: den 22 april 2014 05:36
To: Wireshark core developers
Subject: Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks

On Mon, Apr 21, 2014 at 6:28 PM, Balint Reczey <rbalint () gmail com> wrote:


On Apr 22, 2014 12:11 AM, Evan Huus <eapache () gmail com> wrote:


On Mon, Apr 21, 2014 at 3:59 PM, Bálint Réczey <balint () balintreczey hu> wrote:

Hi Evan,

2014-04-21 18:21 GMT+02:00 Evan Huus <eapache () gmail com>:

(sending to -core because of potential security implications)

I was recently investigating implementing a wmem-based hash table,
since many of the current users of the wmem-tree do not actually
need the predecessor-lookup feature which is the only advantage it
provides over a hash table (whereas a hash table is otherwise much faster).

I ended up wandering down the rabbit-hole of hash collisions,
algorithmic complexity attacks, and universal hashing ([1], [2],
[3] and more).

Then I read the Glib hash table documentation: [4]. A quick grep
through the Wireshark source reveals numerous locations where we
use packet data in hash tables with insecure hash functions. As
such, I believe that Wireshark is vulnerable to a whole class of
denial-of-service attacks in this area.

Has this come up before? Am I overlooking something clever we're doing?

It is true that Wireshark is vulnerable to hash collision attacks,
and I think it did not come up before because we had enough
vulnerabilities of other simpler classes. ;-)


Makes sense :)

I think we could create wrappers to provide random seed to standard
glib hash functions which is the standard way of handling such
vulnerabilities.


Unfortunately (based on my reading) that will not be sufficient.
There are two vectors for an attack via collisions: the mapping from
key to guint, and the mapping from guint to bucket index. The first
(key->guint) is user-settable so we can provide a random seed. The
second (guint->bucket) is not user-controllable. From the glib
documentation:

"The modulus is taken with the hash table size (a prime number) to
find the 'bucket' to place each key into."
and then a few paragraphs down
"Even cryptographic hashes are very easy to find collisions for when
the remainder is taken modulo a somewhat predictable prime number."

So basically, it doesn't matter how strong we make the initial
mapping because the secondary bucket mapping is predictable anyways.
Fortunately there are easy and efficient ways to make the secondary
mapping unpredictable (it's actually simpler than the initial
mapping) so I guess that a good secure wmem map implementation is
actually fairly important to have.

Luckily ghashtable resizes itself increasing the number of buckets when the collision rate is high, thus this attack 
can't be performed effectively.
The described problem is valid only for hash tables with fixed bucket count.

Regarding the wmem hash tables I think C wrapping C++ STL hash tables with wmem based custom allocators would do the 
job.

Unfortunately this doesn't work; the free-all operation which wmem provides doesn't play nice with C++ destructors 
(besides which we are still pure-C for libwireshark for the time being).

We lost being C only with the Qt UI. :-)
It can be implemented several ways which work, one is registering each
hash table to the proper wmem pool and calling their destructor when
freeing the pool - this way we don't have to play with a custom
allocator.
Other technique would be not calling destructors, just freeing all the
allocated connected objects. This is not nice, but would work as well
IMO, since we would not hold refs to the free()-d objects.


I have whipped up a very basic wmem-map implementation at [1]. It uses universal multiply-shift hashing [2] for 
bucket placement, and provides a wmem_secure_hash function for replacing >g_str_hash and friends, based on work done 
by the Perl community [3] in hardening their hash tables. To the best of my knowledge the resulting hash map is 
secure against complexity attacks.

It still needs cleanup (comments, tests, etc). I would also like to replace one tree somewhere and run benchmarks to 
make sure it's actually faster. Suggestions and concerns are welcome.

It would be nice to see how it compares to other implementations:
http://incise.org/hash-table-benchmarks.html

I prefer using existing building building blocks instead of inventing our own.

Cheers,
Balint


[1] https://code.wireshark.org/review/1272
[2] https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic
[3] http://blog.booking.com/hardening-perls-hash-function.html



Would it be good to have initial hastable size in the API? So a table could be pre allocated to a custom size for the 
cases where we know the size will be larger than standard default.

I think it would not be good. I would save a constant number resizes
(O(1) gain) but we would reserve more memory for each hash table
initially.

Cheers,
Balint

Regards
Anders

Cheers,
Balint

AFAIK ghashtable does not employ randomization of hash function seed:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655044

Cheers,
Balint


Evan

P.S. I believe it's still perfectly OK to use hash tables with
non-packet data such as static value_strings or as in [5].

[1] http://lwn.net/Articles/474912/ [2]
http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attack
s [3] https://en.wikipedia.org/wiki/Universal_hashing
[4]
https://developer.gnome.org/glib/unstable/glib-Hash-Tables.html#GH
ashFunc [5]
https://code.wireshark.org/review/gitweb?p=wireshark.git;a=commit;
h=5983cda769fa6615dda5fc7b8f87819d40f0a8d5

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe

Current thread:

Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks Bálint Réczey (Apr 22)
- Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks Anders Broman (Apr 22)
- Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks Evan Huus (Apr 22)