Information Security News mailing list archives

The perils of Googling

From: InfoSec News <isn () c4i org>
Date: Wed, 10 Mar 2004 07:20:41 -0600 (CST)
http://www.theregister.co.uk/content/55/36142.html

[Its worthwhile to visit the story on the Register's website for links 
to Google on various search terms.  - WK]


By Scott Granneman
SecurityFocus
Posted: 10/03/2004 

Google is in many ways most dangerous website on the Internet for
thousands of individuals and organisations, writes SecurityFocus
columnist Scott Granneman. Most computers users still have no idea
that they may be revealing far more to the world than they would want.

I'm not putting down Google. Far from it: it's a great search engine,
and I use it all the time. I couldn't do my many jobs without Google,
so I've spent some time learning how to maximize its value, how to
find exactly what I want, how to plumb its depths to find just the
right nugget of information that I need. In the same way that Google
can be used for good, though, it can also be used by malevolent
individuals to root out vulnerabilities, discover passwords and other
sensitive data, and in general find out way more about systems than
they need to know. And, of course, Google's not the only game in town
- but it is certainly the biggest, the most widely-used, and in many
ways the easiest to use.


Throwing back the curtain

Most people just head to Google, type in the words they're looking
for, and hit Google Search. Some more knowledgeable folks know that
they put quotation marks around phrases, or put a "+" in front of
required words or a "-" in front of words that should not appear, or
even use Boolean search terms like AND, OR, and NOT. Greater Google
aficionados know about Google's Advanced Search page, where you get
really specific.

The page that Google provides for its Advanced Search is nice, and
it's certainly easy and full of necessary tips, but if you really want
to master all the tricks that Google offers the dedicated searcher,
you need to learn at least some of what is detailed on the Google
Advanced Search Operators page. For instance, let's say you just type
the word "budget" into a Google search box, without the quotation
marks. You're going to get over 11,000,000 hits, so many that it would
take a tremendously long time to find anything troublesome from a
security perspective.

Now try that same search, but include the search operator "filetype"  
along with it. Using the filetype operator, you can specify the kind
of file you're looking for. Google's Advanced Search page lists
several common formats, including Microsoft Word, Microsoft Excel, and
Adobe Acrobat PDF, but you actually search for far more than those.  
Let's change our search from just "budget" to "budget filetype:xls"  
(again without the quotes; in fact, just ignore the quotation marks
unless I mention otherwise) and see what we get.


63,000 hits and counting

Hmmm ... now we're down to 63,000 hits. Still an overwhelming number,
but if you start looking through the first couple of pages, you'll
notice some items of interest if you were an attacker looking for
information you shouldn't have. Let's add another operator into the
mix.

The "site" operator allows you to narrow down your results to a
particular subdomain, a second-level domain, or even a top-level
domain. For instance, if you wanted to find out what Google has
indexed at SecurityFocus on the topic of password cracking, try this
search: "site:www.securityfocus.com password cracking", which gives
you 449 results. I often use this trick even when a site provides its
own search engine, as Google's index is often far better than the
search that many sites include.

Let's try our search, but stick to the .edu top-level domain, so we're
looking for "budget filetype:xls site:edu". 15,200 hits. Not bad.  
Things are starting to look very interesting.

Let's introduce another tool into your toolbox: the ability to look
only on pages that use a certain word or words in their title by
incorporating the "intitle" operator into your search. At
SecurityFocus, this query would narrow our results list down to only
five, an incredible tightening of our search:  
"site:www.securityfocus.com intitle:password cracking" (note that
"password" is the only word that must be in the title; "cracking"  
should appear on the page as a search term, but not in the title,
since I didn't place "intitle:" prior to it).


Enter the bad guys

Bad guys know about the "intitle" operator, but they know something
else that makes it even more powerful. Often Web servers are left
configured to list the contents of directories if there is no default
Web page in those directories; on top of that, those directories often
contain lots of stuff that the website owners don't actually want to
be on the Web. That makes such directory lists prime targets for
snoopers. The title of these directory listings almost always start
with "Index of", so let's try a new query that I guarantee will
generate results that should make you sit up and worry:  
"intitle:"index of" site:edu password". 2,940 results, and many, if
not most, would be completely useless to a potential attacker. Many,
however, would yield passwords in plain text, while others could be
cracked using common tools like Crack and John the Ripper.

There are other operators, but these should be enough to make the
picture clear. Once you start to think about it, the potentially
troublesome words and phrases that can be searched for and leveraged
should begin to multiply in your mind: passwd. htpasswd. accounts.  
users.pwd. web_store.cgi. finances. admin. secret. fpadmin.htm. credit
card. ssn. And so on. Heck, even "robots.txt" would be useful: after
all, if someone doesn't want search engines to find the stuff listed
in robots.txt, that stuff could very well be worth a look. Remember,
robots.txt just indicates that the website doesn't want search engines
to index the files and folders listed in robots.txt; nothing
inherently stops users from accessing that content once they know it
exists.


Sensitive information

A couple of websites have even sprung up dedicated to listing words
and phrases that reveal sensitive information and vulnerabilities. My
favorite of these, Googledorks, is a treasure trove of ideas for the
budding attacker. As a protective countermeasure, all security pros
should visit this site and try out some of the suggestions on the
sites that they oversee or with whom they consult. With a little elbow
grease, some Perl, and the Google Web API, you could write scripts
that would automate the process and generate some nice reports that
you could show to your clients. Of course, so could the bad guys...  
except I don't think your clients will ever see those reports, just
the end results.

Even the Google cache can aid in exposing holes in systems. Couple the
operators outlined above with Google's cache, which can provide you
with a look at files that have changed or been removed, and attackers
have an incredibly powerful tool at their disposal.


Responses

As I said at the beginning of this column, the fact that it is
actually quite easy to find dangerous information using just a search
engine and some intelligent guesses is not exactly news to people who
think about security professionally. But I'm afraid that there are
many uneducated folks putting content onto Web servers that they think
is hidden to the world, when it is in reality anything but.

We have two seemingly opposite problems at work here: simplicity and
complexity. On the one hand, it has become very easy for non-technical
users to post content onto Web servers, sometimes without realizing
that they're in fact placing that content on a Web server. It has even
become easier to Web-enable databases, which has led in one case to
the exposure of a database containing the records of a medical
college's patients (and by the way, the search terms discussed in that
article are still very much active at Google, one year later).

Even when people do understand that their content is about to go onto
the Web, many do not fully think through what they're about to post.  
They don't examine that content in light of a few simple questions:  
How could this information be used against me? Or my organisation? And
should this even go on the Web in the first place?

Well, of course ordinary users don't think to ask these questions!  
They're just interested in getting their content out there, and most
of the time are just pleased as punch that they could publish on the
Web in the first place. Critically examining that content for security
vulnerabilities is not something they've been trained to do.


Points of failure

On the other side of the coin we have complexity. For all the ease
that has come about in the past several years, no matter how simple it
has become for Bob in Marketing to publish the company's public sales
figures online, the fact remains that we're dealing with complex
systems that have many, many points of potential failure. That
knowledge scares the hell out of the people who live security, while
Bob goes blithely on successfully publishing the company's public
sales figures ... and accidentally publishing the spreadsheet
containing the company's top customers, complete with contact info,
sales figures, and notes about who the salespeople think are good for
a few thousand more this year.

For instance, FrontPage is touted by Microsoft as an extremely
simple-to-use Web publishing solution that enables users to "move
files easily between local and remote locations and publish in both
directions". Unfortunately for those average Joes who buy into the
hype, FrontPage is still a very complicated program that can easily
expose passwords and other sensitive data if it is not administered
correctly. Don't believe me? Just search Google for "_vti_pvt password
intitle:index.of" and take a look at what you find.

FrontPage is not the only offender, but it is certainly an easy one to
find in abundance on our favourite search engine. Now think about all
the other programs out there that people are using every day. Personal
Web servers that come with operating systems. Turnkey shopping cart
software. Web-enabled Access databases. The list goes on and on. Take
a moment and start to think about the organisations you oversee. See
the list of potential problems tumble off into infinity. Oy.

Sure, it's possible for the folks creating Web content to tell Google
and other search engines not to index that content. O'Reilly's website
has a marvellous short piece titled "Removing Your Materials From
Google" that should be required reading for anyone who even thinks
about putting anything on or even near a Web server. Of course, as I
mentioned above, relying on robots.txt to protect sensitive content is
a bit like putting a sign up saying "Please ignore the expensive
jewels hidden inside this shack". But at least it will get folks
thinking.


Understand the threat

And really, that's what it comes down to: we have to get folks
thinking. Sure, those of us responsible for security can try to shut
everything down and turn everything off that could pose a threat - and
we should, within reason. But those pesky users are going to do their
job: use the systems we provide them, and some we don't provide. We
need to help them understand the threats that any Web-enabled
technology can provide.

Print out this column and hand it out. Show them how easy it is to
find sensitive content online. Talk to them about appropriate and
inappropriate content. Try to get them on your side so they trust you
and come to you with requests for help beforehand instead of coming to
you after the fact, when it's too late and the toothpaste is out of
the tube. Finally, realise that humans have an innate need to
communicate and will seize on any tool to do so, and if that means
talking to your users and setting up a wiki or bulletin board or other
collaborate tool, then do so.

Google and other search tools have made the world available to us all,
if we just know what to ask for. It's our job as security pros to help
make the folks we work and interact with aware of that fact, in all of
its far-reaching ramifications.

-=-

Scott Granneman is a senior consultant for Bryan Consulting Inc. in
St. Louis. He specializes in Internet Services and developing Web
applications for corporate, educational, and institutional clients.



-
ISN is currently hosted by Attrition.org

To unsubscribe email majordomo () attrition org with 'unsubscribe isn'
in the BODY of the mail.
Current thread:

The perils of Googling InfoSec News (Mar 10)