Politech mailing list archives

Google's SafeSearch is overzealous, blocks innocuous domains [fs]


From: Declan McCullagh <declan () well com>
Date: Fri, 23 Apr 2004 09:22:52 -0400



http://news.com.com/2100-1032_3-5198125.html?tag=nefd.lede

Google's chastity belt too tight
Last modified: April 23, 2004, 4:00 AM PDT
By Declan McCullagh
Staff Writer, CNET News.com

PartsExpress.com proudly touts itself as the Net's No. 1 source for audio, video and speaker components--but online shoppers who rely on an optional feature in the Google search engine to block porn sites would never know it.

By an accident of spelling, the domain name of the Ohio electronics retailer includes an unfortunate string of letters, "sex," which is enough to block the Web site from Google's filtered results.

PartsExpress.com is not alone. A CNET News.com investigation shows that Google's SafeSearch filter technology incorrectly blocks many innocuous Web sites based solely on strings of letters such as "sex," "girls" or "porn" embedded in their domain names.

Google's SafeSearch flaws are more than academic--they can have serious consequences for innocent Web site operators blocked out by them. Google is the most widely used search engine on the Web, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets.

Research company WebSideStory reported last month that Google claimed an all-time high in search referrals, 41 percent of the United States total, and the search giant's market share is steadily expanding.

"Traffic from Google can make or break a business," said Maria Medina, whose family-run clothing business at ALittleGirlsBoutique.com doesn't pass the SafeSearch censor. "Here I am, a mom of four children, creating an at-home business that sells little girl dresses and accessories, in order to spend more time with my children, and I have been filtered out as not being family friendly. Ridiculous."

Matt Cutts, the Google engineer who designed SafeSearch four years ago, said his algorithm looks for a "relatively small" number of trigger words in a Web page's address. If one of those words appears, the SafeSearch algorithm puts the address on a block list and does not take the next step of evaluating the content of the site. "We try to find the best trade-off of precision, recall and safety," Cutts said. "People who opt in to SafeSearch are mostly OK with us being on the conservative side."

Cutts would not disclose how many Web searches are done with SafeSearch enabled, saying only that it's a small percentage of the millions of queries handled by Google each day. But the sloppy filter stands out as a rare black eye for a company that prides itself on superior search technology and boasts on its payroll one of the world's highest concentrations of computer science doctoral degrees. Google claims SafeSearch "uses advanced proprietary technology that checks keywords and phrases" and filters out only Web pages "containing pornography and explicit sexual content."

"That's not very bright," said Karen Schneider, a librarian who runs the Librarians' Index to the Internet and has made a study of filtering software. SafeSearch is "certainly evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development."

[...remainder snipped...]
_______________________________________________
Politech mailing list
Archived at http://www.politechbot.com/
Moderated by Declan McCullagh (http://www.mccullagh.org/)


Current thread: