Interesting People mailing list archives
Re Chronicle of Higher Education: Google and the Misinformed Public
From: "Dave Farber" <farber () gmail com>
Date: Wed, 18 Jan 2017 18:13:16 -0500
Begin forwarded message:
From: Chuck McManis <chuck.mcmanis () gmail com> Date: January 18, 2017 at 5:34:59 PM EST To: Dave Farber <dave () farber net> Cc: ip <ip () listbox com> Subject: Re: [IP] Re Chronicle of Higher Education: Google and the Misinformed PublicIt sounds like there's a market opportunity here for a search engine that explicitly provides context for search results: credibility, fact checking, bias (not as a value judgement), research articles vs. journalism reporting on them, etc. Could also incorporate some form of crowd sourcing, etc.Fortunately this experiment has been done at least once. The search engine Blekko was founded by Rich Skrenta and his friends from Topix, I joined as VP of operations (and later operations and engineering) in 2010 about 6 months before the product officially launched in November of 2010. The search engine was predicated on the idea that much of the content of the web was useless and that the original search engine mission of finding pages you would not normally find, had flipped over to finding useful pages out of all the fluff. That concept was implemented as a curated index based on web sites that had been identified by humans (content editors) and generally validated by search traffic (click throughs). What had been observed was that Google had created an attractive nuisance in the early 2000's where a page with AdSense ads on it that was on the first page of results for a given topic would deliver thousands of dollars a month in passive income, the same pages with would provide additional income with affiliated links (sending referral cash to the referral network). Blekko defined such pages which were simply place holders to drive advertising traffic as web "spam" in much the same vein as email spam. By 2015 Google had an estimated an index size of 10 trillion documents, by analyzing over 200 billion pages Blekko estimated that there were no more than 100 billion web documents with actual content. In that same year Doug Smith presented a paper he and I had authored at ICSC and Doug and I had further refined the research to use training data from known good web pages to begin automating growing the index with high quality documents. As a product idea it resonated strongly with anyone who used search as part of their job. Blekko was beloved by the reference librarians around the world, had a tremendous following with attorneys and journalists who used use for research, and students trying to research term papers. As a business model it was less successful. Specifically the only search 'intent' that makes money is commercial intent. Blekko was unable to pursue subscription access to the index and because we did not index all content the engine would do poorly on topics that were not curated, or "long tail" topics[1]. The company had a "3 card monte" game built into the interface where it would show results from Blekko, Bing, and Google (the only 3 US based indexes available, today we're back to only 2). The user was asked to pick the column with the "best" results and that showed a consistent correlation between contested searches, long tail searches, and topic searches. A "contested" search was one where a lot of people were attempting to game the algorithm (search for "no fee credit card" some time to see a good example of a contested search), in those Blekko consistently 'won' because our index wasn't influenced by these folks and so we only returned good pages. In "long tail" searches Google generally won, it has a really really big index. And in topic searches we typically would tie with Bing (and beat Google) or win if the topic was one of our curated topics. In March of 2015 IBM bought the assets of Blekko and made it part of the Watson Group where it's crawler continues to go out and collect web pages but now does so in service of building data sets for Watson rather than a search engine. What I learned was that it costs about $7.5M/year grossed up to operate a 5 billion page index with enough bandwidth to serve on the order of 10M queries/day and you can't make a profit with that unless you build up your own ad network (which Blekko never did). People love quality search results, but they aren't actually willing to pay any money for them. (or conversely they are willing to put up with the queries that return nothing but junk on Google for the ones that work well). Digital advertising networks are filled with people who make boiler room sellers of sub-prime mortgages look like angels. --Chuck [1] A "long tail" topic is one for which only a handful of web pages exist and it is not referenced widely in the existing webOn Wed, Jan 18, 2017 at 11:33 AM, Dave Farber <farber () gmail com> wrote: Begin forwarded message:From: Thomas Leavitt <thomleavitt () gmail com> Date: January 18, 2017 at 2:26:30 PM EST To: Dave Farber <dave () farber net> Subject: Re: [IP] Chronicle of Higher Education: Google and the Misinformed Public Dave, It sounds like there's a market opportunity here for a search engine that explicitly provides context for search results: credibility, fact checking, bias (not as a value judgement), research articles vs. journalism reporting on them, etc. Could also incorporate some form of crowd sourcing, etc. Would be an interesting technical challenge to make this applicable across a broad range of searches, and of course there's the business case (or lack thereof) and going up against Google. On the other hand, it seems like there's a real need for genuine innovation in the space, and some obvious candidates that would likely be interested in executing an buy out for a successful implementation prior to the company going to market. Regards, Thomas LeavittOn Jan 17, 2017 10:13 AM, "Dave Farber" <farber () gmail com> wrote: Begin forwarded message:From: Lauren Weinstein <lauren () vortex com> Date: January 17, 2017 at 11:20:06 AM EST To: nnsquad () nnsquad org Subject: [ NNSquad ] Chronicle of Higher Education: Google and the Misinformed Public Chronicle of Higher Education: Google and the Misinformed Public http://www.chronicle.com/article/Googlethe-Misinformed/238868 Digital media platforms like Google and Facebook may disavow responsibility for the results of their algorithms, but they can have tremendous -- and disturbing -- social effects. Racist and sexist bias, misinformation, and profiling are frequently unnoticed byproducts of those algorithms. And unlike public institutions (like the library), Google and Facebook have no transparent curation process by which the public can judge the credibility or legitimacy of the information they propagate. That misinformation can be debilitating for a democracy -- and in some instances deadly for its citizens. - - - --Lauren-- REPORT Fake News Here! - https://factsquad.com CRUSHING the Internet Liars - https://vortex.com/crush-net-liarsArchives | Modify Your Subscription | Unsubscribe Now
------------------------------------------- Archives: https://www.listbox.com/member/archive/247/=now RSS Feed: https://www.listbox.com/member/archive/rss/247/18849915-ae8fa580 Modify Your Subscription: https://www.listbox.com/member/?member_id=18849915&id_secret=18849915-aa268125 Unsubscribe Now: https://www.listbox.com/unsubscribe/?member_id=18849915&id_secret=18849915-32545cb4&post_id=20170118181325:B47F58FC-DDD3-11E6-AE4C-E326D0A49613 Powered by Listbox: http://www.listbox.com
Current thread:
- Re Chronicle of Higher Education: Google and the Misinformed Public Dave Farber (Jan 18)
- <Possible follow-ups>
- Re Chronicle of Higher Education: Google and the Misinformed Public Dave Farber (Jan 18)