Interesting People mailing list archives

VERY GOOD READING query re Google actions


From: Dave Farber <dave () farber net>
Date: Tue, 17 Nov 2009 14:42:02 -0500





Begin forwarded message:

From: Eric Glover <eric_ip () ericglover com>
Date: November 17, 2009 2:01:43 PM EST
To: dave () farber net
Cc: ip <ip () v2 listbox com>
Subject: Re: [IP] query re Google actions


As an expert in web search who worked for two search engines I would like to point out a few key points:

Before I write them - I do not, and have never worked for Google, and do not claim to have any inside knowledge of what Google actually does for certain, merely what other engines do and what Google also appears to do from experiments and observation.

First - it is a very difficult matter to determine exactly why one page ranks higher than another for a given query - without getting the raw features and actual algorithm from Google. Google's top- level ranking algorithm is not public knowledge so we can only infer as Mr Farance has done.

Second - there are many other factors which can influence the ranking of a particular page for a given query, and there could easily be some other feature (not using meta-keywords) which can cause the same effects (I list some below).

Specifically: Web graph, click-data and user behaviors, and various domain-trust features are likely other factors which can significantly affect ranking.

It is my belief (as an expert who has worked for two commercial search engines in the past in a variety of roles) that the complex ranking function of Google as well as other engines is not a simple 'linear function'. Specifically, depending on one feature other features can be positive or negative. For example, a page which 'looks like a homepage' might consider in-link and title features differently than a page which looks like a 'reference page'. A page on a 'trusted domain', or linked from a 'trusted domain' might consider meta-keywords (not commenting on the particular pages in the disclosure), but otherwise they would be ignored or considered 'negative'. Maybe a page which was recently "discovered" might only consider the first 100 words of the document for ranking purposes, etc...

A search engine might not directly use a feature - say Pagerank - a feature which I believe is not used at all directly for run-time relevance calculation by any serious commercial search engine - might be 'indirectly used' - say some researcher does an offline experiment that calculates Pagerank and from that builds a list of 'trusted domains' and the 'is trusted domain' is a feature which is used by the ranking to decide what other features to consider or how to use them.

Specific confounding features:
#1: Web graph - it is important to note who/what currently (and previously) linked into those pages - what concepts were discussed on those pages. Although academic papers talk about in-bound anchotext, it is likely Google and other engines consider much more than simply 'words in links' (I have personally published papers about using windows of words near anchortext).

Also, I suspect if you have a company who knows what a Meta-keyword is and does a "campaign" to optimize, they are also asking "friends" to link to them with the desired keywords, as well as optimizing their own site to add those words on inlinks.

#2: Click logs and search behavior patterns. It is well known that search engines consider user behavior to aid in ranking. Lets say you have page A and page B - maybe for a given query page B is clicked more often - then the engine might (over time) change the ranking of page B - even though 'text-based attributes' might favor page A over B. Likewise, lets say you have a user who enters query q1 - they do a search, then they do a search for q1 - the engine might make connections between q1 and q2 and the pages the users interacted with for those clicks - so a word not on a page may still cause a page to rank.

#3: Search behavior and site popularity - although I do not know the extent to which this is used by Google - an engine which has user action data from toolbars or other logs might leverage those to boost "popular sites" - so if a site is viewed often then it might rank higher. So if you do a 'campaign' you buy ads and do other things to create lots of "views" - these views may indirectly translate to higher ranking.

Given the political nature of this site, it might have gotten more views, more in-links (with appropriate keywords on or near the in- bound links), more clicks, or higher 'trusted' scores (or lower). It is not possible without direct knowledge of the ranking function to determine the effect of meta-keywords with certainty. HOWEVER - it does not mean you can't do experiments to show that metakeywords are (or are not) likely used by search engines in limited circumstances, to correctly conclude this is quite difficult. Past experience shows Meta-keywords when used directly are noisy at best, detrimental at worst.

As an expert I can say that often you have "features" which are used indirectly since they are noisy, and meta-keywords is a "noisy feature" in that spammers have been known to (ab)use it, but it can help in selected cases. I suspect IF it is used indirectly then all stories could be consistent. I also know that engines use lots of "indirect data" from other sources which can complicate the ability to determine if one particular feature had any effect.

As a simple proof of this - remember the "Google Bombs" http://en.wikipedia.org/wiki/Google_bomb - where users caused sites to rank for specific query terms (even though the words did not appear on those pages). In addition - it is entirely possible that some other source created extra tags - say an algorithm scanning Wikipedia (http://en.wikipedia.org/wiki/ Chai_Ling) might automatically associate "Jenzabar" with the pages in question as "strong tags".

Hope this helps.

-Eric

Dave Farber wrote:
Begin forwarded message:
*From:* Paul Levy <plevy () citizen org <mailto:plevy () citizen org>>
*Date:* November 17, 2009 7:50:27 AM EST
*To:* David Farber <dave () farber net <mailto:dave () farber net>>
*Subject:* *Question*

In the trademark case where I am defending the documentarists Long Bow Group against a trademark lawsuit by Jenzabar, we have received an "expert report" from an individual named Frank Farance, who claims a long pedigree of involvement in standards and specification development organizations. Mr. Farance insists that Google takes keyword meta tags into account in computing search rankings (even though Google itself has announced that its ranking algorithm does NOT support keyword meta tags). <http://pubcit.typepad.com/clpblog/2009/11/jenzabar-expert-witness-claims-that-google-still-uses-keyword-meta-tags.html >http://pubcit.typepad.com/clpblog/2009/11/jenzabar-expert-witness-claims-that-google-still-uses-keyword-meta-tags.html

I am curious whether others on the list have any comments on his report.
Paul Alan Levy
Public Citizen Litigation Group
1600 - 20th Street, N.W.
Washington, D.C. 20009
(202) 588-1000
<http://www.citizen.org/litigation>http://www.citizen.org/litigation
Archives <https://www.listbox.com/member/archive/247/=now> <https://www.listbox.com/member/archive/rss/247/ > [Powered by Listbox] <http://www.listbox.com>




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com

Current thread: