Interesting People mailing list archives

VERY GOOD READING query re Google actions

From: Dave Farber <dave () farber net>
Date: Tue, 17 Nov 2009 14:42:02 -0500





Begin forwarded message:

From: Eric Glover <eric_ip () ericglover com>
Date: November 17, 2009 2:01:43 PM EST
To: dave () farber net
Cc: ip <ip () v2 listbox com>
Subject: Re: [IP] query re Google actions

As an expert in web search who worked for two search engines I wouldlike to point out a few key points:
Before I write them - I do not, and have never worked for Google,and do not claim to have any inside knowledge of what Googleactually does for certain, merely what other engines do and whatGoogle also appears to do from experiments and observation.
First - it is a very difficult matter to determine exactly why onepage ranks higher than another for a given query - without gettingthe raw features and actual algorithm from Google. Google's top-level ranking algorithm is not public knowledge so we can only inferas Mr Farance has done.
Second - there are many other factors which can influence theranking of a particular page for a given query, and there couldeasily be some other feature (not using meta-keywords) which cancause the same effects (I list some below).
Specifically: Web graph, click-data and user behaviors, and variousdomain-trust features are likely other factors which cansignificantly affect ranking.
It is my belief (as an expert who has worked for two commercialsearch engines in the past in a variety of roles) that the complexranking function of Google as well as other engines is not a simple'linear function'. Specifically, depending on one feature otherfeatures can be positive or negative. For example, a page which'looks like a homepage' might consider in-link and title featuresdifferently than a page which looks like a 'reference page'. A pageon a 'trusted domain', or linked from a 'trusted domain' mightconsider meta-keywords (not commenting on the particular pages inthe disclosure), but otherwise they would be ignored or considered'negative'. Maybe a page which was recently "discovered" might onlyconsider the first 100 words of the document for ranking purposes,etc...
A search engine might not directly use a feature - say Pagerank - afeature which I believe is not used at all directly for run-timerelevance calculation by any serious commercial search engine -might be 'indirectly used' - say some researcher does an offlineexperiment that calculates Pagerank and from that builds a list of'trusted domains' and the 'is trusted domain' is a feature which isused by the ranking to decide what other features to consider or howto use them.
Specific confounding features:
#1: Web graph - it is important to note who/what currently (andpreviously) linked into those pages - what concepts were discussedon those pages. Although academic papers talk about in-boundanchotext, it is likely Google and other engines consider much morethan simply 'words in links' (I have personally published papersabout using windows of words near anchortext).
Also, I suspect if you have a company who knows what a Meta-keywordis and does a "campaign" to optimize, they are also asking "friends"to link to them with the desired keywords, as well as optimizingtheir own site to add those words on inlinks.
#2: Click logs and search behavior patterns. It is well known thatsearch engines consider user behavior to aid in ranking. Lets sayyou have page A and page B - maybe for a given query page B isclicked more often - then the engine might (over time) change theranking of page B - even though 'text-based attributes' might favorpage A over B. Likewise, lets say you have a user who enters queryq1 - they do a search, then they do a search for q1 - the enginemight make connections between q1 and q2 and the pages the usersinteracted with for those clicks - so a word not on a page may stillcause a page to rank.
#3: Search behavior and site popularity - although I do not know theextent to which this is used by Google - an engine which has useraction data from toolbars or other logs might leverage those toboost "popular sites" - so if a site is viewed often then it mightrank higher. So if you do a 'campaign' you buy ads and do otherthings to create lots of "views" - these views may indirectlytranslate to higher ranking.
Given the political nature of this site, it might have gotten moreviews, more in-links (with appropriate keywords on or near the in-bound links), more clicks, or higher 'trusted' scores (or lower). Itis not possible without direct knowledge of the ranking function todetermine the effect of meta-keywords with certainty. HOWEVER - itdoes not mean you can't do experiments to show that metakeywords are(or are not) likely used by search engines in limited circumstances,to correctly conclude this is quite difficult. Past experience showsMeta-keywords when used directly are noisy at best, detrimental atworst.
As an expert I can say that often you have "features" which are usedindirectly since they are noisy, and meta-keywords is a "noisyfeature" in that spammers have been known to (ab)use it, but it canhelp in selected cases. I suspect IF it is used indirectly then allstories could be consistent. I also know that engines use lots of"indirect data" from other sources which can complicate the abilityto determine if one particular feature had any effect.
As a simple proof of this - remember the "Google Bombs" http://en.wikipedia.org/wiki/Google_bomb- where users caused sites to rank for specific query terms (eventhough the words did not appear on those pages). In addition - it isentirely possible that some other source created extra tags - say analgorithm scanning Wikipedia (http://en.wikipedia.org/wiki/Chai_Ling) might automatically associate "Jenzabar" with the pagesin question as "strong tags".
Hope this helps.

-Eric

Dave Farber wrote:
Begin forwarded message:
*From:* Paul Levy <plevy () citizen org <mailto:plevy () citizen org>>
*Date:* November 17, 2009 7:50:27 AM EST
*To:* David Farber <dave () farber net <mailto:dave () farber net>>
*Subject:* *Question*
In the trademark case where I am defending the documentarists LongBow Group against a trademark lawsuit by Jenzabar, we havereceived an "expert report" from an individual named FrankFarance, who claims a long pedigree of involvement in standardsand specification development organizations. Mr. Farance insiststhat Google takes keyword meta tags into account in computingsearch rankings (even though Google itself has announced that itsranking algorithm does NOT support keyword meta tags). <http://pubcit.typepad.com/clpblog/2009/11/jenzabar-expert-witness-claims-that-google-still-uses-keyword-meta-tags.html>http://pubcit.typepad.com/clpblog/2009/11/jenzabar-expert-witness-claims-that-google-still-uses-keyword-meta-tags.html
I am curious whether others on the list have any comments on hisreport.
Paul Alan Levy
Public Citizen Litigation Group
1600 - 20th Street, N.W.
Washington, D.C. 20009
(202) 588-1000
<http://www.citizen.org/litigation>http://www.citizen.org/litigation
Archives <https://www.listbox.com/member/archive/247/=now> <https://www.listbox.com/member/archive/rss/247/> [Powered by Listbox] <http://www.listbox.com>




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com

Current thread:

VERY GOOD READING query re Google actions Dave Farber (Nov 17)