SEO UK

SEO UK - free SEO (search engine optimisation) advice, articles and tools for UK companies - www.seo4uk.com
0
google click monitoring

Is your every click being watched by the search engines?

by Paul Ireland, Tuesday 27 May 2008 (updated 8 July 2008, 26 Feb 2010)

UPDATE 26 Feb 2010 - It appears that recent google updates have obfuscated (made far more complex and difficult to decipher) their search results code far beyond recognition of what their code used to look like when this article was first created. Even before, the code was complex, but now it's a whole different level of complexity and obfuscation (with more code dynamically generated on the fly in the browser with JavaScript, AJAX - Asynchronous JavaScript And XML, document.write, and DOM element creation). Before, sneaky click monitoring was going on, and this is still going on with the new code, it is just more difficult to find where and how, but even after a quick glance, the tell-tale monitoring signs of "mousedown" and "new image" JavaScript code are there.
Why this extra level of obfuscation? Perhaps to make web scraping results much more difficult, perhaps to make other UI features like fades easier to add dynamically, perhaps something to do with Chrome (there is a special chrome.js which might be loaded for chrome browsers), perhaps to offload some processing power from the server to the client browser, perhaps to hide underlying intent further, perhaps to separate code from results content (it now appears that the HTML code sent to the browser does not contain any results, so the results must be loaded dynamically by the code in the browser)?
At present the following article refers to google search before this recent change in code, mainly because it is easier to follow and understand. If you really want to have a look at google's new code, although a health warning should accompany it, and don't expect me to explain it, then have a look at this recent search page code.


   Google Big Brother watching your mouse clicks
Google is tracking and monitoring what users click on in the natural results of google search pages, and in my opinion they are doing it in quite a secretive way, which tends to go against their "Don't be evil" motto and corporate code of conduct.

I had a revelation this weekend. I got something so wrong about Google that when it was eventually pointed out to me, not only did it make me rethink the technical issue in question, but it also made me re-evaluate my overall opinion of Google, as well as create this site so I could talk about it, spread the word, and cover other issues in this vein.

It all started with a SEO discussion on a UK business forum, late into a wet UK May bank holiday. The discussion can be found here (my forum name is awebapart.com). The forum thread started off as normal, somebody seeing their google rankings jumping up and down, possibly due to the latest Google Dewey update, but the discussion also went off on a tangent about Google click monitoring.

Before this discussion my belief was that Google was not this big brother company monitoring everything it could about its website visitors. Out of the big 3 search engines, Google, Yahoo, and Microsoft/MSN Live, I believed that only Yahoo was monitoring user clicks on natural results, and this was plainly visible if you looked at the status bar of your browser when moving the mouse over the Yahoo search results links, which revealed the web address as some complex address starting at the Yahoo site (so your click went back to Yahoo first, to be logged, before redirecting you to your chosen site).

At the time, I quite confidently pointed out that google's natural result links were direct links to the external website and therefore no logging of clicks was taking place. My confidence, or false sense of security in this issue, didn't come about because I had analysed in detail the code in a google search results page, at the time I didn't think that I had to, to me it seemed obvious at a glance what was going on, and there was no way that the Google 'do no evil' company was going to do anything 'under-hand', non-transparent, and hide the fact that they were logging clicks, it just didn't make sense from a technical and PR reasoning point of view (i.e. wouldn't there be a public outcry if it was discovered that Google was doing this and trying to do it in a secretive way, trying to cover it up).

Anyway, the discussion continued and it turned out I was so wrong on this issue. Another forum member pointed out the Javascript code where the logging is performed, and after looking into this I was left with a very different opinion not just on this issue, but on Google as a whole.

Google does log your clicks on the natural results. It does this by calling a Javascript onmousedown function just before the new link (external website) is loaded. The Javascript funtion is called in the A result link tag's onmousedown event handler:

<a href="http://www.awebapart.com/home/sitebuilder_features" class="l"
onmousedown="return clk(this.href,'','','res','3','')">sitebuilder features</a>

and the clk function creates a fake image load, asking for an image with a complex url which includes the monitoring information, a common method for logging:

window.clk = function(b, c, d, e, f, g) {
if (document.images) {
var a = encodeURIComponent || escape;
(new Image).src = "/url?sa=T" + (c ? "&oi=" + a(c) : "") + (d ? "&cad=" + a(d) : "") + "&ct=" + a(e) + "&cd=" + a(f) + (b ? "&url=" + a(b.replace(/#.*/, "")).replace(/+/g, "%2B") : "") + "&ei=my87SI2LFoyg1gb6r_3EDQ" + g
}
return true
};

Dummy Google results page

To see this in action I have created a dummy Google results page based on a real Google results page, and replaced the actual logging code with a windows alert (MessageBox) call, so that you can see the url logging string.

The logging URL looks something like:

/url?sa=T&ct=res&cd=2&url=http%3A%2F%2Fwww.resulturlclicked.com&ei=my87SI2LFoyg1gb6r_3EDQ

where cd=2 is the position on the page the result appears (e.g. 2)
ei=my87SI2LFoyg1gb6r_3EDQ is some user session based tracking id

whilst this information alone may not seem much, don't forget that as part of the logging URL request, Google can also log the time the link was clicked, and also match this information up with the user's original search query, and what page you are on, either via the user session id or via a HTTP_REFERER HTTP header request information.

Since Google compresses (stripping out white space, code indents, new lines, etc), and in turn obfuscates, its pages it is difficult to see what is going on under the hood. Therefore the dummy page has been reformatted and its code beautified to make it easier for you to see what is going on - just click view source.

Since the logging code is executed on a mousedown, you will notice that it not only logs when you perform a normal click, but also when you right click on a link to either "Copy Link Location" or open in a new tab or window. Interestingly, no logging is performed if the user avoids the mouse and navigates to the links using tabs and the return key.

Thanks to Duane on the forum for pointing this out to me (and the errors of my ways).

How long has google been doing this?

It is difficult to tell, but my best guess, is since November 2003. My guess is based on forum information, googling, and looking at past google results pages (old google results pages which have been saved as part of other websites - although this last piece of research cannot be entirely accurate since website owners may have altered the google results code before saving the page to their websites).

Why isn't it more widely known?

It isn't something that is easy to see when you view source, it isn't something that is widely reported, it isn't something that you can investigate easily unless you know what to google for, and it isn't something that is easy to track back in time since sites like www.archive.org (the wayback machine) do not archive google results pages. It is also a topic which was discussed on the webmasterworld.com private forum which requires paid subscription to join, hence its limited exposure. It is also a topic which doesn't surprise some people because misinformed novices don't understand how search engines rank sites and some think that is how search engines rank sites, by monitoring which sites are clicked on the most (in a kind of Alexa traffic monitoring / UK top 40 singles chart buying kind of way).

What has google got to say about it?

Almost hidden away (before July 2008), 4 clicks away from the standard google search page, you will find their privacy policy (from the search page click on "About Google" at the bottom, then click on "Privacy Policy" at the bottom, then click on their main "Google Privacy Policy" in the middle, then click on "Privacy FAQ" on the left). In item 5 you will see the following:

5. What information does Google receive if I click on a link displayed on Google?

When you click on a link displayed on Google, the fact that you clicked on the link may be sent to Google. In this way, Google is able to record information about how you use our site and services.

We use this information to improve the quality of our services and for other business purposes. For example, Google can use this information to determine how often users are satisfied with the first result of a query and how often they proceed to later results. Similarly, Google can use this information to determine how many times an advertisement is clicked in order to calculate how much the advertiser should be charged.


This statement talks generally about clicking on links, and since it is a general policy it applies to other services not just the search service, which can make you think well it doesn't apply in this case. It does not make it clear about the difference between google adsense clicks (clicks we naturally assume are logged otherwise it wouldn't be "pay per click" for the advertisers) and google natural results clicks.

(Update: in June 2008 Ask publicly named and shamed itself and Google about this practice of hiding privacy pages a few clicks away from the home page. Ask took the lead and announced that it would place privacy links on its home page. Very soon afterwards Google followed suit.)

Why is google doing this?

This is speculation, but currently I think google is doing it because it can, for internal quality assurance, and because it might use this information in the future - a bit like supermarkets monitoring people's purchases with loyalty cards. Whilst I have heard some people speculate that google might use this information as a factor in the 200 or so factors it uses for ranking, I have yet to see any evidence of this, if it was so the Pet Shop Boys pop site wouldn't appear first page of google when people search for pet shop, and I also think it would be open to click fraud / manipulation.

I very much doubt that google would seriously use this information as part of some realtime ranking factor, because if it did, it would be so open to spam, fraud, and manipulation. I'm not talking about hiring cheap labour to provide fake interested user stats, I'm talking about automated programs which can simulate users clicks with user randomness and variation programmed in (this technology has been around for years in the form of automated test tools, desktop tools simulating one or several users, and server tools which can simulate hundreds or thousands of users). It is a lot easier writing these kind of programs to simulate users searching and browsing than it is to write automated programs to add spam links to websites for link building purposes (spamming forums, guestbooks, blogs, cms's, feedback/review areas) - the latter, though much more difficult, has already been done, and the former has been partially done in the form of rank checkers and automated testing techniques. Tie this in with viruses and zombie pcs and that would be the spammers weapon of choice in creating thousands/millions of fake users to manipulate the new, even more flawed, system. Even if google have tested the water with this new system, surely there are sensible guys at google who would have realised by now that the system is flawed and would have removed it, or at least those parts which are flawed, if not and they go ahead with this they will be in for an embarrassing and rude awakening.

There is some SEO community talk/speculation about google going a stage further and monitoring, and using for ranking purposes, what happens after you click (either by onsite google analytics, user google toolbar monitoring, onsite adsense monitoring, further clicks on the google search results etc). Some SEO people are saying that this information could be used to determine how useful a site is, and for example if the user bounces back to google this means the site wasn't useful. I personally do not agree with this. I can give a few examples where even if google was capturing this information for serps purposes, it would be meaningless anyway:

1. What constitutes a useful site and what constitutes a useless site? If google can monitor what search results get clicked on and monitor whether you visit a number of pages within that site by other means (google analytics, google toolbar), what meaning can we get out of how many pages on the site were viewed. A useless site might have one page view (you know straight away the site is useless), or several page views (you browse the site looking for what you want, can't find it, give up and then realise it is useless). Likewise a very useful site could have exactly the same pattern, one page view because exactly what you are looking for is on that one page, or several page views because it is interesting. A very useful site might have no pages viewed and no clicks from google results, because the answer the user is looking for is in the google results description.

2. When you click on a few results how could google work out that one is more appropriate than another? I've heard some people mention time, if you click on result 1, then 5 seconds later you click on result 2, the 5 seconds later you click on result 3 that could tell google that results 1 and 2 were no good and result 3 was better? But what about people who multi-task, and want to compare different sites (e.g. prices), and simply open up all 3 results in different tabs on their browser, or 3 different windows for later viewing? In a windowed multi-tasking environment just because you start clicking on other web pages it does not mean that you are finished with the other web pages.

And again if google was affecting serps in this way, it would again be open to click fraud and manipulation.

The words in this article are copyright seo4uk.com 2008 - you do not have permission to reproduce, or cut or paste, these words into any other website, forum or any other electronic or printed media.

To return to previous page click here

 
0
Wed 27 Aug 2014 web design | login Copyright © 2014 SEO4UK.COM