Web Analytics: Log Files vs. Cookie-Based Tracking

A few days ago I sent this off to Jason Dowdell in response to his post about the recent cookie tracking scare/frenzy. By the time I finished the email I began to think it might make a decent post…

Hi Jason,

I just recently came across your blog (although I’ve known of it for a little while) and specifically your post on using cookies vs. log files vs. flash for tracking. (http://www.marketingshift.com/2005/03/cookies-vs-flash-for-client-side.cfm#comments)

IMO (and probably most everyone else’s) the deletion of cookies and possible need for an alternative means of accurate, deep methods of visitor tracking has got to be one of if not the most important SEM and internet marketing issues over the next year or two.

Personally I’m quite fond of a third-party cookie-based tracking service called HitsLink, which has grown a great deal in popularity over the past year. For about the past two years I’ve run my own private SEO firm and utilized this as my main means of tracking and statistic analysis. As most of my clients are small in size and simply need basic traffic and campaign/conversion stats, it has proved a great value.

Here’s where you may actually be interested though 🙂 I recently began working for a large firm as their in-house SEO specialist. My first action in my role has been to improve the stats we’ve been using, which up until a few weeks ago was an older version of WebTrends. It gave us virtually no conversion stats (only traffic) and I highly questioned its accuracy, especially regarding search engine referral numbers (esp. for certain key phrases). However, this I believe can be a common issue with all log file analytics, not just WebTrends.

Its still early, but using cookies our traffic reported by the cookie-based method is literally about 50-60% of what WebTrends was reporting using only the log files. I suppose the good news is that our conversion rate is higher than we thought!

I expected a difference, but almost half? In trying to determine the reasoning for this I came across the same initial conclusion that Eric Peterson from Jupiter Research noted in your comments section – spiders, bots, etc. This certainly accounted for a large portion of the discrepancy, but no matter how I crunch the numbers I can’t seem to have crawlers justify more than 10-12% of total visitors reported by the log files. Surely some visitors as reported by the cookie method will not have JS or cookies enabled, but again I don’t see how that is any more than 5%, even by aggressive estimates.

Anyhow, wish I could wrap this up with some sort of revelation, but really the main point that I though might interest you is that we didn’t notice a 25% difference, but much closer to a full 50% difference between the two methods.

I believe that the cookie method is closer to the actual “truth”, as I did a mini-study using our Google AdWords campaigns… Our cookie-based tool shows X visitors over a given time period from our AdWords campaigns (we used a ?source= variable in the URL), and the numbers reported by Google in their interface (the number of clicks they say they have sent us and are thus billing us for) is consistently within about 5% – very close to each other. Thus, in this case I’m using Google’s measure as the “truth” (which admittedly may be flawed), but figure that they are counting clicks going out, and we are counting them coming in – two different methods, both reporting similar figures. Thus, until reason otherwise, I believe our cookie method is more accurate than the log files, and happens to report just over 50% of the unique visitors we had previously thought were on the site.

Keep up the good blogging! I’ve just added your blog to My Yahoo for tracking and such.

Kind Regards,

Jon Payne

PS – I may recycle this comment on my blog at www.jonpayne.net. Writing this email got the juices flowing…