Monday April 11, 2005
False Positives
Browser Share Comparison
click image to view full size

Tim Bray and a few others started publishing their weblog's browser statistics soon after Firefox started taking off. Overall it showed that IE's market share was slowly going down. Slowly.

Having spent a bit of time perusing logs over the years I was skeptical. Not that IE was losing market share, but that it seemed to have so much. Looking through my weblogs I realized something that I didn't think anyone was accounting for: referrer spam.

Every weblog (and it's ISP) suffers under a deluge of comment and referrer spam traffic. These are automated attempts to post ads or links in website comments or referrer lists. Presumably to dupe you into clicking it or to influence the spam site's google juice. The spam robots masquerade as a browser visiting a website. They cycle through a pile of fake IP addresses and report themselves as popular browser types to avoid early detection and rejection.

The difference between the spammer and a normal visitor is that the spammer is essentially one-way traffic. They don't really download the page's html, much less request the images contained within. It's all about throughput, hit the server, leave a referrer...repeat and repeat.

Guess what the most popular browser type for referrer spam is? Internet Explorer.

Armed with this knowledge I decided to test Tim's log script against my own weblog after filtering out the spammers.

As I mentioned, refspam doesn't (currently) request images. I put a little bug graphic in the template for this weblog. Websites use these "bugs" all of the time to track users and gather statistics. The filesize is small and they are usually transparent. Only by examining a page's html would you spot one. Everyone who visits using a browser gets images from a site, which means they get the bug. To make sure it isn't cached I append a random number to it as the page is rendered.

My web bug is a little 1x1 pixel gif that looks like:
<img src="/d.gif?bugz8945712" width=1 height=1>

I've had a bug in this weblog for a month, only this weekend did I remember it and decide to see what it had uncovered. I filtered April's logs using two patterns: one to extract the web bug traffic and one for traffic sans bug. The output was sent through Tim's perl script. Here's the percentages of each. You can also click the image at the top of this page to see the results graphed.

Log Traffic

IE Mozilla Safari Opera Lynx
4/1   86.4 7.7 2.1 3.8 0.0
4/2   88.2 6.5 2.5 2.9 0.0
4/3   90.6 4.0 3.3 2.1 0.0
4/4   88.2 7.4 2.6 1.8 0.0
4/5   81.4 7.7 7.1 3.8 0.0
4/6   87.5 7.2 2.2 3.1 0.0
4/7   76.8 14.4 6.3 2.4 0.0
4/8   85.9 9.6 1.3 3.2 0.0
4/9   73.8 14.5 8.6 3.1 0.0

Web Bug Traffic

IE Mozilla Safari Opera Lynx
20.9 52.5 25.9 0.7 0.0
35.1 23.7 41.2 0.0 0.0
39.2 24.8 35.2 0.8 0.0
49.4 29.1 21.5 0.0 0.0
24.6 14.2 55.2 6.0 0.0
41.4 14.9 40.2 3.4 0.0
42.1 27.1 29.3 1.4 0.0
43.0 34.2 22.8 0.0 0.0
18.6 20.6 60.8 0.0 0.0

I was pretty surprised. Granted, heavily swayed towards Safari, but that's to be expected since most of my family and friends have Macs. Still, that really knocked the IE traffic down. To verify I visually scanned the log referrers. Tons of spam, all of it with IE browser type. In case you are curious about the opera numbers I found that some of the refspam is starting to use Opera.

Results will surely vary from one weblog to another. I'm curious if commercial websites have similar problems or if this is predominately a weblog scourge? It does help to illustrate what a heavy burden refspam puts on a site.


Pedraum • 2005-04-11 07:07pm

Very interesting. I would imagine that the phenomenon is most stark on sites that allow comments/tracksbacks but the bad guys are pinging everyone all the time so they're mucking with true browser numbers overall; the question is how much.

I don't get the spikiness in the filtered graph?
jerry • 2005-04-12 06:50am

I think the spikiness can be attributed to my traffic levels and readership. Since I only measured the weblog the number of visitors per day is relatively small and varies in user profile from day to day.
Karen • 2005-04-12 09:12am

"Everyone who visits using a browser gets images from a site, which means they get the bug."

Oh no, does that mean you gave me cooties?!

Seriously, that is interesting. Blog spam or spam blogs - don't know which is worse.