31 May 2006: Technical Report on TrackBack

We have released a technical report about our TrackBack research and the effectiveness of the Validator in eliminating spam as of May 2006.

Abstract: The TrackBack protocol, conceived as a way to automatically link together web sites which reference one another, has become a new vector for spammers wishing to divert web surfers to their sites. A site which supports TrackBack allows any entity to inject arbitrary HTML code, plus the URL of the sender, into its pages; an attacker need only follow the TrackBack protocol to exploit the system and leverage such a site in a link farm. Current approaches to combating TrackBack spam are limited to content-based filters (of the sort currently used against email and weblog comment spam). In this paper, we propose a way to identify TrackBack spam by considering the relationship between the sender’s URL and the site under attack. In particular, we observe that, for spam TrackBacks, the page at the given URL does not link to the page to which the TrackBack was sent. We have developed software for weblog authors that rejects TrackBacks from sources lacking this reciprocal link. Data collected from our users demonstrates that this test is 100% accurate at identifying and separating spam from legitimate TrackBacks.

21 May 2006: Bugfix release: 0.7.1.

Version 0.7.1 is now available for download. It fixes a bug on WordPress 2.x blogs: wp_post.comment_count wasn’t correctly updated when rejecting spam TrackBacks, causing the blog’s frontpage to show one too many comments (if the current theme features a display of the number of comments per post).

17 May 2006: Escalation!

An exciting development today, coincidentally hot on the heels of our 0.7 release: The existence of the Validator (and other tools now using the same technique) has forced spammers to change their tactics.

Well, it took them half a year to figure it out, but tonight it happened: I received a spam pingback (spingback?) from a spam blog, and the Validator let it through clean. Which is should have, because indeed, the splog sent its pingback the way any pingback is sent: Via a post that contained a valid permalink to my targeted blog posting, obviously obtained via an automated scraping program.

So, what’s happened here? In order to successfully submit a spam TrackBack, a spammer has to:

  1. Set up a real blog (or blog-like website).
  2. Create a stable URL (like a blog post).
  3. Link from this post to your site.
  4. Send you a TrackBack from the stable blog post URL.

We knew this would happen (assuming the writer means TrackBack instead of Pingback) and consider this a victory: The spammer is now giving you PageRank, but more importantly, his website looks just like a blog. It is, effectively, a real blog. Who’s to say he’s a spammer and not just another blogger out there (the contents of whose blog you’re not particularly impressed by)?

At this point, we’ve moved into a more philosophical area of spam prevention. I’ll still argue that this is a victory, however. Consider this: What if we “defeated” email spam to the point that the only “spam” you ever got in your INBOX was personal notes, hand-written by advertisers, custom-tailored to your interests? Is that even spam anymore, or just email you’re not as interested in? I argue we’ve now made this exact leap with TrackBack spam.

17 May 2006: New version 0.7.

We’ve just released version 0.7 of the Validator; this is a strongly recommended upgrade for all our current users. We have improved the reliability and robustness of almost all aspects of the plugin, including spam classification, administration, and data reporting. Go grab version 0.7 and be free of TrackBack spam!