As I’m sure you’ve guessed by the inactivity here (including a period where many
of the blog’s posts weren’t even showing up), this project has gone dormant.1 Many
people are still using the Validator (even with WordPress 2.7) with positive re
sults. Here’s the spam sparkline for this blog, for example:
If you’re using the Validator and it works, great! If you’ve had problems or identified bugs, please let us know by leaving a comment on this post. We’ll try to round up outstanding issues and see if we can’t get them addressed at some point.
Just spotted in the wild: a spammer linking to the blogs he sends TrackBacks to, but using CSS tricks to hide all the links.
(The Validator wasn’t able to aut
omatically classify this as spam, of course.)
We have released a technical report about our TrackBack research and the effectiveness of th
e Validator in eliminating spam as of May 2006.
Abstract: The TrackBack protocol, conceived as a way to automatically link together web sites which reference one another, has become a new vector for spammers wishing to divert web surfers to their sites. A site which supports TrackBack allows any entity to inject arbitrary HTML code, plus the URL of the sender, into its pages; an attacker need only follow the TrackBack protocol to exploit the system and leverage such a site in a link farm. Current approaches to combating TrackBack spam are limited to content-based filters (of the sort currently used against email and weblog comment spam). In this paper, we propose a way to identify TrackBack spam by considering the relationship between the sender’s URL and the site under attack. In particular, we observe that, for spam TrackBacks, the page at the given URL does not link to the page to which the TrackBack was sent. We have developed software for weblog authors that rejects TrackBacks from sources lacking this reciprocal link. Data collected from our users demonstrates that this test is 100% accurate
and separating spam from legitimate TrackBacks.
Version 0.7.1 is now available for download. It fixes a bug on
WordPress 2.x blogs: wp_post.comment_count wasn’t correctly updated when rejecting spam TrackBa
cks, causing the blog’s frontpage to show one too many comments (if the current theme features a display of the number of comments per
An exciting development today, coincidentally hot on the heels of our 0.7 release: The existence of the Validator (and other tools now using the same techniqu
e) has forced spammers to change their tactics.
Well, it took them half a year to figure it out, but tonight it happened: I received a spam pingback (spingback?) from a spam blog, and the Validator let it through clean. Which is should have, because indeed, the splog sent its pingback the way any pingback is sent: Via a post that contained a valid permalink to my targeted blog posting, obviously obtained via an automated scraping program.
So, what’s happened here? In order to successfully submit a spam TrackBack, a spammer has to:
- Set up a real blog (or blog-like website).
- Create a stable URL (like a blog post).
- Link from this post to your site.
- Send you a TrackBack from the stable blog post URL.
We knew this would happen (assuming the writer means TrackBack instead of Pingback) and consider this a victory: The
spammer is now giving you PageRank, but more importantly, his website looks just like a blog. It is, effectively, a real blog. Who’s to say he’s a spammer and not just another blogger out there (the contents of whose blog you’re not particularly impressed by)?
At this point, we’ve moved into a more philosophical area of spam prevention. I’ll still argue that this is a victory, however. Consider this: What if we “defeated” email spam to the point that the only “spam” you ever got in your INBOX was personal
notes, hand-written by advertisers, custom-tailored to your interests? Is that even spam anymore, or just email you’re not as interested in? I argue we’ve now made this exact leap with TrackBack spam.
We’ve just released version 0.7 of the Validator; this is a strongly recommended upgrade for all our current users.
We have improved the reliability and robustness of almost all aspects of the plugin,
including spam classification, ad
ministration, and data reporting. Go grab version 0.7 and be free of TrackBack spam!
It appears that a vulnerability has been found in Movable Type allowing Trackback spammers free reign to sneak
links in without rel="nofollow". (I haven’t yet found details of the exact attack being used.)
Here’s a slightly edited version of the message I sent to wp-hackers today:
The first public version (v0.5) of the WP Trackback Validator is no
w available from the following URL:
The idea behind the Validator, which is under development by students in the Rice University Computer Security Lab, is simple: Trackback URLs that point to pages that don’t link back to your blog are bogus. It’s an easy test to perform, and one that no
current Trackback spammer is bothering to try to defeat; since we’ve started using this plugin on our personal WP blogs, our Trackback spam rate has dropped to zero.
This test is already present in some other anti-spam plugins, typically included among a hodgepodge of other content-based schemes and rules. If you’re looking for something lightweight that does one job extremely well, please check out the Validator.
The point of the project, in addition to helping to combat Trackback spam, is to collect data. We’re interested in the kinds of spams people get, from which sources, at what rate, etc. We’d like to
see if, once everyone starts applying the simple reverse-link check, the spammers step up their assault. In order to help us, the Validator distribution comes with a small shell script which will send us a profile of the spam you’ve caught recently.
So, in short, to save Trackback from an untimely death, try out the Trackback Validator plugin, and send us back some data. In the meantime, enjoy spam-free Trackbacks on your WordPress site.
Nice writeup of the current trends in spam blogs and RSS content theft.
/” title=”Understanding Blog and Ping”>
The last six months has seen a massive rise in content theft blogs and spam
Already some in the SEO industry are saying that Blog and Ping is dead due to the massive increase in users, content theft sites and spam blogs. If you’re getting any benefit out of Blog and Ping now, you won’t be for
much longer because already some search engines are talking about excluding your sites.