24 May 2005: How to get spammed, part 1.
Unclear to me at this point is exactly the mechanism by which trackback spammers find their marks. There’s some evidence that there exist Web spiders which look for Movable Type trackback URLs at conventional locations (in particular, the amount of spam can be reduced by obfuscating this URL; see the previous post on best practices).
Presumably there are also more sophisticated spiders which examine pages for the magic bit of RDF XML which identifies the correct Trackback location; obviously obfuscation doesn’t help here. But how do the spiders find weblogs? It’s likely that they follow links from other weblogs that they know about. This would perhaps help account for the power-law distribution in spam: the more popular your site, the more inbound links, the more visible your site to attackers.
Here’s an interesting corollary, gleaned from the WordPress 1.5 announcement:
WordPress 1.5 aims to bring the joy back to comments. […] if you forget about your blog for a little while you won’t come back to find your domain a nest of spam (which begets more spam) […]
The quote is part of a section touting the new whitelisting features in version 1.5, meant to keep the comments flowing from known-good writers. But why would spam beget more spam?
What Matt seems to be saying is that, much like email spammers who use various techniques (Web bugs in spam emails, for example) to determine that an email address is “live”, weblog comment and Trackback spammers may rely on successfully posted spam links to identify vulnerable sites. If a spammer sees your website in his referrer logs (even if none of your readers click on the spam links, Google sure will), he knows your site is ripe for further exploitation.
It would be interesting to observe the increase in comment spam associated with a few deliberate clicks to spam websites (using your weblog URL as the referrer). Is it possible to go from zero to spam in this way?
Let’s find out!