Escalation!
An exciting development today, coincidentally hot on the heels of our 0.7 release: The existence of the Validator (and other tools now using the same techniqu
e) has forced spammers to change their tactics.
Well, it took them half a year to figure it out, but tonight it happened: I received a spam pingback (spingback?) from a spam blog, and the Validator let it through clean. Which is should have, because indeed, the splog sent its pingback the way any pingback is sent: Via a post that contained a valid permalink to my targeted blog posting, obviously obtained via an automated scraping program.
So, what’s happened here? In order to successfully submit a spam TrackBack, a spammer has to:
- Set up a real blog (or blog-like website).
- Create a stable URL (like a blog post).
- Link from this post to your site.
- Send you a TrackBack from the stable blog post URL.
We knew this would happen (assuming the writer means TrackBack instead of Pingback) and consider this a victory: The
spammer is now giving you PageRank, but more importantly, his website looks just like a blog. It is, effectively, a real blog. Who’s to say he’s a spammer and not just another blogger out there (the contents of whose blog you’re not particularly impressed by)?
At this point, we’ve moved into a more philosophical area of spam prevention. I’ll still argue that this is a victory, however. Consider this: What if we “defeated” email spam to the point that the only “spam” you ever got in your INBOX was personal
notes, hand-written by advertisers, custom-tailored to your interests? Is that even spam anymore, or just email you’re not as interested in? I argue we’ve now made this exact leap with TrackBack spam.
Well, I installed TrackBack Validator, but I’m still getting trackback spam.
I admit I only have a foggy notion of what these terms mean. I assume trackback is some mechanism that enables your blog to show an alert when it’s referenced by some other blog.
I went to the one of the trackback spammer’s pages, listed in the trackback notice:
New trackback on your post # “We live what you read about”
Website: Contract Free Phone Sony PSP (IP: 69.31.32.5 ,
69-31-32-5.quantum-tech.com)
URI :
http://www.itsamobilephone.co.uk/contract-free-phone-psp-sony.php
Excerpt:
Contract Free Phone Sony PSP
Contract Free Phone Sony PSP
I guess this message is saying that the website’s linking to one of my entries. But when I go to
http://www.itsamobilephone.co.uk/contract-free-phone-psp-sony.php
There’s no mention of criticalbeatdowns.com. So what’s going on?
Fucking spammers
One of the frustrating things about the WordPress plugin API is that we can’t suppress the notification email you receive when a TrackBack arrives, even if the Validator decides it’s spam and blocks it. You need to check the post in question (in this case, “We live what you read about”) to see if the TB actually appeared on the site.
If the spam TrackBack is actually appearing, well, the Validator is broken. If the TB isn’t there, it was caught by the Validator, and WordPress was just over-eager in notifying you about the TB.
You seem to be assuming that the spammers won’t delete their blog entry once they’ve passed validation. What’s to prevent the spammers from establishing a “temporary” blog entry with a seemingly valid trackback in it, pass through the TB Validator, and then remove the temporary entry before moving on to their next spam target? I don’t think anything would prevent this, would it?
So, the claim that you pushed the spammer into creating a real blog that subsequently gives you PageRank is likely false. You’ve only required the spammer to program some additional automation into his TB spam-bot.
Don’t get me wrong — I applaud your effort and have recently installed TB Val on my own blog to see how it works. It just seems like you were a little quick to declare victory against the spammers in this post. The cat-n-mouse game will surely continue… we just need to make it as painfully inconvenient as possible for spammers.
Keep it up!
Steve: You’re right, nothing prevents a spammer from creating a temporary blog with the inbound links necessary to defeat the Validator. We observe (a) that we can go back and check TB links from time to time, clearing out spam (or other dead links); (b) almost all spammers choose not to bother with this kind of stuff (because there are other, more vulnerable blogs out there—making the Validator kind of like The Clubâ„¢ for blog spam).
[...] Making spammer manually enter text into your blog comment system is indeed very hard to defend, but that also means a victory of bloggers. Quoting from WordPress Trackback Validator Plugin: The existence of the Validator (and other tools now using the same technique) has forced spammers to change their tactics. [...]
My blog software is custom and written in ASP.NET and C#. I wrote my own trackback system and it uses the same principals you outline. The basic premise is that the standard trackback system works on the assumption that the sender of the trackback request is automatically trusted. I considered that the major failing of the specification so added a simple method of authentication to the process. When the trackback is sent, the sender’s referring URL is quoted. Simply make a HTTP request that page, and if a link to your page exists then accept the trackback. I wrote three short guides on how to implement it:
Part One is here:
http://www.junto.co.uk/Diary/2005/06/f6d6ba34-73fe-401b-ba7d-85a879df4f61.aspx
I have also logged the majority of trackback attempts, which currently stand at 3800 failed attempts. I log the IP, and browser, plus the content of the trackback.
I find that the majority of spam is contained in the excerpt. They know that the core URL field will be NOREL and useless to them for PageRank, but the excerpt text is generally not checked by the trackback system in WordPress. Most of the sites they are spamming are aimed at improving a sites Google Page Ranking, and for the most part their technique must be working. As with all spam, people wouldn’t do it if it wasn’t beneficial to them.
Unfortunately I haven’t logged each spam attempt in a database, but I have sent myself an email with the data, and you are welcome to copies of the 3800 if you want. If it can help your research and you can correct the problems that the big blogging engines have created then I’m right behind you. If Six Apart had thought about trackback properly they would have done a referrer check as part of the specification. I saw it is an inherent weakness the first time I read proposal. It is surely obvious that you must check the validity of such a request and you can’t just accept it on good faith.
Sadly I think that for the most part trackback is already dead and Six Apart are mostly to blame for that. They launched a good idea in an incomplete state and then ignored it. They haven’t updated the specification since August 2004.
The future for blogging backlinks is to use a blog tracker such as Google Blog Search or Technorati. Let the professionals deal with the spammers and you don’t have to worry about it. An API example for Google Blog Search is here if people are interested:
http://particletree.com/features/replacing-trackback-with-blog-search/
It is sad that trackback has suffered so badly. Blog users don’t understand it and unfortunately spammers do.
I announced a new version of my system via Technorati (Pingoat) last week. Since then the volume of trackback attempts has increased dramatically. I have a feeling that trackback spammers monitor such services to check the technology changes in the anti-spam blogsphere. They have been hammering my system ever since.
Steve above, points out what I feel is the final nail in the coffin for trackback. I also see that the flaw with the system we have both developed independently is that spammers will simply create a link to the correct page, send the trackback, we check it, pass it and then they replace their page with the content they are trying to spam people with. Sadly, not only do you have spam on your website, but you are by association placing your website in Google’s black books.
Currently the spammer’s software doesn’t support this kind of process (see Steve’s flaw), otherwise we would be seeing more successful trackbacks by the spammers. The development of such software is not exactly hard and will be undertaken by the spammers once the requirement is there. The more people you get to use your plug-in, the greater the probability the spammers will redevelop their software to match.
I plan to move to Blog Search asap.
How To Start A Blog
I couldn’t understand some parts of this article, but it sounds interesting
The Importance of Customer Service When Purchasing Process Manufacturing Software
Process manufacturing software is the lifeblood of the chemical and food production industries. Products from paint to peanut butter are manufactured using some level of process manufacturing software applications, and for good reason. Process manufact…