Debugging spammer mechanics

Submitted by Frederic Marand on

I've long been receiving quite high volumes of comment spam on this blog, which is why comments have always been pre-moderated. And, of course, there is usually not much to think of it. Not so with one of the spam messages posted today, which unwittingly provided an unexpected insight into the current mechanisms uses by spammers.

Selected excerpt

The actual spam text is attached for the interested reader (I hope the guys at Akismet and Mollom already know about this !), so here is just its beginning :

{
{I have|I've} been {surfing|browsing} online more than {three|3|2|4} hours today, yet I never found any interesting article like
yours. {It's|It is} pretty worth enough for me. {In my opinion|Personally|In my view}, if all {webmasters|site owners|website owners|web owners}
and bloggers made good content as you did, the {internet|net|web} will be {much
more|a lot more} useful than ever before.|
I {couldn't|could not} {resist|refrain from} commenting.
{Very well|Perfectly|Well|Exceptionally well} written!|
{I will|I'll} {right away|immediately} {take hold of|grab|clutch|grasp|seize|snatch}
your {rss|rss feed} as I {can not|can't} {in finding|find|to find}
your {email|e-mail} subscription {link|hyperlink} or {newsletter|e-newsletter} service.
Do {you have|you've} any? {Please|Kindly} {allow|permit|let} me {realize|recognize|understand|recognise|know} {so that|in order that} I {may just|may|could} subscribe.
Thanks.|
[...snip...]

The interesting bits

The interesting thing, of course, is the use of a syntax reminiscent of a regular expression with multiple levels of alternatives : did you notice the two levels of opening braces ? These enable a number of entirely distinct message bodies, apparently to increase variability of the spam text in order to foils similarity-based anti-spam protection services.

I wonder if one of you, actual readers, will have similarly interesting spam insights to provide ?