Latest sites

Quick news

  • 2014-03-27: MongoDB Watchdog module ported to Drupal 8 at the Szeged Dev Days.
  • 2014-01-26: My post on the Symfony web profiler in Silex selected in Week of Symfony. w00t !
  • 2013-10-18: My first commit went into MongoDB today. And, guess what ? It's in JavaScript
  • 2013-09-20 to 29: Working on Drupal 8 EntityAPI at the extended code sprints during and around DrupalCon Prague
  • 2012-08-19: Working on Drupal 8 EntityAPI at Drupalcon Munich
  • 2012-06-15: Working on Drupal 8 EntityAPI at DrupalDevDays Barcelona
  • 2012-03-23: Working on the future Drupal Document Oriented Storage at DrupalCon Denver. D8 or later ? Bets are on Later

Debugging spammer mechanics

I've long been receiving quite high volumes of comment spam on this blog, which is why comments have always been pre-moderated. And, of course, there is usually not much to think of it. Not so with one of the spam messages posted today, which unwittingly provided an unexpected insight into the current mechanisms uses by spammers.

Selected excerpt

The actual spam text is attached for the interested reader (I hope the guys at Akismet and Mollom already know about this !), so here is just its beginning :

{
{I have|I've} been {surfing|browsing} online more than {three|3|2|4} hours today, yet I never found any interesting article like
yours. {It's|It is} pretty worth enough for me. {In my opinion|Personally|In my view}, if all {webmasters|site owners|website owners|web owners}
and bloggers made good content as you did, the {internet|net|web} will be {much
more|a lot more} useful than ever before.|
I {couldn't|could not} {resist|refrain from} commenting.
{Very well|Perfectly|Well|Exceptionally well} written!|
{I will|I'll} {right away|immediately} {take hold of|grab|clutch|grasp|seize|snatch}
your {rss|rss feed} as I {can not|can't} {in finding|find|to find}
your {email|e-mail} subscription {link|hyperlink} or {newsletter|e-newsletter} service.
Do {you have|you've} any? {Please|Kindly} {allow|permit|let} me {realize|recognize|understand|recognise|know} {so that|in order that} I {may just|may|could} subscribe.
Thanks.|
[...snip...]

The interesting bits

The interesting thing, of course, is the use of a syntax reminiscent of a regular expression with multiple levels of alternatives : did you notice the two levels of opening braces ? These enable a number of entirely distinct message bodies, apparently to increase variability of the spam text in order to foils similarity-based anti-spam protection services.

I wonder if one of you, actual readers, will have similarly interesting spam insights to provide ?

AttachmentSize
spam.txt15.16 KB