Debugging spammer mechanics

Quick news

2025-02-14: Working on a TLA+ model for Kafka
2024-08-01: I'm back at Deliveroo to work on Kafka production at scale
2019-10-26: Presenting on NoSQL in Drupal 8 at DrupalCon Amsterdam
2019-02-15: Presenting on FranceTV Sport on Drupal 8 at DrupalCamp Paris
2018-06-08: Presenting on how to react to a hacked site at Drupal Hack Camp
2014-01-26: My post on the Symfony web profiler in Silex selected in Week of Symfony. w00t !
2013-10-18: My first commit went into MongoDB today. And, guess what ? It's in JavaScript

Latest sites

2017-11-26: New Drupal 8+ site at Rue du Commerce, architected and tech-led by OSInet, just went throught Black Friday week with flying colors thanks to RabbitMQ
2017-02-20: New Drupal 8+ site galaxy (+/- 70 sites) for Agences Régionales de Santé architected and tech-led by OSInet, delivered by Klee
2015-08-21: 50% less server load with MongoDB on the Drupal site factory at France Télévisions
2015-07-15: Our first Drupal 8 production site at France Télévisions is live
2014-08-18: 400% speedup in 3 weeks for http://france3-regions.francetvinfo.fr/ : who said Drupal back-offices had to be slow ?
2014-02-07: Sotchi Olympics traffic not a problem for http://www.francetvsport.fr/ , which I rearchitected on Drupal 7 in 2013
2011-09-14: Completed migration of FranceInfo.FR from SPIP to Drupal

Submitted by Frederic Marand on Sun, 2014-07-20 09:40

Add new comment

spam

I've long been receiving quite high volumes of comment spam on this blog, which is why comments have always been pre-moderated. And, of course, there is usually not much to think of it. Not so with one of the spam messages posted today, which unwittingly provided an unexpected insight into the current mechanisms uses by spammers.

Selected excerpt

The actual spam text is attached for the interested reader (I hope the guys at Akismet and Mollom already know about this !), so here is just its beginning :

{
{I have|I've} been {surfing|browsing} online more than {three|3|2|4} hours today, yet I never found any interesting article like 
yours. {It's|It is} pretty worth enough for me. {In my opinion|Personally|In my view}, if all {webmasters|site owners|website owners|web owners} 
and bloggers made good content as you did, the {internet|net|web} will be {much 
more|a lot more} useful than ever before.|
I {couldn't|could not} {resist|refrain from} commenting.
{Very well|Perfectly|Well|Exceptionally well} written!|
{I will|I'll} {right away|immediately} {take hold of|grab|clutch|grasp|seize|snatch} 
your {rss|rss feed} as I {can not|can't} {in finding|find|to find} 
your {email|e-mail} subscription {link|hyperlink} or {newsletter|e-newsletter} service.
Do {you have|you've} any? {Please|Kindly} {allow|permit|let} me {realize|recognize|understand|recognise|know} {so that|in order that} I {may just|may|could} subscribe.
Thanks.|
[...snip...]

The interesting bits

The interesting thing, of course, is the use of a syntax reminiscent of a regular expression with multiple levels of alternatives : did you notice the two levels of opening braces ? These enable a number of entirely distinct message bodies, apparently to increase variability of the spam text in order to foils similarity-based anti-spam protection services.

I wonder if one of you, actual readers, will have similarly interesting spam insights to provide ?

spam.txt

fgm @ OSInet

Looking for Go / Kafka / Drupal / MongoDB expertise ? Contact me, see how we could work together.
Have you read my Go book and its blog ?
Is your site reliably up ? Check with the free Log4U uptime monitoring service.
Check my drupal.org "fgm" profile
Find me on

Follow me on LinkedIn