- 2011-09-01: Building an Unfuddle to Drupal Casetracker import module using Migrate
- 2011-08-28: Back from DrupalCon London and its WSCCI code sprint. Wow.
- 2010-12-21: Madame Figaro brand new site by OSInet and others
- 2010-08-16: France.FR is back online with OSInet and Typhon
- 2010-06-15: the new http://www.franceculture.com/, which OSInet helped reach its performance goals, is now online
- 2010-06-13: the OSInet Features Server is live
- 2009-11-29: mongodb_watchdog module created by dereine, ported to D7 by me in about half an hour, and migrated in a larger MongoDB project by damz before the hour ended. Wow...
- 2009-02-03: the new Drupal-based site for the golden jubilee of the french "Ministère de la Culture", which OSInet helped build, is now online
Force valid HTML with valid_node module
Having non-HTML-skilled contributors input content on a Drupal site seems to often lead to invalid HTML tag soup being input. And even with seasoned coders, a HTML input error happens sometimes, which can be a problem until someone fixes the post.
So I figured I'd force valid HTML from user input, and here is the proof-of-concept valid_node module: it will force any node to be saved as a XHTML fragment.
XHTML fragments
Since the page as a whole is being built by drupal, the data input by the user is never valid XHTML on its own, since it always misses the XML prolog, head section, and body element. However, valid node content should typically be valid XHTML when wrapped in these elements.
This is exactly what the module does: wrap the node body in a prolog and epilog, validate the whole, and remove the processed prolog and epilog, leaving us with the "fragment".
For an interesting but ignored background, I suggest you look at the XML fragment interchange recommendation of the W3C for some background on the issues involved with the concept.
Let's do IT
OK, now what ? Parsing HTML on the fly within a drupal module is typically out of the question, because it is too much work. Parsing the fragment with the XML DOM PHP extension is not a good idea: it will reject any invalid construct instead of correcting them. So ?
Enter tidy. Read about it, but in short, it does just what we are looking for. It is even available as a PECL extension, but since this is not part of the drupal requirements, we can't really rely on it. However, it is also available as a command-line binary, easy to install on either Windows or Linux/UNIX, so the module uses it.
The module settings allow you define the actual command used for tidying (possibly setting the path, adding options...), and the number of lines to trim from top and bottom of post to remove the prolog and epilog and regenerate the fragment. Some options of tidy could be added to the default -asxml to forcefully remove unwanted attributes.
The module is available from my sandbox on drupal.org.
Note that it is only "proof of concept", and will, for instance, not validate dynamically generated content like PHP nodes.
UPDATE 2006-10-02: valid_node or htmltidy module ?
I just noticed the HTMLtidy module on drupal.org. Although the goal and results are not strictly identical, there have much in common.
The main differences seem to be:
- HTMLtidy is older has has evolved longer, so it has more features
- HTMLtidy has more settings
In both cases, the mechanism is to use hook_nodeapi(), wrap a fragment, tidy the result, and strip the wrapping. Check the sources for more details.





Recent comments
2 weeks 3 days ago
11 weeks 3 days ago
21 weeks 2 days ago
21 weeks 2 days ago
22 weeks 4 days ago
23 weeks 2 days ago
26 weeks 1 day ago
39 weeks 6 days ago
41 weeks 1 day ago
41 weeks 2 days ago