Multisite and dynamic configuration items in Drupal : help from Apache

Submitted by Frederic Marand on Sat, 2008-01-12 15:07

Having recently merged a number of separately configured sites into a single consolidated multisite, I found myself with URLs like:

http://www.site1.com/sites/www.site1.com/files/somefile.png

...on all except the main site of the multisite configuration.

Although img_relocator allows me to just type things like:

<img src="somefile.png" />

... and still have these URLs be automagically generated, they still appear to those users who actually look at the URLs.

Enter mod_rewrite

Abandon all hope, ye who enter here

With these sites running over Apache 2.x, mod_rewrite offers a simple way to rewrite the files URLs: in the Apache site definition file for the vhost describing the site, we can use:

<VirtualHost *>
  [...]
  RewriteEngine on

  RewriteCond %{REQUEST_FILENAME} ^/files
  RewriteRule ^(.*)$ /sites/site1$1 [L]
  [...]
  </VirtualHost>

And all of a sudden, the image links look

All of this is nice and well. Or is it ?

system.module and color.module

Actually, not everything works so well: both system.module and color.module generate directories and files on the fly, respectively files/css/someuniqueid and files/color/someotherid, and place them in the designated files directory for the sub-site.

But since we have overriden the actual sub-site placement of the files directory to make it appear under the root of the virtual host instead of where it actually is (that is to say, the files directory in the config for the sub-site is actually set to files instead of sites/www.site1.com/files, both of these modules generate their files in the files directory of the main site, which is only logical.

At this point, you'd probably want to return to the default files setting, or tweak the modules to force them to resolve the files directory relative to the path of the sub-site directory, but this is not necessary: all it takes is one additional rule, telling Apache not to rewrite these URLs:

<VirtualHost *>
  [...]
  RewriteEngine on

  RewriteCond %{REQUEST_FILENAME} ^/files
  RewriteCond %{REQUEST_FILENAME} !files/(color|css)
  RewriteRule ^(.*)$ /sites/site1$1 [L]
  [...]
  </VirtualHost>

Notice the additional RewriteCond ? This is how we tell Apache not to apply the RewriteRule if the path matches files/color or files/css. We needn't anchor to the start of the string since this line is only activated if the first line already matched, and that one is itself anchored.

Debugging mod_rewrite behaviour

Here be dragons

Now, not having worked a lot with mod_rewrite (I only touched the main Apache code itself years ago), I often find it is difficult at best to be exactly sure of the way the server processed the rewriting rules. Tweaking LogLevel, even to debug will only show the most obvious misses, like infinite loops in redirections, but not tell you anything about why they occurred. Or why an anchored search fails, for instance.

Luckily, Apache offers the RewriteLog parameter, even in Apache 1.x, and this one is a boon, but needs to be handled carefully, because raising the RewriteLogLevel will produce incredible volumes of debug info, slowing your server to a crawl. The lines to use being as such:

RewriteLog "/var/log/apache2/rewrite_log"
RewriteLogLevel 9

... I found it useful to use two small shell scripts:

a2enlog

#!/bin/sh
CONFIG=/etc/apache2/sites-available/$1

sed s/#RewriteLog/RewriteLog/ < $CONFIG > "$CONFIG".new
mv $CONFIG "$CONFIG".old
mv "$CONFIG".new $CONFIG

a2dislog

#!/bin/sh
CONFIG=/etc/apache2/sites-available/$1

sed s/RewriteLog/#RewriteLog/ < $CONFIG > "$CONFIG".new
mv $CONFIG "$CONFIG".old
mv "$CONFIG".new $CONFIG

Using the log

Now, considering the volume of data output to this log, it is typically best to use these two files along with a single query, that way:

a2enlog site1 ; apache2ctl restart ; wget http://www.site1.com/some/url ; a2dislog site ; apache2ctl restart

Unless the site is heavily loaded, in which case you probably shouldn't be messing with it while it is live, this will result is only your single request being logged, making things really understandable, especially if you cut the log vertically to remove the request separation info, which in this case is just noise. A simple instruction like 1G!Gcut -c76-500 in vi does it wonderfully, provided you adjust the first column number depending on your site name.

I run into the same issue and instead of using mod_rewrite to map sites/example.com/files/ to files/ I always used a simple Alias.

Are you using a single VirtualHost for all your sites?

As a side note, I wish Drupal was smart enough to generate URLs in short format directly, based on some configuration option.

Actually, the server has about twenty sites, on half as many vhosts, and about 1000 rewrite rules, to serve old URLs from previous generations of the sites, so I tend to use mod_rewrite rather automatically. But you're right: mod_alias could handle this, possibly even at a lower resource cost: it would have to be benchmarked.

I have a neat setup where the sites folder says what Drupal version you're running, so:

Drupal/5 -> Drupal/5.6
Drupal/5.6
Drupal/5.6/sites -> Drupal/sites/5
Drupal/sites
Drupal/sites/5
Drupal/6 -> Drupal/6.0-rc2
Drupal/6.0-rc2
Drupal/6.0-rc2/sites -> Drupal/sites/6
Drupal/sites/6

.... In the end it makes upgrading the system VERY easy (extract files, rename to version number, update symlinks, done). Symlinks probably require less resources, as their part of the file system instead of Apache.

Enabling clean URLs will get rid of URLs like http://www.site1.com/sites/www.site1.com/files/somefile.png and use something like http://www.site1.com/system/files/somefile.png.

Hi Rob,

Interesting layout you have, too. I've lost track of the number of solutions various Drupallers have chosen to manage their multisites: even in the handbook pages on d.o., several layouts are described; maybe at some point a consensus on best practices should emerge.

I actually use symlinks too, but like this:

drupal/mainsite : actual install
drupal/site1/subsite1 -> d5/main/sites/site1/subsite1
...
drupal/sitex/www : actual site not in the multisite
...

Since the main goal is to use only a single version for production sites, these are defined as vhostbyname entries, and symlinked to the production multisiten while site on alternate versions (6.x, 7.x) are just defined on their normal URLs. Switching them to production once the files are copied only involves removing their directory and symlinking.

Interesting post you did about desktop apps, BTW, although all the declarative stuff seems a bit "not in the spirit" of PHP coding, even though it obviously fits perfectly with C#.