At long last, I've moved the OSInet sites from shared hosting to a VDS over the 4-day Xmas period, and overall it went well.
However, with new users (dare I say bots ?) eager to refresh the site after a few hours offline, I soon had to stand about 20 hits/sec, which proved to be a bit too much on this small VDS. So I enlisted the help of APC, just like the drupal.org sysops do, and like I do on the group's intranet servers and it went really faster.
Except after a few minutes, I eventually entered the true realm of winter holidays: pure white screens, without any text on them. Also known as White Screen Of Death to the Drupal community.
The apc.php dashboard supplies with APC gave some very interesting insight into what was happening: it helped tune APC and even Apache by reducing most of the default parameters to fit them to such a smallish server, until I managed to reduce them. The most interesting point, if you track this, appears when the cache fills up: at some point, it empties the entries above the TTL duration, ready for reallocation... and one can find the site with 0 item in cache, and still a full cache on the stats !
I haven't looked into the APC code yet, but i noticed this appeared to be point at which WSODs manifested themselves, even though the server still had available RAM. Pending further examination, I suspect this means the entry lifetime set in
apc.ttl has expired, but the garbage collection defined by
apc.gc_ttl hasn't run yet. This one seems to be fixed at 3600, whatever it is set to in
php.init, although its reference describes it as being settable there. Looks like mucho debug ahead !
Restarting Apache clears the problem, luckily, as we've learnt from the day-to-day operation of drupal.org, but I hope to find a better solution.