Substantially improved ... but not perfect. The quiz I have put up at
http://www.passenger.chat/23875 is especially testing of the web and database elements and to some extent I have put that up to "push" the system and provide evidence if it breaks further.
It has run well over the weekend ... and I suspect that's not entirely a co-incidence. I have:
* Banned a couple of aggressive crawlers
* Retired some older heavy database URLs that crawlers loved (%)
* Ceased back bus service calls which were hanging processes (%)
* Widened SQL door to allow more parallel connections
* Cleaned out accumulated database logs including heavy context setters
Actions marked (%) could usefully have slimmer/better alternatives put in place - change is a bit of a sticking plaster. I have also enhanced a couple of logs - hoping to see if more can be learned about current running and help in future analysis.
Still a "watching brief'. Our databases back up from time to time, and there are certain times bookkeeping is going on in the background, which may effect access. We rather crudely lock tables while we back them up and hold requests until the backup copy is completed and if the holding pen is full of excited requests ("Flying Scotsman spotted at Goonbarrow" or "First day of hourly Oxford to Fawley via Trowbridge and Eastleigh service - pictures") we may have an issue.