grahame
|
|
« on: September 01, 2014, 11:14:39 » |
|
Overnight, we had some database issues and the forum wasn't available for about an hour this morning while I fixed them - sorry for the outage. I'll be posting / testing a lot during the day as I investigate what happened; please excuse any further slight blips.
I had to backup / restore data, and I think I got the synchronisation right - in particular no recent messages were lost to my knowledge. It appears that the damaged section dated from around Christmas 2013 to Febuary 2014 - please let me know if you spot any errors in my stitching things back together from around that date!
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
Chris from Nailsea
|
|
« Reply #1 on: September 01, 2014, 14:50:05 » |
|
Thanks for your efforts on this one, on behalf of all Coffee Shop forum members, grahame.
|
|
|
Logged
|
William Huskisson MP▸ was the first person to be killed by a train while crossing the tracks, in 1830. Many more have died in the same way since then. Don't take a chance: stop, look, listen.
"Level crossings are safe, unless they are used in an unsafe manner." Discuss.
|
|
|
Worcester_Passenger
|
|
« Reply #2 on: September 01, 2014, 15:01:30 » |
|
Thanks for your efforts on this one, on behalf of all Coffee Shop forum members, grahame. Agree completely! And I loved the system-down message about cleaning the coffee machine.
|
|
|
Logged
|
|
|
|
grahame
|
|
« Reply #3 on: September 01, 2014, 16:12:26 » |
|
OK ... I'm going to declare it fully up and running again.
There appears to have been a small piece of database corruption the occurred around 24 hours ago - not noticed at that point, except that the snapshot backups that happen through the day were dropping out in the middle so weren't completing; by the time it came to my attention, quite a few posts had been made after the last complete backup, and indeed good backups were being overwritten by faulty ones. Fortunatley, we have enough redundancy in the system to recover things, and I was able to prod the database to skip over the corrupt area and do me a backup to include current stuff, after which I was able to completely remove and reload all messages from 2008 to the very latest one that had been posted at around 9 a.m. today.
I've done a lot of posting / answering / making sure I didn't spot anything unusual today. Please excuse the inane content of some of my posts. Good news is the backup system's running again now and replacing the faulty backups by good ones. And I've downloaded a set just in case we have other problems - there's always a seed of doubt when something like this happens!
The download is 43 Mbytes compressed - 144 Mbytes uncompressed. There are 158,202 posts in 12,392 topics by 1,623 members. There are also 10,093 personal messages stored on the system, and 62 polls with a total of 339 options available and 2,287 votes. There have been 613 attempts to log in by 34 locked accounts - those locks being either accounts which have been closed to further posts as the member wishes to leave (but cannot delete content because it would destroy continuity), or outright bans where we have had to remove posting rights from a member. Some of those locks may not current (i.e. the accounts may have been re-opened since). 373 events have been added to our calendar, and there are 44 boards here. There are 1122 lykes registered. We have 395,679 log records about who's read what so that we can point you to posts you haven't seen ...
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
SDS
|
|
« Reply #4 on: September 01, 2014, 17:00:27 » |
|
Explains the odd (php?) error message I got about being unable to connect to the database.
|
|
|
Logged
|
I do not work for FGW▸ and posts should not be assumed and do not imply they are statements, unless explicitly stated that they are, from any TOC▸ including First Great Western.
|
|
|
grahame
|
|
« Reply #5 on: September 11, 2014, 14:19:29 » |
|
Explains the odd (php?) error message I got about being unable to connect to the database.
We have been getting some "failed to connect" database errors over the past couple of weeks - usually indicates a very busy server. If you go to a shop and see a long queue, you're likely to walk away and the queueing is to some extent self regulating, but if you go to a web site you don't have a clue how busy it is so you'll join the queue anyway. Here is my "control" on the server ... the sort of thing I expect in loading: and here is what I'm getting at the moment: For future readers, here's a dynamic report: Analysis shows that we're getting up to 8700 hits a day from 1 address IP: 193.201.224.32 Decimal: 3251232800 Hostname: 193.201.224.32 ISP: PE Tetyana Mysyk Organization: PE Tetyana Mysyk and they're in batches of up to 350 per minute ... more to follow ... Edit - final diagram script changed to emphasise the current day's data
|
|
« Last Edit: September 12, 2014, 15:47:59 by grahame »
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
grahame
|
|
« Reply #6 on: September 11, 2014, 15:20:05 » |
|
more to follow ...
Here's a typical series of counts minute by minute ... showing requests completed in the minute. 0 0 0 0 0 0 263 176 6 0 0 0 Nothing to suggest any form of "personal" attack ... and our server has withstood it quite well. However, I'm going to take measures to cut out those requests and (if my theory is right) we should see the black line on the graph even out as it extends through the rest of the day.
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
grahame
|
|
« Reply #7 on: September 11, 2014, 16:47:48 » |
|
Final post in this series, unless problems recur .. up until I took the log file snapshot at 13:15 (from 03:30 this morning), here are the hosts that have made the most requests to our server: 388 xx.xx.xx.xxx << Our busiest member ;-) 472 msnbot-157-55-39-18.search.msn.com << This series in Bingbot crawler 480 157.55.39.49 481 148.251.51.240 << Majestic Search Engine Country Germany Code DE Town Kiez Region Mecklenburg-Vorpommern 591 msnbot-207-46-13-40.search.msn.com 688 msnbot-157-55-39-19.search.msn.com 704 77.68.227.217 << Denmark Looks like it's crawling for a new search engine - http://www.abonti.com 709 157.55.39.211 993 blexn4.webmeup.com << http://webmeup.com Crawling our site so they can sell a backlink checking services to others (and us) (a bit cheaky - charging for link data we provide for free!) 1057 msnbot-157-55-39-153.search.msn.com 1068 157.55.39.212 1206 msnbot-207-46-13-35.search.msn.com 1359 157.55.39.50 1836 157.55.39.155 2025 157.55.39.202 3636 193.201.224.32 << Our Ukrainian "friends" In each case the number is the request count. MSN / Bing is always busy indexing - the busiest Goole indexer made 278 requests, but in those cases they are spread out. With our problem visitor being active for about 1 minute in 30, their effective request level had risen to over 100,000 requests in the period at peak rates, or 50 times that of the busiest Microsoft system. I've just taken a look at the graphs and they appear to be settling down ... I'll just finish off with a log of the number of accesses fro 192.201.224.32 up to 03:30 each day in August and September so far; I love playing with figures and studying the profiles of these things! ac_20140801:0 ac_20140802:0 ac_20140803:0 ac_20140804:0 ac_20140805:0 ac_20140806:0 ac_20140807:0 ac_20140808:0 ac_20140809:0 ac_20140810:0 ac_20140811:0 ac_20140812:28 ac_20140813:54 ac_20140814:8 ac_20140815:28 ac_20140816:22 ac_20140817:4 ac_20140818:0 ac_20140819:16 ac_20140820:2 ac_20140821:14 ac_20140822:18 ac_20140823:30 ac_20140824:10 ac_20140825:0 ac_20140826:0 ac_20140827:0 ac_20140828:22 ac_20140829:0 ac_20140830:6922 ac_20140831:6021 ac_20140901:8669 ac_20140902:4605 ac_20140903:499 ac_20140904:4857 ac_20140905:7533 ac_20140906:5636 ac_20140907:8144 ac_20140908:6149 ac_20140909:1273 ac_20140910:3397 ac_20140911:6043[code]
[/code]
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
thetrout
|
|
« Reply #8 on: September 11, 2014, 19:21:15 » |
|
Some very brief research suggests that IP is a known Forum Spammer. Probably some form of Bot Net. Without wishing to state the obvious. Could you block the IP on the servers firewall from all traffic? The worst crawler I've come across are the Israeli PicScout servers owned and managed by the friendly and good ethics company Getty Images PicScout Crawler Servers have been known to completely crash servers in the past... They also don't like being called out for it...
|
|
|
Logged
|
|
|
|
grahame
|
|
« Reply #9 on: September 11, 2014, 19:41:24 » |
|
Some very brief research suggests that IP is a known Forum Spammer. Probably some form of Bot Net. Without wishing to state the obvious. Could you block the IP on the servers firewall from all traffic? Yep, I got that too ... I haven't quite published everything I've got The worst crawler I've come across are the Israeli PicScout servers owned and managed by the friendly and good ethics company Getty Images PicScout Crawler Servers have been known to completely crash servers in the past... They also don't like being called out for it... Most crawlers are reasonably polite ... I've come across a few that aren't over time, including one university who when challenged said they were researching how many people would challenge them, or block them.
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
grahame
|
|
« Reply #10 on: September 12, 2014, 10:32:34 » |
|
Reviewing the morning after ... and the curves look much cleaner. The host in question continues to make requests but is turned away at the door and told it's forbidden, rather than have us waste time giving it answers.
In the 24 hours to 03:30 this morning it made 7378 requests, and in the last 7.5 hours we've turned it away another 1848 times, including 339 times within one 4 second period. Now that we're turning it away quickly, we can see just how intense the contacts have been, whereas previously we couldn't be sure as we look at the completion time of our request handling and not when they arrive in.
These days there's always all sorts of "odd" traffic floating around that I keep an eye on ... and our security / ban / odd visitor files bear evidence to this. However, there's nothing else at the moment that's remotely as intense as what's been coming from the Ukraine.
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
grahame
|
|
« Reply #11 on: September 14, 2014, 08:42:26 » |
|
Still visiting in blocks of around 350 within a second or two (6340 requests yesterday)- but now being turned away without us going all the work of providing an answer to the detriment of our "real" users and the polite automata. Take a look at the graph upthread which I have modified slightly to emphasise the current day and you'll see on the final panel the a couple of days of old data still left there, and the current much smoother loadings. You may note a sharp rise then a much more blunt recovery on the graphs - that's because the graph is taken of a weighted average from the last "n" minutes. If I was to plot snapshots, the graph would spike much higher and come down again very quickly, to the extent that events would be lost at times, and the whole thing would be much harder to read due to a lack of continuity. It's rather like the weighted averages that we have on the monthly forum stats where the green curve gives perhaps the best reading - http://www.firstgreatwestern.info/coffeeshop/index.php?topic=10596.0 . I do have some "heartbeats" in place to get automated alerts when the server is busy (they're snapshot checks that happen every so often) or indeed unreachable, and they'll usually flag up issues which are ongoing when the heartbeat is checked
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
grahame
|
|
« Reply #12 on: September 15, 2014, 23:22:20 » |
|
I had better take a further look ... The yellow line is the data from the day before I fixed the last issue, the big red and black spikes in the evening are explainable (major backup procedures) ... but there look like some other traffic that's had a significant effect on performance today. I best go take a look!
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
grahame
|
|
« Reply #13 on: September 17, 2014, 15:26:06 » |
|
I best go take a look!
Sorted - if you look up the thread, today's graph is much cleaner. Problem was a housekeeping script which took an awful lot of c.p.u. as it hadn't been written for a site that was this big, and was being run by the search engines as they indexed the site. I've moved it away from a public URL to run occasionally at quiet times now.
|
|
|
Logged
|
Coffee Shop Admin, Chair of Melksham Rail User Group, TravelWatch SouthWest Board Member
|
|
|
IndustryInsider
|
|
« Reply #14 on: September 17, 2014, 15:32:15 » |
|
Sorted - if you look up the thread, today's graph is much cleaner. Problem was a housekeeping script which took an awful lot of c.p.u. as it hadn't been written for a site that was this big, and was being run by the search engines as they indexed the site. I've moved it away from a public URL to run occasionally at quiet times now.
I had a feeling that might be the problem...
|
|
|
Logged
|
To view my GWML▸ Electrification cab video 'before and after' video comparison, as well as other videos of the new layout at Reading and 'before and after' comparisons of the Cotswold Line Redoubling scheme, see: http://www.dailymotion.com/user/IndustryInsider/
|
|
|
|