Website troubles

SiteNews

Warning: boring website administration story follows. I’ve been having some troubles with one of the pages I maintain here, and I think it’s finally resolved. This is just the story of what I think happened. The more complex pages I have set up have been chugging along fine, while the simplest of them, in8snotes, a weblog powered by the perl script Blosxom (and some related plugins) was getting hammered by a ‘very hungry’ indexing ‘bot. Yes, I did say a ‘bot was bringing the site down - a bot is a program that ‘web crawls’ and indexes pages it finds and stores the results in a database. Usually sites like Google and Yahoo will index the web to gather info about the web for later serving to you on searches you make, but ‘bots can also be used for more nefarious purposes - like harvesting email addresses for later spamming, or selling the information gleaned to marketing companies.

It’s unfair of me to attempt to pawn the problem off on the ‘bot entirely, since I think I was partially responsible for the problem. I would tune into my site at some point in the day and there would be a big unfriendly ‘Account suspended’ page greeting me. This was bad for a couple reasons, not least because it referred folks to the billing department, making me look like a deadbeat who couldn’t make the measly payment for an el-cheapo linux-based hosted web page.


Each time I would investigate the issue, I’d find a zillion hits from the Omni-Explorer Bot, and although their rather spartan website offered helpful hints on how to block the bot (by offering their IP range and some text to stick in your robots.txt file), these steps didn’t seem to stop the bot. I would tune into my site the following week and be locked out again :(

Each time I was slammed, it seems the bot was attempting to connect to my Bloxsom pages, making several connections every second and pounding the hell out of the server. Any page served up by that server would take several seconds to load. (Logged in through ssh and running ‘top’ would give load values like 45.5 - which is a severe workload, you usually see figures under 1.0)

As the little text bit on the page states: “Blosxom is a perl script whose functionality here is extended through several plugin scripts in order to dynamically assemble a multitude of separate plain text files stored in a directory tree on the server into a cohesive, dated and RSS enabled weblog.” It’s a fantastic way to organize those little text files that seem to accumulate on your computer. If they are important enough to save in the first place, why not organize them better so you can find stuff?

In dealing with the troubles, I have had to deal with the web hosting company, who have been very patient with me, and although they had never heard of Blosxom, they didn’t ban me from using it. I imagine the fact that I have several other paying websites with them helps a little bit in negotiations with them, but to be honest, they are the best service I’ve had so far - very responsive. If you would like their name - email me (link at top right of this page). I’ve also contacted (and received responses from) the ‘bots creators and maintainers. My last correspondance was hasty and heated and I threatened them with legal action because they said they would remove me from their database, but subsequently shut me down again. Oh well, if we’re both at fault, perhaps this story on my site will make it up to them in some small way.

Blosxom has a venerable history on the web, and is not known to have any vulnerabilities like this. Finally having a moment to take a look at the page, I will still willing to accept that I may have changed something in the script accidentally to cause a problem. It wasn’t easy to accept however - since the bulk of the site is ‘mirrored’ on my personal laptop - it’s an exact copy of what is running on my machine and so anything here is there and there, here. This was why it was hard to believe when I noticed they were working differently. I have several plugins installed for Blosxom, the links to them are on the site. Breadcrumbs and categorytree were not functioning the same as they did on my laptop.

I started by disabling the plugins, and then ‘diffing’ them (Diff is a unix command you can use to compare two text files - it spits out any differences between them to the screen). They were the same. The Blosxom script (kept separately in my site’s cgi-bin directory) was also the same (except for logical differences due to the fact that they are running on different servers).

Totally baffled as to why they would be acting differently, I remembered that I had an .htaccess file setting (a rewrite rule) on the remote server that is not needed on my laptop. The
original note for that setting
is actually in my Blosxom log. The rule ‘rewrites’ the URL in the address bar of your browser to execute the script in the cgi-bin directory, but not display it up there. This seemed to be related to the strageness I was seeing in my site: Clicking a link on the remote page would work, but then subsequent links would start at that level, building up incorrect links. example: if you clicked on ‘computers’ on the remote site, the URL would be displayed as http://in8sworld.net/notes/computers, but then if you clicked on ‘fun’, instead of starting over again at http://in8sworld.net/notes, it would start at the computers subdirectory, tack on fun to build http://in8sworld.net/notes/computers/fun (and fail to find anything).

The Blosxom script has a ‘preferred base url’ section which I have never used, and don’t use on my laptop: $url = “”; The $url variable was empty for both the laptop and the remote site. I decided to manually set $url for the remote site as ‘http://in8sworld.net/notes’, and try that. Hooray! Everything started working again. I’m almost positive this used to work on the remote site, so I’m thinking something must have changed on the server.

In any case, I have the full range of Omni_Explorer’s IPs blocked, a deny rule in my robots.txt file and I have fixed up the Blosxom script. Hopefully you won’t be greeted by any more ‘account suspended’ messages upon visiting here, but if you do - perhaps next time I’ll solicit help from you guys first.

Leave a Reply

*
To prove you're a person (not a spam script), type the answer to the math equation shown in the picture. Click on the picture to hear an audio file of the equation.
Click to hear an audio file of the anti-spam equation


This page was created in 0.870 seconds.

Valid XHTML 1.0 Transitional