Complicated Linux Environment #3 -
By Erik Rodriguez
The Busy Website
Tags: Apache webserver, Apache Log, NFS, Round Robin DNS, Dell Poweredge 2650,
Netscreen Firewall, Server Cluster, AWstats
This article contains information about my actual experiences with overly-complicated Linux environments. The previous articles in the series cover Complicated Linux Environment #1 and Complicated Linux Environment #2.
As a consultant, I have had the unfortunate experience of dealing with overly-complicated Linux environments. The previous complicated Linux environment I discussed was a website "cluster" running some overly-complicated configurations at the request of an over-zealous CTO. This instance covers load mitigation with existing web environment. I worked as a consultant on this project on and off in 2008 and 2009.
The third environment was not near as complicated as the first two. However, it was the lack of budget for a site that obviously generated a mountain of cash. A popular college sports site, at times received very large amounts of traffic. I am talking upwards of 40,000 visits per day. It was a common occurrence after a big game or championship the site would become flooded with requests (over 200,000) and stop responding.
As a consultant, my task was to eliminate the bottlenecks so the previous capacity issues would no longer be a problem. The servers were colocated in a data center and had plenty of bandwidth available. Upon my initial discovery, I found the firewall in front of these servers was a Juniper Netscreen 5GT. Not the best choice for a firewall as that device can only support 2,000 concurrent sessions. Adjusting the session timeout would do no good as we estimated during a busy time there were anywhere from 1500-10,000 requests.
Here is where things started to get tricky. Despite the amount of traffic and size of the organization, they had no budget for new equipment. I ended up hitting ebay and getting a used Juniper Netscreen 25. A much better firewall that would allow up to 64,000 sessions solved that problem. The servers used during this time (mid 2008) were Dell Poweredge 2650s. Decent servers in their day, but by that time, dell was sport the 2950s with dual quad-core Xeon CPUs. The server did have 8GB of RAM and was running CentOS.
After replacing the firewall, requests were no longer bottlenecked and I ran into the next problem. Apache was simply overloaded with requests. I put in nearly every tweak to get more performance out of it, but at times, it still could not process all requests and ended up leaving the famous "error 404, page cannot be displayed." The real way to tackle this problem would be a hardware (or even software) load balancer in front of two or more front end web-servers. The initial configuration was a single webserver with a private connection to a mySQL database server. I suggested purchasing another server and some sort of load balancer. I was approved to purchase an additional webserver but not a load balancer.
I ended up getting another Poweredge 2650 with the same specs as the initial one. I was denied the load balancer request and given the famous "we would like to explore a cheaper option." I setup the second webserver next to the first one. I created an A record called updates.domain.com which they would use to FTP new content. I wanted them to use updates.domain.com as it pointed the first webserver, I then setup a cronjob that would snyc the contents of the web directory every 5 minutes.
To attempt a load balance, I added a second A record for the domain name which changed requests to a round-robin fashion. This would split requests from different people to one server or another, hopefully cutting request to the first server in half. In reality, this ended up being about a 60/40 spilt. Visitors to the second webserver may see delayed content by up to 5 minutes until the rsnyc ran. Not a big deal. The diagram below illustrates my attempt to load balance using DNS:
Apache log files
The apache log files grew quickly on these boxes because of all the requests. They had to be rotated nightly or the drives would fill up and Apache would die. This became even more complicated when the customer required awstats. I tried to get them to switch to google analytics. I got shot down on that idea, and they insisted on keeping awstats. The problem was that awstats processes the apache log file to generate its stats using a perl script. It usually takes a long time depending on the size of the log, and takes a good amount of CPU power. As traffic increased, I had to adjust the time of stat generation because it needed to complete before the log rotation occurred.
Here is where it became even more batty. Since requests were now split across two different servers. This meant I would have to combine the two log files into a single log and point awstats to generate against that file. I ended up configuring NFS on the second server. A cronjob was used to copy the apache log file from the second server to a secondary partition on the first one. Another cronjob was used to fire off a script that combined that log for processing. So this was basically a 3 step process, but everything had to be timed perfectly. One job could not start before the other was complete or it would mess everything up.
In the end, the site ended up moving to a large cluster used by ESPN for a group of other sports sites.
NOTE: this form DOES NOT e-mail this article, it sends feedback to the author.