Inheriting Linux Server Administration Duties

By Erik Rodriguez

Tags: Linux Server Admin, Linux system administration, linux administrator, Linux discovery, document Linux version, find Linux version

This articles provides information about inheriting Linux server administration. Methodology and things to watch out for are discussed in detail.

Introduction

Linux may be a bit challenging to take over administration. Compared to a Windows environment, the discovery process can be much more difficult. Overlooking certain things can lead to problems or unexpected outages. The following course provides a thorough list of discovery tactics to ensure nothing is missed when taking over administration of a Linux server or servers.

More often than not, you will be required to take over Linux administration because the last administrator left or was fired. This can go one of two ways. Smooth if the previous admin knew their stuff and left on good terms. Awful if they had no clue or left on bad terms. It may even be a mix of both situations. Whatever the case may be, try to ask as many questions with the rest of the staff. They may have key information that can give you clues on what to look at first. For example, they may tell you that in the case a specific Linux box was down, nobody could get online. This could indicate the server is performing DHCP leases, firewall/NAT functions, or DNS. Of course you should be able to determine these things through other methods, but it's often easiest to gather such clues the easy way.

We all like to think we are the most brilliant IT minds, but in reality we probably are not. Those being said, do not under-estimate the value of using a checklist. The checklist should include basic information like IP address, hostname, etc. It should also include detailed notes such as processes for starting and stopping certain daemons, dependencies on other servers, or other details that are critical to normal operation.

Discovery

Here is some basic information that should be included in a discovery:

Server name:

IP address or FQDN:

Distro/version:

Backup status:

Physical location:

Number of network cards:

Network connectivity/port numbers:

Virtual server:

Server make/model:

Server role:

Server name is obviously an important detail. The server name may be used by client machines to map resources such as network drives, databases, and other critical operations.

IP address or addresses is important. Multiple IP addresses could indicate a web server (apache) running name or IP based hosting. Multiple IPs and/or network cards may be required due to different VLANs. For those who are not familiar, I have included instructions for how to find the IP settings in Linux. One of the worst things about inheriting a Linux server is working with a distribution you are not familiar with. While I am primarily a Red Hat/CentOS admin, I have inherited nearly every distribution under the sun. I have worked on everything from Slackware Linux to FreeBSD. While FreeBSD was the hardest for me to work with, if you have the fundamental understanding of Linux/UNIX operating systems, it isn't that difficult to get up to speed.

One of the toughest things to do is work with a Linux server that has an EOL operating system. End-of-life (EOL) operating systems are classified as operating systems that no longer have updates or patches available from the vendor. This is a common scenario with servers running Fedora Core or SuSE (Novell) Linux. These distributions have very short release-cycles meaning they are only "officially supported" for 3 -4 years at best. After the vendor drops support and declares the OS EOL, it inherently becomes a ticking time bomb.

Aging servers or servers running out of disk space are also ticking time bombs. Especially when taking over administration of such servers. It is best to assume there are no backups of the data so deleting data should be done with EXTREME caution. Log files for certain daemons can grow very large depending on how busy they are. Careful attention should be used when modifying log rotation scripts. Deleting the log file incorrectly can stop the daemon from operating. Specifically, Apache WILL crash if the logs are deleted instead of being rotated. Remember that Linux has no undelete feature. There is no software package that can undelete a file and for this reason, backups are even more critical on Linux servers. The information below provides some good ways to determine how and if backups are being performed.

Backups in Linux are not as easy compared to traditional Windows backups. There is not as much software available and most of it is much harder to configure. Symantec backupexec and R1soft CDP server are the most common mainstream backup titles available for Linux. Both software editions have Linux agents, which include plugins for things like databases and work extremely well. Both of these agents will run as a service so you should be able to find them using the following:

Find the services running in the list of daemons (backupexec runs as be.agent or VRTSralus) Use netstat to find a listening port associated with backup software

If not using mainstream software, Linux servers may be using something like bacula or even a manual script to zip or tar a group of files and copy them somewhere. It may be a simple or complicated script that moves or deletes old copies of the data. Whatever the case may be, these will almost always be done by setting a cron job with the backup scripts. For those who are not familiar, I have included instructions for how to view and edit cron jobs in Linux.

Physical location is important as you may need to physically reboot the server, attach thumb drives, external drives, and other things. The server will probably be located within the office. However, it is not uncommon for servers to be located in remote offices or data centers. If the server is located in a data center, make sure you are added to the access list. It is much easier to fix problems on server located in the data center when you already have access. There have been situations where I was sent to a data center and had to wait over an hour for credentials to access the servers. It depends on what type of data center the server is located in.

Network connectivity can provide some important clues. For instance, if the server is connected to a managed switch such as Cisco, Juniper, or other enterprise switch it may be on a special VLAN. Linux does have a utility that will allow the network card to read VLAN tags. However, most traditional network administrators will assign VLAN membership by port. If the server has HBA or fiber cards, it could have a iSCSI or fiber channel volumes mounted. Network attached storage is generally very large and is used for things like mailstores, databases, or backups. Documenting port numbers can be an important part of troubleshooting later. Your network administrator or data center may not have proper documentation. Therefore, take notes, and if a problem arises, you can say something like "my server is plugged into a switch labeled EWR-34 port 31." Most colocation facilities keep track of server information by IP addresses and/or customer name.

Is the Linux server virtual? If so, you are probably in for even more work. Virtual servers running on software like VMware or Xen may have things like virtual disks, virtual switches, and other factors that can add complication to your administrative tasks. Chances are, if your Linux server is virtual, there are also other virtual servers on the same box. It is fairly common to have a mix of Windows and Linux servers on the same virtual platform. Unfortunately, because of the way Linux allocates memory, it will "use" more memory from the virtual platform even when idle. Windows servers will traditionally only allocate memory when it's being used. This ultimately means you will not be able to operate as many Linux hosts (compared to Windows) on a virtual platform.

Server make and model is a good thing to know. You may or may not find that information particularly useful. However, you may need it for warranty or hardware related issues should they arise. Make sure you have the make, model number, and serial number documented. Dell refers to "service tags" instead of serial numbers. You can also lookup a server or device by service tag to find the original hardware configuration.

Server role can be a general statement. I usually indicate something like "web server" "file server" or other description. If the server is doing many different roles, you should indicate them in detail within the following sections.

Additional Information

Is this server critical to business operations?
Is the server running any network functions?

Some servers are not deemed critical to business operations. For example, an intranet site for the marketing department to view previous versions of their marketing campaigns will not bring business to a grinding halt if the server is down for a few hours. In fact, there are many times administrators will reboot non-critical servers during normal business hours.

If the sever is running any network functions such as DNS or DHCP it will affect the rest of the organization if it goes offline. Linux servers can be used as proxy servers or firewalls. There are many open-source packages which turn a standard PC or server into a "firewall appliance." These appliances operate as routers/firewalls but can also do things such as file servers, web servers and database servers. These are generally found in small businesses. The moral to this story is that when dealing with Linux, don't leave any stones unturned. It's much easier to spend the extra hour double checking things, than several hours in the middle of the night when the server crashes.