This article provides information on how to effectively manage/monitor a network.
What is Network Management?
Network management is the maintenance and operation of a network. While this can mean many things, the main idea behind network management is uptime. The whole purpose of a network is to have a reliable resource for users to interact with. Management of a network can vary depending on the size of the network and it's purpose. Different types of networks require different types of management plans. The following sections discuss "standard" network management techniques. These techniques may need modification depending your network.
As a network administrator/engineer, your main concern should be network uptime. Where network performance is a close second. There are a few thing you should be aware of in respect to uptime and performance.
- Node Availability
- Traffic Shaping
Node availability should be a priority. Especially when managing a WAN or LAN with VPN connections. This is usually only an issue with larger networks where DNS may be an issue. Remember the more complicated the network, the more possibility for problems. You may want to keep an eye on your "core nodes" which are usually feeding interfaces or "backbone interfaces" providing network connectivity to a portion or department of your network. These should have a monitoring system in place that constantly pings the connection. This is important because these types of monitoring systems can detect anomalies that may alert you of trouble, a precursor to downtime.
Bottlenecks are a common problem in networks and usually occur when a large amount of traffic is sent to and from a server, router, or switch. This is such a common problem that most servers now come with at least 2, sometimes 3 network interfaces. Combined with VLANs and network segmenting, multiple network interfaces are an easy way to prevent bottlenecks. However, there are other sources of bottlenecks that can occur on your network. For instance, The maximum line speed of a full T1 is 1.5 Mbps. This works out to around 200 kilobytes per second (KBps). True story: someone emailed the entire office of 55 people an excel spreadsheet that was 50 MB. Since everyone on the network had outlook set to check their inbox every 1 minute, the router was requesting (55) 50 MB files all at the same time. See the traffic graph below:
As you can see, the highlighted red section shows the receiving portion of the T1 line completely maxed out. This creates a bottleneck for any legitimate traffic and/or other servers sharing resources from the same T1 line. Needless to say it was a pain in the ass to let all the users know what happened and how to fix it.
Traffic shaping is a method used to designate different amounts of bandwidth for different purposes. This can be done for several reasons:
While you may or may not want to traffic shape, it is a good tool to consider if you are having trouble with bottlenecks that are the result of end user activity. You may also want to review the logs of your web history to see if there are any bandwidth hungry applications like streaming audio/video that may be using up your valuable bandwidth. Traffic shaping is an easy way to deter users from, but not completely block these services.
- Dedicate bandwidth to servers
- Cap download/upload rates of clients that may be using P2P software
- Throttle bandwidth used for rsync/backup data transfers
Traffic shaping can also be a good idea if you are going to send a large amount of data (FTP, rysnc, smb) and don't want to bog down your network. This is usually only an issue if you are using bandwidth around the clock. I.E. - running a web server.
Watch Your Router
You should be in the habit of checking your routers traffic graphs on a daily basis. It will help you gage the "usual" amount of traffic on your network. This makes spotting something fishy much easier. See the graph below:
As you can see the weeks 50,51,52, and 1 were fairly consistent. You can see the steady pattern of upload traffic (blue lines) and download traffic (green bars). Week 2 shows a large jump in the amount of upload traffic. After I noticed this, I did some digging through the routing logs and noticed that someone on the network was in fact using some P2P software. That someone was me ;)
However, even if the system administrator wasn't using P2P software, this would be a red flag. Personally, my instincts would tell me that a workstation was probably "owned" via a trojan or something and was taking place in a DDoS attack. On the other hand, it could be someone uploading a 50 MB excel file, you never know...
Another thing you should keep an eye on is the amount of open connections to your network. You'd be surprised how drone like end-users are in an office. As you saw by the traffic patterns above, there isn't much variation. See the graph below:
The graphs above show the active connections to the network both daily and weekly. By studying the pattern, you can see how busy or not busy your end users are. Notice the amount and duration of connections by day. By looking at the weekly graph, you can see everyone had a few days off. There are a few little bumps on the weekends that are either people working on weekends or system maintenance by people like me ;)
It really is amazing to me how much better a T1 line is than a cable or DSL line. It's no secret the quality of service (QoS) is better on a fully digital line like a T1, but it comes at a sacrifice: speed. My current home cable connection is 5 Mbps. Which is more than 3 times faster than a T1 speaking in terms of download speed. For those of you who don't know, a full T1 line will upload and download at or near 1.5 Mbps. While your cable/DSL company claims you may have a 4,5, or 6 Mbps connection, this is only in terms of download speed. Upload speeds are capped usually around 512 - 768 Kbps. Like I said, the QoS is much better on a T1 line. To illustrate my point, see the graphs below:
The first graph is a router on my corporate LAN. As you can see at the time of this article, is has been running for almost 78 days. The T1 is connected directly to the router and nothing "freezes" and needs to be restarted unless I have reason to. On the other hand, the second graph is my home cable connection. As you can see my home router is usually rebooted at least 1 time per week. This is because my connection usually drops off at some point and I am forced to power cycle both my modem and router. It's even more fun with it happens in the middle of the night. I guess any mail going to my exchange server during that period is lost in cyberspace?
In conclusion, you should see the importance of properly monitoring your network. Keep a close eye on your routers and servers. These devices are usually directly involved in bottlenecks and/or network outages. Remember that you should have a fairly routine troubleshooting process when evaluating network outages. Check prior traffic graphs and logs for anything out of the ordinary. Asking end users when the last time they worked on a certain server can help you isolate problems that may only affect one side of the building or a certain department. Never rule out wiring faults. It happens, cables do go bad. Happy monitoring ;)
NOTE: this form DOES NOT e-mail this article, it sends feedback to the author.