Handling IT Emergencies

© 2003 by Nick B. Nicholaou, all rights reserved
President, Ministry Business Services, Inc.
Reprinted from Church Business

Some networks live in a constant state of panic. They go from one emergency to another, sometimes with little time to catch one’s breath. Other networks seem to run without error, needing only to be restarted because of scheduled maintenance. Regardless of where you are in the network stability continuum, emergencies can hit and you need to have a plan… just in case.

Know Your Enemy
George Patton said that his success in World War II was, in part, because he had read a book on war strategy written by his enemy. The principle holds true in most areas of life: victory is more easily gained by understanding the strategic methods of your enemy.

What are the typical causes of system emergencies? In descending order, they are:

User Error. Most system problems are caused by users at all levels simply doing something they shouldn’t have. These fall into two categories:
- Unintentional. This covers the majority in ministry networks. Someone may have accidentally dragged and dropped a folder to a location they can’t find. Another may have sent dozens of print commands of a large report to a printer that was merely out of paper. Someone may have filled the network hard drive by trying to download The Internet!
- Intentional. These are typically disgruntled former team members or hackers who have found their way into your system. Neither is common, but does happen.

We’ve found that training and good security can help minimize these. In ministries we have a steady flow of users that aren’t seasoned computer users, and so this is a constant challenge.

Hardware Failure. Computers are simply a collection of man-made parts that occasionally fail. Depending on your hardware strategy, this may happen either often or rarely. We’ve found that one of the best ways to minimize this issue is to buy right. Rather than looking for the least expensive computers, or those that are built locally, buying computers engineered for a corporate environment nearly eliminates hardware support issues. At the time of this writing (and for the last 3+ years) we recommend Dell. Our team has negotiated a group discount that you can participate in by calling 877/MBS-DELL and saying that you are part of the MBS agreement. They can even tell you what configurations we’re recommending! And anything they would have paid us for the referral is credited to you in the purchase price.
Software Problems. This is usually caused by—
- Software that isn’t designed to be on a network or designed to be used the way your team is using it,
- Software that is purchased before it’s really ready. Many software providers today sell what the industry used to call beta versions. They do so figuring they can release service packs and patches over the Internet. Unfortunately, those who believe they must have the latest version before it’s been fixed pay the price. In effect, they become the software company’s R&D department. We always recommend waiting until the software has been in the marketplace long enough for the manufacturer to release a service pack that fixes those bugs and irritations that have been discovered since the software was released.
- Software that is improperly configured or that has a file that’s become corrupt.
Electrical and other causes that are outside of the system strategy and design. These don’t happen often, but when they do they can be catastrophic to your network. Because they can/do happen they must be planned for. In this article, this is where we’ll focus most of our attention.

Wisdom says, “Spend budget dollars where they will have the greatest impact.” While we don’t recommend ignoring those items towards the bottom of the list, this does make the case for a training budget. Usually a system’s most neglected “module”, training can improve the proficiency, efficiency, and reliability of a network. And it’s not very expensive… especially when compared to the increased output quality and quantity it produces!

Managing Outside Causes by Preparation
The best way to protect a system is to prepare for quick recovery. Even though the hardware may be covered by a comprehensive warranty, the system will still need to be restored if a hard drive crashes.

Workstations. We recommend setting up workstations so they store all data to the file server’s hard drive(s). This accomplishes a few things:
- It makes backing up all data faster and easier because it’s all at the server. This provides a disaster recovery plan that is comprehensive and easy to work with.
- It makes it possible for users to login to any workstation and still have access to the files on which they depend.
- It makes local hard drives expendable. We recommend using a utility like Symantec’s Ghost to create images of local hard drives during setup. This allows the duplication of similar systems, while also providing a solution to use if a hard drive needs replacing. In 10-15 minutes the system can be fully up and running!
- We use the same strategy anytime something happens to a workstation that can’t be diagnosed in 10-15 minutes. Perhaps a program file has become corrupt or a virus has gotten through your defenses. Using something like Ghost makes workstation support simpler because you can simply overwrite the hard drive from the Ghost image file you used when setting up the system..
- Protect your desktops from power surges by replacing their surge protectors when you buy new computers. Surge protectors get weaker each time they get hit with a power spike. By replacing them every few years, you’ll provide reasonable protection for your desktop systems. (Notebook computers usually have surge protectors built into their power supply. Check with your manufacturer to be sure.)
Network Servers. We recommend doing a complete unattended backup of the file server followed by a full compare Monday through Friday nights. We also recommend—
- Taking a copy of the backup off-site every Monday morning, rotating tapes back on-site each week. This protects your ministry from a catastrophic building or site loss, allowing that your data is always available with a week of a disaster. (When the World Trade Center tragedy happened, many business who weren’t doing this went out of business because they lost all of their data.)
- Testing the backup monthly by restoring folders and files and testing them to ensure they restored correctly. In addition to making certain your backups are viable, this also helps you stay familiar with the restore process in case you need to restore something quickly.
Protect your servers from electrical problems with proper Uninterruptible Power Supplies, or UPS’. These should have the capability to keep your servers running long enough for a safe shut down, and should be able to communicate a shut down command to each server.

Cabling & Switches. Document where your cables run. This will help avoid them being accidentally cut by a well-meaning work crew using a backhoe on your property. Also, protect your switches with UPS’ so they can continue to run beyond the time necessary for the servers to shut down. This is especially helpful when the power loss was caused by someone turning off an electrical panel breaker.

The old saying says that an ounce of prevention is worth a pound of cure. By following these steps, you’ll improve the reliability and stability of your system, while also improving your disaster recovery plan. When emergencies hit, your team will bless you for getting them back up more quickly than you otherwise could have. And when the Books are opened, you’ll hear, “Well done!”

Handling IT Emergencies

June 5, 2003