3 Ağustos 2016 Çarşamba

Preventing Data Corruption in the Event of an Extended Power Outage

Executive summary:

Despite advances in computer technology, power outages continue to be a major cause of PC and server downtime. Protecting computer systems with uninterruptible power supply (UPS) hardware is part of a total solution, but power management software is also necessary to prevent data corruption after extended power outages. Various software configurations are discussed, and best practices aimed at ensuring uptime are presented.



Introduction:

An extended power outage, which can strike at any time, can prevent unprotected computers from initiating their required shutdown procedure. PC and server operating systems are not designed to support abrupt losses of power known as "hard" shutdowns, but rather rely on a set of built-in processes that prepare a computer for shut down such as saving memory, stopping applications and services, etc. Shutting down in this manner is often referred to a "graceful" shutdown. Hard shutdowns, on the other hand can result in lost or corrupted data and a lengthier time-to-recovery after power returns.

An uninterruptible power supply (UPS) can protect the system from damaging power problems and improve server availability by allowing users to continue working without interruption during a short power outage. During an extended power outage, defined as any outage that might outlast the UPSs runtime, if the system is equipped with UPS shutdown software, it can communicate with the UPS and perform a graceful, unattended system shutdown before the UPS battery is exhausted.

There are many reasons for the occurrence of extended power outages, ranging from a local
transformer failure due to lightning, or a regional power grid going offline. Steps must be taken to protect computer systems and the data they store from the corrupting effects of a hard shutdown. One cause of potential data corruption in the event of an extended power outage is abnormal termination of applications or the operating system while manipulating data. This can affect documents, critical file system structures (such as File Allocation Tables), or dynamic application data, and in many cases can also lead to increased "time-torecovery" when power returns, as the operating system or application attempts to rebuild
corrupted tables, etc.

Another cause of concern is with a computer's hard drive. While progress has certainly been
made in the industry over the last decade in hard drive technology to prevent "head crashes" (where the read/write head of the hard drive could actually damage the surface of the disk if not properly "parked"), another advance in hard drive technology has actually contributed to the likelihood of data corruption. To achieve high levels of performance, hard disk controllers are often designed to take advantage of caching techniques, which involve temporarily writing information to memory and then writing the data out to the actual disk later. In the event of a power loss, information in the cache is lost, leading to potential data file or data corruption.

One does not have to search extensively in business and government publications to see that, despite technological advances, data corruption due to power loss is still a widely recognized problem in the IT industry. This is emphasized in the industry quotes below:

• "Even a moment's disruption can have devastating effects on power sensitive customers such as internet service providers, data centers, wireless telecommunication networks, on-line traders, computer chip manufacturers and medical research centers. For these customers, power disruptions can result in data corruption, burned circuit boards, component damage, file corruption and lost customers."

- U.S. Dept. of Energy Office of Power Technologies, Electrical Power Interruption Cost Estimates for Individual Industries, Sectors, and U.S. Economy, February 2002

• "Failure to boot after a power failure is generally caused by corrupted files or a damaged hard disk - neither of which last known good configuration is capable of repairing."

- MCSE Microsoft® Windows® XP Professional Readiness Review Exam 70-270, Section 70-270.04.03.002, 11/28/2001

• "Total failures, or blackouts, constitute a complete loss of electrical power to the networking or computing equipment…these failures can cause system and network crashes, PC lockups, and corruption or loss of valuable data from servers and work-stations."

- Contingency Planning Management Magazine, Power Protection Basics, March 2002
• "The system and its data can become corrupt as a result of a power failure....a UPS can protect the system if power is lost. A UPS usually provides ...temporary power which may be enough to permit a graceful shutdown."

- National Institute of Standards and Technology, Special Publication 800-34 Contingency Planning Guide for Information Technology Systems , June 2002


Recommended configurations for UPS software:


Configuration 1: Protecting a single computer with a single UPS;

In this configuration, each computer is backed up by its own UPS, and the UPS communicates with the computer over a serial or USB cable. UPS software is installed on the
computer to provide graceful, unattended shutdown in the event of an extended power outage. In this case the UPS is managed locally by the connected computer. This is the simplest configuration and is widespread for both server and workstation deployments.


Configuration 2: Protecting two to three computers with a single UPS;

In this configuration, several computers are plugged into a larger UPS (typically one rated at 1500 VA or higher). One computer will be connected directly to the serial port on the UPS, while the other two are connected to an expansion card installed in the UPS that provides two additional serial ports. In this situation, all three computers will have graceful shutdown
capability, but management of the UPS is handled via the computer connected directly to the UPS. Note that since the USB standard addresses communication with a single system only, USB connections cannot be used in this configuration. Although this scheme can be extended to handle up to 24 computers (via daisy-chaining), Schneider Electric does not recommend such an approach due to the additional cabling required.

Hiç yorum yok:

Yorum Gönder