While we have made some significant strides in network and system reliability over the past 5 years, some specific gaps remain.

There are many ways we could reduce the "time to recovery" in the event of a disaster. For example, we are working to insure that all the data centers on campus have appropriate line and emergency power and appropriate equipment racks. This insures, for example, that replacement equipment can be housed in an alternate location after a fire. Another is to enhance our backup system. With additional backup hardware and software licenses, we could protect a broader range of systems and reduce the amount of time needed to rebuild a server's configuration. The backup hardware could be enhanced to include a more sophisticated backup-to-disk solution in one data center, and a backup tape library in a third area. That would give us the performance benefits of disk-based backups and the ability to copy those backups to tape. Adding a copy in a third building also gives us an extra layer of protection.

If we licensed the software and had enough disk space to allow our EVA 6000 disk array to take "snapshots" (tracking changes to the disks and keeping the original blocks from a point in time forward), upgrades on the email server, Netware cluster and so forth would be a very low risk operation. We could take a snapshot of the data volumes prior to a system upgrade. If the upgrade goes badly, it would be very easy to put things back to the way they were when the snapshot was taken.

If we were to implement a business continuity solution, it would likely take the form of a second disk array in another building, and the software needed to periodically replicate data from the production disk array to the standby unit. We could even run systems off that second disk array (likely at a lower performance level) until new hardware arrived. Moving our data onto the replacement array would be faster since there was a complete copy on that standby unit. This would not replace a backup system. Rather, it would provide us with nearly interrupted service.

  • No labels