IT Focus Area: infrastructure operations
April 16, 2014
Why IT Disaster Recovery and IT Operations Management Are So Dependent on Each Other
Whether you have undertaken a full business impact analysis or you are focusing on an inventory of your applications, it is important to understand that IT operations management and IT disaster recovery are very dependent on each other.
They are two very different departments within IT but depend greatly on each other to succeed. In other words, they have a symbiotic relationship. For example, the more robust and mature an IT operations team is, the better an IT organization is able to run disaster recovery exercises and, ultimately, recovery from a disaster.
In order to keep your IT disaster recovery efforts healthy and successful in the long term, it is vital that an IT operations management team has established a foundation of well-defined and adopted processes.
Build a Mature IT Operations Management Team
A mature IT operations management organization should be aligned with your company’s service management toolset and be well understood by the team.
The foundation of people, processes, and tools should support the fully-documented IT services that are delivered to the business. (See IT Operations Management Framework slideshow).
The information collected and maintained by a mature IT operations management organization should include two key areas:
A service catalog consisting of business and technical services,
A configuration management database (CMDB) containing the underlying infrastructure and applications that are related to each other and to each service.
The service catalog and CMDB should be maintained and kept current through the use of a change management process. In order to build a mature IT operations management team, it is critical that strong and adopted processes are in place.
Solidify an IT Disaster Recovery Process
IT disaster recovery is the process of resuming business and technical service operations at an alternate site in the event that a production data center becomes damaged or unavailable. The process should be developed to meet two primary requirements: Recovery time objective (RTO) and recovery point objective (RPO). These requirements pertain to the applications that deliver the IT services recovered. RTO and RPO should be validated by the business-user community.
A mature disaster recovery organization should have implemented and tested a recovery design consisting of specific methods in four main areas:
Data center facilities
IT infrastructure provisioning
Network traffic redirection
Understanding the IT services provided to the business is imperative when undertaking an IT disaster recovery strategy initiative. A good disaster recovery strategy relies on detailed configuration item (CI) information collected in a CMDB for these IT services and their supporting applications and infrastructure.
A simple example is a CMDB that includes a set of SQL Server databases that have a diversity of RTO/RPO levels. A good configuration management database will indicate what servers the databases reside on, their size, which applications they support, and their recovery time objective and recovery point objective requirements. The disaster recovery strategist can then identify the optimal infrastructure needed for data replication (or restoration) and application hosting.
As is often the case, opportunities for optimizing the production environment can also arise from disaster recovery analysis.
Detailed samples of configuration item attributes needed for IT disaster recovery strategy.
This highly detailed configuration item information is used to develop a prioritized list of applications (per the recovery time objective and recovery point objective) to be restored in a disaster situation. Infrastructure dependencies are then examined in-depth to determine the appropriate recovery configuration.
As an outcome of the strategic analysis, a specification for additional IT infrastructure assets to provide redundancy is likely. In this case, the new assets will be added to the existing CMDB as configuration items along with the IT disaster recovery plan document.
Why Change Management Is Important
To keep all of this information current, the change management process should include procedures for applying changes to the disaster recovery configuration items whenever their production counterparts are affected by a change. This important practice is followed in order to keep the two environments synchronized. Failure to do so would result in the rapid deterioration of the disaster recovery landscape’s effectiveness.
An IT disaster recovery strategy initiative undertaken without the benefit of mature IT service catalog and integrations between change management and configuration management processes may be ultimately successful but at significant cost and risk. Information gathering could be incomplete, labor-intensive, error-prone, and subject to delay. These consequences are only magnified over time and opportunities unearthed for IT service delivery improvements on an enterprise level will likely remain buried.
IT Disaster Recovery & IT Operations Management Must Successfully Collaborate
Once the information is gathered, analyzed, and used for the IT disaster recovery strategy and planning, it can be fed to IT operations management as input for seeding a configuration management database. The information can also be used in defining service composition and creating the service catalog. You could then introduce more comprehensive change and configuration management processes for keeping the newly gathered information up-to-date. These processes will benefit by the selection and deployment of automated tools, such as discovery and collection of current IT infrastructure configuration item attribute information.
Ultimately, the future of an optimally efficient and reliable IT disaster recovery process depends on the maturity of the IT operations management team. IT operations management supplies the IT disaster recovery strategy process with information from the CMDB and service catalog. It helps keep the IT disaster recovery plan document and asset database current via the change management process. IT disaster recovery strategy work can feed information to an IT operations management organization to help create a configuration management database and service catalog. And change management can keep all of the newly acquired information up-to-date.
With the proper planning and strategy, the relationship between IT operations management and IT disaster recovery strategy is truly one that benefits both teams and the success of service management.