Credit unions face unique challenges with DIY disaster recovery operations, which are becoming increasingly complex due to the proliferation of numerous applications ancillary to the Core. The growing list of ancillary and interdependent applications, disparate data stores, and third party services, are making the recovery process an increasingly daunting task.
Credit union members are becoming progressively more accustomed to instant data access and limited downtime. Member expectations put increased pressure on IT departments to reduce downtime to near zero. IT professionals understand the importance and impacts of downtime, yet a study by Forrester and Disaster Recovery Journal, in 2013 showed that median actual recovery times were 8 hours, up from 3 hours in 2010. 1 This trend can be directly correlated to the growing complexities of the disaster recovery process.
Many credit unions favor in-house DIY disaster recovery operations, but the decision to take on the burden of DIY disaster recovery often comes at a high cost. That burden is amplified by the cost of downtime, as noted in an Aberdeen survey of IT professionals in 2013 that found that the average cost of downtime per hour across companies of all sizes was a staggering $163,674. 2 In addition to the complexity and increasing financial burden, DIY disaster recovery results often fall short of desired outcomes.
One of the reasons DIY disaster recovery results fall short of expectations is the realization that credit union IT staff continually face long lists of projects, which demand a tremendous amount of time and attention. When faced with continual project management, it’s not uncommon for IT to allow backup and recovery objectives to become a lower priority and the impact on recovery results are bound to be less than ideal.
Virtualization and the recovery of virtual machines has simplified certain aspects of recovery, however credit union staff may be lulled into a false sense of security. Advances in backup technologies and virtualization are also leading many credit unions to invest heavily in secondary sites or colocation. However, the CAPEX required to implement secondary sites or colocation is not only significant, it requires refresh cycles that ultimately become cost prohibitive over time.
Recent enhancements in backup and recovery technologies appear to make recovery seem easy, when in reality, the opposite is true. Technology offers great promise and certain aspects of recovery are more dependable than in the past, yet there are still problems:
- Credit unions place a heavy emphasis on daily operations and the maintenance of vital functions, but most spend limited time testing recovery procedures and reviewing the logistical necessities of a viable recovery system. Daily maintenance of interdependent systems is an entirely different skill set than the expertise and knowledge required to facilitate a functional recovery of those systems. Nor do most take into consideration the expertise required to facilitate recovery of multiple systems with inter-dependencies distributed throughout disparate tiers within the data center. In other words, emphasis and expenditures focus on daily operations and maintenance, and only a small percentage is dedicated to disaster recovery. Fully 23% of organizations admit they never test. 3
- The DIY approach to recovery processes can heighten the risk of improperly vetting the complex inter-dependencies between multiple, if not dozens of applications that run the credit union. Credit unions often operate with lean IT departments and lack the staffing depth necessary to properly manage the complexities of today’s backup and recovery functions. The development and management of growth initiatives further exacerbate staffing pressures, which result in the neglect of backup and recovery priorities.
- Typical DIY recovery environments often lack the tools and the mindset necessary to discover shadow or forgotten applications that reside within the IT infrastructure. Even applications and inter-dependencies residing in plain sight may be overlooked in disaster recovery implementation.
Why DIY Recovery Falls Short
- The majority of business applications are interdependent upon one another. Facilitating full recovery of all business processes from end-to-end requires more than simply storing and accessing applications and data. All components, including proper OS selection, current application selection and most current data set must be incorporated in the recovery environment. Even achieving recovery of proper applications and data may leave critical components out of the process. Overlooking a seemingly innocuous interdependent component will hinder or prevent end-to-end recovery.
- Credit unions utilizing colocation or secondary sites face the very real challenge of ensuring that properly qualified IT staff will be able to get to the secondary site in a disaster scenario. Many credit unions run with lean IT departments and it can be impractical or extremely challenging to get properly trained people to the secondary site. Simply getting staff to the secondary site doesn’t ensure recovery, staff members must have the necessary knowledge and skill sets to guarantee a successful end-to-end recovery.
- For the most part, IT managers are proficient at managing integration of technology to ensure all systems work together smoothly and at top efficiency. To reliably recover your applications, data and interdependent systems, you must duplicate the precise mix of servers, storage, operating systems, hypervisors, networks and software. You must also manage any and all changes taking place within that environment and change management must take place constantly. The unavailability of even one component or application can trickle down to impact a wide array of business functions.
- DIY backup and recovery using colocation or a secondary site typically requires duplication of everything in use at the primary site. All servers and storage must be replicated or mirrored at the secondary location and both sites must have adequate bandwidth and networking infrastructures. Licensing requirements will be duplicated, as will most security measures. Staff members will be required to put in extra hours to manage the secondary location or additional staffing will be required. Costs for the recovery infrastructure can be significant and should be expected to increase every 3 to 4 years in conjunction with hardware and software refresh cycles.
- Misalignment of Recovery Point Objectives (RPO’s) and Recovery Time Objectives (RTO’s) with Recovery Methods (RM’s) is a common and costly occurrence. When it comes to RPO’s and RTO’s, the concept of “one size fits all” is a dangerous miscalculation. Credit unions practicing this mindset will quickly discover that the incident that caused an initial system failure isn’t the only disaster they will encounter. Misalignment or miscalculations of RPO’s and RTO’s with an associated RM will quickly short-circuit or prevent end-to-end recovery capabilities.
Viable Alternatives to DIY Disaster Recovery
- Credit unions are founded on the principles of a cooperative and shared business model. Backup and Recovery as a Service (BRaaS) is the model for cooperative sharing of technical knowledge, IT infrastructure, hardware, software, and resources. Managed Service Providers (MSP’s) provide the infrastructure, experience and knowledge for BRaaS at a fraction of the cost of DIY backup and recovery models. BRaaS is also recognized as a readily sustainable business model.
- A primary benefit of BRaaS is the deployment of highly qualified and efficient resources in a shared environment. Distribution of shared resources eliminates the significant outlay of CAPEX, while amortizing OPEX in a simple pay-as-you-grow model. BRaaS leverages cutting edge infrastructure along with extensive knowledge and experience, to deliver true business resilience.
- Core system vendors often focus recovery efforts on the Core system specifically and may be inclined to neglect critical inter-dependencies necessary to provide end-to-end recovery and resiliency. However, a select number of MSP’s are proficient at recovering the core and all ancillary servers with related dependencies. Third party vendor connections to ATM’s, mobile banking, internet banking and the FED are typically provided as well. Qualified MSP’s provide true end-to-end business recovery while eliminating the expense and frustrations of DIY endeavors.
What to look for in a BRaaS solution and vendor.
Ask if the service provider:
- Performs Backup and Recovery of the Core and all Ancillary Servers
- Provides Connectivity to Third Party Vendors, such as: ATM, Mobile and Internet Banking, FED, etc.
- Performs Infrastructure and Network Discovery Analysis
- Manages Backup and Recovery Resources and Procedures
- Monitors Alerts for Potential Complications or Issues
- Suggests Technical Improvements and Implementation Processes
- Assists With Systems Maintenance to Resolve Backup and Recovery Challenges
- Identifies and Tracks Inter-dependencies and Application Road-maps
- Matches RTO’s and RPO’s With Appropriate Recovery Methods
- Performs, Manages and Maintains Restore/Recovery Procedures
- Initiates Pre-test Discovery, Meetings and Planning
- Tracks, Monitors and Documents DR Test and Results
- Conducts Post-Test Reviews, Remediation and Gap Analysis
- Initiates Change Discovery and Management
- Maintains and Stores Backup and Recovery Procedural Docs and Configurations
- Provides Timely Reporting of all Vital Backup & Recovery Elements
- Provides Relevant Case Studies and Use Examples
- Provides Credible References and Testimonials.
1 DRJ and Forrester BC/DR Market Study: The State of DR Preparedness, March 2014
2 Aberdeen Group, Downtime and Data Loss: How Much Can You Afford? August 2013
3 DRJ and Forrester BC/DR Market Study: The State of DR Preparedness, March 2014