Deciding whether to failover to a secondary site or wait it out and fix the problem in-house still remains one of the toughest decisions businesses face during an outage. This is according to Oscar Arean, technical operations manager from disaster recovery service provider Databarracks.
Recently, the New York Stock Exchange (NYSE) was forced to suspend trading for three hours following a major technical glitch. The decision to cease trading rather than failover to its Chicago recovery centre is one that has created much debate.
Arean states that this is a situation many organisations still struggle with when they suffer an outage: “Business continuity (BC) and disaster recovery (BC) plans will specify the exact length of an outage before an organisation should invoke its failover but as we all know, during a real-life disaster, these timings can slip as you try to fire-fight.
“Defining the point at which to failover is individual to each organisation and it will differ depending on the type of disaster being dealt with. You may have a set response for dealing with issues relating to storage but something completely different for dealing with network related issues or a natural disaster, all of which are fine. But this doesn’t mean that you should be making these decisions at the time of an incident – your point of failover should be defined before this.
“Your Crisis Management Team (CMT) will identify the most likely disaster scenarios and there should be plans in place for each of these. If the organisation has decided that the maximum outage it will allow is four hours, and it actually takes one hour to recover its systems, then it is crucial to begin the recovery process before the three-hour mark. Failure to do this could have a detrimental impact on the organisation in terms of cost and reputational damage.
“Once you have these plans in place, it’s imperative they are adhered to. An organisation will have worked out how long it can be out of action for before it makes more financial sense to invoke DR and move to its recovery site, so when that point is reached, action must be taken. It’s tempting to extend the time by an extra hour because your team is close to fixing the issue, but this can easily escalate. By going through practice scenarios with the CMT, you should feel more familiar with them, making it an easier decision to make on the day.”
Arean concludes: “Additionally, it is also worth identifying the types of scenarios when it might be deemed unnecessary to failover, as was the case of the NYSE.
“A lot of organisations will have very comprehensive traditional disaster recovery plans in place but are likely to only ever invoke these for very significant outages lasting several days. For those organisations, invoking disaster recovery is such a significant task, consuming so much time and resource, that dealing with the IT incident is considered the lesser of two evils, even if it takes days to resolve the issue. Those are the organisations that are investigating more flexible alternatives made available to them through cloud computing.
“Disaster Recovery as a Service (DRaaS) helps to bridge the gap by providing a more flexible and cost-effective alternative to traditional, cumbersome DR solutions. Organisations who do adopt DRaaS find that their DR plans are now equipped to deal with far more incidents and are less reticent to invoke DR.”