£170m lost on the London Stock Market just over a week, and untold damage to the “World’s Favourite Airline”. That’s the cost within the UK to the International Airlines Group, the owner of British Airways, after BA’s recent ‘Power Outage’ incident.
“It wasn’t an IT failure. It’s not to do with our IT or outsourcing our IT. What happened was in effect a power system failure or loss of electrical power at the data centre. And then that was compounded by an uncontrolled return of power that took out the IT system.” Willie Walsh (IAG Supremo) during a telephone interview with The Times.
Willie has since inferred that the outage was caused by the actions of an engineer who disconnected and then reconnected a power supply to the data centre in “an uncontrolled and un-commanded fashion”. Could this then actually have something to do with the IT outsource after all, and did a staff member go rogue, or was it down to poor training and change control…?
For me what this highlights is the need to place greater emphasis on availability and uptime of those systems that support critical parts of a business or organisations services and offering. Along with robust processes and automation where possible to minimise the impact of an unplanned outage.
All businesses should expect their systems to fail. Sometimes it can be a physical failure of the infrastructure supporting the data centre (Power, UPS’s, Generators, Cooling etc.). It can be the power supply itself. Computing, Storage or the Network equipment can fail. Software and systems can suffer an outage. Plus it can also come down ‘Human Error’ or poor maintenance of core systems or infrastructure.
Coping with a Power Failure
Even if you have two power feeds to your building, and even if they’re from two different power sub-stations, and run through two different street routes, those sub-stations are still part of the same regional and national power grid. If the grid fails, so does your power. No way around it, except to make your own. Power Surge’s are handled by monitoring the power across Cabinet PDU’s, Critical PDU’s, UPS’s, Generators & Transformers, while assigning Maximum Load to all cabinets to make sure that we do not overload our customers systems.
Recovering from a Disaster
Recovering from a disaster is something that all organisation plan for, however not all have a Disaster Recovery (DR) Plan as there are some that consider High Availability (HA) to be more than sufficient. However HA only provides a localised system for failover, whereas DR is designed to cope with a site failure.
The challenge with DR for many of our customers is the cost;
- First you need to prioritise which applications workloads you want to failover in the event of a disaster.
- Second you need to purchase and manage infrastructure and licensing for these workloads with continuous replication.
- Third you need a 2nd location.
- Fourth you need a robust DR plan that allows you to recover your workloads at the 2nd location.
- Then lastly (which is considered harder) you’ll need to fail back these services once the primary site has been recovered.
This can be an expensive option, but this is also where things like Cloud DR-as-a-Service can help minimise any expenditure, and the pain associated with owning and managing a DR environment.
Reducing the impact of an outage
Minimising the impact of any form of physical failure should be a priority over recovering from an outage. Workflow Automation can help a business maintain uptime of applications and services. This can be defined as a policy where services can be moved to other systems locally, or all services can be re-provisioned to a DR location or a DR platform in the event of outage caused either by a power issue or human error. Helping a business minimise the risk and the impact of outage.
I’ll let you come to your own conclusions as to whether British Airways should adopt a robust change control, automation or DR policy. Logicalis can assist and provide you with a number of options custom to your particular needs so that you are not the next press headliner.