Disaster recovery and delivering above expectations
The use of technology is critical in the fast moving retail sector, and having high availability systems with a reliable disaster recovery plan is vital. Associated Retailers Limited (ARL) is one of Australia’s largest independent buying groups with over 500 retail stores nationally. High value transactions take place regularly meaning extended IT outages could risk millions of dollars worth of orders and payments.
An unusually wet morning in April 2017 proved a nightmare event for ARL. Just before 8am as the business was gearing up for a busy day ahead, a torrential downpour of hail and rain resulted in a ceiling collapse, causing water to pour into the server room and douse critical infrastructure.
Four racks housing the firewall, file servers, support for 17 virtual environments, telecoms equipment, and two critical production and development servers went down, despite redundant hardware. Electricity was quickly cut to the site but the damage was done. Essential computing equipment would not refire and storage arrays were deemed unrecoverable.
There is never a good time for an event like this to occur, but only a few days out from the end of the month and with a large number of critical financial and marketing transactions pending for the following day, it was potentially devastating.
“Our suppliers are some of the heaviest hitting brands in the world and we have commitments to pay them on a certain day. We also have to ensure that the goods ordered during catalogue drives by our members are delivered and ready for sale as advertised. These are service level agreements for the business, so this event really put a lot on the line,” said Kevin Simionato, Information Systems Manager for ARL.
As the saying goes, it pays to plan for a rainy day and in this situation, ARL was able to put its pre-existing Disaster Recovery plan into action, calling on trusted partners Truis to move at speed. The call went out in the morning and within an hour, teams at ARL and Truis began deploying countermeasures to ensure the business would have IT back up and running vital transactions as quickly as possible.
Our suppliers are some of the heaviest hitting brands in the world and we have commitments to pay them on a certain day. We also have to ensure that the goods ordered during catalogue drives by our members are delivered and ready for sale as advertised. These are service level agreements for the business, so this event really put a lot on the line.
Truis received a heads up from Kevin in the morning that the DR plan may need to be enacted, and by 3.55pm that afternoon it was confirmed. Within the hour, an expedited delivery for the replacement server was costed and arranged.
The equipment was on premise in Richmond at 7.45am the following morning and with a little phone support from the team at Truis to connect the equipment, Kevin was able to commence data recovery from tape.
“It was outstanding. By the time an IBM engineer arrived with an expansion box at 11am, data restore of the production environment was already 75% complete and key users were back online by 2pm to process the critical payment runs. By 8.30pm, all employees were notified via SMS that desktop operations would resume from the office,” explained Kevin.
The speed, dedication, focus and support shown by Truis throughout the event helped prevent a financial and reputational disaster and has further cemented Kevin’s long standing relationship with the team.
“If anybody ever asks me who I recommend as an IT partner, I tell them Truis. There is a sincerity and an authenticity to their desire to help, support and partner with their customers that I have not seen in other organisations,” said Kevin. “It goes without saying that if I ever moved to another company, Truis would continue to be my IT advisors of choice.”
Lightning often strikes twice and this is true for ARL. An incident earlier in the year saw a construction crane clip nearby power lines, taking business systems down. This precipitated a plan to move production into the cloud and utilise the physical environment in Richmond for Disaster Recovery.
The April flood event has accelerated this agenda significantly and the future for ARL will be an inverse scenario to their previous architecture; production and day-to-day systems hosted in a private cloud setup and their disaster recovery site on premise. Critical equipment will be hosted in a purpose built building with 24 hour expert support, monitoring, and fully redundant power, and ARL aim to have finalised large parts of this strategic shift to the cloud by the end of the year.
It was outstanding. By the time an IBM engineer arrived with an expansion box at 11am, data restore of the production environment was already 75% complete and key users were back online by 2pm to process the critical payment runs. By 8.30pm, all employees were notified via SMS that desktop operations would resume from the office.
Powering on with Truis
Kevin is working with Truis on various stages of this initiative, including migration from a current IBM 525 to IBM Power8 servers. Once up and running in the new data centre environment, the IBM Power8 will come with additional peace of mind, courtesy of Computer Merchants’ View.
CM View is a free service provided by Truis that monitors IBM system health and support ticketing. The fully automated system will provide Kevin with complete transparency over the health of the system and the status of any support tickets.
Partnered with Truis and CM View, ARL has its sights set on stability and uptime, ensuring issues are preempted and addressed ahead of time, and that is always taking care of suppliers and its members.