Ability to control changes is key to avoiding unplanned outages

IT Outages means productivity and revenue loss and thus directly impact financial statements of the company. How can the IT team minimize outage incidents?

Indian Enterprises of all sizes see digitalization as a tailwind for growing the business. This has naturally elevated the importance of IT within Enterprises. Businesses run on Information Technology today and what this also means is that businesses stop when IT goes down due to an outage.

BW Businessworld and ServiceNow organized a roundtable inviting a select few top CIOs and IT leaders from Mumbai and New Delhi to discuss practical approaches to avoiding unplanned outages and avenues for delivering consumer-like experiences.

Sangram Aglave, Contributing Editor, Businessworld led a self-perpetuating interaction among the CIOs and top IT leaders through a set of topical questions. Peter Doherty, ServiceNow's Principal solutions consultant for Asia Pacific and Japan, shared his experiences in helping organizations bring down outages.

“Availability and Security are two top concerns of all CIO’s, its not either or, but both are necessary. Outages happen whenever either one of them or both go down.” Said Edgar Dias, Managing Director, ServiceNow India.

Managing IT Changes

“Ability to control changes is key to reducing unplanned outage incidents and their severity” said George Fanthome, CIO, Vedanta Oil & Gas and Group IT Head, Vedanta Ltd.

Kamal Shah, CIO, L&T Infotech also shared a similar perspective and said “we have to do better job at managing planned outages to avoid firefighting unplanned outages.”

To really understand what has changed requires an accurate inventory of various IT assets, also called a Configuration management database (CMDB). Vikas Malhotra, Sr. General Manager, Hero Motocorp said “the major challenge is to keep the Enterprise CMDB up-to-date”.

Abhijeet Bhattacharjee, SVP- IT, RBL Bank said “change management should be brought under a single umbrella, a single system of record.”

“Keeping track of changes becomes especially difficult during an outage situation as documenting the changes carries less priority than bringing things back online” said George Fanthome, CIO, Vedanta Oil & Gas and Group IT Head, Vedanta Ltd.

Manish Mamtani, Asia IS&T Infrastructure Programme manager, Compass Group PLC expressed that “every change or introduction of any that is new should be fit for the purpose.” In a way, he alluded to gatekeeping the changes more than just accepting the requests from business.

Challenges in performing Root Cause Analysis

Ajit Awasare, Senior GM, Godrej Infotech said “the challenge today is that all system’s are blinking with tens of thousands of alerts and warnings all the time. There is also a operational challenge as well since the IT team can only respond to an early warning when systems are operating in green which is rarely the case.”

George Fanthome, CIO, Vedanta Oil & Gas and Group IT Head, Vedanta Ltd said “Cybersecurity is increasingly the root cause for outage although the IT side of things are much mature but specifically in Oil & Gas and similar process industries, the Plant instrumentation systems like DCS, SCADAs etc., are much more vulnerable to Cyber threats”

Vikas Malhotra and Devarajan Ramanathan brought attention of the group to outages that are caused by third parties like Vendors. Vikas said “root cause analysis in a multi-party environment often comes into way of effective root cause analysis”. Devarajan further added that “in a multi-party environment, some vendor systems are beyond control and only practical approach in such a case would be to accept the risk and mitigate the vendor side risk”

Sachin Jain, CIO, Evalueserve said “Fool proofing the system is a investment decision and thus has to yield net positive outcomes. He further added that “Contracts cannot adjudicate in most cases as its very difficult of the vendor and the performing organization to reach agreement on the penalty amount. No vendor is ready for an unlimited liability for the outage.”

Amit Chabbra, SVP, Customer Experience, SBI cards said “the key challenge is the predominant operations mindset of “fix it and leave it”. IT should deploy user friendly systems of record to document each incident, problem and change with the associated CMDB items to enable the RCA process.”

Rajneesh Mittal, CTO, Zee Entertainment said “getting the vendor to do a RCA is also important for e.g. faulty network equipment. Most often vendors just send a replacement without sharing the RCA.”

Sanjay Desai, EVP – IT, HDFC Bank added that “the system should be setup in such a way that Incidents that qualify for RCA don’t get closed without the a completely certified RCA. A lot of incidents fall through the cracks in a dynamic workplace such as the IT team.”

Problem management

Peter Doherty, Principal Solutions Consultant, Servicenow said “Problem management scope speaks about the Infrastructure and Operations maturity of an organization. Most companies address Problems after fixing the major incidents.”

Satish Doiphode , Senior GM- IT, Reliance said “It is important to track the workload of a typical IT worker as only then the we would know what the allocation of time spend between change, incident and problem management.”

Ajit Awasare, Sr GM, Godrej Infotech added “the task can flow from one person to the other in no fixed path. You know there are skill gaps in the team when you see a skewed workload between the team. So this data is vital in training and developing the team.”

Businessworld Perspective

The Enterprise IT market is progressing into a new domain of data transformation. Data like everything else is also transforming Enterprise IT into a science. CIO’s and their teams can drastically improve their visibility into IT systems and operations through a system of record for IT. Such a visibility would give IT an ability to gather insights into the real operating characteristics and mechanisms of the underlying IT systems. The problem of outages is common across in house and outsourced IT assemblies and the responsibilities in ensuring Availability and Security remain constant even in the Cloud first world.

Tags assigned to this article:
unplanned outages ServiceNow


Around The World