• 1800 212 2003
  • training@novelvista.com

What are the Most Common Root Cause of Major Incidents?

As per recent Industry survey "80% of unplanned outages are due to ill-planned changes made by administrators ("operations staff") or developers in Production environment."

Here are Key reasons –

Key reasons-servey

Here are the list of the companies which shows that ongoing failures and performance problems cost companies both in lost revenues and damaged reputations. The business effectively shuts its doors, puts out the 'We are closed' sign, and then is left wondering 'Will our customers be coming back?'

1. Bank of America Online Banking Down Across U.S.

  • Duration: 6 days

  • Impact: Affected 29 million online customers

  • What Happened: The problem was noted as the result of a "multi-year project" to upgrade its online banking platform.

2. Amazon EC2 Goes Dark In Morning Cloud Outage

  • Duration: 4 Days

  • What Happened: The trigger for this event was a network configuration change.

3. Google Suffers First Gmail Outage of 2011

  • Duration: 2 days

  • Impact: 120,000 users affected

  • What Happened: After analyzing the issue, Google Engineering determined that the root cause was a bug inadvertently introduced in a Gmail storage software update.

4. BlackBerry Outages Spread Throughout the World

  • Duration: 24 hours (some more)

  • Impact: Unavailable worldwide affecting millions of users around the world

  • What Happened: Service outages due to "core switch failure within RIM's infrastructure.

5. Intuit Service Outages Leave Frustrated Customers

  • Duration: 2 days (some users up to 5 days)

  • Impact: Thousands Affected

  • What Happened: The problem was caused by a change to their network configuration.

6. Yahoo Mail Suffers Outage

  • Duration: 24-hours plus

  • Impact: Affecting people around the globe

  • What Happened: Change configuration issues.

A recent Gartner study projected that "Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues.

gartener study on service outages

Cost of Service Outage?

Unplanned outages are the responsibility of IT to resolve. However, at the end of the day they are, essentially, business issues. Part of a thorough evaluation process is calculating how much money you will lose for each hour (or minute, or another time increment of your choice) of downtime.

Here is average cost of unplanned Service outage as per recent study.

average cost of unplanned Service outage

Consequences of poor change management

  • Lower Productivity
  • Passive Resistance
  • Active Resistance
  • Turnover of valued Employees
  • Disinterest in the current or future state
  • Arguing about the need for change
  • Taking sick days or not showing up
  • People finding work-around
  • Divides are created between 'us' and 'them'.

Best Practice to avoid Outages due to Poor Change Management

Key Contributor for Major Outages is due to Poor Implementation of Change Management Process. Here are few best practices/corrective actions outlined which will ensure zero manual error:

  • No changes to be carried out without approved change request in the system.
  • Clear definition of Change request and Service Request.
  • All changes to Production environment should be subject to change request.
  • Train all Engineers on key business applications and impact of their unavailability.
  • All changes to be carried out only in approved change window timelines.
  • All critical changes to be validated by Doer and Checker concept
  • All critical changes to be carried out by L3 engineer only
  • All critical changes to have detailed implementation plan especially to prevent manual error
  • Changes specifically – Patch roll out, File server management, IP address release/issue, AD and VLAN administration to be by default reviewed for possible manual error.
  • All 3rd party change management process must be integrated with customer overall change management process.

Conclusion

Change Management Value to Business

Reliability and business continuity are essential for the success and survival of any organization. Service and infrastructure changes can have a negative impact on the business through service disruption and delay in identifying business requirements, but Change Management enables the service provider to add value to the business by

  • Prioritizing and responding to business and customer change proposals
  • Implementing changes that meet the customers agreed service requirements while optimizing costs
  • Contributing to meet governance, legal, contractual and regulatory requirements
  • Reducing failed changes and therefore service disruption, defects and re-work
  • Delivering change promptly to meet business timescales
  • By Manish Rathi
  • 6/10/18
  • 00
  • itil, Service Management, Change management, ITSM outtrages, poor change management, IT Service

Comments

Categories

Upcoming Batch

ITIL - Pune

Every Weekend

ITIL - Bangalore

Every Weekend

ITIL - Mumbai

Every Weekend

ITIL - Chennai

Every Weekend

Popular Certification

learn the core disciplines of ITIL best practices. Upon completing the ITIL Foundation course, you'll be well positioned to pass the associated ITIL exam required for entry into intermediate-level ITIL courses. Enroll now

Read more

The ITIL Intermediate Level has a modular structure with each module providing a different focus on IT Service Management. You can take as few or as many Intermediate qualifications as you need. The Intermediate modules go into more detail than the Foundation certification and focus on different areas of the framework.