What are the Most Common Root Cause of Major Incidents?
As per recent Industry survey "80% of unplanned outages are due to ill-planned changes made by administrators ("operations staff") or developers in Production environment."
Here are Key reasons –
Here are the list of the companies which shows that ongoing failures and performance problems cost companies both in lost revenues and damaged reputations. The business effectively shuts its doors, puts out the 'We are closed' sign, and then is left wondering 'Will our customers be coming back?'
1. Bank of America Online Banking Down Across U.S.
Duration: 6 days
Impact: Affected 29 million online customers
What Happened: The problem was noted as the result of a "multi-year project" to upgrade its online banking platform.
2. Amazon EC2 Goes Dark In Morning Cloud Outage
Duration: 4 Days
What Happened: The trigger for this event was a network configuration change.
3. Google Suffers First Gmail Outage of 2011
Duration: 2 days
Impact: 120,000 users affected
What Happened: After analyzing the issue, Google Engineering determined that the root cause was a bug inadvertently introduced in a Gmail storage software update.
4. BlackBerry Outages Spread Throughout the World
Duration: 24 hours (some more)
Impact: Unavailable worldwide affecting millions of users around the world
What Happened: Service outages due to "core switch failure within RIM's infrastructure.
5. Intuit Service Outages Leave Frustrated Customers
Duration: 2 days (some users up to 5 days)
Impact: Thousands Affected
What Happened: The problem was caused by a change to their network configuration.
6. Yahoo Mail Suffers Outage
Duration: 24-hours plus
Impact: Affecting people around the globe
What Happened: Change configuration issues.
A recent Gartner study projected that "Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues.
Cost of Service Outage?
Unplanned outages are the responsibility of IT to resolve. However, at the end of the day they are, essentially, business issues. Part of a thorough evaluation process is calculating how much money you will lose for each hour (or minute, or another time increment of your choice) of downtime.
Here is average cost of unplanned Service outage as per recent study.
Consequences of poor change management
Turnover of valued Employees
Disinterest in the current or future state
Arguing about the need for change
Taking sick days or not showing up
People finding work-around
Divides are created between 'us' and 'them'.
Best Practice to avoid Outages due to Poor Change Management
Key Contributor for Major Outages is due to Poor Implementation of Change Management Process. Here are few best practices/corrective actions outlined which will ensure zero manual error:
No changes to be carried out without approved change request in the system.
Clear definition of Change request and Service Request.
All changes to Production environment should be subject to change request.
Train all Engineers on key business applications and impact of their unavailability.
All changes to be carried out only in approved change window timelines.
All critical changes to be validated by Doer and Checker concept
All critical changes to be carried out by L3 engineer only
All critical changes to have detailed implementation plan especially to prevent manual error
Changes specifically – Patch roll out, File server management, IP address release/issue, AD and VLAN administration to be by default reviewed for possible manual error.
All 3rd party change management process must be integrated with customer overall change management process.
Change Management Value to Business
Reliability and business continuity are essential for the success and survival of any organization. Service and infrastructure changes can have a negative impact on the business through service disruption and delay in identifying business requirements, but Change Management enables the service provider to add value to the business by
Prioritizing and responding to business and customer change proposals
Implementing changes that meet the customers agreed service requirements while optimizing costs
Contributing to meet governance, legal, contractual and regulatory requirements
Reducing failed changes and therefore service disruption, defects and re-work
Delivering change promptly to meet business timescales
By Manish Rathi
itil, Service Management, Change management, ITSM outtrages, poor change management, IT Service
learn the core disciplines of ITIL best practices. Upon completing the ITIL Foundation course, you'll be well positioned to pass the associated ITIL exam required for entry into intermediate-level ITIL courses. Enroll now
The ITIL Intermediate Level has a modular structure with each module providing a different focus on IT Service Management. You can take as few or as many Intermediate qualifications as you need. The Intermediate modules go into more detail than the Foundation certification and focus on different areas of the framework.
We, "NovelVista Learning Solution" have expertise in providing high end training & Certification programs for ITIL®, ITIL® Intermediate, SIAM, Cloud Computing, PRINCE2®, Scrum, Lean Six Sigma, DevOps, AWS, Cobit5, ISO 27001, MSP®, PMP, TOGAF9 etc