- Abhishek Agnihotry
-   any query mail me at 
-   agnihotry@gmail.com



                      1/31/2013   1
Scope:

•Problem control, error control and proactive Problem Management are all within the
scope of the Problem Management process. In terms of formal definitions, a
'Problem' is an unknown underlying cause of one or more Incidents or of a Major
Incident, and a 'Known Error' is a Problem that is successfully diagnosed and for
which a Work-around or FIX has been identified.




                                                                 1/31/2013            2
Input and Output of Problem Management



Inputs:
• Incident details from Incident Management
• Configuration details from (CMDB)
• Any defined Work-around (from Incident Management).




Outputs:
• Known Errors
• A Request for Change (RFC)
• An updated Problem record (Work-around / Fix)
• for a resolved Problem, a closed Problem record
• Response from Incident matching to Problems and Known Errors
•  MIS




                                                   1/31/2013     3
Root Cause Analysis – Review by Application Owners

  RCA provided by
     PM team



                                               Problem
    Sent to the
                                               and Error
    Application
                                                Control
      owners
                                                (slide 5)


  RCA analysis by
    application
      owners




      Approved              Yes


            No

 Request to PM team for rectified
   RCA based on App owners
       recommendations




                                                      1/31/2013   4
Problem & Error Control
                          PM
      Problem
                                Identify and Record
 Identification and
                                        Error
     Recording




     Problem
                                   Asses Errors
   Classification



                                                                  CM
      Problem
                                   Record Error
 Investigation and
                                    Resolution                    RFC
     Diagnosis




Root Cause Detected                 Close Error
                                    record and
                       KEDB         Associated        Change successfully
                      Updated       Problem(s)
                                                      Implemented

                                                      1/31/2013             5
Root Cause Analysis - a REACTIVE method of identifying event(s) causes




General principles of RCA:
•   To be effective, RCA must be performed systematically, and root
    causes identified backed up by documented evidence.

•   There may be more than one RC for an event or a problem

•   The purpose of identifying all solutions to a problem is to prevent
    recurrence at lowest cost in the simplest way, the simplest or
    lowest cost approach is preferred.

•   To be effective, the analysis should establish a sequence of events
    or timeline to understand the relationships between contributory
    (causal) factors, root cause(s) and the defined problem or event to
    prevent in the future.

                                                          1/31/2013          6
Root Cause Analysis - evaluation




1st : Is it readable?
     If it is readable it will be grammatically correct, the sentences will
  make sense, it will be free of internal inconsistencies, terms will be
  defined, it will contain appropriate graphics, and the like.
2nd : Does it contain a complete set of all of the causal relationships?
     If it did contain a "complete set of all of the causal relationships"
  one could (at least):
   ◦ 1. Trace the causal relationships from the harmful outcomes to
      the deepest conditions, behaviors, actions, and inactions.
   ◦ 2. Show that the important attributes of the harmful outcomes
      were completely explained by the deepest
      conditions, behaviors, actions, and inactions.




                                                        1/31/2013             7
Root Cause Analysis – Level of Causes




   Physical cause – Specific physical item that if corrected/replaced would fix
    the problem
   System cause – Possible underlying cause of physical failure




                                                                         Problem
                                                                         sympto
                                                                            ms
                                                         Physical
                                                          cause

                                              System
                                               cause


                                                             1/31/2013             8
Root Cause Analysis – Barriers




Cognitive laziness – Instead of taking the optimum result, we take the
  first sufficient result
Overconfidence – perusing evidences supporting our own belief rather
  than allowing the idea to represent the truth
Recency bias – Assume the same cause for two recent problem
  symptoms and therefore not performing a more rigorous investigation
Availability bias – Rely on available data rather than collecting /
  gathering more relevant or reliable data
Anchoring bias – latching on to the first data and its indication while
  ignoring possibility conflicting evidence
Confirmation bias – Looking for and accepting only data that confirms
  our preexisting assumption of the cause




                                                      1/31/2013           9
Root Cause Analysis – 7 Step problem solving model




  Identify the            List possible              ID most likely
    Problem               Root causes                 Root cause




                           Select and
Evaluate effect                                       ID potential
                           Implement
  of solution                                          solutions
                            solution




 Standardize
   process


                                                       1/31/2013      10
Root Cause Analysis – Use 5 Why’s to understand the issue



5 Why’s                                Problem: Car will not start

   This is the simplest method to     Why: Dead battery
    find out the Root cause
                                       Why: Bad alternator
   Drill deeper into problem until
    a Root cause is found              Why: Alternator’s belt broken

                                       Why: Belt achieved end of life

                                       Why: Recommended maintenance
                                       not performed




                                                             1/31/2013   11

RCA - Root Cause Analysis

  • 1.
    - Abhishek Agnihotry - any query mail me at  - agnihotry@gmail.com 1/31/2013 1
  • 2.
    Scope: •Problem control, errorcontrol and proactive Problem Management are all within the scope of the Problem Management process. In terms of formal definitions, a 'Problem' is an unknown underlying cause of one or more Incidents or of a Major Incident, and a 'Known Error' is a Problem that is successfully diagnosed and for which a Work-around or FIX has been identified. 1/31/2013 2
  • 3.
    Input and Outputof Problem Management Inputs: • Incident details from Incident Management • Configuration details from (CMDB) • Any defined Work-around (from Incident Management). Outputs: • Known Errors • A Request for Change (RFC) • An updated Problem record (Work-around / Fix) • for a resolved Problem, a closed Problem record • Response from Incident matching to Problems and Known Errors • MIS 1/31/2013 3
  • 4.
    Root Cause Analysis– Review by Application Owners RCA provided by PM team Problem Sent to the and Error Application Control owners (slide 5) RCA analysis by application owners Approved Yes No Request to PM team for rectified RCA based on App owners recommendations 1/31/2013 4
  • 5.
    Problem & ErrorControl PM Problem Identify and Record Identification and Error Recording Problem Asses Errors Classification CM Problem Record Error Investigation and Resolution RFC Diagnosis Root Cause Detected Close Error record and KEDB Associated Change successfully Updated Problem(s) Implemented 1/31/2013 5
  • 6.
    Root Cause Analysis- a REACTIVE method of identifying event(s) causes General principles of RCA: • To be effective, RCA must be performed systematically, and root causes identified backed up by documented evidence. • There may be more than one RC for an event or a problem • The purpose of identifying all solutions to a problem is to prevent recurrence at lowest cost in the simplest way, the simplest or lowest cost approach is preferred. • To be effective, the analysis should establish a sequence of events or timeline to understand the relationships between contributory (causal) factors, root cause(s) and the defined problem or event to prevent in the future. 1/31/2013 6
  • 7.
    Root Cause Analysis- evaluation 1st : Is it readable? If it is readable it will be grammatically correct, the sentences will make sense, it will be free of internal inconsistencies, terms will be defined, it will contain appropriate graphics, and the like. 2nd : Does it contain a complete set of all of the causal relationships? If it did contain a "complete set of all of the causal relationships" one could (at least): ◦ 1. Trace the causal relationships from the harmful outcomes to the deepest conditions, behaviors, actions, and inactions. ◦ 2. Show that the important attributes of the harmful outcomes were completely explained by the deepest conditions, behaviors, actions, and inactions. 1/31/2013 7
  • 8.
    Root Cause Analysis– Level of Causes  Physical cause – Specific physical item that if corrected/replaced would fix the problem  System cause – Possible underlying cause of physical failure Problem sympto ms Physical cause System cause 1/31/2013 8
  • 9.
    Root Cause Analysis– Barriers Cognitive laziness – Instead of taking the optimum result, we take the first sufficient result Overconfidence – perusing evidences supporting our own belief rather than allowing the idea to represent the truth Recency bias – Assume the same cause for two recent problem symptoms and therefore not performing a more rigorous investigation Availability bias – Rely on available data rather than collecting / gathering more relevant or reliable data Anchoring bias – latching on to the first data and its indication while ignoring possibility conflicting evidence Confirmation bias – Looking for and accepting only data that confirms our preexisting assumption of the cause 1/31/2013 9
  • 10.
    Root Cause Analysis– 7 Step problem solving model Identify the List possible ID most likely Problem Root causes Root cause Select and Evaluate effect ID potential Implement of solution solutions solution Standardize process 1/31/2013 10
  • 11.
    Root Cause Analysis– Use 5 Why’s to understand the issue 5 Why’s Problem: Car will not start  This is the simplest method to Why: Dead battery find out the Root cause Why: Bad alternator  Drill deeper into problem until a Root cause is found Why: Alternator’s belt broken Why: Belt achieved end of life Why: Recommended maintenance not performed 1/31/2013 11