Monday

Tag Archives: 5-Whys

The limitations of root cause analysis

Learning lessons from projects is not as simple as you may think! Projects are complex adaptive systems linking people, processes and technology – in this environment, useful answers are rarely simple.

Certainly when things go wrong stakeholders, almost by default, want a simple explanation of the problem which tends to lead to a search for the ‘root cause’. There are numerous techniques to assist in the process including Ishikawa (fishbone) diagrams that look at cause and effect; and Toyota’s ‘Five Whys’ technique which asserts that by asking ‘Why?’ five times, successively, can you delve into a problem deeply enough to understand the ultimate root cause. The chart below outlines a ‘Five Whys’ analysis of the most common paint defect (‘orange peel’ is an uneven finish that looks like the surface of an orange):

These are valuable techniques for understanding the root cause of a problem in simple systems (for more on the processes see WP1085, Root Cause Analysis); however,  in complex systems a different paradigm exists.

Failures in complex socio-technical systems such as a project teams do not have a single root cause. And the assumption that for each specific failure (or success), there is a single unifying event that triggers a chain of other events that leads to the outcome is a myth that deserves to be busted! For more on complexity and complex systems see: A Simple View of ‘Complexity’ in Project Management.

Complex system failures typically emerge from a confluence of conditions and occurrences (elements) that are usually associated with the pursuit of success, but in a particular combination, are able to trigger failure instead. Each element is necessary but they are only jointly sufficient to cause the failure when combined in a specific sequence. Therefore in order to learn from the failure (or success), an approach is needed that considers that:

  • …complex systems involve not only technology but organisational (social, cultural) influences, and those deserve equal (if not more) attention in investigation.
  • …fundamentally surprising results come from behaviours that are emergent. This means they can and do come from components interacting in ways that cannot be predicted.
  • …nonlinear behaviours should be expected. A small change in starting conditions can result in catastrophically large and cascading failures.
  • …human performance and variability are not intrinsically coupled with causes. Terms like ‘situational awareness’ or ‘lack of training’ are blunt concepts that can mask the reasons why it made sense for someone to act in a way that they did with regards to a contributing cause of a failure.
  • …diversity of components and complexity in a system can augment the resilience of a system, not simply bring about vulnerabilities.

This is a far more difficult undertaking that recognises complex systems have emergent behaviours, not resultant ones. There are several systemic accident models available including Hollnagel’s FRAM, Leveson’s STAMP that can help build a practical approach for learning lessons effectively (you can Google these if you are interested…..)

In the meantime, the next time you read or hear a report with a singular root cause, alarms should go off, particularly if the root cause is ‘human error’. If there is only a single root cause, someone has not dug deep enough! But beware; the desire for a simple wrong answer is deeply rooted. The tendency to look for singular root causes comes from the tenets of reductionism that are the basis of Newton physics, scientific management and project management (for more on this see: The Origins of Modern Project Management).

Certainly starting with the outcome and working backwards towards an originally triggering event along a linear chain feels intuitive and the process derives a simple answer that validates our innate hindsight and outcome bias (see WP1069 – The innate effect of Bias). However the requirement for a single answer tends to ignore surrounding circumstances in favour of a cherry-picked list of events and it tends to focus too much on individual components and not enough on the interconnectedness of components Emergent behaviours are driven by the interconnections and most complex system failures are emergent.

This assumption that each presenting symptom has only one cause that can be defined as an answer to the ‘why?’ is the fundamental weakness within a reductionist approach used in the ‘Five Whys’ chart above. The simple answer to each ‘why’ question may not reveal the several jointly sufficient causes that in combination explain the symptom. More sophisticated approached are needed such as the example below dealing with a business problem:

The complexity of the fifth ‘why’ in the table above can be crafted into a lesson that can be learned and implemented to minimise problems in the future but it is not a simple!

The process of gathering ‘lessons learned’ has just got a lot more complex.