Businesses

HOW TO: Use Fault Tree Analysis To Predict Business Failures

The world of business isn’t as forgiving as the world of Mario–to ensure that your run through the world of Koopa Troopas goes smoothly, you should try a li’l something called fault tree analysis.

fault tree analysis

It’s tempting to smash buttons and hope for the best, but more often than not, that just leads to us desperately running into a Goomba and screaming, “BUT THIS IS SUPPOSED TO WOOORKKKKKK!”.

Okay, not the most pleasant way to go down.

Plus, in the world of business, we’re usually limited to just one life–continuous mistakes disappoint customers, and encourage them to turn elsewhere.

So what can we do to minimize the risk of mistakes and make the most of our scarce resources?

INTRODUCING… *drumroll* THE FAULT TREE ANALYSIS!

Today’s post will talk about:

  • What FTAs are
  • What they’re made up of
  • How you can make one
  • How you can make sense of one.

What’s an FTA?

This handy diagram is basically a mind map that analyzes all of the possible ways your business (or launch or pop up shop or any other endeavor) will fail.

Getting a close-up look at all the possible ways you’ll fail in the future is a bit gut-wrenching, but we promise–it’s actually very handy.

By creating a fault tree analysis, you’re basically making a list of all the risk factors rollin’ on up in the future. And once you’ve identified all of those risks, you’ll be able to address them properly. (After all, you can’t prepare for a hurricane if you don’t know it’s coming.

But if you are aware, then you can stockpile on food, clothes, clean water, and so on and so forth).

For example, say you’ve been playing Mario all weekend and you’ve finally reached the boss level, where a pixelated and rather frightening Bowser awaits.

In the worst case of scenario–you lose in the final level–what would the causes of failure be?

In this case, the fault tree might look a little bit like this:

fault tree analysisBut that’s not all!

A true fault tree analysis is actually a bit more complicated than that.

Where did FTAs come from, Cotton Eye Joe?

This handy diagram was created back in 1962–older than many of us here–by a Mr. H.A.Watson (haw haw haw) for the Air Force.

Its purpose?

To evaluate the Minuteman I Intercontinental Ballistic Missile (ICBM) Launch Control System for possible errors.

Because of its history as an engineering tool, it’s most often seen in the fields of science and technology–think aerospace, nuclear power, chemical, and other high-hazard industries.

BUT that doesn’t mean you can’t use it! It’s also been used to identify risk factors in social + public service systems, and in software engineering for debugging purposes.

What can FTAs be used for?

  • show compliance with system requirements
  • figure out where your business launch went wrong
  • understand the logic/chain of events that could lead to the top undesired event
  • diagnose problems in your business system
  • minimize and optimize resources
  • design a business or eingeering system
  • create diagnostic manuals

What are FTAs made of?

Part 1: The Logic

True failure trees run on this magical type of sorcery called Boolean logic (named after the spiffy lookin’ English scientist named George Boole, who came up with the whole idea).

Boolean systems are made up of a series of statements that are either TRUE or FALSE.

By combining statements into bigger statements, and combining those statements with other statements, we can create entire operating systems and applications and a whole bunch of other stuff-that-makes-our-lives-easier-even-though-we-don’t-know-how-they-work.

We’re going to be diving a little deeply into the foundations of computer science here, but stick with me: mastery of the failure tree analysis is an amazing asset to your business.

So, we know that in a Boolean system, we have a lot of statements. And each statement is either true or false. To illustrate: let’s say “the sky is blue” is a true statement. But “Today is Monday” is a false statement.

Next: we can combine statements using three fundamental logic operators, which are

1) AND

2) OR, and

3) NOT.

The FTA method mainly focuses on the first two operators, aka “AND” and “OR”. To understand how these operators work, let’s combine the two statements I mentioned above with the “AND” operator.

The sky is blue (true) AND Today is Monday (false).

The result of the combination is a statement that’s incorrect. Even though the sky is blue, it’s not Monday–so we’re wrong, aka the statement is false.

To put it more simply: if you have statements that are linked with “AND”, then all statements must be “true” for the overall statement to be true. If even one statement is false, then the entire combination is false.

If you have statements linked with “OR”, however, it’s the other way around. As long as 1 statement is true, then the entire statement is true.

The only way the combination could be false is if all of the component statements are false.

You can get a clearer look with these charts:

Statement 1 Statement 2 Statement 1 AND 2
False False False
False True False
True False False
True True True

 

Statement 1 Statement 2 Statement 1 OR 2
False False False
False True True
True False True
True True True

 

Within the fault tree analysis, the “AND” and “OR” operators work as gates for specific events. We use different symbols to represent different gates, and different shapes to represent different events.

Basically, events + operators = fault tree analysis.

FTAs are a lot like our society–they have a hierarchy.

At the very top is the undesired outcome–the project failure, the BIG DISASTER, etc. We break these down into intermediate events, which can be further broken down into basic events. And, like I said earlier, they’re all linked with the operator gates.

Part 2: The Shapes

Now, if you want to be a true expert on FTAs, you’ll have to use the correct shapes.

Remember those awesome pattern blocks we used to play with back in elementary school?

This is like that, but a little bit trickier. (Ah, what I’d give to be in kindergarten again..) So, let’s learn our shapes!

Event shapes:

The circle stands for a basic event.

You usually use circles at the very bottom of your fault tree analysis, once you’ve really broken down each possible cause of failure as much as you can. It can’t be explained further.

The diamond stands for an undeveloped event.

These types of events could be basic events or intermediate events, but we don’t really know because we don’t have enough information yet. Or, they might occurs due to factors outside of our control. These are usually in the same hierarchy level as basic events.

The long-ish rectangle stands for intermediate (and top) events.

These events are logical combinations of basic events.

Gate shapes:

The OR Gate is one of two operator gates. Remember how we talked about the rules of OR statements? As long as one statement is true, then the overall statement is true. Let’s say you have three basic events leading up to an OR gate. If any of those events are true, then the logic passes and you can move up to the intermediate event.

Example:

  1. There are three apples.
  2. The apples are in the basket.
  3. The apples are tasty.

These three statements are linked below an OR gate. You can pass through the gate as long as any of the statements are true. If none of the statements are true, you won’t be able to pass through the OR gate.

The AND Gate is the other type of operator gate on an FTA. In order for basic events to pass through all AND gate, they must all be true.

  1. There are three apples.
  2. The apples are in the basket.
  3. The apples are tasty.

If there are three apples, and they’re in the basket, but they’re not tasty, then the combined statement won’t be able to pass through the AND gate.

There are some other shapes that you might see on a FTA, but we won’t discuss those for now. Let’s talk about how to actually create an FTA (and after that, we’ll talk about how we can analyze them).

There are five steps to follow when creating a Fault Tree Analysis:

Preparation: Get your favorite drink (like…apple juice?) and a few friends, because if you do this by yourself you’ll probably end up with a migraine

1. Define the system and determine the top event.

  • What’s the scope of the issue or system that you’re analyzing?
  • What constitutes “failure”?

These are two very important questions you should ask yourself when creating an FTA.

Maybe “failure” = failing to break even. Or maybe “failure” means having stock left over from a pop up sale.

Other questions you could ask are, “How will this impact my business?”. Hint: try calculating your business valuation to see how much you really risk losing.

2. Define the overall structure.

Break down each event as much as possible.

If “failure” means having stock left over from a pop up sale, your intermediate events could be 1) venue problems AND/OR 2) stock problems AND/OR 3) seller problems.

A great way to determine your first level of intermediate events is to ask, “Where does the problem lie? With me, with the system/program, with the product, or what?”

Another example would be a car crash. If the car crash is the top event, then the first-level intermediate events could be 1) brake problems AND/OR 2) driver problems AND/OR 3) external problems.

3. Explore each branch in successive detail.

Now all you have to do is continue the top-down process until the root cause for each branch is identified (basically, when you can’t decompose further). By this point all of your branches should end with only basic or undeveloped events.

Let’s stick to the leftover stock event. First intermediate event? Venue problems:

  1. Venue was unpopular.
  2. Venue was difficult to locate.
  3. No one knew about the event.

Stock problems:

  1. Stock was of poor quality.
  2. Unappealing packaging.

And lastly, seller problems:

  1. Staff was unfriendly/unhelpful, and actively alienated customers.
  2. Staff was not well-educated about the products.
  3. Staff did not do their job properly.

See how we can break each intermediate event down further? In our second level of intermediate events, we now have eight more events–and there are definitely some we could break down further.

4. Analyze and solve the fault tree.

Now that you know all the possible causes of failure, you can analyze the fault tree and create a plan of action to address risk factors.

Here are questions to consider:

  • Which events are most likely to lead to failure?
  • Are there single events that initiate multiple paths to failure?
  • Are there any repeating patterns related to stresses, use, or conditions?

If your staff members didn’t know enough about the products, for example, you could hold a training session to help them become more acquainted. Or you could change up your project and team management techniques.

If you mismanaged your schedule, use Toggl to optimize the time you spend bringing in real results.

Now that you see all of the basic events (aka root causes) in front of you, it’s much easier to address them individually.

There are two main ways of analyzing fault trees.

The first is qualitative analysis, which is done by identifying “Minimal Cut Sets” (MCS). A Cut Set is a group of basic events that, if they all occur, then the top event will also occur.

A Minimal Cut Set (MCS) is the smallest cut set that you could possibly make, ie the least number of events that have to occur in order for the top event to occur.

Identifying Minimal Cut Sets allows you to see which groups of problems you absolutely must address.

The second way to analyze fault trees is via quantitative analysis. This is something lots of engineering buffs like–cold, hard numbers are difficult to dispute–but it’s also a bit technical, so feel free to skip over this part if you prefer qualitative analysis.

Pro tip: P(A), or the probability of event A = number of events divided by number of possible outcomes.

Say we have three basic events, with probabilities of 0.1, 0.02, and 0.09 respectively.

If they’re linked by an AND gate, the intermediate event’s probability is calculated by multiplying the individual probabilities. If they’re linked by an OR gate, all you have to do is add the probabilities.

5. Perform corrections and make decisions.

This is the last step in the failure tree analysis method. You have your plan of action–all that’s left to do now is implement it.

Technically, the Super Mario fault tree I made at the beginning of this post could be refined further–I could use AND and OR gates and the proper shapes to designate intermediate vs. basic events.

The great thing about FTAs is that they’re versatile–so whether you’re an entrepreneur, an astrophysicist, or a simple dude trying to play some video games, they’ll offer insight into your future risks and obstacles.

If you want to learn more about FTAs and take advantage of their full complexity, make sure to check out this handy slideshow from the University of Central Florida.

Another tool you could use to protect your business from possible risks is the Business Impact Analysis–it works great with FTAs.

Back in 2000, Dan Goldin (the administrator of NASA) stated:

To design systems that work correctly, we often need to understand and correct how they can go wrong.

And he’s right.

Failure and risk analysis will grow even more important as your business scales and grows.

It’s easier to prevent a problem than to deal with it once it’s arrived, full-blown and raging and ready to deal some damage. So here’s a fire flower, Mario–go save the princess.

By On March 20, 2018