This is a continuation of yesterday’s blog post on BP’s culture of risk.

The cause of the recent accident on the Deepwater Horizon and resulting Macondo oil spill are still under investigation, but it appears there was no single failure. Instead there was a chain of decisions and events like the one described in the previous blog post for the Ixtoc I oil spill. Some details have been revealed by congressional investigators. The Wall St. J. has reproduced the letter addressed to BP’s chairman from the House Committee on Energy and Commerce. Yesterday’s New York Times has an excellent long article on design weaknesses of blowout preventers.

I won’t speculate about the exact decisions that led to the accident on the Deepwater Horizon rig. I presume that a lot of work went into the design and specification of the equipment, materials, and processes. However, the main contributor to the accident may have been a culture at BP that encouraged engineers to engage in risk creep, to ignore the impact of low probability, high cost events, and reward overconfidence. I will discuss these in detail in the next sections.

BP has a reputation of taking on expensive, high-risk engineering projects. It was a participant in the construction of the Trans-Alaska Pipeline, it invests in Russia and Kyrgyzstan, and it was the lead developer of the Thunder Horse PDQ platform, the world’s largest and most expensive offshore platform, which nearly sank after its commissioning in 2005. BP has an explicit strategy of seeking the biggest oil fields in the Gulf of Mexico, even if it means drilling in deep waters far from shore.

ThunderHorse

Thunder Horse platform. Photo from Wikipedia

Nothing attracts top engineering talent like big challenges and an opportunity to work on high-profile, big budget projects. BP provided plenty of that with its Gulf Coast projects. The ability to handle the low temperatures and high pressures at the bottom of the gulf combined with ability to accurately guide the drill bit at extreme depths are amazing technical achievements. But it can also lead to cost overruns and schedule slips. When combined with the pressure to meet budgets and deadlines, it can lead to accidents.

Allowing risk creep

Good engineering practice requires that designs outside the known limits (called the design envelope) be done as experiments, preferably in a laboratory setting, preferably by PhDs who have extensive knowledge of the phenomena being studied, and that lots of data be collected so that the design can be standardized and repeated with confidence. That is, you want to get to the point that the design is easy to replicate and if you don’t make any avoidable mistakes, it works. However, this doesn’t appear to be what happened in the evolution of deepwater oil drilling. Instead, engineers built deeper, more complex wells without testing their designs adequately prior to implementation.

There are four factors that lead to risk creep. First, long periods of “safe” operation reinforces the belief that the current practices and designs are sufficient. Guess how many wells were drilled offshore in the Gulf of Mexico since the Ixtoc I accident in 1979? How about 50, or 200, or even 1,000? Not even close, try over 20,000. There have been 22 blowouts. But not all wells are the same; the newer wells are deeper, with colder temperatures and higher pressures. Overcoming the belief that long stretches with few accidents mean everything is well understood and under control is really hard, especially as firms compete with each other to meet production targets and minimize costs.

Second, very little time is spent on reflection of past failures. Failures don’t just mean accidents. For every well blowout, there are thousands of near-miss incidents where dangerous unexpected kicks or casing damage occurred. Most engineers consider it a burden to conduct safety reviews, file incident reports, and attend project post-mortems. Time spent doing this is less time spent on new projects. But reviews allow engineers to see trends. They also can help encourage more of the behaviors that led to good results and eliminate those that caused problems.

Third, engineers may believe that extrapolating current designs to new conditions don’t require peer review. Nobody likes to have their work reviewed by outsiders. And managers don’t want to spend the time and money to do it. Unless lots of effort is made, it becomes hard to get into the practice. Similarly, when time sensitive decisions must be made, it is easier to forge ahead with the current plan (or a quickly improvised new plan) than to stop and consider alternatives.

Finally, the risk may be growing so slowly that nobody who works in the field day-to-day notices that the process is actually out of control.

Ignoring rare events

In his book, The Black Swan: The Impact of the Improbable, Nassim Nicholas Taleb points out that humans are prone to two deceptions. First, we think that chaotic events have a pattern to them. That is, we believe that the best way to predict the future is to look at the recent past. Second, we underestimate the importance of rare events. In fact, we believe that rare events are not worth planning for since they are too infrequent to care about. Tony Hayward, the CEO of BP called the Macondo oil spill a one-in-a-million event. (It wasn’t, it is closer to 1 in 1,000.) But even if it were, the enormous consequences means that there is no excuse for not including it in planning at the top levels of the company.

BlackSwan     Image from Amazon

Rewarding overconfidence

As I mentioned earlier, engineers (and many other professionals) are rewarded for being confident in their projections. Managers select projects based on how confident they are about the chance of success. And they are influenced by the confidence of the engineer proposing the project. So everyone learns to speak with more confidence than is safe.

However, overconfidence doesn’t require an external reward. For example, I believe that I am a better than the average driver. I believe I can navigate icy roads safely, and can handle any emergency situation. Everyone believes this. When I first get on an icy road, I drive slowly until several drivers pass me. Then I speed up to match the speed of the other drivers and start passing other cars myself. I know I shouldn’t do this, but I do it anyway. I haven’t been in an accident, so that reinforces my behavior. Similarly, every time I get into my car I don’t explicitly consider the chance that I might kill someone. But I should. And I should be reminded of my fallibilities and the dangers every few minutes, lest my attention wander. I should drive every second as if someone will, not just could, die every time I make a mistake.

Proposals for reducing risk

The solution to oil spills is not to stop drilling offshore because the technology is inherently unreliable and unsafe as some writers recommend. Rather, it is to assume that equipment can fail, that hurricanes will strike, that unexpected rock formations exist, that mistakes in selecting the right mud will be made, and pressure to meet schedules and budgets exist, and then design the mitigation for each.

First, engineers need to admit that they are running experiments whenever they are designing and building something that is even slightly beyond the scope of an existing project. Once engineers admit that what they are doing is an experiment, not just following a recipe in a cookbook, they will be more cognizant of the need to consider the risk, examine alternative methods, take care when collecting data, and to spend more time analyzing the data after the end of the project. Managers also need to consider each project an experiment and remember that experiments can fail. They must be willing to nurture calculated risk taking. They must also be willing accept the cost of mitigation (or the cost of the consequences). It appears that BPs managers failed at this.

Second, engineers need to be more open about their work. In other fields like physical science and medicine, researchers are encouraged to disclose the results of their work and solicit peer review. Engineers rarely publish their findings, for two reasons. First, they are not paid to. Second, nearly all of their work is considered proprietary by management. Even work that would benefit the industry as a whole, like new safety ideas or techniques to protect the environment are often hidden from competitors. The government needs to encourage or enforce sharing of safety data, require public reporting of near-miss incidents, and set standards for best practices. Currently, the government relies too heavily on industry expertise. To adequately police industry, the government needs to start hiring engineers as regulators, recruiting at top universities, paying competitive salaries, and conducting its own research.

Unfortunately, I don’t have high hopes that government regulators, investors, and managers learn the correct lessons from the Macondo oil spill. Rather than looking at the systemic causes of accidents, we will ban offshore drilling for a few months to assuage the public. Then regulators will write new rules like requiring acoustic transducers that shows they are getting tough and reforming the industry. But they won’t do anything that actually encourages critical thinking or processes that channel engineers to do the right thing. Then once the public outcry dies down, new technology, risk creep, and overconfidence will return. But it will be invisible until the next accident happens and we are all left wondering again how something awful like that could happen in America.

[Update1: On June 22, a federal judge issued an injunction that struck down the Obama administration’s six-month offshore drilling ban. The Justice Department is preparing an appeal.]

[Update2: I just noticed a really eerie coincidence. In the sixth paragraph, there is a hyperlink to a report that provides the counts of total offshore oil wells and blowouts. The report is dated April 20, 2010, the same day of the Deepwater Horizon accident.]

[Update3: There is a recent AP story that points to some of the same human errors as this blog post.]

Advertisements