This post is an article that was published in The Ergonomist: Newsletter of the Institute of Ergonomics and Human Factors.
Ref: Shorrock, S. and Licu, T. (2014). Moving toward Safety-II: Ensuring things go right. The Ergonomist. March. Download the original article here.
In September 2013, EUROCONTROL published a white paper entitled ‘From Safety-I to Safety- II’. The white paper, led by Professor Erik Hollnagel and published by EUROCONTROL’s Network Manager, explains the key differences between, and the implications of, two different ways of thinking about safety. An argument is made that safety management should move from ensuring that “as few things as possible go wrong” (Safety-I) to ensuring that “as many things as possible go right” in daily operations (Safety-II).
The white paper was initially presented in Dublin in September 2013 at EUROCONTROL’s network manager human performance workshop to around 150 people. The ideas have been introduced at several other industry conferences and are attracting a good deal of interest among various communities and industries.
This article provides seven Q&As delivered by Frank Brenner, Director General of EUROCONTROL, at Gdansk International Air & Space Law Conference in November 2013. The hosts of the conference were the Civil Aviation Authority and the Polish Air Navigation Services Agency. At the special invitation of the President of the Civil Aviation Authority Piotr Ołowski, the conference was attended by air and space law experts and aviation organisation representatives from around the world including Nancy Graham, Director of the International Civil Aviation Organization (ICAO) Air Navigation Bureau, Catalin Radu, President of the European Civil Aviation Conference, and Frank Brenner, Director General of EUROCONTROL.
The following summary of the Q&A session aims to give a short introduction to some views and reflections within EUROCONTROL on some of the ideas within the white paper.
Q: Why do we need to start thinking differently about safety now? What has changed?
A: Well, think back just 30 years… In 1983, Apple introduced the Lisa computer. It had a 12 inch black and white screen, and 1Mb capacity and weighed 21 kilos. Around 11,000 were sold. In October this year, Apple introduced the latest iPad, with over a hundred gigabytes of capacity, a weight of less than half a kilo. Over 170 million iPads have been sold.
Thirty years ago, the concept of the internet was just an idea. Now the world is connected via video, voice, instant messaging, and social media – all wirelessly.
The world is much more demanding and complex. The pace of change is increasing all the time. And so it is with air traffic management (ATM). ICAO figures show that annual air traffic exceeded five trillion revenue passenger kilometres in 2012, nearly five times more than 1983. We have advanced communication, navigation and surveillance technology being introduced, and game-changing controller tools that enable big increases in capacity, cost-efficiency and environmental performance.
The way that we look at safety has not evolved in the same way. A lot of progress has been made by importing methods from the process industries. But change has been slow and gradual, and we need to change the way we approach safety for three key reasons.
First, complexity is overtaking us. Now we have so much interconnectivity that we cannot simply decompose systems. ATM is too complex and interacts in ways that are hard to understand. Also, the ATM system cannot be shut down like an oil rig or a reactor.
Second, we cannot understand safety simply by predicting and understanding component failures and ultra-rare accidents. This approach does not tell us about system behaviour and why the system usually works so well.
Third, ATM is still a very human-centred industry. While we reduce the variability in the performance of technical equipment, people need to remain fl exible and to adjust to changes in demand and conditions.
We need to understand safety first by understanding clearly how the ATM system really works, how people really work, and why it all works so well most of the time, not just how it occasionally fails.
Q: But aren’t 80 to 90% of aviation safety occurrences due to human error?
A: We now know that this kind of quote is meaningless. Humans are involved in all aspects of the industry, so in one sense we can say humans are involved somehow in all unwanted and wanted events. When someone says something was caused by human error, it tells us more about how they understand systems and how deeply into the system they are willing to look.
People do not perform like a valve, either working or failing. Th ink of how we drive. We make constant adjustments to our acceleration, braking, lanes, distance from other cars, indications, etc. We do this depending on the car, the traffic, the road conditions, the rules, when we need to arrive, etc. It is the same in air traffic management. People have to adjust their performance constantly – it has to be variable. And this works brilliantly most of the time. Occasionally, the variability in demand, the work conditions, and performance is not what we expect. This is not simply ‘human error’. If we want to improve the system and enable the growth in capacity and complexity that is to come, we need to understand how the performance of the system varies.
Q: So does this mean we need to stop looking at what goes wrong?
A: No. We still investigate safety occurrences, assess risk, and try to understand safety problems. But this approach will not be enough to grasp complexity. We have to understand how things really work. This is what Safety-II is really about – understanding how the system works. Things are changing fast, and if we do not keep up, our safety analysis will be out of date. So we need to understand how people, procedures and equipment interact, and the variability in these interactions. And we need to understand how this usually goes right. Only this will allow us to anticipate developments proactively and understand how things occasionally go wrong.
Q: But we have hardly any accidents! Isn’t that proof that current safety management is effective?
A: Accidents are at an all-time low, for several reasons. Safety activities have, of course, played a big part. Safety regulation has developed and implemented basic standards, and authorities have ensured compliance. Safety management systems have ensured a level of safety competency. We have made big improvements to safety policy, safety assurance, safety achievement and safety promotion via the European Safety Regulatory Requirement (ESARR) framework. But we cannot simply say that the system is safe because of safety regulation and safety management.
When you go into an operations or equipment room, you will see people who are keeping us safe on a day-to-day basis. They are the ones adjusting in real time to changes in demand and conditions. They make the ATM system work safely day and night. We also need to keep up to date with how the system works, why it works so well, and how this is all changing. Traditional safety management is important, but it cannot stand still while demand and complexity grows.
Q: The new EUROCONTROL network manager white paper is called ‘From Safety-I to Safety-II’. What are the main differences?
A: According to Safety-I, things go wrong due to technical, human and organisational causes – failures and malfunctions. Humans are therefore viewed predominantly as a liability or hazard. So we try to respond when something happens or is viewed as an unacceptable risk. We eliminate causes or improve barriers, or both. This approach is fi ne as far as it goes, but it does not tell us about safety – why things nearly always go right – or about what to do to keep it that way. It tells us about unsafety, about what occasionally goes wrong, and what to avoid. The purpose is that ‘as few things as possible go wrong’.
Safety-II is about ‘as many things as possible go right’. The reason things go wrong and right is the same – because the performance of the system is variable. It has to be, because demand is variable, and so are resources. But we need to understand the variability that we need and want, and the variability that we do not want, or is impossible to monitor and control.
People are a necessary resource that enables the system to be flexible and resilient. Rather than reacting to failures, we need to continuously anticipate developments and events. We need to understand how things usually go right as a basis for explaining how things occasionally go wrong.
Q: How can Air Navigation Service Providers (ANSPs) start to move toward Safety-II? Is anyone actually putting Safety-II ideas in action?
A: There are several practical ways forward and some ANSPs and other industries are already moving in this direction. They all rely on understanding how things work and why they usually work well. Of course, people are only too happy to talk about this!
Take safety culture as a start. This is of course vitally important, the heart and soul of day-to-day safety, from the CEO to every member of staff , operational and non-operational. EUROCONTROL has now surveyed nearly 90% of air navigation service providers. Tens of thousands of staff have answered questionnaires and thousands have participated in workshops. Increasingly, we are looking not just at problems, but also why safety culture is so positive – what do we need to defend and grow?
Also look at safety incidents. When investigating a safety occurrence, we need to start out by understanding how things normally work, because we know that things usually work the same way regardless of the outcome. Once we understand how the day-to-day work really works, we can not only prevent future occurrences, but improve the work more generally, that means improving the effectiveness of work, not just safety. So investigation is about helping to understand and improve the system, not just isolated cases.
Next take safety assessment. Here we try to understand when and how the variability of system performance becomes too difficult to monitor and control. We have all heard of cases of large-scale failures of hospitals and other organisations. How will we know about this before it happens?
In team resource management we can understand the day-to-day adjustments that people make to keep the system flexible, how performance varies in Operations, what tradeoff s do our people have to make?
Air navigation service providers are also observing how people and the systems works. A big part of the safety work for the new Heathrow tower for instance was based on weeks of observations in the simulator and in shadowing and live operations. This process is being repeated in many ANSPs and other industries; we are trying to understand how the work really works and why it usually works so well, not just looking out for failures.
Q: How would you summarise Safety-II in a nutshell?
A: Safety II is about understanding how the system really works, how performance varies over time, and how we can anticipate and respond proactively to developments. The aim is to make sure that as many things as possible go right.
EUROCONTROL (2013) From Safety-I to Safety-II: A White Paper. See http://www.eurocontrol.int/news/safety-focus-what-goes-right.