Systems Thinking for Safety: From A&E to ATC

This article summarises a EUROCONTROL Network Manager White Paper called Systems Thinking for Safety: Ten Principles. The White Paper was a collaboration of EUROCONTROL, DFS, nine other air navigation service providers and three pilot and controller associations. The purpose is to encourage a systems thinking approach among all system stakeholders to help make sense of – and improve – system performance. This article was originally published in Safety Letter, a magazine of the German air traffic services provider Deutsche Flugsicherung (DFS) and can be downloaded here.

Only human

Imagine you are a health professional – a doctor or a nurse – working in a busy accident and emergency (A&E) department. You are highly trained and are experienced in your job, and you are motivated to do your best. But you work within an hospital A&E department that faces many challenges. Demand is very high, and it can vary significantly and quickly depending on the time of day, the day of the week, and the time of the year. At the busiest times, there are many patents waiting to receive care. There have been government funding cuts to social services and it can take days to get an appointment with family doctors/general practitioners, so patients come to A&E when they feel it necessary, often when their sickness has advanced significantly. There are often many older people, with a complicated range of conditions and medications.

You are constrained by a lack of capacity in the hospital. You have to transfer patients to other parts of the hospital, but there are not enough beds, so they have to stay in A&E for longer, leading to a build-up of demand. You only have 10 beds in the A&E department, and occasionally ambulances have to wait outside, as there are no beds available for the patients. Despite this, you have a 4 hour target to meet – 95% of patients must be seen, treated, admitted or discharged in under four hours. The pressure is very high. Sometimes, there are not enough nurses or doctors.

You often have to work very long hours with little sleep and few breaks. You feel fatigued, too much of the time. You interact with a wide variety of equipment to diagnose, test and monitor, and some of it can be tricky to use. The manufacturers vary and the user interfaces can be confusing, or just subtly different. You have to prescribe a huge range of medicines, and some of these have similar labels, so that two very different doses or two different medicines actually look very similar. You are under time pressure most of the time, but each patient and nurse needs some of your time. Since there are only 10 beds, trade-offs have to be made. For instance, patients are sometimes kept in corridors temporarily. It is a difficult job, but everyone does their best to make things work.

This is not a fictional scenario. It is a scenario faced daily by many hospital A&E departments. It serves here to remind us that we may be highly skilled, experienced and motivated, but we only human, and we work in a system that has a powerful effect on our possibilities – for good or bad.

The Fundamental Attribution Error

When we take the time to think about work situations in this kind of way, the difficulties and compromises faced by people in highly dynamic and complex systems become more clear. Yet often don’t think in this way; we are steered first by our first impressions or reactions. These can lead us astray when we seek to explain situations, events or occurrences that involve other people. Research has shown that we tend to emphasise internal or personal factors to explain someone else’s behaviour, while ignoring or minimising external, contextual or system factors. When things go wrong in life – such as a tragic major accident, a near miss or an annoying minor occurrence – this bias seems to kick in. We tend to focus automatically on the person at the sharp end. We seem to think that they should really have done better, perhaps been more careful, and then things would have been OK. The phenomenon is known as the ‘fundamental attribution error’, and explains why we so often blame other people for things over which they had little control, or were influenced by powerful systemic factors that affect most people.

Part of the reason for this is that individuals are ‘obvious’ – we can see them, they are relatively unchanging. The context or system, on the other hand, is not so obvious and it changes quickly – it seems to be ‘background’. So when we try to explain events, we focus on what we can see – the person. This is our automatic or gut reaction – it is easy, and takes little effort to explain away an event as being something to do with an individual. (Additionally, it seems easier to direct our frustration, anger or outrage toward a person.) But considering the situational constraints is a complex activity which requires deliberate inquiry and conscious effort. Indeed, we slip into internal explanations when under greater ‘cognitive load’ and lack the energy or motivation to consider situational factors. We may also think, without even realising, that the world is ‘just’ and we all have control over our own lives, so people get what they deserve. Unsurprisingly, when we judge our own behaviour, we are more likely to explain negative outcomes in terms of the context or situation (and positive outcomes in terms of our own disposition) – sometimes known as the self-serving bias.

In organisations, there are a few problems with focusing on the person, especially when things go wrong. One is a problem of fairness. Contrary to our first impression, the context and ‘system’ in which we work has an enormous effect on behaviour. This is why the celebrated management thinker, statistician and quality guru W. Edwards Deming wrote that “most troubles and most possibilities for improvement…belong to the system”, and are the “responsibility of management”. When we dismiss the influence of the system – including the system goals, resources, constraints, incentives, flow of information, etc – we are unfairly blaming an individual for a much bigger issue. The second, and perhaps more important problem, is more practical. When we focus our attention on the individual, we miss opportunities to address future problems and improve the system. Put another way, “We spend too much of our time ‘fixing’ people who are not broken, and not enough time fixing organization systems that are broken” (Rummler and Brache, 1995). What is surprising about the accident and emergency scenario above is not that there are accidents and incidents. What is surprising is that there are so few. Pit against a system with such high demand, with limited resources, with such difficult constraints, people somehow manage to make it work most of the time. But if we want to make improvements, we cannot simply rely on front-line staff to always compensate in this way. If we want to improve how things work – for safety but also for productivity, efficiency, human wellbeing – we must think about the system and how it works. This relates to the idea of ‘systems thinking’.

It’s the system, stupid!

To make sense of situations, problems and possibilities for improvement, we need to make sense of systems. ‘System’ is a word we sometimes use to describe a technical system, but here we use the word much more generally, especially to refer to systems in which humans are an integral part. An A&E department is a system, and is part of a bigger system (a hospital), which is part of a bigger system (e.g. a private or government health service), and interacts with others systems (e.g. police service, transport system). In ATM/ANS, one might consider the working position or sector, an Ops room or equipment room, a centre, an organisation, the airspace, the ATM/ANS system, the aviation system or the transport system. Systems exist within other systems, and exist within and across organisational boundaries. In practice, the boundaries of a system are where we choose to draw them for a purpose.

A system can be also characterised in terms of its purpose, the components and the patterns of interactions between the components (which produce characteristic behaviour). The purpose or goal of the system is critical, yet we often take this for granted, or recite some ‘official’ purpose (“Safety First!”) without thinking too much about it. In practice, there are several interdependent goals, but we can say that these relate to the customers or system users and their needs. These needs are not simple, however, and often conflict. For instance, as passengers, we need to arrive at our destination safely, but we also feel a need to arrive on time.

A system comprises a number of components. Some components are obvious and visible, for instance people, equipment and tools, buildings, infrastructure, and the like. Others system components are less visible, typically organisational components (such as goals, rosters, competency schemes, incentives, rules, norms) and political and economic components (such as pressures relating to runway occupancy, noise abatement, and performance targets). All of these have a powerful influence on how the system works – on what the people within the system do – and yet because we cannot readily ‘see’ them, we sometimes don’t realise this. These components – both obvious and less obvious – interact in characteristic ways or patterns, within an environment which may be more or less changeable, in more or less predictable ways.

‘Complex systems’

The term ‘complex system’ is often used in aviation (and other industries), and it is important to consider what is meant by this. According to Snowden and Boone (2007), complex systems involve large numbers of interacting elements and are typically highly dynamic and constantly changing with changes in conditions. Their cause-effect relations are non-linear; small changes can produce disproportionately large effects. Effects usually have multiple causes, though causes may not be traceable and are socially constructed. This means that we jointly construct and negotiate an understanding of reality, its significance and meaning, though our models of the social world and through language. In a complex system, the whole is greater than the sum of its parts and system behaviour emerges from a collection of circumstances and interactions. Complex systems also have a history and have evolved irreversibly over time with the environment. They may appear to be ordered and tractable when looking back with hindsight. But in fact, they are increasingly unordered and intractable. It is therefore difficult or impossible to decompose complex systems objectively, to predict exactly how they will work with confidence, or to prescribe what should be done in detail. This state of affairs differs from, say, an aircraft engine, which we might describe as ‘complex’ but is actually ordered, decomposable and predictable (with specialist knowledge). Some therefore term such systems ‘complicated’ instead of complex (though the distinction is not straightforward). While machines are deterministic systems, organisations and their various units are purposeful ‘sociotechnical systems’.

If only everybody just did their job!

Despite these complex interactions and the nature of the wider system, the way that we try to understand and manage sociotechnical system performance (i.e. people and technology) is as a collection of components – a bunch of stuff that will all work together so long as each part does its job. This focus on ‘components’ – a person or a piece of equipment – is common in many standard organisational practices. At an individual level, it includes incident investigations that focus only on the controller’s performance, behavioural safety schemes that observe individual compliance with rules, individual performance reviews/appraisals, individual incentive schemes, individual performance targets, etc. This focus is a management trade-off, since it is easier to focus on individuals than complex system interactions!

You may be wondering, “how is this a problem?” Surely, if everybody does his job, it will be all right! A counter-intuitive truth seems to defy this idea. Deming observed that “It is a mistake to assume that if everybody does his job, it will be all right. The whole system may be in trouble”. That may seem like a curious statement. How can it be so? The famous strategy of industrial action known as ‘work to rule’ gives one answer. In his book ‘Images of organisations’ Gareth Morgan wrote: “Take the case of the old state-owned British Rail. Rather than going on strike to further a claim or address a grievance, a process that is costly to employees because they forfeit their pay, the union acquired a habit of declaring a “work to rule”, whereby employees did exactly what was required by the regulations developed by the railway authorities. The result was that hardly any train left on time. Schedules went haywire, and the whole railway system quickly slowed to a snail’s pace, if not a halt, because normal functioning required that employees find shortcuts or at least streamline procedures.” Fancy that! If everyone does only his or her job, exactly and only as officially prescribed, then the system comes to a grinding halt! As well as this, it is not possible to design exactly the interactions between all system components (people, procedures, equipment), so people – as the only flexible component – have to ‘take up the slack’; they adapt to the needs of the situation (the same situation that we often miss when looking from the outside, with hindsight). People are always needed to fill in the holes in between work-as-prescribed; to adjust and adapt to changing, unforeseen and unforeseeable conditions. These adjustments and adaptation are the reason why systems work.

Yet sometimes, such adjustments and adaptations are defeated by changing system conditions. While we accept the adjustments so long as the system is working, we decry them when the system fails. Going outside of the organisation, this is reinforced in the justice system, which seeks a person to put on trial when accidents occur (e.g. the 2013 train crash at Santiago de Compostela). The unwritten assumption is that if the person would try harder, pay closer attention, do exactly what was prescribed, then things would go well. Ironically, even when the individual and their sharp-end performance has been found to the ‘primary cause’, the recommendation of major investigations is rarely to remove the person from the system. The recommendation is increasingly to change the system itself (e.g. the train crash at Santiago de Compostela). This is of course very unfair, but while we cannot yet seem to shake off our simplistic causal attributions, we seem more capable of understanding that possibilities for improvement belong to the system. We seem to know, at some level, that simply replacing the individual with another individual is not a solution.

How to suboptimise a system

A second counter-intuitive truth comes from organisational theorist Russell Ackoff, who wrote that “it is possible to improve the performance of each part or aspect of a system taken separately and simultaneously reduce the performance of the whole” (1999, p. 36). A practical example can be seen when each department or division of an organisation seeks to meet its own goals or improve at the expense of another, for instance creating internal competition for resources. In such cases, the emphasis may be on the function instead of the interactions and flow of work to satisfy customer needs. Ackoff quipped that “Installing a Rolls Royce engine in a Hyundai can make it inoperable”.

Organisations are not cars, obviously. Yet we often manage these complex social systems as if they were complicated machines. As well as making changes at the component level, we also have a habit of:

assuming fixed and universal goals – the purpose of a car engine is fixed. Not so with organisations.
using reductionist methods – we can break an engine down into parts, analyse each and model and test the result, with fairly reliable results. Not so with organisations.
thinking in a linear and short-term way –we can think linearly for simple and complicated machines. Not so with organisations.
judging against arbitrary standards, performance targets, and league-tables – it might make sense to have a league table of engines, and to have standards and performance targets for each part. Not so with organisations.
managing by numbers and outcome data – everything in a car engine can be meaningfully quantified in some way. Not so with organisations.

We also tend to lose sight of the fact that our world is changing at great speed, and accelerating. This means that the way that we have responded to date will become less effective. Ackoff (1999) noted that organizations, institutions and societies increasingly interconnected and interdependent, with changes in communication and transportation, and that our environments have become larger, more complex and less predictable. We must therefore find ways to understand and adapt to this changing environment.

Treating a complex sociotechnical system as if it were a complicated machine, and ignoring the rapidly changing world, can distort the system in several ways. First, it focuses attention on the performance of components (staff, departments, etc.), and not the performance of the system as a whole. We tend to settle for fragmented data that are easy to collect. Second, a mechanical perspective encourages internal competition, gaming, and blaming. Purposeful components (e.g. departments) compete against other components, ‘game the system’ and compete against the common purpose. When things go wrong, people retreat into their roles, and components (usually individuals) are blamed. Third, as a consequence, this perspective takes the focus away from the customers/ service-users and their needs, which can only be addressed by an end-to-end focus. Fourth, it makes the system more unstable, requiring larger adjustments and reactions to unwanted events rather than continual adjustments to developments.

A systems viewpoint

A systems viewpoint means seeing the system as a purposeful whole – as holistic, and not simply as a collection of parts. We try to “optimise (or at least satisfice) the interactions involved with the integration of human, technical, information, social, political, economic and organisational components” (Wilson, 2014, p. 8). Improving system performance – both safety and productivity – therefore means acting on the system, as opposed to ‘managing the people’ (see Seddon, 2005).

As design and management becomes more inclusive and participatory, roles change and people span different roles. Managers, for instance, become system designers who create the right conditions for system performance to be as effective as possible. System, actors, such as front line staff, also become system experts, providing crucial information on how the system works, helping to make sense of it, and providing the necessary adjustments.

The ten principles that follow give a summary of some of the key tenets and applications of systems thinking for safety that have been found useful to support practice. The principles are, however, integrative, derived from emerging themes in the systems thinking, systems ergonomics, resilience engineering, social science and safety literature.

The principles concern system effectiveness, but are written in the context of safety to help move toward Safety-II (see EUROCONTROL, 2013; Hollnagel 2014). Safety-II aims to ‘ensure that as many things as possible go right’, with a focus on all outcomes (not just accidents). It takes a proactive approach to safety management, continuously anticipating developments and events. It views the human as a resource necessary for system flexibility and resilience. Such a shift is necessary in the longer term, but there is a transition, and different perspectives and paradigms are needed for different purposes (see Meadows, 2009).

Each principle is described along with some practical advice and questions for reflections, that apply to various safety-related activities. But following are some tips from a EUROCONTROL White Paper just published, Systems Thinking for Safety: Ten Principles. Each of these is described in much more detail in the White Paper, and in a set of Learning Cards. You can download both at SKYbrary, via www.bit.ly/ST4SAFETY.

The Foundation: System Focus

Most problems and most possibilities for improvement belong to the system. Seek to understand the system holistically, and consider interactions between elements of the system

Principle 1. Field Expert Involvement

The people who do the work are the specialists in their work and are critical for system improvement. To understand work-as-done and improve how things really work, involve those who do the work.

Q. When trying to make sense of situations and systems, who do we need to involve as co-investigators, co-designers, co-decision makers and co-learners? How can we enable better access and interaction between system actors, system experts/designers, system decision makers and system influencers?

Principle 2. Local Rationality

People do things that make sense to them given their goals, understanding of the situation and focus of attention at that time. Work needs to be understood from the local perspectives of those doing the work.

Q. How can we appreciate a person’s situation and world from their point of view, both in terms of the context and their moment-to-moment experience? How can we understand how things made sense to those involved in the context of the flow of work, and the system implications? How can get different perspectives on events, situations, problems and opportunities, from different field experts?

Principle 3. Just Culture

People usually set out to do their best and achieve a good outcome. Adopt a mindset of openness, trust and fairness. Understand actions in context, and adopt systems language that is non-judgmental and non-blaming.

Q. How can we move toward a mindset of openness, trust and fairness, understanding actions in context using non-judgmental and non-blaming language?

Principle 4. Demand and Pressure

Demands and pressures relating to efficiency and capacity have a fundamental effect on performance. Performance needs to be understood in terms of demand on the system and the resulting pressures.

Q. How can we understand demand and pressure over time from the perspectives of the relevant field experts, and how this affects their expectations and the system’s ability to respond?

Principle 5. Resources and Constraints

Success depends on adequate resources and appropriate constraints. Consider the adequacy of staffing, information, competency, equipment, procedures and other resources, and the appropriateness of rules and other constraints.

Q. How can we make sense of the effects of resources and constraints, on people and the system, including the ability to meet demand, the flow of work and system performance as a whole?

Principle 6. Interactions and Flows

Work progresses in flows of inter-related and interacting activities. Understand system performance in the context of the flows of activities and functions, as well as the interactions that comprise these flows.

Q. How can we map the flows of work from end to end through the system, and the interactions between the human, technical, information, social, political, economic and organisational elements?

Principle 7. Trade-offs

People have to apply trade-offs in order to resolve goal conflicts and to cope with the complexity of the system and the uncertainty of the environment. Consider how people make trade-offs from their point of view and try to understand how they balance efficiency and thoroughness in light of system conditions.

Q. How can we best understand the trade-offs that all system stakeholders make with changes in demands, pressure, resources and constraints?

Principle 8. Performance Variability

Continual adjustments are necessary to cope with variability in demands and conditions. Performance of the same task or activity will vary. Understand the variability of system conditions and behaviour. Identify wanted and unwanted variability in light of the system’s need and tolerance for variability.

Q. How can we get and understanding of performance adjustments and variability in normal operations as well as in unusual situations? How can we detect when the system is drifting into an unwanted state over the longer term.

Principle 9. Emergence

System behaviour in complex systems is often emergent; it cannot be reduced to the behaviour of components and is often not as expected. Consider how systems operate and interact in ways that were not expected or planned for during design and implementation.

Q. How can we ensure that we look at the system more widely to consider the system conditions and interactions, instead of always looking to identify the ‘cause’? How can we get a picture of how our systems operate and interact in ways not expected or planned for during design and implementation, including surprises related to automation in use and how disturbances cascade through the system? How can we make visible the patterns of system behaviour over time, which emerge from the various flows of work?

Principle 10. Equivalence

Success and failure come from the same source – ordinary work. Focus not only on failure, but also how everyday performance varies, and how the system anticipates, recognises and responds to developments and events.

Q. How can best observe and discuss how ordinary work is actually done? Can we use a safety occurrence as an opportunity to understand how the work works and how the system behaves? How can we best observe, discuss and model ‘normal work’?

The principles do not operate in isolation; they interrelate and interact in different ways, in different situations. For instance, imagine an engineering control and monitoring position. There is variability in the way that alarms are handled, and some important alarms are occasionally missed. This must be understood in the context of the overall ATM/CNS system (Foundation: System Focus). Since success and failure come from the same source – everyday work – it is necessary to understand the system and day-to-day work in a range of conditions over time (Principle 10: Equivalence). This can only be understood with the engineers and technicians who do the work (Principle 1: Field Experts). They will view their work from their own (multiple) perspectives, in light of their experience and knowledge, their goals at their focus of attention, and how they make sense of the work (Principle 2: Local Rationality).

In particular, it is necessary to understand how performance varies over time and in different situations (Principle 8: Performance Variability). For this, we must understand demand over time (e.g. the number, pattern and predictability of alarms) and the pressure that this creates in the system (time pressure; pressure for resources) (Principle 4: Demand and Pressure). Through observation and discussion, it is possible to understand the adequacy of resources (e.g. alarm displays, competency, staffing, procedures), and the effect of constraints and controls (e.g. alarm system design) (Principle 5: Resources and Constraints) on interactions and the end-to-end flow of work (Principle 6: Interactions and Flow) – from demand (alarm) to resolution in the field.

It will likely become apparent that engineers must make trade-offs (Principle 7: Trade-offs) when handling alarms. Under high pressure, with limited resources and particular constraints, performance must adapt. In the case of alarms handling, engineers may need to be more reactive (tactical or opportunistic), trading off thoroughness for efficiency as the focus shifts toward short-term goals.

Through systems methods (see http://bit.ly/1DG1odH), observation, discussion, and data review, it may become apparent that the alarm flooding emerges from particular patterns of interactions and performance variability in the system at the time (Principle 9: Emergence), and cannot be traced to individuals or components. While the alarm floods may be relatively unpredictable, the resources, constraints and demand are system levers that can be pulled to enable the system to be more resilient – anticipating, recognising and responding to developments and events.

A systems perspective, and the ten principles outlined above, encourage a different way of thinking about work, systems, events and situations. Anyone can use the principles in some way, and you may be able to use them in different aspects of your work. We encourage you to do so.

References

Ackoff, R. (1999). Ackoff’s best: His classic writings on management. John Wiley.

Deming, W.E. (2000). Out of the crisis. MIT Press.

EUROCONTROL (2013). From Safety-I to Safety-II (A white paper). EUROCONTROL.

EUROCONTROL (2014). Systems thinking for safety: Ten principles (A white paper). EUROCONTROL.

Hollnagel, E. (2014a). Safety-I and Safety-II. The past and future of safety management. Ashgate.

Meadows, D. & Wright. (2009). Thinking in systems: A primer. Routledge. Rummler, G. A. and Brache A. P. (1995). Improving performance: how to manage the white space in the organization chart. Jossey Bass Business and Management Series

Seddon, J. (2005). Freedom from command and control (Second edition). Vanguard.

Skybrary (2014). Toolkit:Systems Thinking for Safety. http://www.bit.ly/ST4SAFETY

Snowden, D.J. & Boone, M.E. (2007). A leader’s framework for decision making. Harvard Business Review, November, pp. 7679.

Authors

Steven Shorrock is Project Leader, Safety Development at EUROCONTROL and the European Safety Culture Programme Leader. He has a Bachelor degree in psychology, Master degree in work design and ergonomics and PhD in incident analysis and performance prediction in air traffic control. He is a Chartered Ergonomist and Human Factors Specialist, and a Chartered Psychologist, with a background in practice and research in safety-critical industries. Steve is also Adjunct Senior Lecturer at the University of New South Wales, School of Aviation.

Jörg Leonhardt is Head of Human Factors in Safety Management Department at DFS – Deutsche Flugsicherung – the German Air Navigation Service provider. He holds a Master degree in Human Factors and Aviation Safety from Lund University, Sweden. He co-chairs the EUROCONTROL Safety Human Performance Sub-Group and is the Project leader of DFS- EUROCONTROL “Weak Signals” project.

Tony Licu is Head of Safety Unit within Network Manager Directorate of EUROCONTROL. He leads the support of safety management and human factors deployment programmes of EUROCONTROL. He has extensive ATC operational and engineering background and holds a Master degree in avionics. Tony co-chairs EUROCONTROL Safety Team and EUROCONTROL Safety Human Performance Sub-group.

Christoph Peters spends half his time as an Air Traffic Controller for Düsseldorf Approach and the other half as a Senior Expert in Human Factors for the Corporate Safety Management Department at DFS – Deutsche Flugsicherung. He completed a degree in psychology and is member of the EUROCONTROL Safety Human Perfomance Sub-Group and the DFS-EUROCONTROL “Weak Signals” project.

Humanistic Systems

Understanding and Improving Work & Life

Systems Thinking for Safety: From A&E to ATC

Only human

The Fundamental Attribution Error

It’s the system, stupid!

‘Complex systems’

If only everybody just did their job!

How to suboptimise a system

A systems viewpoint

References

Authors

Author: stevenshorrock

One thought

Leave a comment Cancel reply

Only human

The Fundamental Attribution Error

It’s the system, stupid!

‘Complex systems’

If only everybody just did their job!

How to suboptimise a system

A systems viewpoint

References

Authors

Share this:

Related

Author: stevenshorrock

One thought

Leave a comment Cancel reply