Nancy Leveson, Ph.D., professor of aeronautics and astronautics at the Massachusetts Institute of Technology (MIT), has been on a 30-plus-year quest to engineer out hazards that can lead to aircraft accidents. Her Systems Theoretic Accident Model and Processes or, simply, STAMP, eschews the quick-and-dirty, blame-the-pilots approach to post-accident analysis and instead embraces a holistic approach to identifying, analyzing and eliminating a wide range of hazards during aircraft design, development and construction, as well as in everyday operation.
“Hazard analysis can be thought of as investigating an accident before it occurs,” Leveson writes in her 2004 paper “A New Accident Model for Engineering Safer Systems.” “The most effective models will go beyond assigning blame and instead help engineers to learn as much as possible about all the factors involved, including those related to social and organization structures.” She emphasizes that when safety investigators simply sift accident data through traditional filters, they risk biasing the results in favor of only historical events or conditions. But if they choose to look outside those constraints, they may uncover a whole host of causal or contributing factors that previously may have been overlooked.

At its core, STAMP is based on a closed control loop, as illustrated in Figure 1: (1) controller, (2) actuator (3) controlled process and (4) sensors. In the early days of aviation, pilots were just controllers with mental models of appropriate processes, such as keeping the aircraft within its design flight envelope. They made hand and foot inputs to actuators that controlled processes. Outside inputs and disturbance also affected the controlled processes. The behaviors of the controlled processes were sensed by eyes, hands, ears and seat of the pants, providing feedback to the pilot/controllers who then made adjustments to their inputs to achieve the desired behaviors. If the controlled process went awry, it was relatively easy to trace the problem to an equipment failure or pilot error.

However, Leveson notes that as aircraft performance increased, the simple pilot-actuator-controlled process-sensor closed control loop resulted in higher and higher pilot workload. Accordingly, basic analog computers were introduced to aircraft and tasks were delegated to them with the goal of decreasing pilot workload, as shown in Figure 2.
Early analog computers made possible anti-skid braking, nosewheel steering and automatic redline limiting for powerplants and propellers. With the advent of digital computers, precision control of many more systems became possible. Software engineers became kingpins in aircraft design, able to create computer codes with unprecedented and near failproof capabilities.
Overall aircraft reliability also steadily improved, including structures, systems and powerplants. The percentage of aircraft accidents due to equipment failure plummeted. Conversely, accidents involving pilot error soared.
Leveson asserts many such accidents were symptomatic of higher level equipment design and operational control shortcomings. Hardware and software performed exactly as designed. But the original design requirements failed to encompass all possible complex process behaviors.
“We are attempting to build systems that are beyond our ability to intellectually manage. . . .” Leveson writes. It’s increasingly tough for designers to foresee all potential normal, non-normal and emergency system modes or states that can occur during flight operations. No wonder some pilots ask themselves “What’s it doing now?”
Leveson’s holistic approach to system design and accident investigation is becoming more essential because of the accelerated pace of technology development. That’s a major factor that has often been overlooked in accident investigation in the past. Technology advances are particularly challenging as they have outpaced system safety engineering.
Why Won’t It Stop?
Leveson cites two fatal jetliner accidents as being especially emblematic of shortcomings in aircraft design requirements and operational control. First, an Airbus A320-211, performing as Lufthansa Flight 2904, flew from Frankfurt to Warsaw in September 1993. Okecie Tower warned the pilots that wind-shear conditions were encountered on the approach to Runway 11 by Lufthansa Flight 5764, which had just landed. In response to AFM guidance, the pilots of LH2904 increased the selected landing reference speed. The pavement was very wet and surface winds were shifting.
When the Airbus touched down on the runway, the contact was so gentle the left main landing gear didn’t sufficiently compress to trigger the weight-on-wheels (WOW) squat switch to ground mode until halfway down the runway, 9 sec. after the right main WOW switch activated. Hydroplaning prevented the main wheels from spinning up to 72 kt., the minimum needed to override the WOW switches to the ground mode. As a result, ground spoiler deployment and thrust reverser activation were delayed. Also because of hydroplaning, the main wheels didn’t spin up until 4 sec. later, thereby allowing the wheel brakes to start functioning. Compounding those problems, the crew touched down long and 20-kt. fast. These were critical factors, considering the wet runway and 13-kt. tailwind.
The thrust reverser lockout, brake-by-wire system and automatic ground spoilers worked perfectly, according to design spec. Unfortunately, though, those systems in part prevented the flight crew from stopping the aircraft before it reached the end of the runway at 72 kt. The jetliner crashed down a shallow embankment, came to rest partially on a perimeter access road and caught fire. The pilot in the right seat and one passenger died as a result and 49 passengers and two crewmembers were injured.
A second example: Red Wings Flight 9268, a Tupolev Tu-204, flew on a positioning mission from Pardubice Airport, Czech Republic, to Moscow Vnukovo Airport in December 2012. The crew was cleared to land on Runway 19, a 10,039-ft.-long strip. It had recently snowed at Vnukovo prior to the aircraft’s arrival. With gusting winds reported, the crew increased approach speed to 124 kt. from 113 kt. to provide more stall margin. The crew actually flew 14-16 kt. faster.
The aircraft crossed the threshold of Runway 19 at 50 ft., but at 140 kt., it touched down long, fast and at a soft 1.12g, with a 26-kt. right crosswind. The left main strut compressed, but the right main strut remained partially extended due to the crosswind. As the nosewheel touched down, the crew selected maximum reverse thrust in one continuous movement and initiated maximum auto-braking. While engine rpm increased, the thrust reversers did not deploy, thus the aircraft accelerated. In addition, the air brakes and ground spoilers did not automatically deploy. And the crew did not actuate them manually, further compounding the problem.
The flight engineer shouted, “Reversers! Deploy reversers!” The pilots again moved initiated max reverse. Again, engine rpm increased, but the reversers did not move. And, again, the aircraft accelerated.
The aircraft skidded down the runway for 32 sec. and careened off the end at 116 kt. It plowed through snow and small obstacles, impacted a slope in a steep ravine at 102 kt. and broke into pieces. Five of the eight crewmembers on board were killed.
Accident investigators concluded that the crash was caused by maladjustment of the WOW sensors, causing the thrust reversers to be locked out and preventing automatic deployment of the ground spoilers. But the Russian Interstate Aviation Committee also faulted the pilots for not following AFM procedures and for flying an unstabilized approach. Aboard the Tu-204, pilots are required to first unlock and deploy the reversers at idle rpm, verifying that the reversers are in proper position before they fully actuate them. The accident report did not mention that during the stress of attempting to stop the aircraft on a slippery runway with gusting winds, this two-step actuation process may be easy to overlook.
All too often, Leveson notes, blaming the pilots just results in their being censured, fired or retrained. Alternatively, the quest to eliminate “pilot error” may result in increasing the level of automation or making standard operating procedures more rigid. It is no guarantee that the same type of accident will not recur because latent hazards remain hidden.
“In complex systems, human and technical considerations cannot be isolated,” she writes, reflecting specifically on the Warsaw A320 and Moscow Tu-204 accidents. “Human error is a symptom of a system that needs to be redesigned.”
In Front of the Screen, Behind the Screen
Too often, there is an invisible, impermeable wall between pilots and computers. According to Leveson, “Human factors concentrates on the [cockpit display or ATC computer] ‘screen out.’ Hardware and software engineering concentrates on the ‘screen in.’” The two parts of the system are not integrated, often leading to “mode confusion, situational awareness errors, inconsistent behavior, etc.,” Leveson asserts.
She cites the December 1995 American Airlines Boeing 757 controlled flight into terrain near Cali, Colombia, as a prime example of human/computer interface dysfunction. The flight crew was faulted for a breakdown in CRM. But Leveson notes that there was an inconsistency in the name of the waypoint identifier for ROZO intersection, used to navigate on the approach into Cali, between the published approach chart and the FMS navigation database. She also asserts the carrier’s training department did not alert its flight crews about the inconsistencies between the charts and FMS identifiers. She notes Jeppesen-Sanderson furnished both the published approach charts and FMS navigation database updates for the aircraft but did not tell the airlines that different waypoint identifiers were used on charts and in FMSes. And, finally, she points out that no international uniform standards existed for the digital navigation databases used in different avionics manufacturers’ FMS boxes.
Leveson says she did not invent the holistic approach to accident prevention. Rather, she credits the late Jerome Lederer, founder of the Flight Safety Foundation and NASA’s director of safety. A half-century earlier, he wrote, “Systems safety covers the total spectrum of risk management,” encompassing the states of mind of designers and producers; employee/management relations; the relationships between government, associations and operators; human factors; and the approach to safety by top management.
And she writes, “Studies have found that the most important factor in the occurrence of accidents is management commitment to safety and the basic safety culture in the organization or industry.” Top management cannot afford to assume that old accident analysis methods are adequate for today’s complex aircraft operations. Or that ever-increasing technological complexity can be crammed into conventional cockpits without providing pilots with more intuitive displays and controls.
“The problem is complexity,” Leveson says. Historically, engineers have oversimplified system behaviors. They’ve used the analytic reduction method to examine individual components, or pairs of components. Behaviors have broken down into individual events that occur during a predictable time sequence. This “divide and conquer” approach assumes each component functions autonomously or in concert with another component. It also assumes each component functions the same individually or when operating as part of a whole system. According to analytic reduction advocates, you can combine those simple interactions and you can predict the behavior of the entire system.
But such a simplistic approach, Leveson says, does not work with today’s complex systems. “The whole is greater than the sum of its parts,” she maintains, because of the myriad ways individual components, especially ones with complex software, can function with each other within a system in unexpected ways.
Greater Than the Sum of Parts
Emergence is a term used to describe synergies that evolve as a result of parts interacting with each other in a system. A single ant in a colony, for instance, is a simple creature that cannot lead the actions of the entire colony by itself. But when groups are guided by strong pheromone trails, when they are working with others to gather food, or when they are compelled to protect the queen ant at the cost of their own lives, individual ants in colonies are able to exploit their environments impressively over long periods of time. The emergent properties of ant colonies thus extend far beyond the limited capabilities and short life spans of individual ants.
J. Doyne Farmer, complex systems scientist and Oxford University math professor, says emergence is not magic, but it feels like magic because individual components in a system, similar to ants in a colony, can self-organize to produce wholly new properties and actions.

That is why Leveson’s systems theory approach to investigating potential accidents before they can occur is so relevant. It emphasizes the study of emergent properties, attributes or behaviors that arise from the complex and seemingly unprogrammed interaction of all parts contained within a system, as illustrated in Figure 3.
Both “safety and security are emergent properties,” says Leveson. The most effective way of keeping potential risks in check is by exerting firm, top-level control over the entire system. Preventing aircraft from intruding into minimum separation bubbles, keeping aircraft from operating outside of a safe-flight envelope, requiring component replacement or restoration well before they can break, and assuring adequate crew rest periods between duty times are examples of top-level control functions.

To optimize safety and security, Leveson believes a series of two complementary closed control loops are necessary. Figure 4 depicts a systems operations control loop applicable to pilots, flight operations, FAA field offices, air traffic control organizations, flight training organizations, and maintenance and repair organizations. For simplicity, we’ve chosen to show only a systems operations control loop for an aircraft operator.
In this closed loop, the U.S. Congress is at top-level control, ultimately responsible for the safety and security of civil flight operations in the U.S. The second tier, comprising the FAA, trade associations, unions, user groups and the court system, exercises authority granted by laws passed by the federal legislature to provide rules and regulations, oversight, guidance, operating standards, certification of pilots and operators, and, as a last resort, legal remedies.
The third tier, the company or organization top management level, perhaps is the most vital to operational safety and security, as noted by Leveson. The culture of a flight department operation directly depends on the commitment of top management to the highest safety and security end goals.
The company flight operations department, at the fourth tier, is charged with the responsibility and is given sufficient authority to screen personnel for qualification, to train people and to promulgate standard operating procedures and guidance materials to assure safety and security at the cockpit, cabin, service and maintenance level.
Congress ultimately is charged with the safety and security of flight operations, but time lags in the bottom-up feedback channels may result in delays of months or even years before issues are addressed. In addition, filtering may occur at each level in the feedback loop caused by internal organizational politics, strict cost control considerations and personality conflicts.
Yet, control actions only are as effective as the feedback provided by the people operating closest to the risks.
Dissecting data from the crash of Comair Flight 5191 at Lexington, Kentucky’s Blue Grass Airport (KLEX) in August 2006 provides examples. About an hour before sunrise, the crew attempted to depart from 3,501-ft.-long Runway 26, instead of Runway 22, a 7,003-ft.-long strip. Departing at a weight of about 49,000 lb., takeoff roll for the BombardierCRJ100 was 3,744 ft. The aircraft crashed off the end of Runway 26 at 137 kt., 5 kt. below rotation speed, killing everyone on board but the copilot.
The NTSB concluded the accident primarily was caused by the flight crew violating sterile cockpit procedures, failing to confirm they were on the correct runway and non-compliance with Comair’s standard operating procedures.
But “Blame is the enemy of safety,” says Leveson. “To prevent accidents in the future, [we] need to focus on why it happened, not who to blame.” Hindsight bias often just concludes the crew “should have, could have, would have” that identifies and illuminates peoples’ mistakes but fails to determine why those mistakes were made.
Paul Nelson, a former Comair captain, Air Line Pilots Association safety specialist and mentee of Leveson, conducted a STAMP Analysis of the KLEX Comair 5191 accident as part of a master’s program. The analysis, which delves deep into the accident’s root causes, includes the published airport diagram, which accurately depicts the layout of ramps, runways and taxiways as the flight crew had seen on their previous flights into Lexington. But since their last flight to the airport, the taxiways near the ends of Runways 22 and 26 had been partially closed and renamed. There was no “heads-up” alert of the change in the taxiways provided by the tower, ATIS or Comair dispatch release. Comair did not have an internal process for including some airport NOTAMs on dispatch releases. Nelson asserts the crew, at a minimum, needed an amended airport chart that would have shown the changes in taxiway layout.
The flight crew indeed had a flawed mental model of the airport diagram, aggravated by flashing low barricade lights on the closed taxiways and the lack of lights where they had previously seen them. The tower did not caution them that the taxi route was different than depicted on the airport diagram. The mishap occurred long before tablet computers and MFDs with moving map depictions, including airport diagrams, became commonplace. The pilots believed they were taxiing to Runway 22, unaware of the changes in taxiways. Comair didn’t have an SOP requiring the PFC heading bug to be set to runway heading prior to taxi.
It was dark at the airport. The crew probably was fatigued with a change in schedule and an 0400 wake-up call. Comair was in bankruptcy and attempting to save costs; it was demanding large cuts in pilot wages. This created financial stress on flight crews and it was a frequent topic of conversation in cockpits.
The Blue Grass Airport Authority also exercised unsafe control actions. It relied only on FAA guidance for erecting signage during construction. It did not seek out additional FAA help, other than NOTAMs, to find out how to assist pilots in identifying potential hazards. It never asked airport users how to improve situational awareness and it changed the name of Taxiway A5 to Taxiway A but only erected minimal signage.
The FAA also had a role. The hold-short markings for Runways 22 and 26 were standard pairs of solid and dotted yellow lines. The agency did not upgrade signage with white on red runway name signs adjacent to the hold short lines. Such enhancements only were required at airports with more than 1.5 million annual passengers. Furthermore, Lexington Tower was staffed with a single person during periods of low traffic volume. Budget constraints prevented assigning two people to work the tower and ATC coordination functions.
The single tower controller was multitasking that morning. The controller’s possible sleep deprivation could have impaired performance. The Comair crew was issued nonstandard taxi instructions. And the controller did not have time to monitor the position of the CRJ prior to clearing the aircraft for takeoff.
According to Nelson’s analysis, all these factors combined and emerged as a critical safety breakdown that ultimately led to 49 people dying in a fiery, high-speed crash. Congress, Comair, the FAA and the Blue Grass Airport Authority, as well as the flight crew, failed to exercise critical safety controls that could have prevented the accident.
STAMP Tackles Future Accidents . . . Before They Happen
Leveson conducts STAMP workshops at MIT every March, to teach participants her holistic STAMP control techniques, such as Causal Analysis using System Theory (CAST) pertaining to accidents, and System Theoretic Process Analysis (STPA), which applies to hazards.

STAMP participants not only dissect accidents caused by inadequate operational safety and security control using CAST, they also examine systems development control loops by applying STPA to identify potential hazards in design, development, testing and manufacturing processes, as shown in Figure 5. As with systems operations control loops, systems development loops start at the top with federal law. The second and third tiers, respectively, consist of regulators and top company management, among other stakeholders. 

Figure 6 shows the relationships between development and operational control loops. Each have the U.S. Congress at the top level and each provides feedback from end users. Free and open communications are the keys to success in attaining optimum safety and security in both control loops. Feedback from the people closest to hazards at the human/computer control loop level is critical to fine-tuning processes. Human and computer component interactions within the control process change over time due to fatigue, stress and state of mind, along with wear, age and maintenance. As a result, the process controls that are effective today may need updating tomorrow to attain the same safety and security outcomes.
“The public has a “decreasing tolerance for single accidents,” she writes. “The losses from accidents [are] increasing with the cost and potential destructiveness of the systems we build.”
That observation is particularly applicable to the aviation industry. According to Leveson, “Learning from accidents needs to be supplemented with increasing emphasis on preventing the first one.” For safety-conscious aviation department managers, learning from the MIT professor’s annual STAMP seminars is worth consideration. For more information on the March 2019 workshop, visit http://psas.scripts.mit.edu/home/2019-stamp-workshop/