Saturday, 28 August 2010

A Long Chain

Some things you remember all your life, like the news of some totally impossible, totally unexpected event - the death of JFK, the events of 9/11, and, on the 28th of January, 1986, the loss of the space shuttle Challenger.

At the time images of the shuttle were everywhere. Nikolodium, one of the brand new cable TV channels, a channel for children, had a Space Shuttle animation running during all its program breaks. The Space Shuttle was shorthand for all of the USA’s competencies in technology, organisational skills and wealth.

I was in one of the engineering labs at Boeing Huntsville when I heard the news. The phone rang and one of my colleagues answered. I can remember fragments of the conversation. Dick Peterson was asking, “But what happened to the shuttle?” He put the phone down and said that there’d been some kind of explosion during the launch, there was no sign of the shuttle. His wife and child had been watching it on live TV and had called to tell what they’d seen. They wanted to know immediatly how such an impossible event could have happened.

In those days, even if we had a few computers on our desks we had no internet and I imagine we immediately went in search of a radio, or went home. This disaster, unlike the Apollo 13 which had been so memorably rescued, was over with the crew apparently killed instantly.

It was assumed at the time, ‘The Shuttle blew up’, in fact it did not. Hot gases from the right hand solid fuel booster leaked past a rubber ‘o’ ring seal. Eventually they burned through the aft part of the large orange hydrogen/oxygen ‘external tank’ which the shuttle orbiter, (the delta winged spaceplane which goes to orbit and glides back to earth) is attached to. A few seconds later the right hand solid rocket booster broke away at its aft securing point, then the ‘external tank’ broke up. Abruptly, at 48,000 ft, the orbiter was tossed around and subjected to 20g forces that caused it to break up.

What happened to the crew next, during the time it took the crew compartment to come back down, is a matter of conjecture. One wing was torn from the orbiter almost immediately. The intact crew cabin broke free and continued to ascend, to 65,000 ft. After the break up the crew were in free-fall and there are indications that at least some of them were still alive. The reserve oxygen tanks show depletion of two and a half minutes of oxygen, the time from the event until the crew cabin hit the ocean with a force of 200G.

Theoretical physicist Richard Feynman was drafted into the Rogers Commission on the Challenger Space Shuttle Disaster. He was an unusual choice, but whoever nominated him showed good judgement. Feynman’s conclusions on system reliability and the gap between the expectations of management and engineering reality are on target and have a wide application.

Feynman was one of a number of American scientists that had been on the Manhattan project to develop the atomic bomb. In later life he would recall his time at Los Almos as a time when things were organised extraordinarily well. His views of NASA were very different. Feynman’s findings led to the three year halt in shuttle flights during which NASA went through widespread reorganisation.

The head of the Rogers Commission to investigate the Shuttle disaster described him thus, "Feynman is becoming a real pain." Yet Feynman, with no small assistance from another commision member, Air Force General Donald Kutyna, memorably revealled the cause of the disaster on live TV. Feynman also threatened to remove his name from the Rogers Commision official report unless he was allowed to add his own appendix which described a number of further risk areas which could be potential safety issues for subsequent flights.

Despite NASA’s official position, pre-Challenger, that the complete shuttle was extremely safe a large number of NASA engineers felt that the risk of a launch failure was between 1 in 50 and 1 in a 100. Challenger was its 10th flight, (and the 25th shuttle flight overall). On the day of the launch the very cold conditions were far from ideal.

The issue, as Feynman demonstrated on TV with a cup of iced water, a clamp and some of the rubber o ring material was that in very cold conditions, as these were, the rubber became stiff and would not recover its shape fast enough to maintain a seal.

There might be a few who would see Feynman’s on-screen revelation as grandstanding. Maybe a little, yet Feynman was smart enough to fully understand the problem, he had the tenacity to dig deep to the bottom of it and he had the balls to go public with it.

Donald Kutyna, no slouch at digging to the bottom of problems for the Air Force, arguably used Feynman. He told Feynman about he'd been repairing his car and discovered that some carburettor seals were ineffective due to low temperatures on a cold morning. Feynman latched onto this and related it to the o ring seals on the shuttle. Maybe he thought Feynman’s approach would get the point across more effectivly than his own.

Kutyna, even before coming to the Roger’s Commision already knew plenty about the Shuttle, and NASA’s organisation. He’d been in charge of setting up Vandenberg Air Force’s shuttle launch facility. Vandenberg was never actually used for launching the shuttle because of the Challenger disaster. Kutyna, himself not one for mincing words, said that NASA allowing the Challenger launch, under those conditions, was akin to an airline allowing a plane to fly despite evidence that one of its wings was about to fall off.

Feynman and Kutyna made an excellent team and Feynman went on to add his appendix to the Roger’s Commision report which can be read here.

Feynman’s section opens thus, It appears that there are enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life. ... The higher figures come from the working engineers, and the very low figures from management...quoting NASA’s official line... "the probability of mission success is necessarily very close to 1.0." It is not very clear what this phrase means, asks Feynman, Does it mean it is close to 1 or that it ought to be close to 1? This rhetoric is how NASA management deluded themselves into overlooking the alarms and cautions that their own engineers were raising.

Feynman’s section looks at a few other aspects as well. The shuttle had, by the standards of the 1980’s, a large amount of safety critical software running on it. He says about 250,000 lines of code in its entirety. (By current aircraft standards no more than one major subsystem.)

Feynman is complimentary of the software production process. First, having described all the very labour intensive methods of classic safety critical software testing, he adds, ...there is an independent verification group, that takes an adversary attitude to the software development group, and tests and verifies the software as if it were a customer of the delivered product.... the computer software checking system and attitude is of the highest quality.

However, in a word of caution he says, (although) there appears to be no process of gradually fooling oneself while degrading standards so characteristic of the Solid Rocket Booster .... To be sure, there have been recent suggestions by management to curtail such elaborate and expensive tests as being unnecessary at this late date in Shuttle history.

Three years later, and after much redesign and reorganisation, the Shuttle flew again.

17 years on, another shuttle, Columbia was lost during a re-entry accident. This on the 113th shuttle launch. But by now no one was pretending that the shuttle was as safe as a commercial airliner. Or was pretending that after fixing one weak link in a long chain that somehow the entire chain had been strengthened. As Feynman put it, When playing Russian roulette the fact that the first shot got off safely is little comfort for the next.


  1. The Challenger loss was the 25th shuttle flight not 10th.

  2. Yes, the 25th shuttle flight and the 10th flight of Challenger.