"Can We Talk?"

Computer History Vignettes

By Bob Bemer

A Software Patch Problem

In Mark Halpern's first memoir in the Annals of the History of Computing he mentioned his uneasiness when the word "patch" was deemed no longer necessary to define. I can share his nostalgia.

When I moved to Phoenix from Paris in 1966, one of my first tasks was to see why certain GE software was not running up to advertised speeds -- the COBOL compiler in particular. I got the assistance of Leroy Ellison, and we started with comparison compiler runs. But it seemed to me that there was some gold to be mined in the operational aspects, so I went to the machine room to see COBOL compilations in action.

I noticed that the operator would get a message, necessitating the depression of a key. It seemed as though it was in an infinite loop. So I inquired, and found out that patches seemed to be the problem. Apparently the programmers thought it OK to write the COBOL compiler program, assemble it, but not do periodic reassemblies very often. Instead they put in a lot of patches, presumably intending to take them out upon the next assembly. Unfortunately that did not happen too often, and at the time of my investigation there were some 12,000 patches, residing on the drum!

Each patch took 64 words (a sector) of space on the drum to define where it went, and 64 words to contain the patch itself. Thus 12,000 patches, each using up 128 36-bit words on the drum, which not only caused a certain drain on drum capacity, but every time the COBOL compiler was read in to run it had to overlay each one of these 12,000 patches in sequence before the compiler could actually compile.

Now this might not have been too bad, but unfortunately the hardware people did not talk to the software people, and vice versa. The former knew of a design glitch called the "1-word DCW problem". DCW stood for Device Control Word. The hardware glitch occurred when only one word in a sector was used. But surely, hardware reasoned, that would not happen very often, and when it did the operator could call for a rereading which would probably work. But guess the circumstance where there would be only one word in a sector? Right. The pointer for the patch.

It turned out to be VERY important when there were 12,000 patches, but the software people did not know about the problem. They hadn't been told, and thus did not ask the hardware folk to fix it. With 12,000 patches reading in at every compilation there was a high probability of the 1-word DCW glitch acting up.

There was an ABORT button on the console, so in such cases the operator had the option of aborting and going on to another process, or else trying to reread the drum. Naturally, no operator wants to foul up the operation, and so kept hitting the RETRY button. I timed this, and on average the RETRY button was hit 30 times before a correct read occurred. By my count this happened an average 30 times per shift. Each time there was also a typewriter message that said, in effect, "We could not read this time. Want to try again?". The typewriter was not buffered, and my timing gave 3 seconds for the message, 1 second to hit the RETRY button.

So we had 4 wasted seconds times 30 retries times 30 per shift. That equals 1 hour per 8-hour shift -- absolutely lost! And nobody realized it until I actually walked onto the test floor to observe!

A Process Transfer Problem

Another gold mine was in the operating system, which is usually pretty busy doing housekeeping when shifting attention to another portion of the system. But software management thought that not much could be gained here.

I didn't believe that, so I asked for a manual for the GE 600 (I had never seen the instruction set before), and then a listing of the actual code. I handled the first on a Thursday, the second on Friday. Over the weekend I pondered.

On Monday morning I had several substantial operational improvements. The most significant concerned the index registers, of which there were seven! When moving from process to process one could not know if the new process needed any or all of those registers. So the contents of all registers had to be saved upon each change, being restored when the relinquishing process was called again.

There in the code were Save Register instructions -- all seven of them. But the operation code set I had studied showed a single instruction that would save the content of all seven registers at once! So a simple modification to use this instruction would save lots of time.

You can see the communication problem now. The hardware people had not bothered to tell software about the new facility, and of course the software people were so busy they had no time to go back and see if there were any beneficial new capabilities.

MORAL-- Deliberately swap one or more people between the hardware and software operations. A lot of eyes will be opened.

Back to History Index            Back to Home Page