Testing (How Software is Built Today III)
by Gerard Meszaros
Most people assume that testing is something you do after the engineering and construction are done, in order to check that the program runs and does what it should. That’s pretty much the way we used to build software. Some people still do that.
But modern dogma calls for extensive, automatic, built-in tests that are written alongside — or often before — the code. Each object that could possibly do something wrong gets tested in isolation. Each combination of objects that ought to work together gets tested together. All these tests are run all the time, automatically, so when something breaks you find out right away.
Ten years ago, “legacy code” meant “parts of the program written with punched cards, paper tape, stone knives and bearskins.” Since 2004 , it’s meant “parts of the program written before we used tests. Tinderbox 1.0 shipped in 2002, and so our code was still shiny and new when it became “legacy” code.
We’ve had three generations of tests in Tinderbox. The first tests, called Monkeys, were what we now call customer or acceptance tests. They exercised the whole system, focusing on a specific task or feature. The TextLinkMonkey, for example, opened a text window and and lots of links in order to check link editing behavior. The monkeys often (but not always) caught mistakes before they left Eastgate HQ, and they helped clarify what the program should do in situations where that wasn’t already clear. But they were also fragile and slow and it was too easy to overlook subtle problems — not to see, for example, that one of a dozen text links was shorter or bluer than it ought to have been.
Over time, the Monkeys were supplemented and then replaced by proper unit tests. Tinderbox got its own home-made unit test framework around 2005, and this rapidly chased the monkeys into the background. There were about 100 of these tests in Tinderbox 5.
Some of these tests were still too broad. There were lots of tests that depended on Hypertext and HypertextView, and that meant that most of Tinderbox had to be working in order for the test to get down to business. This wasn’t a problem for Tinderbox 5 development, but it was a real obstacle to Tinderbox 6 because so much of Tinderbox 6 has to start fresh. And Xcode now has its own unit testing framework which resembles, but also differs from, the custom-made Tinderbox test system.
One priority right now is to get better testing for smaller objects. For example, each Tinderbox view has a small utility object, a LayoutPolicy, that knows how to arrange notes in that kind of view. The OutlineLayoutPolicy, for example, knows about the geometry of outlines, and TreemapLayoutPolicy knows about the geometry of tree maps.
The old tests tested the view after it had been assembled with the appropriate LayoutPolicy. This has worked fairly well, to be sure, and it’s not that hard to do. But over the years, some of the layout policies had grown shaggy and complicated, and the layout policies had all become too closely coupled with the views. This is annoying because improving a layout policy meant needing to recompile a dozen or more classes, and that slows things down. For Tinderbox Six, coupling became a real nuisance; you couldn’t really build the view classes until the layout policies were working, but you couldn’t know whether the layout policies were working until you had a view class to test them with. For a few days, I was holding a hammer in one hand, a plank in one hand, and needed to reach for a nail…
Fixing these dependencies has been good for the code; the LayoutPolicy classes are now slimmer and easier to understand. But it knocked the development schedule back a week, maybe two, in order to improve a part of the system that was working fine. This was probably necessary to pay down technical debt, but I sure could use a week or two if you have some to spare.