January 28, 2018
MarkBernstein.org
 

Fragile Tests (ultra-wonkish)

Fragile Tests (ultra-wonkish)

An importance part of Tinderbox development is a fairly big suite of tests that run all the time. Right now, there are 1,297 tests, and they’re run many times a day. Tests can’t check everything, but they do check lots of things, and let me know when something comes unstuck. I seldom let a test fail for more than a few minutes.

A fragile test is a test that sometimes passes but occasionally fails. That’s a very bad thing. Most of the fragile tests are fragile for well-understood reasons; for example, there are a couple of tests for time intervals that fail just before or just after we switch to daylight savings time. Those could be fixed, of course, but it’s just simpler to leave myself a note.

Tests that involve multiple cores at the same time can be fragile because you never know exactly how many cores will be available, or when one of them might be interrupted. Any sign of fragility is an asynchronous test is almost surely a serious failure.

Sometimes, though, a test is fragile without rhyme or reason. Here’s one saga.

Tinderbox’s has a bunch of kibitzers that help keep the map view neat. For example, the Top Alignment Kibbitzer keep an eye out for notes whose tops are nearly aligned; if you move a note so its top is almost but not quite aligned with its neighbor, the Top Alignment Kibbitzer will align it. There are a whole bunch of kibbitzers, and they're not hard to test: you put down a note, you move a second notes so the kibbitzer in question should wake up, and you see if the kibbitzer does in fact wake up and check that it does the right thing. You also test that other kibbitzers don't wake up at inappropriate moments and start shoving things around for no good reason.

We’ve got about 40 of these tests, and one of those tests has been fragile for months. It tests the Top Adornment Spacing Kibbitzer, an advisor that tries to move notes that are near the top of an adornment a consistent distance from the top edge. This would fail about once a week, without rhyme or reason. Only the top adornment spacing test failed; the test for the Left Adornment Spacing Kibbitzer never gave any trouble. What was wrong?

The problem, it turns out, was a bad C++ copy constructor for LayoutInfo objects — the objects that keep track of layout details in the map view. LayoutInfo has a ton of instance variables — it’s really just a bundle of data and nothing else. One of those details is the size of the text for drawing the title.

Now, the title of an adornment is drawn at the top of the adornment, and it's handy to have the Top Adornment Spacing Kibbitzer try to reserve the top area of the adornment for the title. That’s reasonable! But, somehow or other, the text size got dropped from the copy constructor, and the Kibbitzer happens to use the copy constructor when it’s trying to decide whether to wake the kibbitzer. The test expects the text size to be zero, and usually it actually was zero, but once in a while you'd get a garbage value and the kibbitzer would fail to fire because the adornment was drawing the title in billion point Helvetica. The baseline is somewhere beneath the basement, and the kibbitzer goes back to sleep because there's no way this can be anything it can fix.

Lessons: