October 7, 2022
MarkBernstein.org
 

Leaks

Leaks

Here’s an interesting example of software development in practice.

A customer was experiencing intermittent Tinderbox crashes. I asked to see the crash logs and any future crash logs. This question sometimes clears up a problem, because sometimes the crashes stop! But not this time; over a few days, we accumulated a few logs.

The crash logs varied a bit, but all seemed to involve export. “Did it crash while you were exporting?” The user didn’t think so. (That was a head scratcher. It’s not uncommon for users who experience a crash to have no idea what they were doing before the crash, but Export is fairly unusual and sometimes takes a while because you might be building a site with hundreds of pages. You’d probably remember if you were waiting for Tinderbox to finish and instead it quit. Hmmmm.)

A number of the crash logs revealed a crash in TbxProgressBar. That was interesting, because it's not usually a place where Tinderbox has trouble. I studied the code, and there’s a reason for that: as far as I can make out, TbxProgressBar simply cannot crash. I bullet-proofed the code, which was already bullet-proofed. I wrote some tests. I hoped for the best. No luck!

This left some twilight zone possibilities. Was something fouling up the TbxProgressBar object? I remember one pesky bug, ages ago, that was tracked down to a faulty memory chip right where one object tended to wind up. Could I be looking at the wrong version of the TbxProgressBar code? Was this a time-bomb crash, somehow planted by code that ran earlier? (Time bombs used to be really common, back before OS X. I haven't seen one in years, but who knows?)

After far too long, I asked the customer to take a look at Activity Monitor. What was Tinderbox’s memory footprint? The footprint was huge. Now, worrying about activity monitor is often pointless: Tinderbox uses a lot of memory because you have lots of memory to use. You have lots of memory and not enough time. This was said to be a complicated document, but nonetheless, the footprint was too big.

Finally, wiring up the document to the profiler, the answer emerged at once: a memory leak in ExportPathAttribute. This is a seldom-used, read-only attribute that tells you where this page will wind up if it's exported to disk. For years, each use of ExportPathAttribute has leaked — not much, but a drip. If you were editing a weblog and then exporting to your server, well, you might have wasted some kilobytes, but you wouldn’t notice that at all.

In the last year or so, however, a new approach to Tinderbox notes has become popular; people write their notes in Markdown or HTML, and when they read their notes, they use the Preview pane. This Preview-led Tinderbox isn’t what I’d designed, and it sometimes feels like Obsidian-in-Tinderbox or something like that, but in skilled hands it can be pretty cool. And this customer was really skilled!

So, we had a complex Tinderbox document with lots of actions and lots of agents, that was spending a lot of time in Preview. That meant Tinderbox was responding to changes from rules and agents and running a new Preview every few seconds. Preview was reaching out to rebuild a complex page by assembling lots of individual notes in a big overview. A few kilobytes every 3 seconds is a few megabytes every 5 minutes. Leave that cooking for a day or two, and hilarity is bound to ensue.

Why TbxProgressBar? Because Tinderbox updates the progress bar a lot during preview. Too much, clearly, but again: if preview is fast enough, who cares if it's updating a hidden progress bar and doing extra work? But that’s where we often were when we wanted to reach for some memory and the system said, “More? You want more?

And why did we have a leak in the first place? A decade or more back, Tinderbox adopted an optimistic approach to concurrency: agents ran in the background, and most of the time everything was OK. But “most of the time” isn’t really good enough, and perhaps four years ago I started to put this on a sounder basis. That meant taking a lot more care to make sure that we weren't writing a value in one thread at the same time we were reading it in another thread. That process has tests for all the common and tricky attributes like $Text, and most of the attribute classes are designed to handle everything themselves. But, somehow, ExportPathAttribute never got the memo.

It simply didn’t matter, until it did.