Mark Bernstein: A Tricky Feature

A Tricky Feature

When you copy some text in Tinderbox 7, you copy the text and its styles but not the text links. Being able to copy and paste text links along with the text seems a simple-enough request, and in fact it will be part of Tinderbox 7.5. One tester recently asked the obvious “what took you so long?” question: why was this difficult. Here’s the story.

The macOS text system is built around a class cluster of attributed strings that represent styled text. Clickable text is marked up with a special attribute that tells the system to call back to the view’s delegate when it’s clicked, passing an object that represents the link destination. This is typically a string or a URL object, but can be other kinds of objects too. Tinderbox annotates text links with a special Tinderbox Link object.

This engages one of the oldest and most vexed implementation issues in hypertext system design: whether links should be represented as embedded in the the text or as external references to the text. Embedded links make editing easier; for example, if you insert a character in an HTML file, the ... tags that represent links remain correct. If the links are kept in an external table, you need to update that table whenever the text changes. On the other hand, if you want to know how many links there are on a page, you need to scan the entire HTML file; for the external table, you just look at the table’s size. Good writing on this ancient controversy includes Ted Nelson, Berners-Lee & Cailliau, and a classic paper by Southampton’s Hugh Davis that finally settled the matter.

Tinderbox represents links externally, but embeds copies of the links in the text in order to obtain mouse-click behavior. After you've edited a note, Tinderbox scans the text and updates the link table. This gets tricky because

people sometimes edit long texts, so scanning the whole text can involve a good deal of work;
some documents have lots of links, and some link table tasks scale with the number of links; and
some people — not I! — type rapidly, imposing a tight performance requirement.

So, there's a bit of tricky lazy evaluation and parallel processing involved here.

The problem is that, while copy/paste understand the common cases where the link is described by a string or a URL, they don’t handle the case where you associate a different kind of object with a link. This fact is not documented, and in fact the documentation is internally inconsistent. By placing myself in the shoes of the author of these macOS classes, I developed several hypotheses to explain the failure and coded around them. For example, the text system might need to know that it could safely make extra copies of the link annotation, which means my special Tinderbox link object would need to conform to NSCopyable. This turned out to be a dead end, but it took some time. The upshot is that Tinderbox links are silently stripped before they are copied to the clipboard.

If the system expects an NSString or an NSURL as the link object — some but not all of the documentation suggests that it does — then we might make our own subclass that adds a Tinderbox Link object to the string or NSURL. This would be great — but you are not supposed to subclass NSString or NSURL. (Everyone knows this — I knew it once, too — but that didn't stop me from coding up an implementation before I remembered.)

An alternative approach was to add a custom style attribute that describes the Tinderbox link as a string; when we update links for a text pane, we’d add both the standard NSLinkAttributeName annotation to get mouse click behavior and our custom annotation, too. Copying to the clipboard would strip out our nonstandard link annotations but leave the custom annotation intact; when pasting, we'd recognize the custom annotations and recreate the standard NSLinkAttributeName annotations. Easy and elegant! I was neat, clean, shaved and sober, and I didn't care who knew it.

But I was wrong. Not only does copy: clean out our non-standard NSLinkAttributeName objects, it also ignores our custom annotations. This is a side-effect of the way macOS coerces attributed strings to the pasteboard by converting them to RTF or RTFD; those custom annotations don’t work for RTF.

So, what we do it define an entirely new pasteboard format that contains an XML representation of all the copied text links. Copy does its usual work, then adds the additional data for the links. Paste does its usual work, then grabs this hunk of XML, parses it, validates the results, and uses those results to make new text links. Voila!

Well, not quite. Whenever the text is changed, we do a quick check to make sure that you haven't deleted all the text for a text link. If you have, we need to delete the link, too. And — guess what! — adding the new link annotation counts as changing the text. So, we’d create a new link when pasting, and the system would helpfully delete the new link before we finished adding it. Remember that lazy evaluation and parallel processing? That’s where this scan happens.

This was obvious in retrospect, but what I observed was a Twilight Zone bug: the automated tests worked but manual tests crashed. (The automated tests did things faster than the scavenging system, while the manual tests let the scavenging system catch up.)

The whole business was, in short, a classic example of a task that ought to have been easy, that any reasonable customer would assume to be easy. The final implementation is really not very large or complex. Nevertheless, it required an inordinate amount of work. This is usually the mark of bad code, but I really don’t see what would have made this easier.

on this date |