Wikipedia’s problems are serious. Some of them are subtle. Some are simply caused by a jury-rigged system of governance, but others are deeply embedded in the Wiki Way and the ideology of crowdsourcing. Let’s look at how hard things can become.
Categories are a good idea that lie at the heart of efforts to build a semantic Web. They’re also very helpful for finding stuff, and finding stuff is one of the core problems of building a reference work that works. It’s easy to imagine that faulty categories in Wikipedia, like the notorious Filipacchi mess in which writers being systematically deleted from “American novelists” and relegated to “American women novelists,” are the result of idiots who weren’t paying attention.
But it’s harder than that. Let’s take an example that has caused endless grief, that appears to be intractable, and yet must be faced somehow.
Jew tagging. It’s not unreasonable to mention religion in some biographies. This is obviously true of notable religious leaders and philosophers and for people who are notable because of their religious beliefs. I think it’s also clear for people whose public role was shaped by their religious affinity. Hank Greenberg, Hall of Fame first baseman for the Tigers, was a Jew and that was important to understanding his position in American life. And sometimes it’s simply interesting and fun to know that Cary Grant, Theda Bara, Winona Ryder, Edward G.Robinson and Lauren Bacall were or are Jewish.
Perhaps as a result of this, some Wikipedia editors systematically seek out every mention of a person who might be Jewish in order to mark them as Jews. This is extremely creepy.
Windows into mens’ souls. If we are going to label all the Jews, one presumes we will also want to label adherents to other religions. But some religions require belief; how can we know what someone once believed? We’re left to rely on the testimony of “reliable published sources,” but often these sources have no idea, either.
Dirty Data. A misfiled page is a lost page; it’s important to classify correctly. So, all we need to do is to decide who is and isn’t a Jew, a question that has perplexed wise people for millennia. Is the Pope Polish? Is Aix-la-Chapelle French? Is Kipling a 20th-century author? Is Pluto a planet? Borges nailed this one: your categories are bound to be arbitrary and incomplete, and you still won’t know where things belong.
Cui Bono? If we’re building indexes and taxonomies to provide access to knowledge — and we should — we need to make lists . Making a comprehensive list of Jews raises awful echoes. So should we skip the Jews? That raises echoes, too. Should we stop categorizing religions? Then we lose information we need to understand Hank Greenberg and that’s fun to know about Cary Grant, and we have the same sort of headache when we use race, or ethnicity, or language, or nationality.
The crazies. If you watch a page about any famous Jewish person — living, dead, or even a fictitious character — you’ll see the crazies from time to time.
Seventeen-year-old James Gatz, hailing from rural San Diego, California, where he was born to a poor Jew farming family in 1890, despises the limitations of poverty so much he drops out of Stripclub school.
Some editors, for example, try to change the first sentence of every possible Democratic politician to read, “____ is a Jewish-American politician”. The crazies often get reverted quickly, though in aggregate this takes an enormous amount of work and distorts the entire Wikipedia administrative process. But putting Rahm Emanuel and George Soros on your watch list for a few months is bound to alarm you about resurgent anti-Semitism.
I suspect that some of the people most interested in tagging all the Jews in Wikipedia dream of a time when those lists might be of service to the police. Other might be interested in establishing who Israel can exclude, or keeping their rivals outside the pale. Some simply want to recruit for their team. It’s a sordid mess.
Then, there are people with axes to grind, people seeking an edge. Did someone’s parents have a connection to the Irgun? Then they’re “the son of terrorists”. Did someone grow up in occupied Europe? Then they’re “collaborators”. Did someone emigrate from a disputed territory – someplace in the Balkans, for example? Be prepared for endless turf wars between ethnic factions of whom you’ve never heard. Did someone once allude in print to their silly schoolboy prank? Expect a full section of encyclopedic coverage.
Some of this is silliness, but some is consequential. Just walk into the wikipedia bar and start a nice conversation about whether Turkey is European. Remember to duck.
The pajamas are worse than the crazies. Many are dedicated: they have lots of time and they have cause and they know their right-wing cause is true. Plenty, like the just-banned Qworty, have years of experience gaming the administrative system. One Qworty can, over time, plant bad categories or remove good categories from hundreds of papers. Many of the Pajamas are terribly eager to justify things like Creation Science, things about which they care deeply and which most people find idiotic distractions. So, bit by bit, the tinfoil hats keep planting cruft throughout Wikipedia. And one by one, sensible editors stop trying.
Crowds make things worse. Lots of conventional taxonomies — from natural languages to library classification schemes — use conventions that seem absurd. In Japanese, for example, I understand that different words are used to count different kinds of things. But you count rabbits in the same way as sparrows; rabbits are counted like birds.
The Dewey Decimal System shelves books geographically, allocating space according to the 19th century canon. This causes little harm because we know it’s just a convention. Ancient Greek Literature gets 880-888 (and modern Greek gets 889), while all of East and Southeast Asia for all of time gets crammed together in 895. It’s silly, but we know it’s just a convention that some guy cooked up one afternoon and just as meaningless as “rabbits are birds.”
But if you start to believe the wisdom of crowds, it’s not a convention any more. And if you’re confident that the sensible people will eventually bring good sense and taste to the table, I’ll remind you again of that little unpleasantness we had during the short 20th century. Crowds can be spectacularly unwise; the great state of Wisconsin, home of Progressivism and land of La Follette, sent us Joe McCarthy.
Wikipedia classifications don’t evolve. A combination of primitive tools, uneducated classifiers, and plentiful ideologues means that it’s very hard to change schemes and, when a change (like “American women novelists”) doesn’t work, it’s very hard to undo.
It takes too long. The world has idiots, ideologues, tinfoil hats and pajamas aplenty. Each of them is eager to plant their harpoon wherever they can. Try to stop them, and you’ll spend all your time at AN/I and ArbCom and Lord Knows Where trying to tell it to the judge. It can take years to chase away even the most egregious offenders, and the true believers can, once banned, be back the next day with a fresh account from a new ISP. Sure, if they’re insane they’ll grind the same axe and be caught, but any intelligent troll will simply find a new harpoon.
The appetite of the tinfoil hats and the crazies to indulge in anti-Semitic claptrap is alarming. Wikipedia’s tolerance for talented trolls can be great, especially if the trolls know how to frame their case.
There’s plenty of good in Wikipedia. But you can’t trust it – not without checking. The good is constantly under attack, and the amount of work required simply to avoid letting an article deteriorate increases with the interest in that article. There are lots of routes to enforce policies, but they all consume time and they all are slanted heavily in favor of practiced hands with plenty of time, which is to say that the pajamas and the crazies are likely to prevail as long as they keep their heads. The ones who don’t are often so consumed with rage over esoterica (the use of the digamma is one fellow’s bête noire) that they’re easily smoked out. Like Qworty, a sane troll should be able to persist for years.
So we’re not going to get a useful classification system from Wikipedia. Not even for access to the Wiki, but certainly not for the semantic Web.
More urgently, the survival of the Open Web might depend on convincing Google and Bing to consider deweighting wikipedia pages in some circumstances. I’d suggest looking at the edit history; if a page has lots of recent edits, put it in the penalty box and demote its page rank until things calm down. That would limit the incentive for the crazies and the pajamas, and perhaps help cut down the funding I expect they receive from the right-wing noise machine.