Collaborative Work and Openness

I hope you forgive me for letting this blog start with a rant into the new year. But this topic actually has bothered me for some time now, so I'll hope you will bear with me.

Wikis are well known today, even though most only know it from Wikipedia the biggest wiki there is. Most will know about the fact that Wikipedia is community driven and a "collaboratively edited encyclopedia to which you can contribute". Most will agree that this concept is, or at least was, radical at this time. But most will also agree that it (somehow) works.

Wikipedia is not the only online collaborative project, in fact many other projects work that way. And to those I would add the Japanese and Chinese dictionaries EDICT, CEDICT, HanDeDict, to name few. There are though various degrees of how the collaborative concept is employed or enforced. And one particular implementation has me up in arms, the Chinese-German dictionary HanDeDict.

The HanDeDict team actually did a very good job in the past creating a dictionary under a Creative Commons license out of nothing. The license was well chosen; too many other projects actually try to come up with their own wording, which leads to nothing. A bootstrapping process made sure you would actually find good words from the beginning on. Their concept of having the online folks help out was pretty future-oriented considering how many people in some parts of academia view this thing "Internet".

I am not sure where the project is now, today, though. The early discussion board was moved to Google Groups out of SPAM reasons. The group is moderated and posts seem to only seldom go trough, it is practically dead. Large deletions of entries that where copyright infringements added by a single user leave many basic entries missing. Communication to the outside is basically nonexistent, and criticism, from my point of view, only marginally considered.

In particular I remember adding some special entries, as their reading have very peculiar forms. I am not clear today if it was ê/ei or n/ng, but I remember adding a bunch of entries for one of this form, if not all. Today a short SQL query comes up with these sad remains:

sq lite> select * from HanDeDict where Reading like 'ê%' limit 30 offset 0;
sq lite> select * from HanDeDict where Reading like 'ei_' limit 30 offset 0;
誒|诶|ei1|/He! Hey! (u.E.) (Int)/|
誒|诶|ei1|/Hey! Hi! He! (u.E.) (Int)/|
sq lite> select * from HanDeDict where Reading like 'n_' limit 30 offset 0;
sq lite> select * from HanDeDict where Reading like 'ng_' limit 30 offset 0;
嗯|嗯|ng2|/Interjektion: erstaunt fragend - Hä? (u.E.) (Int)/|1
哼|哼|ng5|/(drückt Unzufriedenheit oder Zweifel aus) (u.E.) (Int)/|2

For me it is sad to see my contribution lost, no way I can get it back easily. I took the time to consult two dictionaries to fill in the missing information. And I am particular sad as I am missing the tools to research this removal of content: my proposition in the past to use a wiki-like editing tool was turned down with the words "the Wikipedia principle does not work for a dictionary". Well, Wikipedia would allow me though to find out why my contribution was lost.

The fact that contribution to HanDeDict is not open makes it impossible for outsiders to judge and control content, coordinate their own work, or develop a responsibility for the project's content.

I already considered forking this project by moving stuff to a MediaWiki installation, but I have to admit that this task moved pretty far behind other more urgent things on my list. I can currently only hope that either the HanDeDict people come back to revive the project or somebody else is willing to fork the project. If you decide to, I'll be happy to help.

Wie funktioniert dieser

Wie funktioniert dieser Prozess denn bei EDICT und Co.? Ich kann mir vorstellen das für Wörterbücher ein moderiertes System besser funktioniert als ein free-for-all-Wiki, aber das haben die Wikipedialeute ja früher auch gedacht und es war falsch...

EDICT/CEDICT

Bei EDICT kenne ich mich nicht aus, mag sein, dass die Einträge manuell eingefügt werden und dort einer Kontrolle durchlaufen. CEDICT ist recht konservativ. Einige wenige haben Autorenrechte, jeder kann aber Einträge einsenden (ich glaube Mail ist der Standardweg). Was ich aber wichtig finde ist die anzeige von Diffs bei CEDICT, man sieht also was geändert wurde. HanDeDict hat ein Rechteschema (anonymer, angemeldeter, geprüfter Mitarbeiter); anonyme dürfen nur ungeprüftes ändern, aber haben dann auch auf den ungeprüften Teil vollen Zugriff und können damit auch solche Einträge löschen. Die Last des Prüfens liegt dann auf einigen wenigen, und das hat offentsichtlich immer wieder versagt. Wikipedia heißt ja nicht nur "alle editieren", sonder auch "alle sehen", und zwar mittels einer guten Technik von History, Diffs, Watchlist. Ich für meinen Teil nutze gerne die Watchlist, um meine eigene Qualität zu verbessern. Ändert jemand meine Änderungen, dann sehe ich was fehlte, oder kann im Zweifel mit dem Benutzer diskutieren.