Archive for November, 2006

Orthography development with WeSay

Wednesday, November 8th, 2006

John Duerksen, who is working with a cluster of Bantu languages, gave us a load of ideas for how WeSay might fit into their process.  I have to write up my notes anyhow, so I’ll do it here in case it sparks discussion.

John helped me think about something I was asked here in Thailand. “How do you start on a dictionary before you have a developed orthography?”  His answer seems to be:

  1. collect some words in a whatever orthography is at hand
  2. use those in the analysis needed to get a working orthography
  3. go back and fix things up
  4. go for broke adding lots of words in the new orthography

As a first step in developing an orthography and promoting literacy in these languages, John has prepared a 1700 word word-list. It comes with Swahili, English, French, and Portuguese glosses, as well as semantic domain tags.

[Feature request: multiple gloss language word lists]

[Feature request: wordlists can contain pre-assigned semantic domains]

Native speakers fill in the equivalent of each word, using the Swahili writing system.  This will have lots of errors and problems, because you can’t really represent the sounds in Swahili.  But that’s ok, this is part of helping people through the process of discovering why they need their own orthography.  The resulting words are printed onto cards which are then used in an orthography-development workshop.

For nouns, they collect both the singular and plural forms.  This is important because the associated prefixes bring out orthography issues.  For verbs, they collect the infinitive form.

[Feature request: need a way to collect various forms and mark which features/paradigm slot each form corresponds to.]

[Feature request: may want a way to encourage elicitation of the desired form]

Later, someone will have to go back through the wordlist and fix the spelling of each word so that it matches the orthography they’ve developed.

[Feature request: support some way of identifying the "spelling status" of each word, and a task for checking/modifying those words which need it.]

John mentioned a few other things he’d like to see:

[Feature request: Collect simple texts and then feed that into a task for collecting/glossing new words.]

[Feature request: enable spell checking and "add-to-lexicon" in MS Word using the WeSay Lexicon]

[Feature request: Support multiple forms; identify and link to the root entry]

CTC 2006 Presentation

Wednesday, November 8th, 2006

Whew.  I’m getting too old for international conferences, or at least ones that entail a full week and a 12 hour time shift.  I just returned yesterday from North Carolina, and of course my head is brimming with bloggable things.  But first, here’s a link to the presentation I gave.  Aiming to follow the expert’s advice, there are almost no bullet points, so you’ll find the script more informative that the PowerPoint.  I plan to make a flash movie of the demo portion and post that.

We didn’t get as much interaction with would-be depolyers as we would have hoped for… mainly because most of the attendies were “computer guys”, not language workers.  But we did get some good ideas from someone working in Africa and some on-the-spot right-to-left script testing. Thanks Beth!