WeSay News

January 26, 2007

Multi-lingual Option Lists

Filed under: Uncategorized — John Hatton @ 9:45 am

Any dictionary program needs to help the user be consistent when entering field that have a closed set of values, e.g. parts of speech.

FLEx calls these Possibility Lists, Toolbox has Range Sets. WeSay offers is something somewhat different, and yet another name (sorry).

Option Lists are multilingual, and they “fall back” to other languages if there isn’t a value in the language you’ve set it up to show. Here, I’ve set it up to show POS in Thai, but so far only noun and verb have Thai values.

A big thing we work at is forcing our programmer minds to give up on always having “good data”. For example, what if you send your dictionary to someone else, who has a different PartsOfSpeech OptionList installed? Say, you had a pos named “blah blah blah” and then tried to open lexicon up on a machine that didn’t have this pos. What should WeSay do? Show an error box? Not if we can help it. Say everything referenced in the lexicon must be in one large file? That may be hard to share with colleagues. We’ve chosen to use a human readable key that indexes into a richer set of multilingual data. If that data is missing, we do the best we can. This screenshot shows how WeSay will still show the key of the offending item, but mark it in red.

Finally, there is nothing in WeSay yet that really knows about Parts of Speech. They are just a custom field. So you could add something your language needs, like classifiers or noun classes.

Part Of Speech is now available in the default template used by new projects.

Some Nitty-Gritty Details for techies

We are trying to make do with “human readable keys” rather than GUIDs. We use the key to look up stuff about the option, like how to show it in Spanish or what to show for an abbreviation of it. If the list is lost, the key is at least readable.

This is also our foray into “custom fields”, which I’ll blog more about some day. The pos field shown above was defined with this XML:

and the list of options is in a file that looks like this:

January 5, 2007

Multiscribe

Filed under: graphite — Eric Albright @ 3:46 pm

One of the goals of WeSay is to support user interfaces in languages with scripts that require complex shaping. Microsoft Windows has the ability to render complex scripts using their shaping engine called Uniscribe. However, some languages, such as Burmese, are not yet supported by Uniscribe. SIL has Graphite technology that deals with this, but up until now, a programmer had to do lots of difficult, custom work to enable Graphite support in an application on Windows.

Without Graphite, Burmese is all a jumble, showing the underlying characters and their basic forms but without any special contextual forms or reordering.

Burmese shaped without Multiscribe

With a Graphite font and a Graphite shaping engine, we get the correct shapes and orders for the various glyphs.

Sample Burmese shaped with Multiscribe

I spent the last month working on a project we have been calling Multiscribe. With Multiscribe, users can get the benefits of the Graphite shaping engine in their existing applications including Internet Explorer, Firefox, and even Word.

Uniscribe does not yet support Burmese. But now, with Multiscribe, WeSay will be able to properly display Burmese in our user interface.
Multiscribe works by wrapping Uniscribe so that any time an application would have called Uniscribe, the Graphite engine gets a chance to do its work. The Graphite engine only works with Graphite fonts and the rest are passed on to Uniscribe. Linux already has similar functionality in the form of PangoGraphite.

Powered by WordPress