Author Archive

Using WeSay from other applications

Wednesday, January 16th, 2008

Recently, we were asked to make a way for a user of a translation program to make use of WeSay, without leaving the program they’ve been trained on. The native speaker-user will want to:

  • See which words are missing from the dictionary, and add them along with a definition.
  • Jump into the entry screen for a word in WeSay to do more advanced editing.
  • Point to a word and see a list of similar words they might choose instead (thesaurus lookup).

A linguist working with the group will want to:

  • Click on an unfamiliar word and see the full dictionary article for it.

If the language has any affixation, both users will need to be able to:

  • see a list of entries, ordered by how similar their spellings are to the word be investigated.
  • find words based on their inflected/derived forms, not just by the citation form in the dictionary.
  • add variants to the word so that it is clear that this form is covered by the dictionary, and make it easier to lookup next time.

The first round of this work is now available for other application developers to use (the italicized bullet items above will come in some future version).

To help developers add these features to their programs, I’ve built a little sample application so they can see what’s possible and how to do it. Here’s a little crummy video showing it:

[kml_flashembed movie="http://www.wesay.org/downloads/movies/dsDemo/dsDemo.swf" height="464" width="512" scale="noborder"/]

A few technical details for developers

Currently, I’ve implemented support for .net applications to make use of these services. But support via any language, via xml-rpc, should be easy to add when needed. All .net applications need to do is get our Palaso library and use the DictionaryAccessor class. You currently need to tell it where on the user’s machine to find WeSay, and where the dictionary is that you’ll be accessing.

Here’s some code to show what it takes add this ability to a .net application:

Getting some HTML of matching entries to show in a WebBrowser control

DictionaryAccessor dictionary = new DictionaryAccessor("c:docsnoosupunoosupu.lift", "c:program fileswesaywesapp.exe");

string[] forms;

string[] ids;

dictionary.GetMatchingEntries(writingSystemIdForWords, "foobar",

       FindMethods.Exact, out ids, out forms);

string html = dictionary.GetHtmlForEntries(ids);

 

Adding a new word

dictionary.AddEntry(writingSystemIdForWords, wordBox.Text,

                    writingSystemIdForDefinitions,definitionBox.Text,

                    writingSystemIdForWords, exampleBox.Text);

Ok, so you get the idea that this will be a very easy service to add to your .net program.

A plug here for .net 3’s WCF (Windows Communication Framework), which made implementing this a very nice experience.

Update:  This has now been re-written to use cross language, cross platform “XML-RPC”messaging.  So programs written in non-.net languages can now participate.

Add Semantic Domains using WeSay

Monday, November 19th, 2007

bilumbaby2 WeSay now lets the user edit the semantic domains of a sense from the Dictionary Browse and Edit tab, as an alternative to gathering words using the Gather By Semantic Domain task.

To see how this works, let’s add some of the domains that would apply to a Papua New Guinean bilum

2007-11-19_13-26-38-306

To look for a domain, we click in a box and start typing. First, we start typing "crafts" and see a domain matching that word:

2007-11-19_13-10-48-146

Next, since bilums are used to transport firewood, we type that in.  A promising domain appears, fuel.  But is that right? We don’t want to say that bilums are something you burn.  Happily, when we point to the word Fuel, WeSay displays a description confirming that domain is also for things you use in collecting fuel.

2007-11-19_13-16-51-764

Finally, since at least one of my kids has slept in a bilum, we should find a domain for that.  Here we’ll pretend we know the domain number, and just start typing that until we see Bed:

2007-11-19_13-18-53-958

I had fun finding domains for a few words. I hope you do too.

Simple and Advanced Sorting

Monday, November 12th, 2007

One of the last big features for version 1 of WeSay has been in a for while.  Someone (I won’t mention any names) did a great job on it but didn’t blog about it.  So I’ll see if I can do it justice.

In this screen shot we see the three ways you can now specify sorting:

2007-11-09_16-45-54-413

Sort like another language

If the text sorts just like some major language, just select that language in the list and you’re done.

Custom Simple

Many languages based on Latin characters introduce a small number of "special characters" used to represent sounds not covered by A-Z, like a barred i. In these situations, you can specify the rules just like you do in many existing apps, like Toolbox and Lexique Pro. When you choose "custom simple", the rules box is filled with rules needed to sort English. You can enter vernacular works in the "Test Sort" area:

2007-11-09_16-56-07-999

We want the barred-i to sort just after i, so we add it to the rules and click the button:

2007-11-09_16-58-41-205

Normally, these secondary distinctions are enough.  But for some languages, tertiary distinctions are needed. We get these in the simple rules by using parentheses. Consider this list of words:

2007-11-09_17-12-49-809

Now, imagine we want the upper-case words to sort together.  We need to add in another level of distinction, so that case can trump the accents.  We do this by adding parentheses around all case pairs, and putting the two sets of e’s on the same line:

2007-11-09_17-17-23-560

Eric has written up the details on our wiki.

Custom ICU rules

For languages that need them, WeSay also supports ICU tailorings, which look like this:

& C < č <<< Č < ć <<< Ć   –for Serbian (Latin) or Croatian

Like many features of WeSay, this simple-to-advanced collating actually lives in our "Palaso Library", which is of course open-source and can be included in other programs.  Thus we foresee a day soon when the setup you do in one program (e.g. WeSay) will be trivially usable in other language-development tools.

Happy sorting!