WeSay News

January 21, 2010

Merging LIFT dictionary files

Filed under: Collaboration, FLEx — John Hatton @ 2:18 pm

If you aren’t yet using the new collaboration features of WeSay, you may have multiple versions of your dictionary out there.  Here are a few notes on ways to get them together.

The simplest case is where the users have been working on completely different sets of words, with no overlap. That is, they each started with completely empty dictionaries, which have never once been merged together.  In this specific case, you can merge them by hand.  Do that by opening each .lift file and copying all the <entry>…</entry> chunks of one file in next to the <entry>…</entry> chunks of the other file.    Open in WeSay to make sure you didn’t mess the lift file up.

In the more general case, you will want to merge them together using FieldWorks Language Explorer (FLEx).  To do that, follow these steps:

1) Create a new project using FLEx.

2) Import each .lift file into the project, one at a time, until you have a nice combined dictionary.

image

If getting/installing/using FLEx seems like to much, you can always just ask for someone to do this for you.  Write to the WeSay email list and ask someone to do the merge for you.

5 Comments »

  1. [...] If there are multiple copies of the dictionary out there, you need to do that for each one of them.  That is, get the project, remove it from their computer.  You have an extra step in this case, which is to merge the entries together.  Read these instructions on merging LIFT files. [...]

    Pingback by WeSay News » Testing the New WeSay Collaboration Features — January 21, 2010 @ 2:29 pm

  2. I’ve imported several LIFT files from WeSay into FLEx. One question I have is what happens when two people have modified the same entry. For example, one person may have added a grammatical category where the other has left it blank. The second may have put in an example sentence. And they may both have put in slightly different definitions.

    When FLEx imports the LIFT file, does it combine the information from the two versions, or does it just take one and leave the other? What effect do the different import options have?

    Comment by Richard Gravina — January 28, 2010 @ 4:39 pm

  3. Richard,
    thanks for writing. FLEx does attempt to merge the items together, if they came from the same source. That is, I think, if you enter a word “xyz” and I separately enter the same word, it’s not going to merge them. But in the common case where, say, you add a Part Of Speech to xyz, and I add a new sense, it will merge these changes together.

    About the “trust entry modification times” option, the FLEx documentation says it is for speeding up the import:
    “When selected, date and time metadata are compared between the current file and the import file. If the date/time stamps are identical for an entry, then no additional comparison is done.”

    Since WeSay always updates the modified date, this is a safe thing to enable. In contrast, if you ran the LIFT file through a CC table or did a Search/Replace from Notepad, then it would *not* be safe to enable it, because although the data on the entries would be changed, the modified dates would be unchanged.

    Comment by John Hatton — January 29, 2010 @ 5:39 am

  4. I did try and import Wesay LIFT files in Flex 6.0.1. I later heard that these import problems were fixed in Flex 6.0.3. The file is too large for me to download so I am waiting for someone to had carry the installer file to me later in the month. Then I will be able to see if the newer version works better with LIFT files.

    Comment by Jeff Shrum — May 10, 2010 @ 4:08 am

  5. A LIFT file from Flex and WeSay store unique identifiers (GUIDs) on each entry and sense. At this point Flex will only attempt to merge entries or senses that have identical GUIDs. When a LIFT file is imported, Flex tries to merge data with matching GUIDs. If a definition or other field is missing in the database, but is present in the LIFT file, it will be added to the database. If any extra items for sequences (e.g., semantic domains, senses, examples) exist in the LIFT file, they will be added to whatever is already present in the database. If a field is present in the database but not in the LIFT file, the field will not be changed in the database. Thus you cannot use a LIFT import to remove anything from the database. If a field has different non-empty content in the database than in the LIFT file, then the import process uses one of three settings you choose at the beginning of the import.
    1) The field is not changed in the database, thus skipping the LIFT field.
    2) The field in the database is replaced with the field from the LIFT file.
    3) A duplicate entry or sense will be created so that you can see the contents later and merge them manually.
    If entries or senses in the LIFT file do not have identical GUIDs, then the entire entry or sense is added to the database. Thus, if two users develop lexicons independently, a LIFT import will typically have many duplicate entries or senses since the GUIDs will all be different. Likewise, if a LIFT file is created by some other program such as Paratext, Lexique Pro, or Solid, the LIFT file will likely not have GUIDs, but even if it does, they would be entirely different, so the import would simply import all the data as separate entries with no attempt to merge.
    Another option can speed up import by trusting the modification dates on entries. If the modification dates for an entry are the same in the database and the LIFT file, the entry is automatically skipped without trying to analyze and merge all of the fields in the entry.

    Comment by Ken Zook — June 25, 2010 @ 9:53 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress