SOLID
Contents |
Update
This became a real project, here. Here's a screen shot of the progress as of August 3rd:
The following is just the original proposal for SOLID.
Motivation
It is well understood that users developing dictionaries using programs like Shoebox and Toolbox find it difficult or impossible to keep their use of various codes consistent. This causes them some pain during normal dictionary development, but the pain hits very hard when they try to do something else with their data, such as print out a dictionary or import the data into another program. In effect, their data is an island. It is of limited archival value, and of limited value to the group that actually speaks the language.
Recently, several of us have been talking about going after a useable XML interchange format for dictionary data. (Revision: this has become known as "LIFT"). Indeed, WeSay:Words needs such a format, since it is designed to be a satellite data gathering tool that feeds data to desktop applications.
However, it occurs to us that just having such a standard does not actually give us any data in that standard. The process of getting the average SFM dictionary into that format will be a long and painful one. We know this from experience with bringing data into LinguaLinks and FieldWorks. whether one is importing into one of these applications or giving your data to the program which converts it into LIFT, the process almost exactly the same.
Therefore:
- You don't want to do it more than once
- Once your data is consistent, you want your tools to to keep it that way
- You want a lot of payoff from this process. The joy of testing out something new (e.g. Fieldworks Language Explorer) is probably not enough of an incentive.
The proposal here is to create a package of standards, processes, and a kind of "brand" around a solution for this problem:
- A standard "map" file which describes the use of \codes in the dictionary. This will include mapping those fields to some common ideas (e.g. fields in LIFT) as well as information about what the writing systems are in each field. This map then serves as a schema which can be used to validate future versions of standard format dictionary.
- A graphical tool or set of tools which help users both clean up their own data and create a map which describes it. (Think FieldWorks Language Explorer's SFM Import Wizard, but with more resources put into it).
- A Validation tool which helps you keep your SFM dictionary in shape. Ideally, such a tool would be built into Toolbox.
- A brand, e.g. "SOLID", that can be used to promote the practice of getting dictionaries up to the standard. SOLID sounds like an acronym, but it's not really (yet). Alternative names are very welcome.
- Consultant: "Oh, so you want to produce a dictionary/try out new program X. Ok, well the first step is for us to get your dictionary up to SOLID standard".
Notice that the proposal is explicitly not to try to create some new SFM dictionary standard. Rather, the standard being proposed here is that you can use whatever SFM scheme you want, so long as it is describable by the SOLID map file.
SOLID Overview
The big picture here is that you take your existing files, you go through a process, and in the end you have
- clean, consistent files
- the means to keep them that way
- and your data is now useable for other purposes.
SOLIDification Process
* CC, PERL, Python, Word, etc.
** Same idea as the FLEx SFM import wizard, but separated from FW and enhanced.
What we have in mind here is very close to what FieldWorks already does, in terms of its inputs and outputs. It would be very good to improve this process, as it is currently more painful than it needs to be.
Speaking of FieldWorks, it will benefit greatly if this process is split out from FieldWorks as a separate tool/process. If the notion of SOLID is adopted, people will not feel that it is FieldWorks' fault that they have to go through a lot of pain to get their data consistent. Rather, they should think of SOLID as being a best practice, just like converting to Unicode.
Uses of a SOLID Package




