• Home
  • Screenshots
  • Downloads
  • FAQ
  • Help/Contact
  • Blog
  • Development
  • How to sort using a custom sort sequence

    Using the WeSay Configuration tool, there are three options for indicating how a given writing system should be sorted.

    Contents

    Same as a major language

    The easiest way is to say that the writing system should be sorted in the same way as one of the major languages and choose one from the list of languages. The list of languages comes from the system and may very between versions of operating systems.

    Simple rules

    For many languages, simple rules can be written with up to three levels of minor variation (typically base, accented, and caps). A sequence of characters is considered a single collation element.

    Primary distinctions

    A primary distinction is the strongest difference between collation elements. Dictionaries are usually divided into different sections by the primary distinctions which are usually base characters.

    Primary distinctions are listed on separate lines with the collation element(s) on the first line sorting before those on following lines.

    Given the following sort rule:

    a
    A
    b
    B
    e
    E
    t
    T
    

    The following strings are ordered.

    bat
    bet
    Bat
    BAT
    Bet
    BET
    

    Secondary distinctions

    Secondary distinctions allow collation elements to be considered similar (by giving them the same primary distinction) yet still retain differences when there are no primary distinctions to further distinguish between characters. A secondary difference is ignored when there is a primary difference anywhere in the strings. The difference between an accented character and it's base character is usually considered a secondary difference.

    Secondary distinctions are listed within a line (usually separated by space).

    Given the following sort rule:

    e é
    m
    r R
    s
    u
    

    The following strings are ordered.

    resume
    résumé
    Resume
    Résumé
    resumes
    résumés
    Resumes
    Résumés
    

    Tertiary distinctions

    Tertiary distinctions allow one more level of distinction in the same manner as secondary distinctions. Tertiary distinctions are usually between the case of a character, such as the difference between characters é and É.

    Tertiary distinctions are surrounded by parenthesis within the secondary distinctions.

    Given the following sort rule:

    (e E) (é É)
    m
    (r R)
    s
    u
    

    The following strings are ordered.

    resume
    Resume
    résumé
    Résumé
    resumes
    Resumes
    résumés
    Résumés
    


    Character escaping

    Characters may be escaped by using \uXXXX (where XXXX is the hexadecimal value of the unicode code point):

    (a A) (\u00e0 \u00c0)
    (b B)
    ...
    ...
    (n N)
    (ng Ng NG)
    ...
    

    ICU custom rules

    When more power is needed, ICU tailoring can be used. Instructions for writing ICU rules can be found here. A simple introduction to ICU rules can also be found in section 8.1 of this document.

    This page was last modified 09:59, 24 September 2007. This page has been accessed 607 times.