How to sort using a custom sort sequence
Using the WeSay Configuration tool, there are three options for indicating how a given writing system should be sorted.
Contents |
Same as a major language
The easiest way is to say that the writing system should be sorted in the same way as one of the major languages and choose one from the list of languages. The list of languages comes from the system and may very between versions of operating systems.
Simple rules
For many languages, simple rules can be written with up to three levels of minor variation (typically base, accented, and caps). A sequence of characters is considered a single collation element.
Primary distinctions
A primary distinction is the strongest difference between collation elements. Dictionaries are usually divided into different sections by the primary distinctions which are usually base characters.
Primary distinctions are listed on separate lines with the collation element(s) on the first line sorting before those on following lines.
Given the following sort rule:
a A b B e E t T
The following strings are ordered.
bat bet Bat BAT Bet BET
Secondary distinctions
Secondary distinctions allow collation elements to be considered similar (by giving them the same primary distinction) yet still retain differences when there are no primary distinctions to further distinguish between characters. A secondary difference is ignored when there is a primary difference anywhere in the strings. The difference between an accented character and it's base character is usually considered a secondary difference.
Secondary distinctions are listed within a line (usually separated by space).
Given the following sort rule:
e é m r R s u
The following strings are ordered.
resume résumé Resume Résumé resumes résumés Resumes Résumés
Tertiary distinctions
Tertiary distinctions allow one more level of distinction in the same manner as secondary distinctions. Tertiary distinctions are usually between the case of a character, such as the difference between characters é and É.
Tertiary distinctions are surrounded by parenthesis within the secondary distinctions.
Given the following sort rule:
(e E) (é É) m (r R) s u
The following strings are ordered.
resume Resume résumé Résumé resumes Resumes résumés Résumés
Character escaping
Characters may be escaped by using \uXXXX (where XXXX is the hexadecimal value of the unicode code point):
(a A) (\u00e0 \u00c0) (b B) ... ... (n N) (ng Ng NG) ...
ICU custom rules
When more power is needed, ICU tailoring can be used. Instructions for writing ICU rules can be found here. A simple introduction to ICU rules can also be found in section 8.1 of this document.
