String.prototype.normalize, case folding and sort keys

# Nebojša Ćirić (12 years ago)

String.prototype.normalize(form) spec is here. It offers all 4 forms of normalization.

We did mention additional CF and CFNKFC forms for case folding, but they were not added to the spec. They case fold string in a locale independent way (see www.unicode.org/faq/casemap_charprop.html#2).

Should we:

  1. Add those two new forms to the spec of String.prototype.normalize(form) method?
  2. Add a new String.prototype.toFoldCase(form) method?
  3. Add Intl.Collator.prototype.sortKey(string) -> string method?

We could do 1 and 3, or 2 and 3, or just 3.

Use case would be: user inputs M words, and we would like to see if some of them match N predefined words (say to trigger an action). With current Intl.Collator.prototype.compare() we need MxN comparisons. With toFoldCase/sortKey we would need only O(M) queries to the hash with N keys.

Mihai and I lean towards 3. because it gives more control to the user on what you want to check. For example, it doesn't make sense to ignoreCase for locales that don't have case distinction. Or user may want to preserve accents in the comparison...

# Nebojša Ćirić (12 years ago)

[+mscherer]

# Allen Wirfs-Brock (12 years ago)

Also see String.prototype.toLowerCase.

In my working draft, the paragraph that immediately follows the algorithm has been modified to read:

The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).

This change is in response to ecmascript#206

Does this sufficiently cover the locale independent case folding use case?

# Nebojša Ćirić (12 years ago)

Having sort keys in the collator would allow user to be more flexible in comparing strings, but your* approach is good enough for now.

* toUpperCase spec as it stands