ECMAScript collation question

# Norbert Lindenberg (13 years ago)

I changed the subject because this question also affects the ECMAScript Language Specification.

Section 15.5.4.9, String.prototype.localeCompare (that), has said since ES3: "the function is required ... and to return 0 when comparing two strings that are considered canonically equivalent by the Unicode standard." ecma-international.org/ecma-262/5.1/#sec-15.5.4.9

I assume this requirement goes back to Unicode Technical Standard #10, Unicode Collation Algorithm, whose conformance clause C1 says (and has said since 1999): "Given a well-formed Unicode Collation Element Table, a conformant implementation shall replicate the same comparisons of strings as those produced by Section 4, Main Algorithm. In particular, a conformant implementation must be able to compare any two canonical-equivalent strings as being equal, for all Unicode characters supported by that implementation." unicode.org/reports/tr10/#Conformance

How can the default behavior of ICU be reconciled with this conformance clause?

I brought up the issue of collation and normalization before, but didn't get much feedback: esdiscuss/2012-June/thread.html#23568

Thanks, Norbert

I changed the subject because this question also affects the ECMAScript Language Specification.

Section 15.5.4.9, String.prototype.localeCompare (that), has said since ES3: "the function is required ... and to return 0 when comparing two strings that are considered canonically equivalent by the Unicode standard."
http://ecma-international.org/ecma-262/5.1/#sec-15.5.4.9

I assume this requirement goes back to Unicode Technical Standard #10, Unicode Collation Algorithm, whose conformance clause C1 says (and has said since 1999): "Given a well-formed Unicode Collation Element Table, a conformant implementation shall replicate the same comparisons of strings as those produced by Section 4, Main Algorithm. In particular, a conformant implementation must be able to compare any two canonical-equivalent strings as being equal, for all Unicode characters supported by that implementation."
http://unicode.org/reports/tr10/#Conformance

How can the default behavior of ICU be reconciled with this conformance clause?

I brought up the issue of collation and normalization before, but didn't get much feedback:
https://mail.mozilla.org/pipermail/es-discuss/2012-June/thread.html#23568

Thanks,
Norbert

On Aug 30, 2012, at 15:17 , Nebojša Ćirić wrote:

> Hi,
>  my implementation fails this collation test:
> 
> intl402/ch10/10.3/10.3.2_CE.js
> 
> for this pair (a+umlaut+underdot):
> 
> "ä\u0323", "a\u0323\u0308"
> 
> If I turn normalization on then test passes.
> 
> Mandatory normalization introduces higher processing cost (up to 30% slower in ICU). ICU team decided to avoid normalization for some locales where they don't expect problematic characters to occur.
> 
> My question is, do we want normalize all strings by default or not, in compare() method? I think we said no to default normalization at one of the i18n meetings, but I am not 100% sure.
> 
> -- 
> Nebojša Ćirić
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# Mark Davis ☕ (13 years ago)

ICU is always able to compare them as being equal, just by setting the parameter.

Even if the parameter isn't set, it uses an FCD sort (see unicode.org/notes/tn5) and canonical closure, which handles most cases of canonical equivalence. The default is turned on for languages where the normal+auxiliary exemplar sets contains characters that would show a difference even with an FCD+closure sort, and can be turned on always if desired (at some cost in performance; 30% sounds high though).

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

ICU *is* always able to compare them as being equal, just by setting the
parameter.

Even if the parameter isn't set, it uses an FCD sort (see
http://unicode.org/notes/tn5/) and canonical closure, which handles most
cases of canonical equivalence. The default is turned on for languages
where the normal+auxiliary exemplar sets contains characters that would
show a difference even with an FCD+closure sort, and can be turned on
always if desired (at some cost in performance; 30% sounds high though).

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Thu, Aug 30, 2012 at 6:30 PM, Norbert Lindenberg <
ecmascript at norbertlindenberg.com> wrote:

> In particular, a conformant implementation must be able to compare any two
> canonical-equivalent strings as being equal, for all Unicode characters
> supported by that implementation."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120830/48cbcdfc/attachment.html>

# Nebojša Ćirić (13 years ago)

This is what Markus had to say (he implemented most of the collation for ICU):

"www.unicode.org/reports/tr10/#Avoiding_Normalization

Step 1 of the algorithm: www.unicode.org/reports/tr10/#Step_1 which has a note:

Conformant implementations may skip this step *in certain circumstances: *see *Section 6.5, Avoiding Normalizationwww.unicode.org/reports/tr10/#Avoiding_Normalization

for more information.

See also www.unicode.org/reports/tr10/#Parametic_Tailoring -> attribute "normalization", see the description there (this whole table 14 will soon move to the LDML spec, leaving only a link in this place)"

So the question is:

Do we change i18n API default for normalization to always be true, with some performance penalty?
Update ES 262 spec with info Markus passed (if possible)?

2012/8/30 Mark Davis ☕ <mark at macchiato.com>

This is what Markus had to say (he implemented most of the collation for
ICU):

"http://www.unicode.org/reports/tr10/#Avoiding_Normalization

Step 1 of the algorithm: http://www.unicode.org/reports/tr10/#Step_1
which has a note:

   - Conformant implementations may skip this step *in certain
   circumstances: *see *Section 6.5, Avoiding
Normalization<http://www.unicode.org/reports/tr10/#Avoiding_Normalization>
   * for more information.

See also http://www.unicode.org/reports/tr10/#Parametic_Tailoring
-> attribute "normalization", see the description there
(this whole table 14 will soon move to the LDML spec, leaving only a link
in this place)"

So the question is:

1. Do we change i18n API default for normalization to always be true, with
some performance penalty?
2. Update ES 262 spec with info Markus passed (if possible)?


2012/8/30 Mark Davis ☕ <mark at macchiato.com>

> ICU *is* always able to compare them as being equal, just by setting the
> parameter.
>
> Even if the parameter isn't set, it uses an FCD sort (see
> http://unicode.org/notes/tn5/) and canonical closure, which handles most
> cases of canonical equivalence. The default is turned on for languages
> where the normal+auxiliary exemplar sets contains characters that would
> show a difference even with an FCD+closure sort, and can be turned on
> always if desired (at some cost in performance; 30% sounds high though).
>
> Mark <https://plus.google.com/114199149796022210033>
> *
> *
> *— Il meglio è l’inimico del bene —*
> **
>
>
>
> On Thu, Aug 30, 2012 at 6:30 PM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
>
>> In particular, a conformant implementation must be able to compare any
>> two canonical-equivalent strings as being equal, for all Unicode characters
>> supported by that implementation."
>
>
>


-- 
Nebojša Ćirić
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120831/7f4b55cc/attachment.html>

# Norbert Lindenberg (13 years ago)

OK, so the Unicode conformance question hinges on "must be able to do" versus "must do".

The question for ECMAScript then is whether we should stick with "must do" (the current state of the specifications) or change to "must be able to do".

The changes for "must be able to do" would be:

In the Language specification, remove the description of String.prototype.localeCompare, and require implementations to follow the Internationalization API specification at least for this method, or better provide the complete Internationalization API. That way, localeCompare acquires support for the normalization property in options, and the -kk- key in the Unicode locale extensions.
In the Internationalization API specification, make support for the normalization property and the -kk- key mandatory (it's currently optional), but drop the separate requirement that canonically equivalent strings compare as 0.

This would give applications control over the trade-off between performance and full canonical equivalence, and let implementations select the default per locale.

But trading off correctness for performance in this way doesn't seem quite right. Especially for search usage, it could mean that you're staring at a Vietnamese or Arabic word in a list and the search functions says it's not there because you typed an indistinguishable but different string into the search box.

Thanks, Norbert

OK, so the Unicode conformance question hinges on "must be able to do" versus "must do".

The question for ECMAScript then is whether we should stick with "must do" (the current state of the specifications) or change to "must be able to do".

The changes for "must be able to do" would be:

- In the Language specification, remove the description of String.prototype.localeCompare, and require implementations to follow the Internationalization API specification at least for this method, or better provide the complete Internationalization API. That way, localeCompare acquires support for the normalization property in options, and the -kk- key in the Unicode locale extensions.

- In the Internationalization API specification, make support for the normalization property and the -kk- key mandatory (it's currently optional), but drop the separate requirement that canonically equivalent strings compare as 0.

This would give applications control over the trade-off between performance and full canonical equivalence, and let implementations select the default per locale.

But trading off correctness for performance in this way doesn't seem quite right. Especially for search usage, it could mean that you're staring at a Vietnamese or Arabic word in a list and the search functions says it's not there because you typed an indistinguishable but different string into the search box.

Thanks,
Norbert


On Aug 31, 2012, at 8:24 , Nebojša Ćirić wrote:

> This is what Markus had to say (he implemented most of the collation for ICU):
> 
> "http://www.unicode.org/reports/tr10/#Avoiding_Normalization
> 
> Step 1 of the algorithm: http://www.unicode.org/reports/tr10/#Step_1
> which has a note:
> 	• Conformant implementations may skip this step in certain circumstances: see Section 6.5, Avoiding Normalization for more information.
> See also http://www.unicode.org/reports/tr10/#Parametic_Tailoring
> -> attribute "normalization", see the description there
> (this whole table 14 will soon move to the LDML spec, leaving only a link in this place)"
> 
> So the question is:
> 
> 1. Do we change i18n API default for normalization to always be true, with some performance penalty?
> 2. Update ES 262 spec with info Markus passed (if possible)?
> 
> 
> 2012/8/30 Mark Davis ☕ <mark at macchiato.com>
> ICU is always able to compare them as being equal, just by setting the parameter.
> 
> Even if the parameter isn't set, it uses an FCD sort (see http://unicode.org/notes/tn5/) and canonical closure, which handles most cases of canonical equivalence. The default is turned on for languages where the normal+auxiliary exemplar sets contains characters that would show a difference even with an FCD+closure sort, and can be turned on always if desired (at some cost in performance; 30% sounds high though).
> 
> Mark
> 
> — Il meglio è l’inimico del bene —
> 
> 
> 
> On Thu, Aug 30, 2012 at 6:30 PM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> In particular, a conformant implementation must be able to compare any two canonical-equivalent strings as being equal, for all Unicode characters supported by that implementation."
> 
> 
> 
> 
> -- 
> Nebojša Ćirić

# Mark Davis ☕ (13 years ago)

I think we could go either way. It depends on the usage mode.

The case where performance is crucial is where you are comparing gazillions of strings, such as records in a database.
If the number of strings to be compared is relatively small, and/or there is enough overhead anyway, the performance win by turning off full normalization would be lost in the noise.

So if #2 is the expected use case, we could require full normalization.

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

I think we could go either way. It depends on the usage mode.

   1. The case where performance is crucial is where you are comparing
   gazillions of strings, such as records in a database.
   2. If the number of strings to be compared is relatively small, and/or
   there is enough overhead anyway, the performance win by turning off full
   normalization would be lost in the noise.

So if #2 is the expected use case, we could require full normalization.

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <
ecmascript at norbertlindenberg.com> wrote:

> The question for ECMAScript then is whether we should stick with "must do"
> (the current state of the specifications) or change to "must be able to do".
>
> The changes for "must be able to do" would be:
>
> - In the Language specification, remove the description of
> String.prototype.localeCompare, and require implementations to follow the
> Internationalization API specification at least for this method, or better
> provide the complete Internationalization API. That way, localeCompare
> acquires support for the normalization property in options, and the -kk-
> key in the Unicode locale extensions.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120831/86039cb9/attachment.html>

# Norbert Lindenberg (13 years ago)

I think #2 is far more common for ECMAScript - typical use would be to re-sort a list of a few dozen or at most a few hundred entries and then redisplay that list. #1 might become more common though as JavaScript use on the server progresses.

So here's an alternative spec approach:

Leave the specification of String.prototype.localeCompare as is. That is, if it's not based on Collator, canonical equivalence -> 0 is required.
For Collator.prototype.compare, require that canonical equivalence -> 0 unless the client explicitly turns off normalization (i.e., normalization is on by default, independent of locale). Support for the normalization property in options and the kk key would become mandatory.

Norbert

I think #2 is far more common for ECMAScript - typical use would be to re-sort a list of a few dozen or at most a few hundred entries and then redisplay that list. #1 might become more common though as JavaScript use on the server progresses.

So here's an alternative spec approach:

- Leave the specification of String.prototype.localeCompare as is. That is, if it's not based on Collator, canonical equivalence -> 0 is required.

- For Collator.prototype.compare, require that canonical equivalence -> 0 unless the client explicitly turns off normalization (i.e., normalization is on by default, independent of locale). Support for the normalization property in options and the kk key would become mandatory.

Norbert


On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:

> I think we could go either way. It depends on the usage mode.
> 	• The case where performance is crucial is where you are comparing gazillions of strings, such as records in a database.
> 	• If the number of strings to be compared is relatively small, and/or there is enough overhead anyway, the performance win by turning off full normalization would be lost in the noise. 
> So if #2 is the expected use case, we could require full normalization.
> 
> 
> Mark
> 
> — Il meglio è l’inimico del bene —
> 
> 
> 
> On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> The question for ECMAScript then is whether we should stick with "must do" (the current state of the specifications) or change to "must be able to do".
> 
> The changes for "must be able to do" would be:
> 
> - In the Language specification, remove the description of String.prototype.localeCompare, and require implementations to follow the Internationalization API specification at least for this method, or better provide the complete Internationalization API. That way, localeCompare acquires support for the normalization property in options, and the -kk- key in the Unicode locale extensions.
>

# Mark Davis ☕ (13 years ago)

Support for the normalization property in options and the kk key would

become mandatory.

The options that ICU offers are to observe full canonical equivalence:

For all locales
- kk=true
For key locales (where it is necessary); otherwise partial (FCD)
- kk=<not present>
For no locales; always partial (FCD)
- kk=false

Your proposal looks reasonable, except I'm not sure how someone would use the kk value to get #2.

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

> Support for the normalization property in options and the kk key would
become mandatory.

The options that ICU offers are to observe full canonical equivalence:

   1. For all locales
      - kk=true
   2. For key locales (where it is necessary); otherwise partial (FCD)
      - kk=<not present>
   3. For no locales; always partial (FCD)
      - kk=false

Your proposal looks reasonable, except I'm not sure how someone would use
the kk value to get #2.

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**



On Fri, Aug 31, 2012 at 3:30 PM, Norbert Lindenberg <
ecmascript at norbertlindenberg.com> wrote:

> I think #2 is far more common for ECMAScript - typical use would be to
> re-sort a list of a few dozen or at most a few hundred entries and then
> redisplay that list. #1 might become more common though as JavaScript use
> on the server progresses.
>
> So here's an alternative spec approach:
>
> - Leave the specification of String.prototype.localeCompare as is. That
> is, if it's not based on Collator, canonical equivalence -> 0 is required.
>
> - For Collator.prototype.compare, require that canonical equivalence -> 0
> unless the client explicitly turns off normalization (i.e., normalization
> is on by default, independent of locale). Support for the normalization
> property in options and the kk key would become mandatory.
>
> Norbert
>
>
> On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:
>
> > I think we could go either way. It depends on the usage mode.
> >       • The case where performance is crucial is where you are comparing
> gazillions of strings, such as records in a database.
> >       • If the number of strings to be compared is relatively small,
> and/or there is enough overhead anyway, the performance win by turning off
> full normalization would be lost in the noise.
> > So if #2 is the expected use case, we could require full normalization.
> >
> >
> > Mark
> >
> > — Il meglio è l’inimico del bene —
> >
> >
> >
> > On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
> > The question for ECMAScript then is whether we should stick with "must
> do" (the current state of the specifications) or change to "must be able to
> do".
> >
> > The changes for "must be able to do" would be:
> >
> > - In the Language specification, remove the description of
> String.prototype.localeCompare, and require implementations to follow the
> Internationalization API specification at least for this method, or better
> provide the complete Internationalization API. That way, localeCompare
> acquires support for the normalization property in options, and the -kk-
> key in the Unicode locale extensions.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120901/fe496402/attachment.html>

# Markus Scherer (13 years ago)

On Sat, Sep 1, 2012 at 4:19 PM, Mark Davis ☕ <mark at macchiato.com> wrote:

Your proposal looks reasonable, except I'm not sure how someone would use the kk value to get #2.

Could we say kk=default? markus

On Sat, Sep 1, 2012 at 4:19 PM, Mark Davis ☕ <mark at macchiato.com> wrote:

> Your proposal looks reasonable, except I'm not sure how someone would use
> the kk value to get #2.
>

Could we say kk=default?
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120902/d64e5e50/attachment.html>

# Mark Davis ☕ (13 years ago)

We could propose to the CLDR group adding <attribute>=default to mean (for

CLDR) the same as missing (at least for kk, if not others).

That would formally work, but would mean than in an ECMAScript context missing != default, while in other CLDR contexts, missing == default.

May work, but any other thoughts?

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

We could propose to the CLDR group adding <attribute>=default to mean (for
CLDR) the same as missing (at least for kk, if not others).

That would formally work, but would mean than in an ECMAScript context
missing != default, while in other CLDR contexts, missing == default.

May work, but any other thoughts?

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Sun, Sep 2, 2012 at 8:15 AM, Markus Scherer <markus.icu at gmail.com> wrote:

> On Sat, Sep 1, 2012 at 4:19 PM, Mark Davis ☕ <mark at macchiato.com> wrote:
>
>> Your proposal looks reasonable, except I'm not sure how someone would use
>> the kk value to get #2.
>>
>
> Could we say kk=default?
> markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120902/a50f43f0/attachment.html>

# Markus Scherer (13 years ago)

On Sun, Sep 2, 2012 at 12:51 PM, Mark Davis ☕ <mark at macchiato.com> wrote:

We could propose to the CLDR group adding <attribute>=default to mean (for CLDR) the same as missing (at least for kk, if not others).

I don't think that CLDR needs that just because ECMAScript might have it.

markus

On Sun, Sep 2, 2012 at 12:51 PM, Mark Davis ☕ <mark at macchiato.com> wrote:

> We could propose to the CLDR group adding <attribute>=default to mean (for
> CLDR) the same as missing (at least for kk, if not others).

I don't think that CLDR needs that just because ECMAScript might have it.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120902/4b2b1e15/attachment.html>

# Norbert Lindenberg (13 years ago)

The BCP 47 Unicode Locale Extension would need it, and currently that's tangled with CLDR...

Norbert

The BCP 47 Unicode Locale Extension would need it, and currently that's tangled with CLDR...

Norbert


On Sep 2, 2012, at 16:49 , Markus Scherer wrote:

> On Sun, Sep 2, 2012 at 12:51 PM, Mark Davis ☕ <mark at macchiato.com> wrote:
> We could propose to the CLDR group adding <attribute>=default to mean (for CLDR) the same as missing (at least for kk, if not others).
> 
> I don't think that CLDR needs that just because ECMAScript might have it.
> 
> markus

# Norbert Lindenberg (13 years ago)

Seeing that the final draft of the spec is due today, here's a breakdown of possible changes around normalization in Collator:

Change the description of Intl.Collator.prototype.compare to say: "The method is required to return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard, unless collator has a [[normalization]] internal property whose value is false."

This is the smallest possible change to the spec that's needed to make its canonical equivalence and normalization requirements consistent, and I've made it.

Require support for the normalization property and the kk key.

The way I phrased the spec in 1), this isn't necessary anymore, and we can make this change in the second edition if needed.

Add "locale" to the set of acceptable input values for the normalization property of options. Implementations that support the normalization property would use the selected locale's default for the "kk" key. The normalization property of the object returned by resolvedOptions remains a boolean.

This change could be made today or in the second edition. If we make it in the second edition, implementations of the first edition would interpret "locale" as true because "locale" is truthy. The conformance clause does not allow implementations to add support for this value on their own.

Add "locale" to the set of acceptable values of the kk key of BCP 47. The Internationalization API would use this, if the normalization property of options is undefined, to map to the appropriate boolean value.

This can't happen today, and I'm not sure it's really required. Turning off normalization is primarily an optimization and so should be under application control.

Comments?

Norbert

Seeing that the final draft of the spec is due today, here's a breakdown of possible changes around normalization in Collator:

1) Change the description of Intl.Collator.prototype.compare to say: "The method is required to return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard, unless collator has a [[normalization]] internal property whose value is false."

This is the smallest possible change to the spec that's needed to make its canonical equivalence and normalization requirements consistent, and I've made it.

2) Require support for the normalization property and the kk key.

The way I phrased the spec in 1), this isn't necessary anymore, and we can make this change in the second edition if needed.

3) Add "locale" to the set of acceptable input values for the normalization property of options. Implementations that support the normalization property would use the selected locale's default for the "kk" key. The normalization property of the object returned by resolvedOptions remains a boolean.

This change could be made today or in the second edition. If we make it in the second edition, implementations of the first edition would interpret "locale" as true because "locale" is truthy. The conformance clause does not allow implementations to add support for this value on their own.

4) Add "locale" to the set of acceptable values of the kk key of BCP 47. The Internationalization API would use this, if the normalization property of options is undefined, to map to the appropriate boolean value.

This can't happen today, and I'm not sure it's really required. Turning off normalization is primarily an optimization and so should be under application control.

Comments?

Norbert

On Sep 1, 2012, at 16:19 , Mark Davis ☕ wrote:

> > Support for the normalization property in options and the kk key would become mandatory.
> 
> The options that ICU offers are to observe full canonical equivalence:
> 	• For all locales
> 		• kk=true
> 	• For key locales (where it is necessary); otherwise partial (FCD)
> 		• kk=<not present>
> 	• For no locales; always partial (FCD)
> 		• kk=false
> Your proposal looks reasonable, except I'm not sure how someone would use the kk value to get #2.
> 
> Mark
> 
> — Il meglio è l’inimico del bene —
> 
> 
> 
> On Fri, Aug 31, 2012 at 3:30 PM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> I think #2 is far more common for ECMAScript - typical use would be to re-sort a list of a few dozen or at most a few hundred entries and then redisplay that list. #1 might become more common though as JavaScript use on the server progresses.
> 
> So here's an alternative spec approach:
> 
> - Leave the specification of String.prototype.localeCompare as is. That is, if it's not based on Collator, canonical equivalence -> 0 is required.
> 
> - For Collator.prototype.compare, require that canonical equivalence -> 0 unless the client explicitly turns off normalization (i.e., normalization is on by default, independent of locale). Support for the normalization property in options and the kk key would become mandatory.
> 
> Norbert
> 
> 
> On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:
> 
> > I think we could go either way. It depends on the usage mode.
> >       • The case where performance is crucial is where you are comparing gazillions of strings, such as records in a database.
> >       • If the number of strings to be compared is relatively small, and/or there is enough overhead anyway, the performance win by turning off full normalization would be lost in the noise.
> > So if #2 is the expected use case, we could require full normalization.
> >
> >
> > Mark
> >
> > — Il meglio è l’inimico del bene —
> >
> >
> >
> > On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> > The question for ECMAScript then is whether we should stick with "must do" (the current state of the specifications) or change to "must be able to do".
> >
> > The changes for "must be able to do" would be:
> >
> > - In the Language specification, remove the description of String.prototype.localeCompare, and require implementations to follow the Internationalization API specification at least for this method, or better provide the complete Internationalization API. That way, localeCompare acquires support for the normalization property in options, and the -kk- key in the Unicode locale extensions.
> >
> 
>

# Mark Davis ☕ (13 years ago)

In view of the schedule, I suggest that we make your first, minimal change right now, and plan to correct it along one of the other lines in the next edition.

.#1 is much weaker than we want, so we should correct it, but we can do that in edition 2.

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

In view of the schedule, I suggest that we make your first, minimal change
right now, and plan to correct it along one of the other lines in the next
edition.

#1 is much weaker than we want, so we should correct it, but we can do that
in edition 2.

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**



On Tue, Sep 4, 2012 at 12:35 PM, Norbert Lindenberg <
ecmascript at norbertlindenberg.com> wrote:

> Seeing that the final draft of the spec is due today, here's a breakdown
> of possible changes around normalization in Collator:
>
> 1) Change the description of Intl.Collator.prototype.compare to say: "The
> method is required to return 0 when comparing Strings that are considered
> canonically equivalent by the Unicode standard, unless collator has a
> [[normalization]] internal property whose value is false."
>
> This is the smallest possible change to the spec that's needed to make its
> canonical equivalence and normalization requirements consistent, and I've
> made it.
>
> 2) Require support for the normalization property and the kk key.
>
> The way I phrased the spec in 1), this isn't necessary anymore, and we can
> make this change in the second edition if needed.
>
> 3) Add "locale" to the set of acceptable input values for the
> normalization property of options. Implementations that support the
> normalization property would use the selected locale's default for the "kk"
> key. The normalization property of the object returned by resolvedOptions
> remains a boolean.
>
> This change could be made today or in the second edition. If we make it in
> the second edition, implementations of the first edition would interpret
> "locale" as true because "locale" is truthy. The conformance clause does
> not allow implementations to add support for this value on their own.
>
> 4) Add "locale" to the set of acceptable values of the kk key of BCP 47.
> The Internationalization API would use this, if the normalization property
> of options is undefined, to map to the appropriate boolean value.
>
> This can't happen today, and I'm not sure it's really required. Turning
> off normalization is primarily an optimization and so should be under
> application control.
>
> Comments?
>
> Norbert
>
>
> On Sep 1, 2012, at 16:19 , Mark Davis ☕ wrote:
>
> > > Support for the normalization property in options and the kk key would
> become mandatory.
> >
> > The options that ICU offers are to observe full canonical equivalence:
> >       • For all locales
> >               • kk=true
> >       • For key locales (where it is necessary); otherwise partial (FCD)
> >               • kk=<not present>
> >       • For no locales; always partial (FCD)
> >               • kk=false
> > Your proposal looks reasonable, except I'm not sure how someone would
> use the kk value to get #2.
> >
> > Mark
> >
> > — Il meglio è l’inimico del bene —
> >
> >
> >
> > On Fri, Aug 31, 2012 at 3:30 PM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
> > I think #2 is far more common for ECMAScript - typical use would be to
> re-sort a list of a few dozen or at most a few hundred entries and then
> redisplay that list. #1 might become more common though as JavaScript use
> on the server progresses.
> >
> > So here's an alternative spec approach:
> >
> > - Leave the specification of String.prototype.localeCompare as is. That
> is, if it's not based on Collator, canonical equivalence -> 0 is required.
> >
> > - For Collator.prototype.compare, require that canonical equivalence ->
> 0 unless the client explicitly turns off normalization (i.e., normalization
> is on by default, independent of locale). Support for the normalization
> property in options and the kk key would become mandatory.
> >
> > Norbert
> >
> >
> > On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:
> >
> > > I think we could go either way. It depends on the usage mode.
> > >       • The case where performance is crucial is where you are
> comparing gazillions of strings, such as records in a database.
> > >       • If the number of strings to be compared is relatively small,
> and/or there is enough overhead anyway, the performance win by turning off
> full normalization would be lost in the noise.
> > > So if #2 is the expected use case, we could require full normalization.
> > >
> > >
> > > Mark
> > >
> > > — Il meglio è l’inimico del bene —
> > >
> > >
> > >
> > > On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
> > > The question for ECMAScript then is whether we should stick with "must
> do" (the current state of the specifications) or change to "must be able to
> do".
> > >
> > > The changes for "must be able to do" would be:
> > >
> > > - In the Language specification, remove the description of
> String.prototype.localeCompare, and require implementations to follow the
> Internationalization API specification at least for this method, or better
> provide the complete Internationalization API. That way, localeCompare
> acquires support for the normalization property in options, and the -kk-
> key in the Unicode locale extensions.
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120904/76606809/attachment.html>

# Norbert Lindenberg (13 years ago)

It was too weak indeed - I added the requirement that normalization is turned on by default.

Norbert

It was too weak indeed - I added the requirement that normalization is turned on by default.

Norbert


On Sep 4, 2012, at 13:23 , Mark Davis ☕ wrote:

> In view of the schedule, I suggest that we make your first, minimal change right now, and plan to correct it along one of the other lines in the next edition.
> 
> #1 is much weaker than we want, so we should correct it, but we can do that in edition 2.
> 
> Mark
> 
> — Il meglio è l’inimico del bene —
> 
> 
> 
> On Tue, Sep 4, 2012 at 12:35 PM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> Seeing that the final draft of the spec is due today, here's a breakdown of possible changes around normalization in Collator:
> 
> 1) Change the description of Intl.Collator.prototype.compare to say: "The method is required to return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard, unless collator has a [[normalization]] internal property whose value is false."
> 
> This is the smallest possible change to the spec that's needed to make its canonical equivalence and normalization requirements consistent, and I've made it.
> 
> 2) Require support for the normalization property and the kk key.
> 
> The way I phrased the spec in 1), this isn't necessary anymore, and we can make this change in the second edition if needed.
> 
> 3) Add "locale" to the set of acceptable input values for the normalization property of options. Implementations that support the normalization property would use the selected locale's default for the "kk" key. The normalization property of the object returned by resolvedOptions remains a boolean.
> 
> This change could be made today or in the second edition. If we make it in the second edition, implementations of the first edition would interpret "locale" as true because "locale" is truthy. The conformance clause does not allow implementations to add support for this value on their own.
> 
> 4) Add "locale" to the set of acceptable values of the kk key of BCP 47. The Internationalization API would use this, if the normalization property of options is undefined, to map to the appropriate boolean value.
> 
> This can't happen today, and I'm not sure it's really required. Turning off normalization is primarily an optimization and so should be under application control.
> 
> Comments?
> 
> Norbert
> 
> 
> On Sep 1, 2012, at 16:19 , Mark Davis ☕ wrote:
> 
> > > Support for the normalization property in options and the kk key would become mandatory.
> >
> > The options that ICU offers are to observe full canonical equivalence:
> >       • For all locales
> >               • kk=true
> >       • For key locales (where it is necessary); otherwise partial (FCD)
> >               • kk=<not present>
> >       • For no locales; always partial (FCD)
> >               • kk=false
> > Your proposal looks reasonable, except I'm not sure how someone would use the kk value to get #2.
> >
> > Mark
> >
> > — Il meglio è l’inimico del bene —
> >
> >
> >
> > On Fri, Aug 31, 2012 at 3:30 PM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> > I think #2 is far more common for ECMAScript - typical use would be to re-sort a list of a few dozen or at most a few hundred entries and then redisplay that list. #1 might become more common though as JavaScript use on the server progresses.
> >
> > So here's an alternative spec approach:
> >
> > - Leave the specification of String.prototype.localeCompare as is. That is, if it's not based on Collator, canonical equivalence -> 0 is required.
> >
> > - For Collator.prototype.compare, require that canonical equivalence -> 0 unless the client explicitly turns off normalization (i.e., normalization is on by default, independent of locale). Support for the normalization property in options and the kk key would become mandatory.
> >
> > Norbert
> >
> >
> > On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:
> >
> > > I think we could go either way. It depends on the usage mode.
> > >       • The case where performance is crucial is where you are comparing gazillions of strings, such as records in a database.
> > >       • If the number of strings to be compared is relatively small, and/or there is enough overhead anyway, the performance win by turning off full normalization would be lost in the noise.
> > > So if #2 is the expected use case, we could require full normalization.
> > >
> > >
> > > Mark
> > >
> > > — Il meglio è l’inimico del bene —
> > >
> > >
> > >
> > > On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
> > > The question for ECMAScript then is whether we should stick with "must do" (the current state of the specifications) or change to "must be able to do".
> > >
> > > The changes for "must be able to do" would be:
> > >
> > > - In the Language specification, remove the description of String.prototype.localeCompare, and require implementations to follow the Internationalization API specification at least for this method, or better provide the complete Internationalization API. That way, localeCompare acquires support for the normalization property in options, and the -kk- key in the Unicode locale extensions.
> > >
> >
> >
> 
>

# Mark Davis ☕ (13 years ago)

That works (for now).

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

That works (for now).

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**



On Wed, Sep 5, 2012 at 11:04 AM, Norbert Lindenberg <
ecmascript at norbertlindenberg.com> wrote:

> It was too weak indeed - I added the requirement that normalization is
> turned on by default.
>
> Norbert
>
>
> On Sep 4, 2012, at 13:23 , Mark Davis ☕ wrote:
>
> > In view of the schedule, I suggest that we make your first, minimal
> change right now, and plan to correct it along one of the other lines in
> the next edition.
> >
> > #1 is much weaker than we want, so we should correct it, but we can do
> that in edition 2.
> >
> > Mark
> >
> > — Il meglio è l’inimico del bene —
> >
> >
> >
> > On Tue, Sep 4, 2012 at 12:35 PM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
> > Seeing that the final draft of the spec is due today, here's a breakdown
> of possible changes around normalization in Collator:
> >
> > 1) Change the description of Intl.Collator.prototype.compare to say:
> "The method is required to return 0 when comparing Strings that are
> considered canonically equivalent by the Unicode standard, unless collator
> has a [[normalization]] internal property whose value is false."
> >
> > This is the smallest possible change to the spec that's needed to make
> its canonical equivalence and normalization requirements consistent, and
> I've made it.
> >
> > 2) Require support for the normalization property and the kk key.
> >
> > The way I phrased the spec in 1), this isn't necessary anymore, and we
> can make this change in the second edition if needed.
> >
> > 3) Add "locale" to the set of acceptable input values for the
> normalization property of options. Implementations that support the
> normalization property would use the selected locale's default for the "kk"
> key. The normalization property of the object returned by resolvedOptions
> remains a boolean.
> >
> > This change could be made today or in the second edition. If we make it
> in the second edition, implementations of the first edition would interpret
> "locale" as true because "locale" is truthy. The conformance clause does
> not allow implementations to add support for this value on their own.
> >
> > 4) Add "locale" to the set of acceptable values of the kk key of BCP 47.
> The Internationalization API would use this, if the normalization property
> of options is undefined, to map to the appropriate boolean value.
> >
> > This can't happen today, and I'm not sure it's really required. Turning
> off normalization is primarily an optimization and so should be under
> application control.
> >
> > Comments?
> >
> > Norbert
> >
> >
> > On Sep 1, 2012, at 16:19 , Mark Davis ☕ wrote:
> >
> > > > Support for the normalization property in options and the kk key
> would become mandatory.
> > >
> > > The options that ICU offers are to observe full canonical equivalence:
> > >       • For all locales
> > >               • kk=true
> > >       • For key locales (where it is necessary); otherwise partial
> (FCD)
> > >               • kk=<not present>
> > >       • For no locales; always partial (FCD)
> > >               • kk=false
> > > Your proposal looks reasonable, except I'm not sure how someone would
> use the kk value to get #2.
> > >
> > > Mark
> > >
> > > — Il meglio è l’inimico del bene —
> > >
> > >
> > >
> > > On Fri, Aug 31, 2012 at 3:30 PM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
> > > I think #2 is far more common for ECMAScript - typical use would be to
> re-sort a list of a few dozen or at most a few hundred entries and then
> redisplay that list. #1 might become more common though as JavaScript use
> on the server progresses.
> > >
> > > So here's an alternative spec approach:
> > >
> > > - Leave the specification of String.prototype.localeCompare as is.
> That is, if it's not based on Collator, canonical equivalence -> 0 is
> required.
> > >
> > > - For Collator.prototype.compare, require that canonical equivalence
> -> 0 unless the client explicitly turns off normalization (i.e.,
> normalization is on by default, independent of locale). Support for the
> normalization property in options and the kk key would become mandatory.
> > >
> > > Norbert
> > >
> > >
> > > On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:
> > >
> > > > I think we could go either way. It depends on the usage mode.
> > > >       • The case where performance is crucial is where you are
> comparing gazillions of strings, such as records in a database.
> > > >       • If the number of strings to be compared is relatively small,
> and/or there is enough overhead anyway, the performance win by turning off
> full normalization would be lost in the noise.
> > > > So if #2 is the expected use case, we could require full
> normalization.
> > > >
> > > >
> > > > Mark
> > > >
> > > > — Il meglio è l’inimico del bene —
> > > >
> > > >
> > > >
> > > > On Fri, Aug 31, 2012 at 9:56 AM, Norbert Lindenberg <
> ecmascript at norbertlindenberg.com> wrote:
> > > > The question for ECMAScript then is whether we should stick with
> "must do" (the current state of the specifications) or change to "must be
> able to do".
> > > >
> > > > The changes for "must be able to do" would be:
> > > >
> > > > - In the Language specification, remove the description of
> String.prototype.localeCompare, and require implementations to follow the
> Internationalization API specification at least for this method, or better
> provide the complete Internationalization API. That way, localeCompare
> acquires support for the normalization property in options, and the -kk-
> key in the Unicode locale extensions.
> > > >
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120905/e0e17677/attachment-0001.html>