Internationalization: Additional values in API
I tend to agree with your proposal.
Some caveats below.
Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **
On Tue, Jun 26, 2012 at 3:22 PM, Norbert Lindenberg < ecmascript at norbertlindenberg.com> wrote:
The TC 39 meeting on 2012-05-21 decided to allow implementations to recognize property values for which the specification prescribes an Error [1]:
- Conformance
- What about already defined properties? Can we add new, implementation specific values, like v8Identical for collator sensitivity?
- We should throw if we don't recognise the value. You may recognise additional property values.
I'd like to propose a more restricted escape hatch, to be added to the existing allowances for additional objects, properties, and functions in the Conformance clause:
<spec> In the following cases where the specification requires that a RangeError is thrown for unacceptable input values, implementations may define additional acceptable input values for which the RangeError is not thrown:
- The options property localeMatcher in all constructors and supportedLocalesOf methods.
- The options properties usage and sensitivity in the Collator constructor.
- The options properties style, currencyDisplay, minimumIntegerDigits, minimumFractionDigits, maximumFractionDigits, minimumSignificantDigits, and maximumSignificantDigits in the NumberFormat constructor.
The ones that are integers it would seem odd to accept others.
- The options property timeZone in the DateTimeFormat constructor, provided that the additional acceptable input values are case-insensitive matches of Zone or Link identifiers in the IANA time zone database [2] and are canonicalized to Zone identifiers in the casing used in the database for DateTimeFormat.resolvedOptions().timeZone, except that "Etc/GMT" shall be canonicalized to "UTC".
I agree with your reasoning below, but would I would rather use the CLDR values in unicode.org/repos/cldr/trunk/common/bcp47/timezone.xml, since they are based on the TZDB but mroe stable. Either just names or names + aliases.
- The options properties listed in table 3 in the DateTimeFormat constructor.
- The options property formatMatcher in the DateTimeFormat constructor. </spec>
The above prevents additional values in the following cases:
Input values that lead to TypeError exceptions. These are usually not meaningful extension points.
Input values that are boolean. There just aren't additional meaningful boolean values.
Language tags that are not structurally valid. Structural validity is a quite minimal requirement, and BCP 47 itself is very extensible. Allowing additional values in the Internationalization API would only create confusion.
Currency codes that are not well-formed. Here as well, well-formedness is a quite minimal requirement, and ISO 4217 itself allows registration of any actual new currency codes. Allowing additional values in the Internationalization API would only create confusion.
Additional keys and values from Unicode Technical Standard 35, Unicode Locale Data Markup Language [3]. UTS 35 defines several keys and values that we have agreed are not useful for the Internationalization API, so we should be able to screen new ones before they're added.
I'm a bit hesitant about the screening, since it may take a while (looking at history) between updates, unless there is a lighter-weight mechanism.
Replies below.
Norbert
On Jun 26, 2012, at 16:02 , Mark Davis ☕ wrote:
On Tue, Jun 26, 2012 at 3:22 PM, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:
- The options properties style, currencyDisplay, minimumIntegerDigits, minimumFractionDigits, maximumFractionDigits, minimumSignificantDigits, and maximumSignificantDigits in the NumberFormat constructor.
The ones that are integers it would seem odd to accept others.
I should clarify for these that the acceptable additional values would be higher numbers, similar to ecma-international.org/ecma-262/5.1/#sec-15.7.4.5
- The options property timeZone in the DateTimeFormat constructor, provided that the additional acceptable input values are case-insensitive matches of Zone or Link identifiers in the IANA time zone database [2] and are canonicalized to Zone identifiers in the casing used in the database for DateTimeFormat.resolvedOptions().timeZone, except that "Etc/GMT" shall be canonicalized to "UTC".
I agree with your reasoning below, but would I would rather use the CLDR values in unicode.org/repos/cldr/trunk/common/bcp47/timezone.xml, since they are based on the TZDB but mroe stable. Either just names or names + aliases.
If the time zone naming schemes of both IANA and CLDR were new proposals, you might be able to convince me that the CLDR scheme is better. However, the IANA time zone names have been around for much longer and are far more widely supported. The ICU API, for example, is based on IANA time zone names, and looking through the time zone related documentation I can't find any support for the CLDR names. userguide.icu-project.org/datetime/timezone, icu-project.org/apiref/icu4c/classTimeZone.html
The TC 39 meeting on 2012-05-21 decided to allow implementations to recognize property values for which the specification prescribes an Error [1]:
I'd like to propose a more restricted escape hatch, to be added to the existing allowances for additional objects, properties, and functions in the Conformance clause:
<spec>
In the following cases where the specification requires that a RangeError is thrown for unacceptable input values, implementations may define additional acceptable input values for which the RangeError is not thrown:
The above prevents additional values in the following cases:
Input values that lead to TypeError exceptions. These are usually not meaningful extension points.
Input values that are boolean. There just aren't additional meaningful boolean values.
Language tags that are not structurally valid. Structural validity is a quite minimal requirement, and BCP 47 itself is very extensible. Allowing additional values in the Internationalization API would only create confusion.
Currency codes that are not well-formed. Here as well, well-formedness is a quite minimal requirement, and ISO 4217 itself allows registration of any actual new currency codes. Allowing additional values in the Internationalization API would only create confusion.
Additional keys and values from Unicode Technical Standard 35, Unicode Locale Data Markup Language [3]. UTS 35 defines several keys and values that we have agreed are not useful for the Internationalization API, so we should be able to screen new ones before they're added.
NaN and +/- Infinity in DateTimeFormat.prototype.format. These just aren't meaningful time values.
The most unusual part of the proposed addition to the Conformance clause is the mini-specification for additional time zone identifiers. In the discussions on DateTimeFormat, we deferred defining support for a larger set of time zones because not all implementations are ready to support them. If we allow implementations to accept additional values, however, it's a pretty safe guess that several implementations will extend the set of supported time zones quickly, because applications need it, and it's also a pretty safe guess that they'll build their support around the IANA time zone IDs [2], and that the values would not be prefixed. There may however be inconsistencies around case significance and around canonicalization of the names in DateTimeFormat.prototype.resolvedOptions. In this situation, I think it would be better to standardize now which values may get accepted optionally and how they're processed.
Comments?
, Norbert
[1] esdiscuss/2012-May/022836 [2] www.iana.org/time-zones [3] unicode.org/reports/tr35