Internationalization: Strings as locales argument

# Norbert Lindenberg (13 years ago)

When a String value is passed as the locales argument to the Intl constructors Collator, NumberFormat, and DateTimeFormat, or to their supportedLocalesOf methods, it is currently interpreted as a list of its individual characters:

  • "" is interpreted as an empty array, which is acceptable (the constructors will generally fall back to the default locale).
  • "en-US" is interpreted as the array ["e", "n", "-", "U", "S"]. Individual characters cannot be valid language tags, so a RangeError is thrown.

This happens because the CanonicalizeLocaleList abstract operation, following the path beaten by many built-ins in ES5, calls ToObject on the argument, and then finds length and indexed properties in the resulting String object.

Using ToObject is an attempt to make (almost) any input usable by the rest of the operation, but from an application point of view it fails.

Given that many applications will only have a single locale to pass in, I think it would be better to recognize strings specifically, and treat a String value by wrapping it into an array for further processing. This means inserting the following before the current step 3 of CanonicalizeLocaleList:

(3) If locales is a String value, then (3a) Let locales be a new array created as if by the expression new Array(locales) where Array is the standard built-in constructor with that name and locales is the value of locales.

The result would be that "" is rejected with a RangeError, but "en-US" is processed as ["en-US"].

Any objections?

Thanks, Norbert

# Brendan Eich (13 years ago)

Norbert Lindenberg wrote:

(3) If locales is a String value, then (3a) Let locales be a new array created as if by the expression new Array(locales) where Array is the standard built-in constructor with that name and locales is the value of locales.

The result would be that "" is rejected with a RangeError, but "en-US" is processed as ["en-US"].

Any objections?

No, looks good -- using the standard built-in constructor makes the temporary Array instance unobservable. Clearly the right answer is to special-case a String-type argument before falling back on ToObject. One might question using ToObject at all, though. If you really only expect a string or else an object with indexed elements bounded by .length, you could throw on non-string primitives.

The ToObject uses in clause 15 in ECMA-262 are mostly for processing |this| when passed to a generic Array method, with a lesser cohort in 15.2.* for Object prototype and constructor methods. Not IMHO strong precedent for new locale/locale-list-bearing APIs.

# Phillips, Addison (13 years ago)

The result would be that "" is rejected with a RangeError, but "en-US" is processed as ["en-US"].

Would there be some means of referencing the "root" locale other than using the empty string?

Also, one means of assigning a locale would be to scrape one or another @lang attribute in some HTML or XML content. If that attribute were empty, would RangeError be an expected outcome? Wouldn't it be better to handle the empty string gracefully, since it isn't necessarily an error condition?

Addison

# Norbert Lindenberg (13 years ago)

Empty strings are not valid BSP 47 language tags and would not qualify with or without my proposed change. Without the change, they'd be interpreted as an empty list, so the constructors would eventually fall back to the default locale, while supportedLocalesOf would return an empty array.

UTS 35, section 3.2.2, specifies that the Unicode locale identifier "root" is mapped to the BCP 47 language tag "und". I once proposed that our API should require support for a language-independent locale (to the extent that that's possible); that proposal didn't find approval.

In XML and HTML, an empty language tag means "no language information available" or "primary language is unknown" [1, 2]. If within such content language sensitive operations are necessary, someone has to decide which language to assume. Should that be the Internationalization API? Maybe an application would find "und" more appropriate in this situation than the default locale?

Norbert

[1] www.w3.org/TR/2008/REC-xml-20081126/#sec-lang-tag [2] dev.w3.org/html5/spec/global-attributes.html#the-lang-and-xml:lang

# Phillips, Addison (13 years ago)

The empty string isn't a valid BCP 47 language tag, but it is a valid value for xml:lang and HTML @lang. So my main concern would be to make the "pass through" somewhat seamless. That is, I'm more concerned with that becoming an error condition than I am concerned with what happens with that value.

I agree that the empty value isn't very useful in determining what to do. For what it's worth, I supported a language-independent "root" locale, although there is scant difference between the empty string and using the "und" tag. I tend, personally, to favor the empty string over "und", because the "und" tag makes it look like there is data there (harder to special case). But the problem remains of what behavior to assign to the "no locale available" case and whether that should be normative or implementation defined.

Addison

# Norbert Lindenberg (13 years ago)

Let's phrase this as clear alternatives for what the spec could say:

(1) Don't say anything about empty strings specifically. With the change discussed earlier, where a string is mapped to an array containing the string, empty strings will result in a RangeError because they're not valid BCP 47 language tags.

Pro: This lets application developers decide how the empty string should be handled, possibly using different choices for HTML/XML contexts and other contexts. We can pick the more popular answer in a later edition of the spec after seeing what application developers do.

Con: Application developers have to deal with this case. Some will forget to do so and be surprised by exceptions.

(2) Treat an empty string like undefined. Constructors will use the default locale; supportedLocalesOf will return an empty list.

Pro: Applications don't get exceptions for this case; we're mapping the HTML/XML notion of "we don't know the locale" to our notion of "we don't know the locale".

Con: The default behavior may not be appropriate in all cases. We deviate from BCP 47 also in cases where no HTML/XML is involved.

(3) Require that implementations interpret an empty string as a request for a (mostly) language independent locale.

Pro: This may be what applications want to happen when using an empty string; it may be appropriate in the case where HTML/XML don't provide a lang attribute.

Con: The spec doesn't require support for a language independent locale, and doesn't say what it would look like, so this doesn't really guarantee anything to application developers.

I'm leaning towards (1).

Comments?

Norbert

# Eric Albright (13 years ago)

I like 1 as well.

# Norbert Lindenberg (13 years ago)

With the demise of LocaleList objects, there's no generic way to say "default locale" anymore.

Constructors will use the default locale if you don't specify a locale and request the Lookup algorithm. If you really need to know, you can use that to determine the default locale:

var format = new Intl.DateTimeFormat([], {localeMatcher: "lookup"}); var defaultLocale = format.resolvedOptions().locale;

There's no standard way to say "language-independent locale", and there's no requirement to support such a locale.

Norbert