Language Negotiation API

# Zbigniew Braniecki (12 years ago)

Currently, ECMA 402 specifies a pretty nice language negotiation algorithm... and keeps it private.

While working on l10n frameworks, we need to be able to negotiate between at least two parties - application and user preferences, in the very same way I18n API does, so if we could get the language negotiation bits exposed, we could just use that, and keep language choices between l10n and i18n in sync.

So, what do we need?

While working on L20n we identified two functions as crucial:

CanonicalizeLanguageTag 1

Because language tags come from developers and users, ability to canonicalize them is crucial to us. ECMA 402 specifies this function and all we need is to expose it in the API

1.1) CanonicalizeLocaleList 2

That would also be nice to have :)

LookupAvailableLocales

This function has almost identical heuristic to LookupSupportedLocales 3 with a single difference being in step d).

Replace:

"If availableLocale is not undefined, then append locale to the end of subset."

with:

"If availableLocale is not undefined, then append availableLocale to the end of subset."

The reason behind this is that localization frameworks need to choose the available locales that closest match the user preferences. If we used LookupSupportedLocales, we will receive the locales that user requested, not ones that are available on the system. In result on each of those, we'd have to call BestAvailableLocale 4 to receive the tag name that we can pull resources for.

With that one change, we are actually going to receive the right set of language tags that we can then use to provide best language with fallbacks.

Example implementation of this is L20n localization framework 5 which copies Mozilla ECMA 402 code to expose the required functions and uses custom function called prioritizeLocales to build the final locale fallback chain.

Comments? Feedback? Next steps? :)

Hi all,

Currently, ECMA 402 specifies a pretty nice language negotiation 
algorithm... and keeps it private.

While working on l10n frameworks, we need to be able to negotiate 
between at least two parties - application and user preferences, in the 
very same way I18n API does, so if we could get the language negotiation 
bits exposed, we could just use that, and keep language choices between 
l10n and i18n in sync.

So, what do we need?

While working on L20n we identified two functions as crucial:

1) CanonicalizeLanguageTag [1]

Because language tags come from developers and users, ability to 
canonicalize them is crucial to us. ECMA 402 specifies this function and 
all we need is to expose it in the API

1.1) CanonicalizeLocaleList [2]

That would also be nice to have :)

2) LookupAvailableLocales

This function has almost identical heuristic to LookupSupportedLocales 
[3] with a single difference being in step d).

Replace:
  - "If /availableLocale/ is not *undefined*, then append /locale/ to 
the end of /subset/. "
with:
  - "If /availableLocale/ is not *undefined*, then append 
/availableLocale/ to the end of /subset/. "

The reason behind this is that localization frameworks need to choose 
the >>available<< locales that closest match the user preferences. If we 
used LookupSupportedLocales, we will receive the locales that user 
requested, not ones that are available on the system.
In result on each of those, we'd have to call BestAvailableLocale [4] to 
receive the tag name that we can pull resources for.

With that one change, we are actually going to receive the right set of 
language tags that we can then use to provide best language with fallbacks.

Example implementation of this is L20n localization framework [5] which 
copies Mozilla ECMA 402 code to expose the required functions and uses 
custom function called prioritizeLocales to build the final locale 
fallback chain.

Comments? Feedback? Next steps? :)

Cheers,
g.
-- 

Mozilla (http://www.mozilla.org)

[1] http://ecma-international.org/ecma-402/1.0/index.html#sec-6.2.3
[2] http://ecma-international.org/ecma-402/1.0/index.html#sec-9.2.1
[3] http://ecma-international.org/ecma-402/1.0/index.html#sec-9.2.6
[4] http://ecma-international.org/ecma-402/1.0/index.html#sec-9.2.2
[5] https://github.com/l20n/l20n.js/blob/master/lib/l20n/intl.js#L431
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130711/487d916b/attachment.html>

# Andy Earnshaw (12 years ago)

On Thu, Jul 11, 2013 at 11:52 PM, Zbigniew Braniecki <zbraniecki at mozilla.com> wrote

CanonicalizeLanguageTag [1]

Because language tags come from developers and users, ability to canonicalize them is crucial to us. ECMA 402 specifies this function and all we need is to expose it in the API

I was thinking the same thing recently, at least for CanonicalizeLanguageTag. I was working with a platform that gave me a language tag in non-canonical form, meaning I had to either canonicalize it or rename my language files to match the same non-canonical form. Exposing it as Intl.canonicalizeLanguageTag(tag) seems like a good idea.

1.1) CanonicalizeLocaleList [2]

That would also be nice to have :)

I don't think you could expose CanonicalizeLocaleList directly without altering it to return an array, you'd have to do something similar to step 5 of LookupSupportedLocales. I'm not sure we could change that function in the spec without other abstracts potentially being affected by tainted a Array.prototype, so I guess you'd need to specify a new function. In which case I'm wondering if maybe you'd be better off with Intl.canonicalizeTags(tags) which would cover both CanonicalizeLanguageTag() and CanonicalizeLocaleList().

LookupAvailableLocales

This function has almost identical heuristic to LookupSupportedLocales [3] with a single difference being in step d).

Replace:

"If availableLocale is not undefined, then append locale to the end of subset."

with:

"If availableLocale is not undefined, then append availableLocale to the end of subset."

The reason behind this is that localization frameworks need to choose the available locales that closest match the user preferences. If we used LookupSupportedLocales, we will receive the locales that user requested, not ones that are available on the system. In result on each of those, we'd have to call BestAvailableLocale [4] to receive the tag name that we can pull resources for.

You can at least work around this for a single locale with Intl.NumberFormat(tag).resolvedOptions().locale. If you're already using the native localisation APIs, this might not be too much of a hindrance. What you're suggesting would need to be a function property of the constructors, e.g. Intl.NumberFormat.availableLocalesOf(). I'm not so sure this approach makes sense, though; wouldn't you still have a problem if your own API provided variant data where the system does not?

Sorry g, forgot the Cc :-)

On Thu, Jul 11, 2013 at 11:52 PM, Zbigniew Braniecki <zbraniecki at mozilla.com
> wrote:

> ...
>

>
1) CanonicalizeLanguageTag [1]
>
> Because language tags come from developers and users, ability to
> canonicalize them is crucial to us. ECMA 402 specifies this function and
> all we need is to expose it in the API
>

I was thinking the same thing recently, at least for
CanonicalizeLanguageTag. I was working with a platform that gave me a
language tag in non-canonical form, meaning I had to either canonicalize it
or rename my language files to match the same non-canonical form.  Exposing
it as `Intl.canonicalizeLanguageTag(tag)` seems like a good idea.

>
> 1.1) CanonicalizeLocaleList [2]
>
> That would also be nice to have :)
>

I don't think you could expose CanonicalizeLocaleList directly without
altering it to return an array, you'd have to do something similar to step
5 of LookupSupportedLocales.  I'm not sure we could change that function in
the spec without other abstracts potentially being affected by tainted a
Array.prototype, so I guess you'd need to specify a new function.  In which
case I'm wondering if maybe you'd be better off with
`Intl.canonicalizeTags(tags)` which would cover both
CanonicalizeLanguageTag() and CanonicalizeLocaleList().

2) LookupAvailableLocales
>
> This function has almost identical heuristic to LookupSupportedLocales [3]
> with a single difference being in step d).
>
> Replace:
>  - "If *availableLocale* is not *undefined*, then append *locale* to the
> end of *subset*. "
> with:
>  - "If *availableLocale* is not *undefined*, then append *availableLocale*to the end of
> *subset*. "
>
> The reason behind this is that localization frameworks need to choose the
> >>available<< locales that closest match the user preferences. If we used
> LookupSupportedLocales, we will receive the locales that user requested,
> not ones that are available on the system.
> In result on each of those, we'd have to call BestAvailableLocale [4] to
> receive the tag name that we can pull resources for.
>

You can at least work around this for a single locale with
Intl.NumberFormat(tag).resolvedOptions().locale.  If you're already using
the native localisation APIs, this might not be too much of a hindrance.
 What you're suggesting would need to be a function property of the
constructors, e.g. `Intl.NumberFormat.availableLocalesOf()`.  I'm not so
sure this approach makes sense, though; wouldn't you still have a problem
if your own API provided variant data where the system does not?

>
> With that one change, we are actually going to receive the right set of
> language tags that we can then use to provide best language with fallbacks.
>
> Example implementation of this is L20n localization framework [5] which
> copies Mozilla ECMA 402 code to expose the required functions and uses
> custom function called prioritizeLocales to build the final locale fallback
> chain.
>
> Comments? Feedback? Next steps? :)
>
> Cheers,
> g.
> --
>
> Mozilla (http://www.mozilla.org)
>
> [1] http://ecma-international.org/ecma-402/1.0/index.html#sec-6.2.3
> [2] http://ecma-international.org/ecma-402/1.0/index.html#sec-9.2.1
> [3] http://ecma-international.org/ecma-402/1.0/index.html#sec-9.2.6
> [4] http://ecma-international.org/ecma-402/1.0/index.html#sec-9.2.2
> [5] https://github.com/l20n/l20n.js/blob/master/lib/l20n/intl.js#L431
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130713/a1e677c8/attachment.html>

# André Bargull (12 years ago)

I was thinking the same thing recently, at least for CanonicalizeLanguageTag. I was working with a platform that gave me a language tag in non-canonical form, meaning I had to either canonicalize it or rename my language files to match the same non-canonical form. Exposing it as Intl.canonicalizeLanguageTag(tag) seems like a good idea.

Only exposing CanonicalizeLanguageTag does not seem useful to me without having access to IsStructurallyValidLanguageTag. Most likely a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is necessary/wanted for most use cases.

I don't think you could expose CanonicalizeLocaleList directly without altering it to return an array, you'd have to do something similar to step 5 of LookupSupportedLocales. I'm not sure we could change that function in the spec without other abstracts potentially being affected by tainted a Array.prototype, so I guess you'd need to specify a new function. In which case I'm wondering if maybe you'd be better off with Intl.canonicalizeTags(tags) which would cover both CanonicalizeLanguageTag() and CanonicalizeLocaleList().

I don't see why you'd need to change CanonicalizeLocaleList at all. Just let it return the internal list as-is, and then define Intl.canonicalizeLocaleList like so:

Intl.canonicalizeLocaleList(locales):

Let canonicalizedLocaleList be the result of CanonicalizeLocaleList(locales).
ReturnIfAbrupt(canonicalizedLocaleList).
Return CreateArrayFromList(canonicalizedLocaleList).

(ReturnIfAbrupt and CreateArrayFromList are defined in ES6 as internal abstract operations.)

It also needs to be considered whether the duplicate removal in CanonicalizeLocaleList creates any issues for users of a potential Intl.canonicalizeLocaleList or Intl.canonicalizeTags function.

> On Thu, Jul 11, 2013 at 11:52 PM, Zbigniew Braniecki <zbraniecki at mozilla.com  <https://mail.mozilla.org/listinfo/es-discuss>
> >/  wrote:
> [...]
> //
> /1) CanonicalizeLanguageTag [1]
> >/
> />/  Because language tags come from developers and users, ability to
> />/  canonicalize them is crucial to us. ECMA 402 specifies this function and
> />/  all we need is to expose it in the API
> />/
> /
> I was thinking the same thing recently, at least for
> CanonicalizeLanguageTag. I was working with a platform that gave me a
> language tag in non-canonical form, meaning I had to either canonicalize it
> or rename my language files to match the same non-canonical form.  Exposing
> it as `Intl.canonicalizeLanguageTag(tag)` seems like a good idea.

Only exposing CanonicalizeLanguageTag does not seem useful to me without 
having access to IsStructurallyValidLanguageTag. Most likely a combined 
IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is 
necessary/wanted for most use cases.

>
> >/
> />/  1.1) CanonicalizeLocaleList [2]
> />/
> />/  That would also be nice to have :)
> />/
> /
> I don't think you could expose CanonicalizeLocaleList directly without
> altering it to return an array, you'd have to do something similar to step
> 5 of LookupSupportedLocales.  I'm not sure we could change that function in
> the spec without other abstracts potentially being affected by tainted a
> Array.prototype, so I guess you'd need to specify a new function.  In which
> case I'm wondering if maybe you'd be better off with
> `Intl.canonicalizeTags(tags)` which would cover both
> CanonicalizeLanguageTag() and CanonicalizeLocaleList().

I don't see why you'd need to change CanonicalizeLocaleList at all. Just 
let it return the internal list as-is, and then define 
`Intl.canonicalizeLocaleList` like so:

Intl.canonicalizeLocaleList(locales):
1. Let canonicalizedLocaleList be the result of 
CanonicalizeLocaleList(locales).
2. ReturnIfAbrupt(canonicalizedLocaleList).
3. Return CreateArrayFromList(canonicalizedLocaleList).

(ReturnIfAbrupt and CreateArrayFromList are defined in ES6 as internal 
abstract operations.)

It also needs to be considered whether the duplicate removal in 
CanonicalizeLocaleList creates any issues for users of a potential 
`Intl.canonicalizeLocaleList` or `Intl.canonicalizeTags` function.

- André
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130713/fd4d10f1/attachment.html>

# Andy Earnshaw (12 years ago)

On Sat, Jul 13, 2013 at 1:05 PM, André Bargull <andre.bargull at udo.edu> wrote:

... Only exposing CanonicalizeLanguageTag does not seem useful to me without having access to IsStructurallyValidLanguageTag. Most likely a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is necessary/wanted for most use cases.

Hmm. I'm not sure I'd agree it's necessary. IsStructurallyValidLanguageTag makes sense as an abstract function because you need to throw accordingly when an invalid tag is passed to the constructors or methods. However, it's still the developer's responsibility to make sure their tags are valid during the development process. Canonicalisation would still throw an error if the tag is invalid.

I don't see why you'd need to change CanonicalizeLocaleList at all. Just let it return the internal list as-is, and then define Intl.canonicalizeLocaleList like so:

Lists are internal, they aren't part of the ECMAScript language. It makes no sense to return an internal list to ECMAScript code unless you intend to go the whole hog and specify them with a constructor/prototype.

It also needs to be considered whether the duplicate removal in CanonicalizeLocaleList creates any issues for users of a potential Intl.canonicalizeLocaleList or Intl.canonicalizeTags function.

Perhaps. Are there any cases you think of where removing duplicates would be a problem?

On Sat, Jul 13, 2013 at 1:05 PM, André Bargull <andre.bargull at udo.edu>wrote:

>  ...
> Only exposing CanonicalizeLanguageTag does not seem useful to me without
> having access to IsStructurallyValidLanguageTag. Most likely a combined
> IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is
> necessary/wanted for most use cases.
>

Hmm.  I'm not sure I'd agree it's necessary.
 IsStructurallyValidLanguageTag makes sense as an abstract function because
you need to throw accordingly when an invalid tag is passed to the
constructors or methods.  However, it's still the developer's
responsibility to make sure their tags are valid during the development
process.  Canonicalisation would still throw an error if the tag is invalid.

>  I don't see why you'd need to change CanonicalizeLocaleList at all. Just
> let it return the internal list as-is, and then define
> `Intl.canonicalizeLocaleList` like so:
>

Lists are internal, they aren't part of the ECMAScript language.  It makes
no sense to return an internal list to ECMAScript code unless you intend to
go the whole hog and specify them with a constructor/prototype.

> It also needs to be considered whether the duplicate removal in
> CanonicalizeLocaleList creates any issues for users of a potential
> `Intl.canonicalizeLocaleList` or `Intl.canonicalizeTags` function.
>

Perhaps.  Are there any cases you think of where removing duplicates would
be a problem?

Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130713/af66e3ec/attachment.html>

# André Bargull (12 years ago)

On 7/13/2013 8:48 PM, Andy Earnshaw wrote:

On Sat, Jul 13, 2013 at 1:05 PM, André Bargull <andre.bargull at udo.edu> wrote:

... Only exposing CanonicalizeLanguageTag does not seem useful to me without having access to IsStructurallyValidLanguageTag. Most likely a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is necessary/wanted for most use cases.

Hmm. I'm not sure I'd agree it's necessary. IsStructurallyValidLanguageTag makes sense as an abstract function because you need to throw accordingly when an invalid tag is passed to the constructors or methods. However, it's still the developer's responsibility to make sure their tags are valid during the development process. Canonicalisation would still throw an error if the tag is invalid.

CanonicalizeLanguageTag isn't even defined for non-structurally valid language tags. That's why I meant a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is more useful than access to the bare CanonicalizeLanguageTag function.

I don't see why you'd need to change CanonicalizeLocaleList at all. Just let it return the internal list as-is, and then define Intl.canonicalizeLocaleList like so:

Lists are internal, they aren't part of the ECMAScript language. It makes no sense to return an internal list to ECMAScript code unless you intend to go the whole hog and specify them with a constructor/prototype.

The internal list structure is not returned to user code instead a possible Intl.canonicalizeLocaleList function is a simple wrapper around CanonicalizeLocaleList to perform the necessary conversion from list to array. That's exactly the point of the algorithm steps in my previous mail.

It also needs to be considered whether the duplicate removal in CanonicalizeLocaleList creates any issues for users of a potential Intl.canonicalizeLocaleList or Intl.canonicalizeTags function.

Perhaps. Are there any cases you think of where removing duplicates would be a problem?

I thought about use cases when a user assumes the i-th element of the output array is the canonicalised value of the i-th element in the input array. I can't tell whether this is a valid use case - I've only implemented ECMA-402, so I know a bit about the spec, but never actually used it in an application...

On 7/13/2013 8:48 PM, Andy Earnshaw wrote:
> On Sat, Jul 13, 2013 at 1:05 PM, André Bargull <andre.bargull at udo.edu
> <mailto:andre.bargull at udo.edu>> wrote:
>
>     ...
>     Only exposing CanonicalizeLanguageTag does not seem useful to me
>     without having access to IsStructurallyValidLanguageTag. Most likely
>     a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag
>     function is necessary/wanted for most use cases.
>
>
> Hmm.  I'm not sure I'd agree it's necessary.
>   IsStructurallyValidLanguageTag makes sense as an abstract function
> because you need to throw accordingly when an invalid tag is passed to
> the constructors or methods.  However, it's still the developer's
> responsibility to make sure their tags are valid during the development
> process.  Canonicalisation would still throw an error if the tag is invalid.

CanonicalizeLanguageTag isn't even defined for non-structurally valid 
language tags. That's why I meant a combined 
IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is 
more useful than access to the bare CanonicalizeLanguageTag function.

>
>     I don't see why you'd need to change CanonicalizeLocaleList at all.
>     Just let it return the internal list as-is, and then define
>     `Intl.canonicalizeLocaleList` like so:
>
>
> Lists are internal, they aren't part of the ECMAScript language.  It
> makes no sense to return an internal list to ECMAScript code unless you
> intend to go the whole hog and specify them with a constructor/prototype.

The internal list structure is not returned to user code instead a 
possible `Intl.canonicalizeLocaleList` function is a simple wrapper 
around CanonicalizeLocaleList to perform the necessary conversion from 
list to array. That's exactly the point of the algorithm steps in my 
previous mail.

>
>     It also needs to be considered whether the duplicate removal in
>     CanonicalizeLocaleList creates any issues for users of a potential
>     `Intl.canonicalizeLocaleList` or `Intl.canonicalizeTags` function.
>
>
> Perhaps.  Are there any cases you think of where removing duplicates
> would be a problem?

I thought about use cases when a user assumes the i-th element of the 
output array is the canonicalised value of the i-th element in the input 
array. I can't tell whether this is a valid use case - I've only 
implemented ECMA-402, so I know a bit about the spec, but never actually 
used it in an application...

>
> Andy

- André

# Norbert Lindenberg (12 years ago)

On Jul 13, 2013, at 12:37 , André Bargull <andre.bargull at udo.edu> wrote:

CanonicalizeLanguageTag isn't even defined for non-structurally valid language tags. That's why I meant a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is more useful than access to the bare CanonicalizeLanguageTag function.

Correct. As currently specified, the CanonicalizeLanguageTag abstract operation assumes that its input is a String value that's a structurally valid language tag. An API cannot make such assumptions - it has to be ready to deal with any input, as well as the absence of input. It has to do something like the steps in CanonicalizeLocaleList 8.c.ii-iv before calling the current CanonicalizeLanguageTag.

Before we get too much into spec details: Do others believe that exposing API as proposed by Zbigniew would be useful?

On Jul 13, 2013, at 12:37 , André Bargull <andre.bargull at udo.edu> wrote:

> On 7/13/2013 8:48 PM, Andy Earnshaw wrote:
>> On Sat, Jul 13, 2013 at 1:05 PM, André Bargull <andre.bargull at udo.edu
>> <mailto:andre.bargull at udo.edu>> wrote:
>> 
>>   Only exposing CanonicalizeLanguageTag does not seem useful to me
>>   without having access to IsStructurallyValidLanguageTag. Most likely
>>   a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag
>>   function is necessary/wanted for most use cases.
>> 
>> 
>> Hmm.  I'm not sure I'd agree it's necessary.
>> IsStructurallyValidLanguageTag makes sense as an abstract function
>> because you need to throw accordingly when an invalid tag is passed to
>> the constructors or methods.  However, it's still the developer's
>> responsibility to make sure their tags are valid during the development
>> process.  Canonicalisation would still throw an error if the tag is invalid.
> 
> CanonicalizeLanguageTag isn't even defined for non-structurally valid language tags. That's why I meant a combined IsStructurallyValidLanguageTag + CanonicalizeLanguageTag function is more useful than access to the bare CanonicalizeLanguageTag function.

Correct. As currently specified, the CanonicalizeLanguageTag abstract operation assumes that its input is a String value that's a structurally valid language tag. An API cannot make such assumptions - it has to be ready to deal with any input, as well as the absence of input. It has to do something like the steps in CanonicalizeLocaleList 8.c.ii-iv before calling the current CanonicalizeLanguageTag.

Before we get too much into spec details: Do others believe that exposing API as proposed by Zbigniew would be useful?

Norbert

# Andy Earnshaw (12 years ago)

On Sun, Jul 14, 2013 at 2:07 AM, Norbert Lindenberg <ecmascript at lindenbergsoftware.com> wrote:

CanonicalizeLanguageTag isn't even defined for non-structurally valid language tags. That's why I meant a combined IsStructurallyValidLanguageTag

CanonicalizeLanguageTag function is more useful than access to the bare CanonicalizeLanguageTag function.

Correct. As currently specified, the CanonicalizeLanguageTag abstract operation assumes that its input is a String valueI'm not too sure about the that's a structurally valid language tag. An API cannot make such assumptions - it has to be ready to deal with any input, as well as the absence of input. It has to do something like the steps in CanonicalizeLocaleList 8.c.ii-iv before calling the current CanonicalizeLanguageTag.

You're both right, it assumes a string and doesn't check validity. That didn't occur to me, it's been a few months since my implementation.

Before we get too much into spec details: Do others believe that exposing API as proposed by Zbigniew would be useful?

I certainly do, at least for Canonicalize-. I've come across one user agent that returns navigator.language in non-canonical form which presented a small problem for data I had stored with canonical file names. This was a WebKit based Smart TV platform from 2012, so it was fairly recent, there could be other platforms or frameworks that do the same.

As for LookupAvailableLocales, there might be a problem with Zbigniew's vision of it as any tags would be returned without extensions. I'm not sure if this is something that we'd need to worry about, though.

On Sun, Jul 14, 2013 at 2:07 AM, Norbert Lindenberg <
ecmascript at lindenbergsoftware.com> wrote:

> > CanonicalizeLanguageTag isn't even defined for non-structurally valid
> language tags. That's why I meant a combined IsStructurallyValidLanguageTag
> + CanonicalizeLanguageTag function is more useful than access to the bare
> CanonicalizeLanguageTag function.
>
> Correct. As currently specified, the CanonicalizeLanguageTag abstract
> operation assumes that its input is a String valueI'm not too sure about
> the that's a structurally valid language tag. An API cannot make such
> assumptions - it has to be ready to deal with any input, as well as the
> absence of input. It has to do something like the steps in
> CanonicalizeLocaleList 8.c.ii-iv before calling the current
> CanonicalizeLanguageTag.
>

You're both right, it assumes a string and doesn't check validity.  That
didn't occur to me, it's been a few months since my implementation.

> Before we get too much into spec details: Do others believe that exposing
> API as proposed by Zbigniew would be useful?
>

I certainly do, at least for Canonicalize-.  I've come across one user
agent that returns `navigator.language` in non-canonical form which
presented a small problem for data I had stored with canonical file names.
 This was a WebKit based Smart TV platform from 2012, so it was fairly
recent, there could be other platforms or frameworks that do the same.

As for LookupAvailableLocales, there might be a problem with Zbigniew's
vision of it as any tags would be returned without extensions.  I'm not
sure if this is something that we'd need to worry about, though.

Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130714/0d2a8db6/attachment.html>

# Zbigniew Braniecki (12 years ago)

As for LookupAvailableLocales, there might be a problem with Zbigniew's vision of it as any tags would be returned without extensions. I'm not sure if this is something that we'd need to worry about, though.

No, that's good, because locales will be stored under names without them as well.

> As for LookupAvailableLocales, there might be a problem with Zbigniew's
> vision of it as any tags would be returned without extensions. I'm not sure
> if this is something that we'd need to worry about, though.

No, that's good, because locales will be stored under names without them as well.

Cheers,
zb.

# Andy Earnshaw (12 years ago)

Would you expect to support the same locales as Intl constructors in your library? Can you safely make that assumption?

Canonicalisation makes sense because I would expect a library to canonicalise the tag and then try and load the file containing relevant data whether the native API supports it or not. Forgive me if I'm misunderstanding something, I didn't have a look at your project in great detail.

Would you expect to support the same locales as Intl constructors in your
library?  Can you safely make that assumption?

Canonicalisation makes sense because I would expect a library to
canonicalise the tag and then try and load the file containing relevant
data whether the native API supports it or not. Forgive me if I'm
misunderstanding something, I didn't have a look at your project in great
detail.

Andy
On 15 Jul 2013 16:49, "Zbigniew Braniecki" <zbraniecki at mozilla.com> wrote:

> > As for LookupAvailableLocales, there might be a problem with Zbigniew's
> > vision of it as any tags would be returned without extensions. I'm not
> sure
> > if this is something that we'd need to worry about, though.
>
> No, that's good, because locales will be stored under names without them
> as well.
>
> Cheers,
> zb.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130715/5210bb35/attachment.html>

# Zbigniew Braniecki (12 years ago)

Would you expect to support the same locales as Intl constructors in your library?

Yes.

Can you safely make that assumption?

I'd have to think more about edge cases, but my initial reaction is - yes.

Canonicalisation makes sense because I would expect a library to canonicalise the tag and then try and load the file containing relevant data whether the native API supports it or not. Forgive me if I'm misunderstanding something, I didn't have a look at your project in great detail.

There's no need to look at my project. All I'm asking is to talk about exposing the API for negotiating between locales provided by the application and locales requested by the user with the result being the list of available locales that the user wants sorted by the user preference.

That enables us to load the locale 0 and fallback to locale 1 and then to locale 2 etc.

The only crucial point here is that we need to operate on the list of available locales, not requested, because we will be selecting from the available ones.

----- Original Message -----
> Would you expect to support the same locales as Intl constructors in your
> library?

Yes.

>  Can you safely make that assumption?

I'd have to think more about edge cases, but my initial reaction is - yes.


> Canonicalisation makes sense because I would expect a library to
> canonicalise the tag and then try and load the file containing relevant
> data whether the native API supports it or not. Forgive me if I'm
> misunderstanding something, I didn't have a look at your project in great
> detail.

There's no need to look at my project. All I'm asking is to talk about exposing the API for negotiating between locales provided by the application and locales requested by the user with the result being the list of available locales that the user wants sorted by the user preference.

That enables us to load the locale 0 and fallback to locale 1 and then to locale 2 etc.

The only crucial point here is that we need to operate on the list of available locales, not requested, because we will be selecting from the available ones.

Cheers,
g.

# Anne van Kesteren (12 years ago)

On Sun, Jul 14, 2013 at 5:20 AM, Andy Earnshaw <andyearnshaw at gmail.com> wrote:

I certainly do, at least for Canonicalize-. I've come across one user agent that returns navigator.language in non-canonical form which presented a small problem for data I had stored with canonical file names. This was a WebKit based Smart TV platform from 2012, so it was fairly recent, there could be other platforms or frameworks that do the same.

FWIW, exposing a new API because another API is broken in a particular implementation is a known anti-pattern. We should fix problems at the source.

On Sun, Jul 14, 2013 at 5:20 AM, Andy Earnshaw <andyearnshaw at gmail.com> wrote:
> I certainly do, at least for Canonicalize-.  I've come across one user agent
> that returns `navigator.language` in non-canonical form which presented a
> small problem for data I had stored with canonical file names.  This was a
> WebKit based Smart TV platform from 2012, so it was fairly recent, there
> could be other platforms or frameworks that do the same.

FWIW, exposing a new API because another API is broken in a particular
implementation is a known anti-pattern. We should fix problems at the
source.


--
http://annevankesteren.nl/

# Andy Earnshaw (12 years ago)

On Mon, Jul 15, 2013 at 9:37 PM, Anne van Kesteren <annevk at annevk.nl> wrote:

On Sun, Jul 14, 2013 at 5:20 AM, Andy Earnshaw <andyearnshaw at gmail.com> wrote:

I certainly do, at least for Canonicalize-. I've come across one user agent that returns navigator.language in non-canonical form which presented a small problem for data I had stored with canonical file names. This was a WebKit based Smart TV platform from 2012, so it was fairly recent, there could be other platforms or frameworks that do the same.

FWIW, exposing a new API because another API is broken in a particular implementation is a known anti-pattern. We should fix problems at the source.

Normally, I would agree. However, I was just using my scenario as an example for where exposing the API would have been useful for me. I can also think of a few other reasons:

Language tags can be in extlang form or canonical form. Depending on the source providing the language tag, it's not guaranteed to be the canonical form (extlang form can reinstate extlang subtags that were removed during canonicalisation).
The Internationalization API doesn't cover all aspects of its namesake, like translation, or formatting of postal codes or telephone numbers, as a few examples. Developer libraries could augment Intl with this data, so it would make lives easier if we exposed CanonicalizeLanguageTag to be used by such libraries.
Canonicalisation has at least a couple of optional steps (like normalising case or ordering variant subtags) so exposing a canonicalizing method would give developers a way to achieve consistency with the Internationalisation API.

navigator.language isn't part of any stable specification, and even the current HTML 5.1 draft doesn't specify that tags should be returned in canonical form. Do you think it would be a good idea to raise an issue for this?

Andy

On Mon, Jul 15, 2013 at 9:37 PM, Anne van Kesteren <annevk at annevk.nl> wrote:

> On Sun, Jul 14, 2013 at 5:20 AM, Andy Earnshaw <andyearnshaw at gmail.com>
> wrote:
> > I certainly do, at least for Canonicalize-.  I've come across one user
> agent
> > that returns `navigator.language` in non-canonical form which presented a
> > small problem for data I had stored with canonical file names.  This was
> a
> > WebKit based Smart TV platform from 2012, so it was fairly recent, there
> > could be other platforms or frameworks that do the same.
>
> FWIW, exposing a new API because another API is broken in a particular
> implementation is a known anti-pattern. We should fix problems at the
> source.
>

Normally, I would agree.  However, I was just using my scenario as an
example for where exposing the API would have been useful for me.  I can
also think of a few other reasons:

 - Language tags can be in extlang form or canonical form.  Depending on
the source providing the language tag, it's not guaranteed to be the
canonical form (extlang form can reinstate extlang subtags that were
removed during canonicalisation).
 - The Internationalization API doesn't cover all aspects of its namesake,
like translation, or formatting of postal codes or telephone numbers, as a
few examples.  Developer libraries could augment Intl with this data, so it
would make lives easier if we exposed CanonicalizeLanguageTag to be used by
such libraries.
 - Canonicalisation has at least a couple of optional steps (like
normalising case or ordering variant subtags) so exposing a canonicalizing
method would give developers a way to achieve consistency with the
Internationalisation API.

navigator.language isn't part of any stable specification, and even the
current HTML 5.1 draft doesn't specify that tags should be returned in
canonical form.  Do you think it would be a good idea to raise an issue for
this?

Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130716/6139cc0f/attachment.html>

# Anne van Kesteren (12 years ago)

On Mon, Jul 15, 2013 at 7:51 PM, Andy Earnshaw <andyearnshaw at gmail.com> wrote:

navigator.language isn't part of any stable specification, and even the current HTML 5.1 draft doesn't specify that tags should be returned in canonical form. Do you think it would be a good idea to raise an issue for this?

Filed www.w3.org/Bugs/Public/show_bug.cgi?id=22681

-- annevankesteren.nl

On Mon, Jul 15, 2013 at 7:51 PM, Andy Earnshaw <andyearnshaw at gmail.com> wrote:
> navigator.language isn't part of any stable specification, and even the
> current HTML 5.1 draft doesn't specify that tags should be returned in
> canonical form.  Do you think it would be a good idea to raise an issue for
> this?

Filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=22681


--
http://annevankesteren.nl/

# Ian Hickson (12 years ago)

On Tue, 16 Jul 2013, Andy Earnshaw wrote:

navigator.language isn't part of any stable specification

It's part of the HTML standard:

whatwg.org/html/#language-preferences

...which is very stable at this point (there's basically no way that part of the spec can change in an incompatible fashion, since it's widely implemented; the only possible changes are those that approach reality more, and those that add features).

and even the current HTML 5.1 draft doesn't specify that tags should be returned in canonical form. Do you think it would be a good idea to raise an issue for this?

Fixed. (A change that approaches reality more.)

On Tue, 16 Jul 2013, Andy Earnshaw wrote:
> 
> navigator.language isn't part of any stable specification

It's part of the HTML standard:

   http://whatwg.org/html/#language-preferences

...which is very stable at this point (there's basically no way that part 
of the spec can change in an incompatible fashion, since it's widely 
implemented; the only possible changes are those that approach reality 
more, and those that add features).

> and even the current HTML 5.1 draft doesn't specify that tags should be 
> returned in canonical form.  Do you think it would be a good idea to 
> raise an issue for this?

Fixed. (A change that approaches reality more.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

# Zbigniew Braniecki (12 years ago)

Anne van Kesteren <mailto:annevk at annevk.nl> July 15, 2013 1:37 PM

FWIW, exposing a new API because another API is broken in a particular implementation is a known anti-pattern. We should fix problems at the source.

Good point, but I believe that there are more potential sources of language tags passed to language negotiation, including programmed composition, feeding from unknown sources (databases etc.), or even manually entered by the user.

Having a function that enables us to canonicalize it (even the simplest part of that - upper/lower cases) allows to use compare operators (langTag1 == langTag2), or, in localization case, allows us to build a path to the resource on case sensitive systems.

> Anne van Kesteren <mailto:annevk at annevk.nl>
> July 15, 2013 1:37 PM
>
> FWIW, exposing a new API because another API is broken in a particular
> implementation is a known anti-pattern. We should fix problems at the
> source.
>
Good point, but I believe that there are more potential sources of 
language tags passed to language negotiation, including programmed 
composition, feeding from unknown sources (databases etc.), or even 
manually entered by the user.

Having a function that enables us to canonicalize it (even the simplest 
part of that - upper/lower cases) allows to use compare operators 
(langTag1 == langTag2), or, in localization case, allows us to build a 
path to the resource on case sensitive systems.

Cheers,
g.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130717/f87442cd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1313 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130717/f87442cd/attachment.jpg>