Quasi-literals and localization

# Norbert Lindenberg (14 years ago)

The quasi-literal proposal discusses at some length text localization [1]. After reading the section and then realizing that most of the discussed functionality is not actually part of the proposal, I'd like to ask that the discussion of localization in this context be removed.

The discussion of the msg function is very incomplete with to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].
Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those built with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.
The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.

, Norbert

[1] harmony:quasis [2] docs.google.com/present/view?id=ddsrrpj5_68gkkvv6hs [3] globalization:specification_drafts

1) The discussion of the msg function is very incomplete with regards to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].

2) Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those built with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.

3) The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.

Regards,
Norbert

[1] http://wiki.ecmascript.org/doku.php?id=harmony:quasis
[2] https://docs.google.com/present/view?id=ddsrrpj5_68gkkvv6hs
[3] http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts

# Gillam, Richard (14 years ago)

I second Norbert's concerns. I think there's something here, but how all this should work from the standpoint of internationalization isn't fully baked.

--Rich Gillam Lab126

I second Norbert's concerns.  I think there's something here, but how all this should work from the standpoint of internationalization isn't fully baked.

--Rich Gillam
  Lab126

On Jul 2, 2012, at 8:30 PM, Norbert Lindenberg wrote:

> The quasi-literal proposal discusses at some length text localization [1]. After reading the section and then realizing that most of the discussed functionality is not actually part of the proposal, I'd like to ask that the discussion of localization in this context be removed.
> 
> 1) The discussion of the msg function is very incomplete with regards to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].
> 
> 2) Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those buil
> t with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.
> 
> 3) The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.
> 
> Regards,
> Norbert
> 
> [1] http://wiki.ecmascript.org/doku.php?id=harmony:quasis
> [2] https://docs.google.com/present/view?id=ddsrrpj5_68gkkvv6hs
> [3] http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# Mike Samuel (14 years ago)

2012/7/2 Norbert Lindenberg <ecmascript at norbertlindenberg.com>:

The quasi-literal proposal discusses at some length text localization [1]. After reading the section and then realizing that most of the discussed functionality is not actually part of the proposal, I'd like to ask that the discussion of localization in this context be removed.

The goal of the quasi-literal proposal is not to define any APIs for localization but only to show that it could benefit a range of localization proposals.

The discussion of the msg function is very incomplete with to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].

Yep. At the time the Quasi proposal was written, that was very much in flux. I chatted with Nebosja Ciric, Mark Davis and others last March and they were planning on contributing to another proposal so I just focused on explaining how a message extraction -> localization ->

message reintegration pipeline could work with messages in quasis, and showing how the various concerns like l10n and security could compose.

Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those buil t with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.

What about quasis makes dynamically choosing a message bundle based on the locale of the current request and substituting for the source language message particularly difficult?

The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.

I'm a bit confused. To which variable name are you referring?

2012/7/2 Norbert Lindenberg <ecmascript at norbertlindenberg.com>:
> The quasi-literal proposal discusses at some length text localization [1]. After reading the section and then realizing that most of the discussed functionality is not actually part of the proposal, I'd like to ask that the discussion of localization in this context be removed.

The goal of the quasi-literal proposal is not to define any APIs for
localization but only to show that it could benefit a range of
localization proposals.

> 1) The discussion of the msg function is very incomplete with regards to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].

Yep.  At the time the Quasi proposal was written, that was very much
in flux.  I chatted with Nebosja Ciric, Mark Davis and others last
March and they were planning on contributing to another proposal so I
just focused on explaining how a message extraction -> localization ->
message reintegration pipeline could work with messages in quasis, and
showing how the various concerns like l10n and security could compose.


> 2) Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those buil
>  t with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.

What about quasis makes dynamically choosing a message bundle based on
the locale of the current request and substituting for the source
language message particularly difficult?


> 3) The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.

I'm a bit confused.  To which variable name are you referring?


> Regards,
> Norbert
>
> [1] http://wiki.ecmascript.org/doku.php?id=harmony:quasis
> [2] https://docs.google.com/present/view?id=ddsrrpj5_68gkkvv6hs
> [3] http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# Mark Davis ☕ (14 years ago)

I didn't pay enough attention to the whole quasi structure, so I can't pretend to speak intelligently about that.

We do support a number of different mechanisms for string translation, many of them extracting strings from source files (including templating source languages like jsps, soy (aka closure templates). Message formats are not trivial, especially when it comes to plural and gender support. In some of these environments, features like plurals and gender are built into the syntax of the language (like soy, see closure-templates.googlecode.com/svn/trunk/javadoc-complete/com/google/template/soy/msgs/restricted/IcuSyntaxUtils.htmlalthough the formatting is very ugly). In others, the programmer has to call a function to interpret a string representation of the message format (which could be fetched from an external source).

That can all work fine, with some provisos; there have to be straightforward programmatic ways to:

determine which are the messages in the file that need to be translated (with some mechanism to skip those that shouldn't be translated, like a literal 'http://'.)
determine the structure of all of the embedded message format strings and map into a 'lingua franca' structure for translation (we use an XML structure).
carry message descriptions along: these are descriptions of the entire message, plus meaningful names and descriptions and examples of each placeholder value.

I don't know whether the quasi structure makes that easier or harder. I also haven't seen any examples where someone has taken a reasonably complex file with quasi messages (containing plurals, gender, dates, times, numbers, multiple placeholders with different orders, etc), extracted the messages for translation (including all of #3), translated, and regenerated the alternate language version of the original file (or a modified version that uses methods to get the translated strings).

I'm not saying it can't be done, or that it hasn't been done (it could well have been); just that I personally haven't seen it.

Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **

I didn't pay enough attention to the whole quasi structure, so I can't
pretend to speak intelligently about that.

We do support a number of different mechanisms for string translation, many
of them extracting strings from source files (including templating source
languages like jsps, soy (aka closure templates). Message formats are not
trivial, especially when it comes to plural and gender support. In some of
these environments, features like plurals and gender are built into the
syntax of the language (like soy, see
http://closure-templates.googlecode.com/svn/trunk/javadoc-complete/com/google/template/soy/msgs/restricted/IcuSyntaxUtils.htmlalthough
the formatting is very ugly).
In others, the programmer has to call a function to interpret a string
representation of the message format (which could be fetched from an
external source).

That can all work fine, with some provisos; there have to be
straightforward programmatic ways to:

   1. determine which are the messages in the file that need to be
   translated (with some mechanism to skip those that shouldn't be translated,
   like a literal 'http://'.)
   2. determine the structure of all of the embedded message format strings
   and map into a 'lingua franca' structure for translation (we use an XML
   structure).
   3. carry message descriptions along: these are descriptions of the
   entire message, plus meaningful names and descriptions and examples of each
   placeholder value.

I don't know whether the quasi structure makes that easier or harder. I
also haven't seen any examples where someone has taken a reasonably complex
file with quasi messages (containing plurals, gender, dates, times,
numbers, multiple placeholders with different orders, etc), extracted the
messages for translation (including all of #3), translated, and regenerated
the alternate language version of the original file (or a modified version
that uses methods to get the translated strings).

I'm not saying it can't be done, or that it hasn't been done (it could well
have been); just that I personally haven't seen it.

------------------------------
Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Thu, Jul 12, 2012 at 12:20 AM, Mike Samuel <mikesamuel at gmail.com> wrote:

> 2012/7/2 Norbert Lindenberg <ecmascript at norbertlindenberg.com>:
> > The quasi-literal proposal discusses at some length text localization
> [1]. After reading the section and then realizing that most of the
> discussed functionality is not actually part of the proposal, I'd like to
> ask that the discussion of localization in this context be removed.
>
> The goal of the quasi-literal proposal is not to define any APIs for
> localization but only to show that it could benefit a range of
> localization proposals.
>
> > 1) The discussion of the msg function is very incomplete with regards to
> the current state of the art in internationalized message construction.
> Modern message formatting libraries include support for plural and gender
> handling, which is not discussed here [2], and offer far more comprehensive
> number and date formatting than discussed here. The msg function also
> doesn't integrate with the ECMAScript Internationalization API [3].
>
> Yep.  At the time the Quasi proposal was written, that was very much
> in flux.  I chatted with Nebosja Ciric, Mark Davis and others last
> March and they were planning on contributing to another proposal so I
> just focused on explaining how a message extraction -> localization ->
> message reintegration pipeline could work with messages in quasis, and
> showing how the various concerns like l10n and security could compose.
>
>
> > 2) Quasi-literals are based on the assumption that the pattern strings
> are normally embedded in the source code. In internationalization, that's
> called "hard-coded strings" and generally considered a really bad idea.
> Normally, you want to separate localizable text and data from the source
> code, so that localization can proceed and languages can be added without
> changes to the code. I'm aware that some companies are using a localization
> process that involves generating localized source files with embedded
> strings. This may be viable for web application where the code is hosted on
> a server and sent to the browser for execution each time the application is
> started. I don't see how it would work for applications that actually run
> on the server (e.g., within Node.js) and where the server has to provide
> localized responses in different languages for each request. I don't think
> it's viable either for applications that are made available through an
> application store (e.g., those bu
>  il
> >  t with PhoneGap or Titanium) and which have to include support for
> multiple applications with minimal download size.
>
> What about quasis makes dynamically choosing a message bundle based on
> the locale of the current request and substituting for the source
> language message particularly difficult?
>
>
> > 3) The workaround discussed to support purely dynamic message
> replacement (which I'd consider the normal path from an
> internationalization point of view) seems to be duplicating much of the
> work involved in interpreting quasi-literals, but loses the variable names
> on the way, and doesn't permit passing through variables that aren't used
> in the original quasi-literal. The latter is necessary, for example, in the
> context of gender handling: Sentences may be gender dependent in some
> localized languages while not in the source language.
>
> I'm a bit confused.  To which variable name are you referring?
>
>
> > Regards,
> > Norbert
> >
> > [1] http://wiki.ecmascript.org/doku.php?id=harmony:quasis
> > [2] https://docs.google.com/present/view?id=ddsrrpj5_68gkkvv6hs
> > [3]
> http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts
> > _______________________________________________
> > es-discuss mailing list
> > es-discuss at mozilla.org
> > https://mail.mozilla.org/listinfo/es-discuss
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120712/b4d6649d/attachment.html>