Quasi-literals and localization

# Norbert Lindenberg (13 years ago)

The quasi-literal proposal discusses at some length text localization [1]. After reading the section and then realizing that most of the discussed functionality is not actually part of the proposal, I'd like to ask that the discussion of localization in this context be removed.

  1. The discussion of the msg function is very incomplete with to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].

  2. Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those built with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.

  3. The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.

, Norbert

[1] harmony:quasis [2] docs.google.com/present/view?id=ddsrrpj5_68gkkvv6hs [3] globalization:specification_drafts

# Gillam, Richard (13 years ago)

I second Norbert's concerns. I think there's something here, but how all this should work from the standpoint of internationalization isn't fully baked.

--Rich Gillam Lab126

# Mike Samuel (13 years ago)

2012/7/2 Norbert Lindenberg <ecmascript at norbertlindenberg.com>:

The quasi-literal proposal discusses at some length text localization [1]. After reading the section and then realizing that most of the discussed functionality is not actually part of the proposal, I'd like to ask that the discussion of localization in this context be removed.

The goal of the quasi-literal proposal is not to define any APIs for localization but only to show that it could benefit a range of localization proposals.

  1. The discussion of the msg function is very incomplete with to the current state of the art in internationalized message construction. Modern message formatting libraries include support for plural and gender handling, which is not discussed here [2], and offer far more comprehensive number and date formatting than discussed here. The msg function also doesn't integrate with the ECMAScript Internationalization API [3].

Yep. At the time the Quasi proposal was written, that was very much in flux. I chatted with Nebosja Ciric, Mark Davis and others last March and they were planning on contributing to another proposal so I just focused on explaining how a message extraction -> localization ->

message reintegration pipeline could work with messages in quasis, and showing how the various concerns like l10n and security could compose.

  1. Quasi-literals are based on the assumption that the pattern strings are normally embedded in the source code. In internationalization, that's called "hard-coded strings" and generally considered a really bad idea. Normally, you want to separate localizable text and data from the source code, so that localization can proceed and languages can be added without changes to the code. I'm aware that some companies are using a localization process that involves generating localized source files with embedded strings. This may be viable for web application where the code is hosted on a server and sent to the browser for execution each time the application is started. I don't see how it would work for applications that actually run on the server (e.g., within Node.js) and where the server has to provide localized responses in different languages for each request. I don't think it's viable either for applications that are made available through an application store (e.g., those buil t with PhoneGap or Titanium) and which have to include support for multiple applications with minimal download size.

What about quasis makes dynamically choosing a message bundle based on the locale of the current request and substituting for the source language message particularly difficult?

  1. The workaround discussed to support purely dynamic message replacement (which I'd consider the normal path from an internationalization point of view) seems to be duplicating much of the work involved in interpreting quasi-literals, but loses the variable names on the way, and doesn't permit passing through variables that aren't used in the original quasi-literal. The latter is necessary, for example, in the context of gender handling: Sentences may be gender dependent in some localized languages while not in the source language.

I'm a bit confused. To which variable name are you referring?

# Mark Davis ☕ (13 years ago)

I didn't pay enough attention to the whole quasi structure, so I can't pretend to speak intelligently about that.

We do support a number of different mechanisms for string translation, many of them extracting strings from source files (including templating source languages like jsps, soy (aka closure templates). Message formats are not trivial, especially when it comes to plural and gender support. In some of these environments, features like plurals and gender are built into the syntax of the language (like soy, see closure-templates.googlecode.com/svn/trunk/javadoc-complete/com/google/template/soy/msgs/restricted/IcuSyntaxUtils.htmlalthough the formatting is very ugly). In others, the programmer has to call a function to interpret a string representation of the message format (which could be fetched from an external source).

That can all work fine, with some provisos; there have to be straightforward programmatic ways to:

  1. determine which are the messages in the file that need to be translated (with some mechanism to skip those that shouldn't be translated, like a literal 'http://'.)
  2. determine the structure of all of the embedded message format strings and map into a 'lingua franca' structure for translation (we use an XML structure).
  3. carry message descriptions along: these are descriptions of the entire message, plus meaningful names and descriptions and examples of each placeholder value.

I don't know whether the quasi structure makes that easier or harder. I also haven't seen any examples where someone has taken a reasonably complex file with quasi messages (containing plurals, gender, dates, times, numbers, multiple placeholders with different orders, etc), extracted the messages for translation (including all of #3), translated, and regenerated the alternate language version of the original file (or a modified version that uses methods to get the translated strings).

I'm not saying it can't be done, or that it hasn't been done (it could well have been); just that I personally haven't seen it.


Mark plus.google.com/114199149796022210033 * * — Il meglio è l’inimico del bene — **