A proposal to add String.prototype.format

# Shanjian Li (15 years ago)

EcmaScript lacks a method to format strings in a flexible and controllable manner. Most EcmaScript strings are constructed by concatenating a series of substrings. Such practice really hurts code readability. Especially for localization, it is almost impossible to translate the string when it is split into multiple pieces. This problem has been identified long before. Brendan Eich proposed something in 2006 for ECMA 3 (discussiondiscussion:string_formatting).

Mike Samuel’s quasisstrawman:quasisand

Douglas Crockford’s string_formatstrawman:string_formateach

proposed a solution as well. This proposal references those proposals, and borrows many ideas introduced by Python ( www.python.org/dev/peps/pep-3101). This proposal also applies lessons learned in Localization (l10n) and Internationalization (i18n) practice, both in Javascript and other languages. strawman:string_format_take_two

Please kindly review the proposal and let me know your feedback.

shanjian

# Mark S. Miller (15 years ago)

[+msamuel]

I don't understand. I see that this proposal references quasis, but I don't see how it subsumes the safety quasis provide against quoting confusions, e.g., that lead to XSS and other injection vulnerabilities. What am I missing?

# Lasse Reichstein (15 years ago)

On Wed, 09 Mar 2011 01:21:09 +0100, Shanjian Li <shanjian at google.com>

wrote:

strawman:string_format_take_two

Please kindly review the proposal and let me know your feedback.

Just some nitpciking:

It doesn't specify how to print objects, except for %s, which says that if
the argument is not a string, convert it to string using .toString(). The string conversion should probably use the internal ToString function
instead (which works for null and undefined too). For formats expecting numbers, it should convert the argument to a number
using ToNumber.

Rounding is specified as "math.round(n - 0.5)" (capital M in Math?). This leaves it open whether overwriting Math.round should change the
behavior of format. It probably shouldn't (i.e., again it would be better to specify in terms of internal,
non-use-modifiable functions). The rounding is equivalent to Math.floor(n) (aka round towards -Infinity),
if I'm not mistaken, so why not just use that? (Personally I would prefer truncation (round towards zero), if conversion
to integer is necessary).

Why can octal, binary and hexidecimal forms only be used on integers?
Number.prototype.toString with an argument works on fractions too (try Math.PI.toString(13) for laughs :).

Why only fixed bases (2,8,10,16)? How about adding an optional base
parameter to number display (with x, d, o, b as shorthands for the more standard bases)? Again,
Number.prototype.toString means that it's already in the language. (I know that step 7 says copy the format of other
languages, but that seems shortsighted since ECMAScript is not those languages, and only copying
functionality from C verbatim seems like tying your shoelaces together before the race).

"Placeholder used in format specifier part can not have format specifier.
This prevent the replacement from embedding more than one level." Should that be "... can not have a placeholder."? If the placeholder value is not a string, it should be converted to a
string. If it is not a valid format, what happens then?

Is the following valid: "{x} and {1[y]}".format({x:42},{y:37}) I.e., can object property shorthands ({x} instead of {0[x]}) be used if
there are more than one argument?

And some arbitrary ideas for extension:

How about a boolean test that checks for falsy-ness of the argument and
acts as one of two other formats or literals? E.g. "{0:s} drew {1:?his|her} gun.".format(person.name, person.isMale) "Please press return{0:?.|{1}}".format(notCritical, " and run!")

Or allow computed indices? "{0[{1}][he]} drew {0[{1}][his]} gun.".format({male:{he:"He",his:"his"},
female:{he:"She",his:"her"}}, "female");

# Shanjian Li (15 years ago)

Right. I didn't give much thought about possible XSS and other injection vulnerabilities. I am open to idea about how this thing can be misused and if anything can be done about it.

The purpose of this proposal is to provide a way for developer to conveniently construct a string, and for translator to be able to translate a message. I read through quasis proposal (and one more time just now), but don't feel that's a good solution for this type of problems.

shanjian

# Shanjian Li (15 years ago)

Comment inline.

On Wed, Mar 9, 2011 at 12:37 AM, Lasse Reichstein < reichsteinatwork at gmail.com> wrote:

On Wed, 09 Mar 2011 01:21:09 +0100, Shanjian Li <shanjian at google.com> wrote:

strawman:string_format_take_two

Please kindly review the proposal and let me know your feedback.

Just some nitpciking:

It doesn't specify how to print objects, except for %s, which says that if the argument is not a string, convert it to string using .toString().

If the format specifier does not apply to the argument given, it should raise exceptions. Except string conversion, no other conversion will be done.

The string conversion should probably use the internal ToString function instead (which works for null and undefined too).

Agree.

For formats expecting numbers, it should convert the argument to a number using ToNumber.

Probably not. As string is the thing being constructed, it make sense to offer "hidden" string conversion. In my experience using this feature in Python, it is within expectation and offer some convenience. Any further "hidden" conversion should really be avoided.

Rounding is specified as "math.round(n - 0.5)" (capital M in Math?).

Right, thanks.

This leaves it open whether overwriting Math.round should change the behavior of format. It probably shouldn't (i.e., again it would be better to specify in terms of internal, non-use-modifiable functions).

Agree.

The rounding is equivalent to Math.floor(n) (aka round towards -Infinity), if I'm not mistaken, so why not just use that?

In this example, 8 / (3 - 8 / 3) , the display will be 23.99999999999999. So the internal representation could be a little bit more or a little bit less than the theoretical value due to float precision. Math.round might generate less surprise results than Math.floor. Of cause, the internal implementation shouldn't rely on either of these two.

(Personally I would prefer truncation (round towards zero), if conversion to integer is necessary).

Why can octal, binary and hexidecimal forms only be used on integers? Number.prototype.toString with an argument works on fractions too (try Math.PI.toString(13) for laughs :).

Why only fixed bases (2,8,10,16)? How about adding an optional base parameter to number display (with x, d, o, b as shorthands for the more standard bases)? Again, Number.prototype.toString means that it's already in the language. (I know that step 7 says copy the format of other languages, but that seems shortsighted since ECMAScript is not those languages, and only copying functionality from C verbatim seems like tying your shoelaces together before the race).

The question for both questions is how useful is that. If it is only needed

in one or few rare occasions, it is probably not a good idea to complicate the language.

"Placeholder used in format specifier part can not have format specifier. This prevent the replacement from embedding more than one level." Should that be "... can not have a placeholder."?

No. The former prevent any format specifier (including embedded placeholder). Refer to the Python specification, it does make sense.

If the placeholder value is not a string, it should be converted to a string. If it is not a valid format, what happens then?

Raise exception?

Is the following valid: "{x} and {1[y]}".format({x:42},{y:37}) I.e., can object property shorthands ({x} instead of {0[x]}) be used if there are more than one argument?

Good points. Possible choices:

  1. {x} always refer to the first object given.
  2. {x} only works when there is one and only one object argument.
  3. {x} will be replaced by the first object that has property x, ie. the following should work too. "{x}, {z} and {1[y]}".format({x:42}, {z:43, y:37})

I prefer 1.

And some arbitrary ideas for extension:

How about a boolean test that checks for falsy-ness of the argument and acts as one of two other formats or literals? E.g. "{0:s} drew {1:?his|her} gun.".format(person.name, person.isMale) "Please press return{0:?.|{1}}".format(notCritical, " and run!")

Interesting. In example 1, the issue is literal get into the placeholder, that could make things messy.

Or allow computed indices? "{0[{1}][he]} drew {0[{1}][his]} gun.".format({male:{he:"He",his:"his"}, female:{he:"She",his:"her"}}, "female");

Allow embedded placeholder inside the field part (not the format specifier part) of a placeholder is something that I will be very cautious about.

shanjian

# P T Withington (15 years ago)

On 2011-03-09, at 13:20, Shanjian Li wrote:

It doesn't specify how to print objects, except for %s, which says that if the argument is not a string, convert it to string using .toString().

If the format specifier does not apply to the argument given, it should raise exceptions. Except string conversion, no other conversion will be done.

Disagree. Since ECMAScript knows the type of the arguments, it does not need the format specifier to tell it the type (as C does). Apparent mismatches should be left open as extensions. For example, the x formatter should simply specify that numeric values should be expressed in base 16, not that the value must be a number. That way, you could pass an Array of numbers to x and see the numbers in base 16.

# Shanjian Li (15 years ago)

It is ok for a give format specifier to apply to multiple type of object. In your example, 'x' is applied to an array of numbers. But the language interpreter should not do a hidden conversion to make it applicable. For example, "{0:x}".("48") should throw an exception instead of trying to do a hidden "toNumber()".

shanjian

# Oliver Hunt (15 years ago)

Implicit function calls within string formatting operations seem like the sort of concept that is likely to lead to security problems on websites and the like.

This isn't a matter of "can the engine do this safely" it's a question of whether the author expects arbitrary code execution to occur when they do

String.format("Someone we don't trust left this amazing comment: %s", somethingUntrusted)

For instance ES5 killed off implicit function calls in object and array literals (through accessors on the prototype chain) due to the potential for unsafe operations (namely data leakage) to occur in code that looked "safe".

# Bob Nystrom (15 years ago)

It doesn't specify how to print objects, except for %s, which says that if

the argument is not a string, convert it to string using .toString().

If the format specifier does not apply to the argument given, it should raise exceptions. Except string conversion, no other conversion will be done.

I like your first six points, but the formatting string stuff feels odd to me: neither dynamic nor object-oriented. C needs format specifiers because it doesn't otherwise know now to interpret the bits you give. ES doesn't have that problem.

At the same time, baking a rigid set of formatting instructions into string.format() feels like a poor separation of concerns. string.format()'s just is to compose a string out of smaller pieces. Why should it need to know anything about numbers or dates?

Could we just say that this:

"hi, {0:blah}.".format(someObj);

Is (conceptually) desugared to:

("hi, " + someObj.toFormat("blah") + ".")

So anything after the ":" in an argument (the format string) gets passed to the object itself by way of a call to toFormat() (or some other method name) on it. Then each object can decide what format strings are appropriate for it.

This keeps the responsibilities separate: string.format() does composition, and the composed object own their own formatting. It's also extensible: you can define your own formatting capabilities for your types and use them with string.format() by defining toFormat(). (By default, I would assume that Object.prototype.toFormat() just calls toString()).

This, I think, helps with locale issues too. Types like Date that care about locale will be able to handle it themselves in their call to toFormat()without string.format() needing to deal with it.

# Shanjian Li (15 years ago)

I like this idea. I thought a lot about how to support those locale specific stuff like plural and gender. Your suggestion provide an elegant way to transfer the responsibility to a more appropriate party.

shanjian

# Christian Mayer (15 years ago)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

Am 09.03.2011 19:48, schrieb P T Withington:

Disagree. Since ECMAScript knows the type of the arguments, it does not need the format specifier to tell it the type (as C does).

Perhaps not for the data type - but for the representation of that data type. A plead for format specification was sent to the list by me last weekend.

CU, Christian

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux)

iEYEAREIAAYFAk139cYACgkQoWM1JLkHou2eewCeL9ZtcKEyN2xrWASGXxaHlnQ2 XnQAn2olGyiKxaLLEZqQAdqXjPZaXr9o =Ulpk -----END PGP SIGNATURE-----

# Adam Shannon (15 years ago)

I rather like the idea of having this syntax for string formatting:

"name: {name:first} {name.last}".format(name)

It allows for more complex operations

"name: {person.firstName} \nstart: {myEvent.startTime}".format(myEvent, person)

Also, it doesn't require mundane fixes later on and keeps things simple for the developer. (No needed knowledge or maintenance of things based on position.)

# Gillam, Richard (15 years ago)

It seems worth mentioning that this functionality sounds an awful lot like what MessageFormat does, and MessageFormat was in the i18n strawman (and I have it in my own i18n implementation). It doesn't seem like we need two different "formatted string" APIs.

--Rich Gillam Lab126

# Shawn Steele (15 years ago)

I would postpone any formatting stuff until the i18n stuff was better understood.

  • Shawn

  blogs.msdn.com/shawnste

# Nebojša Ćirić (15 years ago)

This proposal intentionally avoids i18n issues and focuses on general formatting. One can still use i18n features from our API like so:

var locale = new LocaleInfo(); var df = locale.dateTimeFormat();

"I got married on {0}".format(df.("12/03/2001"));

MessageFormat was focusing on plurals and gender, and we couldn't reach consensus on actual message format and scope, so we postponed it. Also, we aim to avoid adding i18n API objects into core language at this moment.

  1. март 2011. 14.53, Shawn Steele <Shawn.Steele at microsoft.com> је написао/ла:
# Nebojša Ćirić (15 years ago)

that was format.(df.format(...));

  1. март 2011. 15.01, Nebojša Ćirić <cira at google.com> је написао/ла:
# Gillam, Richard (15 years ago)

If we're going to ultimately have two APIs that do the same thing-- a generic one and an internationalized one-- can we at least avoid having them have gratuitously different pattern formats and styles of operation?

--Rich Gillam Lab126

On Mar 9, 2011, at 3:01 PM, Nebojša Ćirić wrote:

This proposal intentionally avoids i18n issues and focuses on general formatting. One can still use i18n features from our API like so:

var locale = new LocaleInfo(); var df = locale.dateTimeFormat();

"I got married on {0}".format(df.("12/03/2001"));

MessageFormat was focusing on plurals and gender, and we couldn't reach consensus on actual message format and scope, so we postponed it. Also, we aim to avoid adding i18n API objects into core language at this moment.

  1. март 2011. 14.53, Shawn Steele <Shawn.Steele at microsoft.com<mailto:Shawn.Steele at microsoft.com>> је написао/ла:

I would postpone any formatting stuff until the i18n stuff was better understood.

  • Shawn

  blogs.msdn.com/shawnste

# Shanjian Li (15 years ago)

That part is for sure, ie. we won't have gratuitously different pattern formats and styles of operation. As to how the internationalized one will be designed and implemented, two possibilities.

  1. I18n feature (locale specific behavior) will be implemented through extend the existing mechanism (like bob's idea, to pass formatting responsibility to object's toFormat() method).
  2. I18n feature will be implemented on top of the generic form, using it as a foundation.

shanjian

# Brendan Eich (15 years ago)

One thought, I had to share after laying it on dherman in person:

We sure could use a pure-JS implementation of the proposal, on github, so people can use it. A lot of the arguments here should be settled by beating on the anvil of an open-source, forkable implementation, and may the sharpest fork, er, sword, win.

I'd hate to be on a committee that had to pick the winner prematurely, or really, without user testing and feedback -- and yes, even forking that led to greater user testing and eventual consolidation.

Separately I think TC39 does agree that injection-attack safety is a goal, and for this reason quasis are still on our agenda. Can we combine forces or at least make proposals combine well?

# Mike Samuel (15 years ago)

2011/3/9 Brendan Eich <brendan at mozilla.com>:

One thought, I had to share after laying it on dherman in person: We sure could use a pure-JS implementation of the proposal, on github, so people can use it. A lot of the arguments here should be settled by beating on the anvil of an open-source, forkable implementation, and may the sharpest fork, er, sword, win.

I'd hate to be on a committee that had to pick the winner prematurely, or really, without user testing and feedback -- and yes, even forking that led to greater user testing and eventual consolidation. Separately I think TC39 does agree that injection-attack safety is a goal, and for this reason quasis are still on our agenda. Can we combine forces or at least make proposals combine well?

I would like that. I could've sworn I had an L10N use-case in one of the quasi docs, but I can't find it now.

As far as I can tell, I think, if quasis are available, all of the functionality Shanjian describes becomes easier to implement. Formatting meta data can already be attached as described in the quasi proposal, and message extraction becomes more reliable.

# Mike Samuel (15 years ago)

2011/3/9 Brendan Eich <brendan at mozilla.com>:

One thought, I had to share after laying it on dherman in person: We sure could use a pure-JS implementation of the proposal, on github, so people can use it. A lot of the arguments here should be settled by beating on the anvil of an open-source, forkable implementation, and may the sharpest fork, er, sword, win.

I implemented a chunk of String.format while looking into Brendan's question : whether quasis & String.format are competing or complementary. I think the answer is they are mostly complementary with a small bit of overlap over the interface.

Take a look at a very rough draft of code.google.com/p/js-quasis-libraries-and-repl/source/browse/trunk/js/messageQuasi.js#153 for implementations of the format methods for Number.prototype, Date.prototype, and String.prototype; and I exposed these via a msg quasi in the playground I sent around recently.

I didn't implement String.prototype.format because I think that part does conflict with quasis and is poor security-wise. It opens up all the same problems that dynamic format strings to sprintf do.

I also changed the method name to formatToString from toFormat since the latter suggests to me that its result should be an instance of the nominal type Format, which is not the case.