String.prototype.padLeft / String.prototype.padRight

# Norbert Lindenberg (9 years ago)

The TC39 meeting in September discussed a proposal to add padLeft and padRight methods to String; the relevant section of the meeting notes [1] is attached below. A revised proposal is on the agenda for the upcoming November meeting [2].

Comments on this proposal:

  1. The proposal notes that “string padding functions exist in a majority of websites and frameworks”. Unfortunately, it doesn’t provide information on the use cases for which applications need these functions. Without use cases, it’s hard to tell whether the proposed methods are adequate, overengineered, underpowered, or misdirected.

  2. The discussion about the unit used for the length arguments can’t be resolved without knowledge of the use cases to be addressed. I can imagine use cases where code units are appropriate, others where code points are appropriate, yet others where tailored grapheme clusters [3] are required, and finally some where padding is required based on the width when rendered with a given font. I don’t know which ones are prevalent in applications in general, and the proposal doesn’t say. If the ones involving font rendering are the most common ones, I’d suggest that the API is not appropriate for ECMAScript because ECMAScript doesn’t deal with fonts.

  3. The names of the methods are inappropriate. “Left” and “right” are visual directions, and whether the first or the last characters in a string (or some characters in the middle) are displayed to the left or the right, is decided by the Unicode Bidi Algorithm. Padding, as proposed here, is an operation on logical strings, so “padStart” and “padEnd” should be used. This aligns with the existing startsWith and endsWidth method. Aligning with trimRight/trimLeft is misguided, because these names are equally wrong (they had to be kept because browsers already shipped with these names) [4].

  4. Step 10 in the proposed method specifications (“Let truncatedStringFiller be a new String value consisting of repeated concatenations of filler truncated to length fillLen.”) is broken for the case where fillString contains one or more supplementary characters. I can’t imagine a use case where an unpaired surrogate in the result string is the right outcome.

, Norbert

[1] esdiscuss/2015-October/044354 [2] tc39/proposal-string-pad-left-right [3] www.unicode.org/reports/tr29 [4] esdiscuss/2015-July/thread.html#43667

# Jordan Harband (9 years ago)

Please feel free to file these as issues on the proposal repo ( tc39/proposal-string-pad-left-right). However, I'll try to answer briefly here:

  1. The primary use cases in my opinion are monospaced command line output, or preformatted text output in a webpage.

  2. ECMAScript only has two concepts: code units, and code points - it does not handle grapheme clusters, and if it were to add support for this, these proposed methods could be adapted to handle them in the same way as all of the pre-existing string methods. The issue of code units versus code points has been resolved in tc39/proposal-string-pad-left-right#5 to the satisfaction of the designated spec reviewers and the spec editor.

  3. The naming is set in stone - reduceRight sets a clear precedent that "right" is "the length minus one", and has no relationship to the visual direction, and based on trimLeft and trimRight, which are already implemented in basically all browsers Thus, in ECMAScript, "left" is not considered a visual direction but rather a synonym for "start" and "index 0". The naming is also mentioned here: tc39/proposal-string-pad-left-right#naming

  4. Per your second question and tc39/proposal-string-pad-left-right#5, this is intentionally not a concern of this proposal.

It'd be much more productive and preferred to continue discussion of this specific proposal on the official repository for that proposal (in this case, tc39/proposal-string-pad-left-right).

# Claude Pache (9 years ago)

Here are my typical use cases (found by scanning uses of "str_pad" in my PHP codebase):

  • transferring data through a protocol that uses fix-length fields;
  • formatting things like date/hours, e.g. "08:00" for "8am";
  • generating filenames of fixed length, so that they sort correctly, e.g. "foo-00042.txt";
  • generating codes of fixed length (e.g. barcodes).

In all those cases, the set of characters is typically limited to ASCII or ISO-8559-1. Moreover, the filler string consists always of one ASCII character (usually " " or "0").

# Alexander Jones (9 years ago)

I see about as little use case for this as String.prototype.blink. Date/hours is better solved with zero padding formatting, not just padding out the already stringified number (think negative values -000042). Same applies to filenames for lexicographical sort. Fixed length fields in wire protocols already need to be converted to bytes first before padding, which makes the use of this feature impossible.

# Claude Pache (9 years ago)

Le 16 nov. 2015 à 09:18, Jordan Harband <ljharb at gmail.com> a écrit :

  1. The naming is set in stone - reduceRight sets a clear precedent that "right" is "the length minus one",

True, but .reduceRight is for arrays, for which there does not exist a strong natural notion of "right-to-left" that has nothing to do with order.

and has no relationship to the visual direction, and based on trimLeft and trimRight, which are already implemented in basically all browsers Thus, in ECMAScript, "left" is not considered a visual direction but rather a synonym for "start" and "index 0". The naming is also mentioned here: tc39/proposal-string-pad-left-right#naming, tc39/proposal-string-pad-left-right#naming

When we have two existing competing precedents in the language (left/right vs. starts/ends), why choose to be consistent with the bad one instead of the good one?

A plausible argument for keeping left/right would be that other languages have padLeft/padRight. However, a simple google search shows that better-named padStart and padEnd are also in use.

# ecmascript at lindenbergsoftware.com (9 years ago)

On Nov 16, 2015, at 0:18 , Jordan Harband <ljharb at gmail.com> wrote:

  1. The primary use cases in my opinion are monospaced command line output, or preformatted text output in a webpage.

These both make assumptions about font rendering: One glyph per Unicode character, each glyph the same width. ECMAScript is used in a wide range of applications where these assumptions do not hold, and baking them into API causes problems. I think API that primarily is intended for, and limited to, these use cases should be part of libraries for these use cases.

  1. ECMAScript only has two concepts: code units, and code points - it does not handle grapheme clusters, and if it were to add support for this, these proposed methods could be adapted to handle them in the same way as all of the pre-existing string methods. The issue of code units versus code points has been resolved in tc39/proposal-string-pad-left-right#5 to the satisfaction of the designated spec reviewers and the spec editor.

The conclusion on issue 5 apparently is:

  1. Some other programming languages provide functions that are partially broken, so ECMAScript can do the same.

  2. Supporting a correct solution “would greatly complicate the algorithm”.

In the two years that I attended TC39 meetings (2011-2013), such arguments were rarely made, and I don’t think they ever ended the discussion. Careful analysis of use cases and in particular of edge cases was the norm.

  1. The naming is set in stone - reduceRight sets a clear precedent that "right" is "the length minus one", and has no relationship to the visual direction, and based on trimLeft and trimRight, which are already implemented in basically all browsers Thus, in ECMAScript, "left" is not considered a visual direction but rather a synonym for "start" and "index 0". The naming is also mentioned here: tc39/proposal-string-pad-left-right#naming

A stage 1 proposal isn’t set in stone. As Claude pointed out, reduceRight is not a good precedent because arrays have no visual order. The content of strings is commonly associated with a visual order, and that visual order is often not the same as the logical order.

  1. Per your second question and tc39/proposal-string-pad-left-right#5, this is intentionally not a concern of this proposal.

Per 2) above, it is.

# Claude Pache (9 years ago)

Le 16 nov. 2015 à 14:01, Alexander Jones <alex at weej.com> a écrit :

I see about as little use case for this as String.prototype.blink. Date/hours is better solved with zero padding formatting, not just padding out the already stringified number (think negative values -000042). Same applies to filenames for lexicographical sort. Fixed length fields in wire protocols already need to be converted to bytes first before padding, which makes the use of this feature impossible.

Sure, in all those cases I could have used sprintf instead of str_pad. However, the equivalent of neither one is natively available in JS.

I could write a tagged template that does the equivalent of sprintf.... And .padLeft^H^H^H^HStart and .padEnd would be nice to have for writing more easily such a template,... oh well... :-/

# Mohsen Azimi (9 years ago)

I might be late to this but please don't use "left" and "right" for referring to start and end of a string. In right to left languages it's confusing. As someone who writes right-to-left we have enough of those "left" and "rights" based on English writing direction. CSS made this mistake but corrected it in later specs. Original box model (margin and padding) used left and right but newer flex box spec uses start and end. Because we made a mistake in the past we don't have to repeat it.

Thanks, Mohsen Azimi

# Jordan Harband (9 years ago)

In the TC39 meeting today, we discussed these concerns and decided to rename the proposal from padLeft/padRight to padStart/padEnd ( tc39/proposal-string-pad-start-end/commit/35f1ef676f692bfc1099f9ed7c123bd2146f9294)

  • and correspondingly, to investigate providing trimStart/trimEnd (alongside the legacy trimLeft/trimRight). The consensus remained the same around treatment of code units - which is that, like every other string method, they should conform to the native encoding of strings in the language.

As such, the proposal has now been approved for stage 3 in the TC39 process.

Thanks everyone for providing your input!

# Jeremy Darling (9 years ago)

Just as an aside, I think padLeft/padRight should still be added in addition to padStart/padEnd. There are times (token generation, identity padding, etc) where you would want to specify the handedness of the pad operation and ignore the locale. This goes for the trim* methods as well.

# Coroutines (9 years ago)

On Tue, Nov 17, 2015 at 1:13 PM, Jeremy Darling <jeremy.darling at gmail.com> wrote:

Just as an aside, I think padLeft/padRight should still be added in addition to padStart/padEnd. There are times (token generation, identity padding, etc) where you would want to specify the handedness of the pad operation and ignore the locale. This goes for the trim* methods as well.

I disagree with what has been decided. :p

I was in favor of {trim,pad}{Right,Left}(). The reason: Iterators.

As JS moves forward toward people creating iterators to traverse an object, referring to the start and end of something can be ambiguous. There is no confusion with *Left() or *Right().

I had assumed Start/End were chosen because it would be another move to avoid very generic names. Like how we have Array::some() instead of Array::any(), and Array::every() instead of Array::all(). Less-generic names allows people to monkey-patch without obvious collisions. I am a fan of adding to String instead of making my own XString.

# Claude Pache (9 years ago)

Le 17 nov. 2015 à 22:29, Coroutines <coroutines at gmail.com> a écrit :

There is no confusion with *Left() or *Right().

In a text string using a right-to-left script (Arabic, Hebrew, etc.), trimLeft() does confusingly "trim at the right".

# Coroutines (9 years ago)

In my opinion, if you understand that your script renders right-to-left, you would understand that it is encoded/parsed left-to-right :>

# Coroutines (9 years ago)

On Tue, Nov 17, 2015 at 2:19 PM, Coroutines <coroutines at gmail.com> wrote:

In my opinion, if you understand that your script renders right-to-left, you would understand that it is encoded/parsed left-to-right :>

On Tue, Nov 17, 2015 at 2:13 PM, Claude Pache <claude.pache at gmail.com> wrote:

Le 17 nov. 2015 à 22:29, Coroutines <coroutines at gmail.com> a écrit :

There is no confusion with *Left() or *Right().

In a text string using a right-to-left script (Arabic, Hebrew, etc.), trimLeft() does confusingly "trim at the right".

Disregard that I top-posted on my reply:

I think I'm just being weird now, you are correct: Silly right-to-left languages make that just as confusing.

# Bergi (9 years ago)

Jeremy Darling schrieb:

Just as an aside, I think padLeft/padRight should still be added in addition to padStart/padEnd. There are times (token generation, identity padding, etc) where you would want to specify the handedness of the pad operation and ignore the locale. This goes for the trim* methods as well.

I don't get that argument. Doesn't start/end exactly do that: ignore the locale? It always trims/pads at the start/end of the string, regardless whether that will be rendered to the left or right of the screen?

Bergi

# Bergi (9 years ago)

Coroutines schrieb:

I disagree with what has been decided. :p

I was in favor of {trim,pad}{Right,Left}(). The reason: Iterators.

As JS moves forward toward people creating iterators to traverse an object, referring to the start and end of something can be ambiguous. There is no confusion with *Left() or *Right().

I could not disagree more with that. Iterators don't have a "left" or "right" side, they have a start and an end? The former would be much more confusing. So when you padStart or trimStart an iterator you can be sure it will always put something in front of it or cut something at the beginning, regardless whether your iterator does iterate your array or structure backwards (right-to-left?), from left to right, top-down, bottom-up, or whatever direction you've drawn/imagined it in.

Bergi

# Coroutines (9 years ago)

On Tue, Nov 17, 2015 at 4:54 PM, Bergi <a.d.bergi at web.de> wrote:

I could not disagree more with that. Iterators don't have a "left" or "right" side, they have a start and an end? The former would be much more confusing. So when you padStart or trimStart an iterator you can be sure it will always put something in front of it or cut something at the beginning, regardless whether your iterator does iterate your array or structure backwards (right-to-left?), from left to right, top-down, bottom-up, or whatever direction you've drawn/imagined it in.

I think you are misunderstanding what I wrote.

"Iterators don't have a "left" or "right" side, they have a start and an end?"

This is my point. At that time I felt that left and right made more sense because as we see iterators pick up in use it might be confusing to see other functions elsewhere that create iterators using start/end in their names. I thought that left/right made more sense because it's very unambiguous that these are not generating functions, and left/right refers very clearly to the left or right side of the string. Someone later replied saying it would still be confusing with script that is rendered right-to-left, but encoded left-to-right.

There is no win ~ it's been decided anyway, so I'm just laying down my pitchfork.

# Norbert Lindenberg (9 years ago)

Thank you for renaming padLeft/padRight to padStart/padEnd.

On the treatment of code units, I was hoping to find more detail in the meeting minutes, but haven’t seen those yet. The native encoding of strings in the language, with the exception of a few parts that we haven’t been able to fix in EcmaScript 2015, is UTF-16, in which some characters take one code unit and others two code units. The current padStart/padEnd proposal doesn’t take that into consideration – it truncates at code unit boundaries rather than code point boundaries, and thus perpetuates problems stemming from obsolete assumptions about Unicode from 1995.

For background on the Unicode problems in pre-2015 EcmaScript see: mathiasbynens.be/notes/javascript-unicode and the proposed solution that got integrated into EcmaScript 2015: norbertlindenberg.com/2012/05/ecmascript-supplementary-characters

Thanks, Norbert

# Jordan Harband (9 years ago)

Closing the loop on this: this proposal is now stage 4 and will be included in ES 2017. tc39/ecma262#581

# Norbert Lindenberg (8 years ago)

Thanks for the update! It seems though that nothing has been done to fix the problem with supplementary characters. If this issue can’t be addressed for ES 2017, how about at least throwing an exception if fillString is longer than 1 code unit, so that a future edition can do something more useful when the use cases for these methods become clearer?

Thanks, Norbert

# Jordan Harband (8 years ago)

The time for that kind of change has long past - and the current use cases for multi-char fill strings work with the current methods. I'll repeat - "The consensus remained the same around treatment of code units - which is that, like every other string method, they should conform to the native encoding of strings in the language." - in other words, there's no "problem" with supplementary characters that needs fixing, at least at the granularity level of "specific API methods".

# Cyril Auburtin (8 years ago)

Like Alexander, I see no general use case for this, and no point in adding it. It's more or less simply ('0'.repeat(n)+value).slice(-n).

If it was to really add it, there could also be other ways like adding it to Date API (where it's mostly used) or as a more general String.format method

2016-06-01 8:29 GMT+02:00 Jordan Harband <ljharb at gmail.com>: