Line Terminators (and others) normalization in template strings

# Rick Waldron (12 years ago)

At the last TC39 face-to-face, the committee agreed to normalize CR, LF and CRLF to LF in template strings. Since then, it's become clear that LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The question to resolve is: should these also be normalized?

# François REMY (12 years ago)

What's the use case for normalizing them, and what normalization would you like to apply?

# Tab Atkins Jr. (12 years ago)

Nothing else in the platform normalizes them. Let's not.

# Brendan Eich (12 years ago)

Tab Atkins Jr. <mailto:jackalmage at gmail.com> September 25, 2013 6:31 PM

Nothing else in the platform normalizes them.

HTML parsing is not in the platform?

# Anne van Kesteren (12 years ago)

On Wed, Sep 25, 2013 at 11:10 PM, Brendan Eich <brendan at mozilla.com> wrote:

HTML parsing is not in the platform?

What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.

# Tab Atkins Jr. (12 years ago)

On Wed, Sep 25, 2013 at 8:10 PM, Brendan Eich <brendan at mozilla.com> wrote:

Tab Atkins Jr. <mailto:jackalmage at gmail.com> September 25, 2013 6:31 PM Nothing else in the platform normalizes them.

HTML parsing is not in the platform?

As Anne said, the "them" in my quote was referring to the subject of the email I was responding to - the LINE_SEPARATOR and PARA_SEPARATOR characters. HTML of course normalizes CR/LF/CRLF. (As does CSS.)

# Brendan Eich (12 years ago)

Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:41 AM

What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

# Anne van Kesteren (12 years ago)

On Thu, Sep 26, 2013 at 2:31 PM, Brendan Eich <brendan at mozilla.com> wrote:

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

Consistency with the platform, in particular for a construct useful for generating code found elsewhere in the platform, seems more important. In particular, I don't think the idea is for string templates to hold JavaScript source code, but I've seen HTML and CSS come by.

# Erik Arvidsson (12 years ago)

I vote for normalizing them too. It is less surprising and more consistent with JS code.

If someone really needs them they can always escape them

# Allen Wirfs-Brock (12 years ago)

Two thoughts below:

On Sep 26, 2013, at 11:31 AM, Brendan Eich wrote:

Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:41 AM

What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators? I know, nobody knows...

Which consistency is greater, HTML or JS standard consistency?

We presumably normalizing so that the meaning of a ES program is not sensitive to the language termination conventions of the platform used to create the source code of the program or any where along the line from creating the source code to the point it is present to the ES engine. If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.

# Mike Samuel (12 years ago)

2013/9/26 Allen Wirfs-Brock <allen at wirfs-brock.com>:

Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators? I know, nobody knows...

Great question. A simple definition of line terminator would make life harder for those of us who occasionally have to craft injection attacks to try to justify our existence.

If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.

BabblePad is one, but I don't think it's widely used.

[+norbert] Maybe the i18n people can answer whether any OS-supported input methods do it.

# Allen Wirfs-Brock (12 years ago)

correction:

normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are NOT showing up as line breaks in their editor) into a template

# Erik Arvidsson (12 years ago)

On Thu, Sep 26, 2013 at 11:42 AM, Anne van Kesteren <annevk at annevk.nl> wrote:

important. In particular, I don't think the idea is for string templates to hold JavaScript source code, but I've seen HTML and CSS come by.

I'm not sure what you are basing that on? We've gotten a lot of milage out of template literals with JS code:

google/traceur-compiler/blob/master/src/codegeneration/ClassTransformer.js#L201

# Waldemar Horwat (12 years ago)

On 09/26/2013 11:31 AM, Brendan Eich wrote:

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

Within JS, the LINE and PARA separators are treated just like CR, LF, and CRLF. They all break source lines. They're forbidden inside normal string literals. They should be treated the same inside multiline string literals.

My general view on such things is to keep them as simple as possible. Treating LINE and PARA as line terminators in normal strings but not in multiline strings would be yet another bizarre and really obscure thing for people to learn, or get tripped by if they were never taught this.