Line Terminators (and others) normalization in template strings

# Rick Waldron (12 years ago)

At the last TC39 face-to-face, the committee agreed to normalize CR, LF and CRLF to LF in template strings. Since then, it's become clear that LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The question to resolve is: should these also be normalized?

At the last TC39 face-to-face, the committee agreed to normalize CR, LF and
CRLF to LF in template strings[0]. Since then, it's become clear that
LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The question to resolve
is: should these also be normalized?


Rick



[0]
https://github.com/rwaldron/tc39-notes/blob/master/es6/2013-09/sept-17.md#58-line-terminators-in-template-strings-should-they-be-normalized
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130925/25351e99/attachment.html>

# François REMY (12 years ago)

What's the use case for normalizing them, and what normalization would you like to apply?

> At the last TC39 face-to-face, the committee agreed to normalize CR, LF  
> and CRLF to LF in template strings[0]. Since then, it's become clear  
> that LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The question to  
> resolve is: should these also be normalized? 

What's the use case for normalizing them, and what normalization would you like to apply?

# Tab Atkins Jr. (12 years ago)

Nothing else in the platform normalizes them. Let's not.

On Wed, Sep 25, 2013 at 4:05 PM, Rick Waldron <waldron.rick at gmail.com> wrote:
> At the last TC39 face-to-face, the committee agreed to normalize CR, LF and
> CRLF to LF in template strings[0]. Since then, it's become clear that
> LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The question to resolve
> is: should these also be normalized?

Nothing else in the platform normalizes them.  Let's not.

~TJ

# Brendan Eich (12 years ago)

Tab Atkins Jr. <mailto:jackalmage at gmail.com> September 25, 2013 6:31 PM

Nothing else in the platform normalizes them.

HTML parsing is not in the platform?

> Tab Atkins Jr. <mailto:jackalmage at gmail.com>
> September 25, 2013 6:31 PM
>
> Nothing else in the platform normalizes them.

HTML parsing is not in the platform?

/be

> Let's not.
>
> ~TJ
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
> Rick Waldron <mailto:waldron.rick at gmail.com>
> September 25, 2013 4:05 PM
> At the last TC39 face-to-face, the committee agreed to normalize CR, 
> LF and CRLF to LF in template strings[0]. Since then, it's become 
> clear that LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The 
> question to resolve is: should these also be normalized?
>
>
> Rick
>
>
>
> [0] 
> https://github.com/rwaldron/tc39-notes/blob/master/es6/2013-09/sept-17.md#58-line-terminators-in-template-strings-should-they-be-normalized
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# Anne van Kesteren (12 years ago)

On Wed, Sep 25, 2013 at 11:10 PM, Brendan Eich <brendan at mozilla.com> wrote:

HTML parsing is not in the platform?

What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.

On Wed, Sep 25, 2013 at 11:10 PM, Brendan Eich <brendan at mozilla.com> wrote:
> HTML parsing is not in the platform?

What Tab meant is that within the platform only CR and CRLF are
normalized to LF and no other code points. This is true for HTML as
well.


-- 
http://annevankesteren.nl/

# Tab Atkins Jr. (12 years ago)

On Wed, Sep 25, 2013 at 8:10 PM, Brendan Eich <brendan at mozilla.com> wrote:

Tab Atkins Jr. <mailto:jackalmage at gmail.com> September 25, 2013 6:31 PM Nothing else in the platform normalizes them.

HTML parsing is not in the platform?

As Anne said, the "them" in my quote was referring to the subject of the email I was responding to - the LINE_SEPARATOR and PARA_SEPARATOR characters. HTML of course normalizes CR/LF/CRLF. (As does CSS.)

On Wed, Sep 25, 2013 at 8:10 PM, Brendan Eich <brendan at mozilla.com> wrote:
>> Tab Atkins Jr. <mailto:jackalmage at gmail.com>
>> September 25, 2013 6:31 PM
>> Nothing else in the platform normalizes them.
>
> HTML parsing is not in the platform?

As Anne said, the "them" in my quote was referring to the subject of
the email I was responding to - the LINE_SEPARATOR and PARA_SEPARATOR
characters.  HTML of course normalizes CR/LF/CRLF.  (As does CSS.)

~TJ

# Brendan Eich (12 years ago)

Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:41 AM

What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

> Anne van Kesteren <mailto:annevk at annevk.nl>
> September 26, 2013 5:41 AM
>
> What Tab meant is that within the platform only CR and CRLF are
> normalized to LF and no other code points. This is true for HTML as
> well.

This thread is to sort out the LINE and PARA seps. No one likes 'em. 
JSON missed them. They seem to me only useful for subtle 
line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent 
to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

/be

# Anne van Kesteren (12 years ago)

On Thu, Sep 26, 2013 at 2:31 PM, Brendan Eich <brendan at mozilla.com> wrote:

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

Consistency with the platform, in particular for a construct useful for generating code found elsewhere in the platform, seems more important. In particular, I don't think the idea is for string templates to hold JavaScript source code, but I've seen HTML and CSS come by.

On Thu, Sep 26, 2013 at 2:31 PM, Brendan Eich <brendan at mozilla.com> wrote:
> This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON
> missed them. They seem to me only useful for subtle
> line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to
> turn a blind eye toward them.
>
> Which consistency is greater, HTML or JS standard consistency?

Consistency with the platform, in particular for a construct useful
for generating code found elsewhere in the platform, seems more
important. In particular, I don't think the idea is for string
templates to hold JavaScript source code, but I've seen HTML and CSS
come by.

-- 
http://annevankesteren.nl/

# Erik Arvidsson (12 years ago)

I vote for normalizing them too. It is less surprising and more consistent with JS code.

If someone really needs them they can always escape them

I vote for normalizing them too. It is less surprising and more
consistent with JS code.

If someone really needs them they can always escape them

On Thu, Sep 26, 2013 at 11:31 AM, Brendan Eich <brendan at mozilla.com> wrote:
>> Anne van Kesteren <mailto:annevk at annevk.nl>
>> September 26, 2013 5:41 AM
>>
>>
>> What Tab meant is that within the platform only CR and CRLF are
>> normalized to LF and no other code points. This is true for HTML as
>> well.
>
>
> This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON
> missed them. They seem to me only useful for subtle
> line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to
> turn a blind eye toward them.
>
> Which consistency is greater, HTML or JS standard consistency?
>
> /be
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss



-- 
erik

# Allen Wirfs-Brock (12 years ago)

Two thoughts below:

On Sep 26, 2013, at 11:31 AM, Brendan Eich wrote:

Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:41 AM

What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators? I know, nobody knows...

Which consistency is greater, HTML or JS standard consistency?

We presumably normalizing so that the meaning of a ES program is not sensitive to the language termination conventions of the platform used to create the source code of the program or any where along the line from creating the source code to the point it is present to the ES engine. If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.

Two thoughts below:

On Sep 26, 2013, at 11:31 AM, Brendan Eich wrote:

>> Anne van Kesteren <mailto:annevk at annevk.nl>
>> September 26, 2013 5:41 AM
>> 
>> What Tab meant is that within the platform only CR and CRLF are
>> normalized to LF and no other code points. This is true for HTML as
>> well.
> 
> This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators?  I know, nobody knows...

> 
> Which consistency is greater, HTML or JS standard consistency?

We presumably normalizing so that the meaning of a ES program is not sensitive to the language termination conventions of the platform used to create the source code of the program or any where along the line from creating the source code to the point it is present to the ES engine.  If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.

Allen

# Mike Samuel (12 years ago)

2013/9/26 Allen Wirfs-Brock <allen at wirfs-brock.com>:

Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators? I know, nobody knows...

Great question. A simple definition of line terminator would make life harder for those of us who occasionally have to craft injection attacks to try to justify our existence.

If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.

BabblePad is one, but I don't think it's widely used.

[+norbert] Maybe the i18n people can answer whether any OS-supported input methods do it.

2013/9/26 Allen Wirfs-Brock <allen at wirfs-brock.com>:
> Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators?  I know, nobody knows...

Great question.  A simple definition of line terminator would make
life harder for those of us who occasionally have to craft injection
attacks to try to justify our existence.

>  If no known code editors or OS platforms actually use LINE or PARA as normal line
> terminators then it seems reasonable that the author explicitly inserted those character
> (and they probably are showing up as line breaks in their editor) into a template string
> because she really wanted a LINE or PARA character to show up.

BabblePad is one, but I don't think it's widely used.

[+norbert]
Maybe the i18n people can answer whether any OS-supported input methods do it.

# Allen Wirfs-Brock (12 years ago)

correction:

normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are NOT showing up as line breaks in their editor) into a template

correction: 
> normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template 
   normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are NOT showing up as line breaks in their editor) into a template 

On Sep 26, 2013, at 11:50 AM, Allen Wirfs-Brock wrote:

> Two thoughts below:
> 
> On Sep 26, 2013, at 11:31 AM, Brendan Eich wrote:
> 
>>> Anne van Kesteren <mailto:annevk at annevk.nl>
>>> September 26, 2013 5:41 AM
>>> 
>>> What Tab meant is that within the platform only CR and CRLF are
>>> normalized to LF and no other code points. This is true for HTML as
>>> well.
>> 
>> This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.
> 
> Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators?  I know, nobody knows...
> 
>> 
>> Which consistency is greater, HTML or JS standard consistency?
> 
> We presumably normalizing so that the meaning of a ES program is not sensitive to the language termination conventions of the platform used to create the source code of the program or any where along the line from creating the source code to the point it is present to the ES engine.  If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.
> 
> Allen
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Erik Arvidsson (12 years ago)

On Thu, Sep 26, 2013 at 11:42 AM, Anne van Kesteren <annevk at annevk.nl> wrote:

important. In particular, I don't think the idea is for string templates to hold JavaScript source code, but I've seen HTML and CSS come by.

I'm not sure what you are basing that on? We've gotten a lot of milage out of template literals with JS code:

google/traceur-compiler/blob/master/src/codegeneration/ClassTransformer.js#L201

On Thu, Sep 26, 2013 at 11:42 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
> important. In particular, I don't think the idea is for string
> templates to hold JavaScript source code, but I've seen HTML and CSS
> come by.

I'm not sure what you are basing that on? We've gotten a lot of milage
out of template literals with JS code:

https://github.com/google/traceur-compiler/blob/master/src/codegeneration/ClassTransformer.js#L201

# Waldemar Horwat (12 years ago)

On 09/26/2013 11:31 AM, Brendan Eich wrote:

This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.

Which consistency is greater, HTML or JS standard consistency?

Within JS, the LINE and PARA separators are treated just like CR, LF, and CRLF. They all break source lines. They're forbidden inside normal string literals. They should be treated the same inside multiline string literals.

My general view on such things is to keep them as simple as possible. Treating LINE and PARA as line terminators in normal strings but not in multiline strings would be yet another bizarre and really obscure thing for people to learn, or get tripped by if they were never taught this.

On 09/26/2013 11:31 AM, Brendan Eich wrote:
>> Anne van Kesteren <mailto:annevk at annevk.nl>
>> September 26, 2013 5:41 AM
>>
>> What Tab meant is that within the platform only CR and CRLF are
>> normalized to LF and no other code points. This is true for HTML as
>> well.
>
> This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.
>
> Which consistency is greater, HTML or JS standard consistency?

Within JS, the LINE and PARA separators are treated just like CR, LF, and CRLF.  They all break source lines.  They're forbidden inside normal string literals.  They should be treated the same inside multiline string literals.

My general view on such things is to keep them as simple as possible.  Treating LINE and PARA as line terminators in normal strings but not in multiline strings would be yet another bizarre and really obscure thing for people to learn, or get tripped by if they were never taught this.

     Waldemar