Line Terminators (and others) normalization in template strings
What's the use case for normalizing them, and what normalization would you like to apply?
Nothing else in the platform normalizes them. Let's not.
On Wed, Sep 25, 2013 at 11:10 PM, Brendan Eich <brendan at mozilla.com> wrote:
HTML parsing is not in the platform?
What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.
On Wed, Sep 25, 2013 at 8:10 PM, Brendan Eich <brendan at mozilla.com> wrote:
Tab Atkins Jr. <mailto:jackalmage at gmail.com> September 25, 2013 6:31 PM Nothing else in the platform normalizes them.
HTML parsing is not in the platform?
As Anne said, the "them" in my quote was referring to the subject of the email I was responding to - the LINE_SEPARATOR and PARA_SEPARATOR characters. HTML of course normalizes CR/LF/CRLF. (As does CSS.)
Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:41 AM
What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.
This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.
Which consistency is greater, HTML or JS standard consistency?
On Thu, Sep 26, 2013 at 2:31 PM, Brendan Eich <brendan at mozilla.com> wrote:
This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.
Which consistency is greater, HTML or JS standard consistency?
Consistency with the platform, in particular for a construct useful for generating code found elsewhere in the platform, seems more important. In particular, I don't think the idea is for string templates to hold JavaScript source code, but I've seen HTML and CSS come by.
I vote for normalizing them too. It is less surprising and more consistent with JS code.
If someone really needs them they can always escape them
Two thoughts below:
On Sep 26, 2013, at 11:31 AM, Brendan Eich wrote:
Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:41 AM
What Tab meant is that within the platform only CR and CRLF are normalized to LF and no other code points. This is true for HTML as well.
This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.
Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators? I know, nobody knows...
Which consistency is greater, HTML or JS standard consistency?
We presumably normalizing so that the meaning of a ES program is not sensitive to the language termination conventions of the platform used to create the source code of the program or any where along the line from creating the source code to the point it is present to the ES engine. If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.
2013/9/26 Allen Wirfs-Brock <allen at wirfs-brock.com>:
Would any code in the world break if ES stopped treating LINE and PARA as lexical line terminators? I know, nobody knows...
Great question. A simple definition of line terminator would make life harder for those of us who occasionally have to craft injection attacks to try to justify our existence.
If no known code editors or OS platforms actually use LINE or PARA as normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template string because she really wanted a LINE or PARA character to show up.
BabblePad is one, but I don't think it's widely used.
[+norbert] Maybe the i18n people can answer whether any OS-supported input methods do it.
correction:
normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are showing up as line breaks in their editor) into a template normal line terminators then it seems reasonable that the author explicitly inserted those character (and they probably are NOT showing up as line breaks in their editor) into a template
On Thu, Sep 26, 2013 at 11:42 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
important. In particular, I don't think the idea is for string templates to hold JavaScript source code, but I've seen HTML and CSS come by.
I'm not sure what you are basing that on? We've gotten a lot of milage out of template literals with JS code:
google/traceur-compiler/blob/master/src/codegeneration/ClassTransformer.js#L201
On 09/26/2013 11:31 AM, Brendan Eich wrote:
This thread is to sort out the LINE and PARA seps. No one likes 'em. JSON missed them. They seem to me only useful for subtle line-numbering/source-hiding attacks. But ECMA-262 would be inconsistent to turn a blind eye toward them.
Which consistency is greater, HTML or JS standard consistency?
Within JS, the LINE and PARA separators are treated just like CR, LF, and CRLF. They all break source lines. They're forbidden inside normal string literals. They should be treated the same inside multiline string literals.
My general view on such things is to keep them as simple as possible. Treating LINE and PARA as line terminators in normal strings but not in multiline strings would be yet another bizarre and really obscure thing for people to learn, or get tripped by if they were never taught this.
At the last TC39 face-to-face, the committee agreed to normalize CR, LF and CRLF to LF in template strings. Since then, it's become clear that LINE_SEPARATOR and PARA_SEPARATOR were overlooked. The question to resolve is: should these also be normalized?