JSON.stringify </script>

# Michał Wadas (9 years ago)

Idea: require implementations to stringify "</script>" as "<\uxxxxscript>".

Benefits: remove XSS vulnerability when injecting JSON as content of <script> tag (quite common antipattern).

Backward compatible: yes, unless binary equality is required and this string is used.

Idea: require implementations to stringify "</script>" as "<\uxxxxscript>".

Benefits: remove XSS vulnerability when injecting JSON as content of
<script> tag (quite common antipattern).

Backward compatible: yes, unless binary equality is required and this
string is used.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/79619eca/attachment.html>

# Mike Samuel (9 years ago)

I think defining an easy way to produce embeddable JSON is a great idea, but it's not quite that simple.

OWASP/json-sanitizer#output captures some requirements that I came up with for embedding JSON in HTML:

""" The output is well-formed JSON as defined by RFC 4627. The output satisfies these additional properties:

The output will not contain the substring (case-insensitively) "</script" so can be embedded inside an HTML script element without further encoding.
The output will not contain the substring "]]>" so can be embedded

inside an XML CDATA section without further encoding.

The output is a valid Javascript expression, so can be parsed by Javascript's eval builtin (after being wrapped in parentheses) or by JSON.parse. Specifically, the output will not contain any string literals with embedded JS newlines (U+2028 Paragraph separator or U+2029 Line separator).
The output contains only valid Unicode scalar values (no isolated UTF-16 surrogates) that are allowed in XML unescaped. """

These apply equally well to RFC 7159 IIUC. The latter few constraints are required to allow embedding of JSON in HTML in a foreign content context ( www.w3.org/TR/html5/syntax.html#cdata-sections ).

Those rules are sufficient to allow embedding in HTML without breaking token boundaries in the embedding language.

To preserve semantics when embedding in HTML you also need to escape '&'. To prevent exfiltration via external entities in SVG & other XML variants, you should probably also escape '%'.

I think defining an easy way to produce embeddable JSON is a great
idea, but it's not quite that simple.

https://github.com/OWASP/json-sanitizer#output captures some
requirements that I came up with for embedding JSON in HTML:

"""
The output is well-formed JSON as defined by RFC 4627. The output
satisfies these additional properties:

* The output will not contain the substring (case-insensitively)
"</script" so can be embedded inside an HTML script element without
further encoding.
* The output will not contain the substring "]]>" so can be embedded
inside an XML CDATA section without further encoding.
* The output is a valid Javascript expression, so can be parsed by
Javascript's eval builtin (after being wrapped in parentheses) or by
JSON.parse. Specifically, the output will not contain any string
literals with embedded JS newlines (U+2028 Paragraph separator or
U+2029 Line separator).
* The output contains only valid Unicode scalar values (no isolated
UTF-16 surrogates) that are allowed in XML unescaped.
"""

These apply equally well to RFC 7159 IIUC.  The latter few constraints
are required to allow embedding of JSON in HTML in a foreign content
context ( https://www.w3.org/TR/html5/syntax.html#cdata-sections ).

Those rules are sufficient to allow embedding in HTML without breaking
token boundaries in the embedding language.

To preserve semantics when embedding in HTML you also need to escape '&'.
To prevent exfiltration via external entities in SVG & other XML
variants, you should probably also escape '%'.

On Wed, Sep 28, 2016 at 10:06 AM, Michał Wadas <michalwadas at gmail.com> wrote:
> Idea: require implementations to stringify "</script>" as "<\uxxxxscript>".
>
> Benefits: remove XSS vulnerability when injecting JSON as content of
> <script> tag (quite common antipattern).
>
> Backward compatible: yes, unless binary equality is required and this string
> is used.
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Alexander Jones (9 years ago)

That's awful. As you say, it's an antipattern, no further effort should be spent on this. JSON produced by JavaScript has far more general uses than slapping directly into a script tag unencoded, so no-one else should have to see this. Also, there are many other producers of JSON than JavaScript.

Instead, use XHTML and CDATA (which has a straightforward encoding mechanism that doesn't ruin the parseability of the code or affect it in any way) if you really want to pull stunts like this.

Alex

That's awful. As you say, it's an antipattern, no further effort should be
spent on this. JSON produced by JavaScript has far more general uses than
slapping directly into a script tag unencoded, so no-one else should have
to see this. Also, there are many other producers of JSON than JavaScript.

Instead, use XHTML and CDATA (which has a straightforward encoding
mechanism that doesn't ruin the parseability of the code or affect it in
any way) if you really want to pull stunts like this.

Alex

On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com> wrote:

> Idea: require implementations to stringify "</script>" as
> "<\uxxxxscript>".
>
> Benefits: remove XSS vulnerability when injecting JSON as content of
> <script> tag (quite common antipattern).
>
> Backward compatible: yes, unless binary equality is required and this
> string is used.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/94e6af10/attachment.html>

# Michał Wadas (9 years ago)

Actually CDATA suffer the same issue - for string "]]>". Mike Samuel has a

very strong point here.

And by saying "it's antipattern, don't do this" we will not make old vulnerable code go away. And we have a very good way to stop people from shooting their own feet - for free.

On 28 Sep 2016 8:31 p.m., "Alexander Jones" <alex at weej.com> wrote:

Instead, use XHTML and CDATA (which has a straightforward encoding mechanism that doesn't ruin the parseability of the code or affect it in any way) if you really want to pull stunts like this.

Alex

Actually CDATA suffer the same issue - for string "]]>". Mike Samuel has a
very strong point here.

And by saying "it's antipattern, don't do this" we will not make old
vulnerable code go away. And we have a very good way to stop people from
shooting their own feet - for free.

On 28 Sep 2016 8:31 p.m., "Alexander Jones" <alex at weej.com> wrote:

That's awful. As you say, it's an antipattern, no further effort should be
spent on this. JSON produced by JavaScript has far more general uses than
slapping directly into a script tag unencoded, so no-one else should have
to see this. Also, there are many other producers of JSON than JavaScript.

Instead, use XHTML and CDATA (which has a straightforward encoding
mechanism that doesn't ruin the parseability of the code or affect it in
any way) if you really want to pull stunts like this.

Alex

On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com> wrote:

> Idea: require implementations to stringify "</script>" as
> "<\uxxxxscript>".
>
> Benefits: remove XSS vulnerability when injecting JSON as content of
> <script> tag (quite common antipattern).
>
> Backward compatible: yes, unless binary equality is required and this
> string is used.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/000fba2f/attachment.html>

# Alexander Jones (9 years ago)

Embedding a JSON literal into HTML involves first encoding to JSON then encoding that into HTML. Two stages which must not be confused. The 'encoding into HTML' part is best done in XHTML with CDATA, and the encoding method is taken care of by whichever XML-generating library you're using. If you hint it to use CDATA for such a text node, or if for any other reason it chooses to use CDATA, rather than merely converting every < to <, etc., then it will (or should) "escape" ]]> as ]]]]><![CDATA[> or whatever equivalent. See

en.wikipedia.org/wiki/CDATA#Nesting for more info. Crucially, this works for encoding ANY text data into a text node in an XML document, not just JSON.

Having the specified JSON algorithm in ECMAScript deal with concerns of embedding into legacy, non XML-based HTML (oh yes, I totally went there! ;) ) is a classic layer violation, which I would guarantee offends 99 out of 100 experienced programmers' sensibilities. :)

Aside, I'll repeat again that this would be largely ineffective - a lot of JSON that might be dumbly pasted into a text stream of HTML would be generated by implementations other than that specified by ECMAScript.

Hope this clears it up

Alex

Hi Michał

Embedding a JSON literal into HTML involves first encoding to JSON then
encoding that into HTML. Two stages which must not be confused. The
'encoding into HTML' part is best done in XHTML with CDATA, and the
encoding method is taken care of by whichever XML-generating library you're
using. If you hint it to use CDATA for such a text node, or if for any
other reason it chooses to use CDATA, rather than merely converting every
`<` to `<`, etc., then it will (or should) "escape" `]]>` as
`]]]]><![CDATA[>` or whatever equivalent. See
https://en.wikipedia.org/wiki/CDATA#Nesting for more info. Crucially, this
works for encoding ANY text data into a text node in an XML document, not
just JSON.

Having the specified JSON algorithm in ECMAScript deal with concerns of
embedding into legacy, non XML-based HTML (oh yes, I totally went there! ;)
) is a classic layer violation, which I would guarantee offends 99 out of
100 experienced programmers' sensibilities. :)

Aside, I'll repeat again that this would be largely ineffective - a lot of
JSON that might be dumbly pasted into a text stream of HTML would be
generated by implementations other than that specified by ECMAScript.

Hope this clears it up

Alex

On 28 September 2016 at 19:41, Michał Wadas <michalwadas at gmail.com> wrote:

> Actually CDATA suffer the same issue - for string "]]>". Mike Samuel has a
> very strong point here.
>
> And by saying "it's antipattern, don't do this" we will not make old
> vulnerable code go away. And we have a very good way to stop people from
> shooting their own feet - for free.
>
> On 28 Sep 2016 8:31 p.m., "Alexander Jones" <alex at weej.com> wrote:
>
> That's awful. As you say, it's an antipattern, no further effort should be
> spent on this. JSON produced by JavaScript has far more general uses than
> slapping directly into a script tag unencoded, so no-one else should have
> to see this. Also, there are many other producers of JSON than JavaScript.
>
> Instead, use XHTML and CDATA (which has a straightforward encoding
> mechanism that doesn't ruin the parseability of the code or affect it in
> any way) if you really want to pull stunts like this.
>
> Alex
>
>
> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
> wrote:
>
>> Idea: require implementations to stringify "</script>" as
>> "<\uxxxxscript>".
>>
>> Benefits: remove XSS vulnerability when injecting JSON as content of
>> <script> tag (quite common antipattern).
>>
>> Backward compatible: yes, unless binary equality is required and this
>> string is used.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/22ad6c52/attachment-0001.html>

# Kris Siegel (9 years ago)

ECMAScript, while highly used in web browsers, should really not care about HTML constructs. That's where WHATWG and W3C come in. I suggest this type of feature should come from one of those groups, not ECMA.

ECMAScript, while highly used in web browsers, should really not care about
HTML constructs. That's where WHATWG and W3C come in. I suggest this type
of feature should come from one of those groups, not ECMA.

On Wed, Sep 28, 2016 at 11:54 AM, Alexander Jones <alex at weej.com> wrote:

> Hi Michał
>
> Embedding a JSON literal into HTML involves first encoding to JSON then
> encoding that into HTML. Two stages which must not be confused. The
> 'encoding into HTML' part is best done in XHTML with CDATA, and the
> encoding method is taken care of by whichever XML-generating library you're
> using. If you hint it to use CDATA for such a text node, or if for any
> other reason it chooses to use CDATA, rather than merely converting every
> `<` to `<`, etc., then it will (or should) "escape" `]]>` as
> `]]]]><![CDATA[>` or whatever equivalent. See https://en.wikipedia.org/
> wiki/CDATA#Nesting for more info. Crucially, this works for encoding ANY
> text data into a text node in an XML document, not just JSON.
>
> Having the specified JSON algorithm in ECMAScript deal with concerns of
> embedding into legacy, non XML-based HTML (oh yes, I totally went there! ;)
> ) is a classic layer violation, which I would guarantee offends 99 out of
> 100 experienced programmers' sensibilities. :)
>
> Aside, I'll repeat again that this would be largely ineffective - a lot of
> JSON that might be dumbly pasted into a text stream of HTML would be
> generated by implementations other than that specified by ECMAScript.
>
> Hope this clears it up
>
> Alex
>
> On 28 September 2016 at 19:41, Michał Wadas <michalwadas at gmail.com> wrote:
>
>> Actually CDATA suffer the same issue - for string "]]>". Mike Samuel has
>> a very strong point here.
>>
>> And by saying "it's antipattern, don't do this" we will not make old
>> vulnerable code go away. And we have a very good way to stop people from
>> shooting their own feet - for free.
>>
>> On 28 Sep 2016 8:31 p.m., "Alexander Jones" <alex at weej.com> wrote:
>>
>> That's awful. As you say, it's an antipattern, no further effort should
>> be spent on this. JSON produced by JavaScript has far more general uses
>> than slapping directly into a script tag unencoded, so no-one else should
>> have to see this. Also, there are many other producers of JSON than
>> JavaScript.
>>
>> Instead, use XHTML and CDATA (which has a straightforward encoding
>> mechanism that doesn't ruin the parseability of the code or affect it in
>> any way) if you really want to pull stunts like this.
>>
>> Alex
>>
>>
>> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
>> wrote:
>>
>>> Idea: require implementations to stringify "</script>" as
>>> "<\uxxxxscript>".
>>>
>>> Benefits: remove XSS vulnerability when injecting JSON as content of
>>> <script> tag (quite common antipattern).
>>>
>>> Backward compatible: yes, unless binary equality is required and this
>>> string is used.
>>>
>>
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/99cfde5a/attachment.html>

# Mike Samuel (9 years ago)

I agree it's subideal which is why I work to address problems like this in template systems but ad-hoc string concatenation happens and embeddable sub-languages provide defense-in-depth without sacrificing correctness.

CDATA sections solve no problems because they cannot contain any string that has "]]>" as a substring so you still have to s/]]>/]]>]]<!CDATA>/g.

I agree it's subideal which is why I work to address problems like this in
template systems but ad-hoc string concatenation happens and embeddable
sub-languages provide defense-in-depth without sacrificing correctness.

CDATA sections solve no problems because they cannot contain any string
that has "]]>" as a substring so you still have to s/\]\]>/]]>]]<!CDATA>/g.

On Sep 28, 2016 2:32 PM, "Alexander Jones" <alex at weej.com> wrote:

> That's awful. As you say, it's an antipattern, no further effort should be
> spent on this. JSON produced by JavaScript has far more general uses than
> slapping directly into a script tag unencoded, so no-one else should have
> to see this. Also, there are many other producers of JSON than JavaScript.
>
> Instead, use XHTML and CDATA (which has a straightforward encoding
> mechanism that doesn't ruin the parseability of the code or affect it in
> any way) if you really want to pull stunts like this.
>
> Alex
>
> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
> wrote:
>
>> Idea: require implementations to stringify "</script>" as
>> "<\uxxxxscript>".
>>
>> Benefits: remove XSS vulnerability when injecting JSON as content of
>> <script> tag (quite common antipattern).
>>
>> Backward compatible: yes, unless binary equality is required and this
>> string is used.
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/55cf4deb/attachment.html>

# Alexander Jones (9 years ago)

They do solve the problem. You encode your entire JS before pasting it, encoding ]]> and nothing more, and the XML document's text node contains

the unadulterated text, which the JS parser also sees. It's perfect layer isolation. Ye olde HTML can't do that because there is no escaping mechanism for </script> that actually allows the JS parser to see the

text (code) content unmodified.

Viva la <xhtml:revolución /> ;)

They do solve the problem. You encode your entire JS *before* pasting it,
encoding `]]>` and nothing more, and the XML document's text node contains
the unadulterated text, which the JS parser also sees. It's perfect layer
isolation. Ye olde HTML can't do that because there is no escaping
mechanism for `</script>` that actually allows the JS parser to see the
text (code) content unmodified.

Viva la `<xhtml:revolución />` ;)

On Wednesday, 28 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:

> I agree it's subideal which is why I work to address problems like this in
> template systems but ad-hoc string concatenation happens and embeddable
> sub-languages provide defense-in-depth without sacrificing correctness.
>
> CDATA sections solve no problems because they cannot contain any string
> that has "]]>" as a substring so you still have to s/\]\]>/]]>]]<!CDATA>/g.
>
> On Sep 28, 2016 2:32 PM, "Alexander Jones" <alex at weej.com
> <javascript:_e(%7B%7D,'cvml','alex at weej.com');>> wrote:
>
>> That's awful. As you say, it's an antipattern, no further effort should
>> be spent on this. JSON produced by JavaScript has far more general uses
>> than slapping directly into a script tag unencoded, so no-one else should
>> have to see this. Also, there are many other producers of JSON than
>> JavaScript.
>>
>> Instead, use XHTML and CDATA (which has a straightforward encoding
>> mechanism that doesn't ruin the parseability of the code or affect it in
>> any way) if you really want to pull stunts like this.
>>
>> Alex
>>
>> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com
>> <javascript:_e(%7B%7D,'cvml','michalwadas at gmail.com');>> wrote:
>>
>>> Idea: require implementations to stringify "</script>" as
>>> "<\uxxxxscript>".
>>>
>>> Benefits: remove XSS vulnerability when injecting JSON as content of
>>> <script> tag (quite common antipattern).
>>>
>>> Backward compatible: yes, unless binary equality is required and this
>>> string is used.
>>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> <javascript:_e(%7B%7D,'cvml','es-discuss at mozilla.org');>
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160929/689fecbf/attachment-0001.html>

# Mike Samuel (9 years ago)

Without CDATA you have to encode script bodies properly. With CDATA you have to encode script bodies properly. What problem did CDATA solve?

Without CDATA you have to encode script bodies properly.  With CDATA you
have to encode script bodies properly.  What problem did CDATA solve?

On Sep 28, 2016 8:03 PM, "Alexander Jones" <alex at weej.com> wrote:

> They do solve the problem. You encode your entire JS *before* pasting it,
> encoding `]]>` and nothing more, and the XML document's text node contains
> the unadulterated text, which the JS parser also sees. It's perfect layer
> isolation. Ye olde HTML can't do that because there is no escaping
> mechanism for `</script>` that actually allows the JS parser to see the
> text (code) content unmodified.
>
> Viva la `<xhtml:revolución />` ;)
>
> On Wednesday, 28 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:
>
>> I agree it's subideal which is why I work to address problems like this
>> in template systems but ad-hoc string concatenation happens and embeddable
>> sub-languages provide defense-in-depth without sacrificing correctness.
>>
>> CDATA sections solve no problems because they cannot contain any string
>> that has "]]>" as a substring so you still have to s/\]\]>/]]>]]<!CDATA>/g.
>>
>> On Sep 28, 2016 2:32 PM, "Alexander Jones" <alex at weej.com> wrote:
>>
>>> That's awful. As you say, it's an antipattern, no further effort should
>>> be spent on this. JSON produced by JavaScript has far more general uses
>>> than slapping directly into a script tag unencoded, so no-one else should
>>> have to see this. Also, there are many other producers of JSON than
>>> JavaScript.
>>>
>>> Instead, use XHTML and CDATA (which has a straightforward encoding
>>> mechanism that doesn't ruin the parseability of the code or affect it in
>>> any way) if you really want to pull stunts like this.
>>>
>>> Alex
>>>
>>> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
>>> wrote:
>>>
>>>> Idea: require implementations to stringify "</script>" as
>>>> "<\uxxxxscript>".
>>>>
>>>> Benefits: remove XSS vulnerability when injecting JSON as content of
>>>> <script> tag (quite common antipattern).
>>>>
>>>> Backward compatible: yes, unless binary equality is required and this
>>>> string is used.
>>>>
>>>
>>> _______________________________________________
>>> es-discuss mailing list
>>> es-discuss at mozilla.org
>>> https://mail.mozilla.org/listinfo/es-discuss
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160928/ca334a78/attachment.html>

# Alexander Jones (9 years ago)

In XHTML, CDATA allows a 'more' verbatim spelling of text node content. But the end token has to be escaped, as discussed. Despite this escaping, the text node can contain arbitrary strings.

In XHTML, you can achieve the same effect without CDATA, just by escaping XML entities. Again, and cruciallt, the text node can contain arbitrary strings.

In HTML without CDATA, using HTML entities within the script tag is wrong specifically because they are not interpreted. The text node in the HTML document CANNOT contain arbitrary strings, and there is no further decode step before the JS parser hits your code, so you're forced to take other measures to ensure that </script> does not appear in your code. There are

a few places this can appear, only one of which is embedded in string literals, so the method of avoiding this is actually sensitive to the context and not practical to specify.

I hope you can appreciate how ridiculous this problem is for HTML - I don't believe CDATA support in HTML 5 can solve this due to forward compatibility - which is why it's an antipattern. Just don't do it, or use XHTML. It's not cool to hate on XML anymore. ;)

Alex

In XHTML, CDATA allows a 'more' verbatim spelling of text node content. But
the end token has to be escaped, as discussed. Despite this escaping, the
text node can contain arbitrary strings.

In XHTML, you *can* achieve the same effect without CDATA, just by escaping
XML entities. Again, and cruciallt, the text node can contain arbitrary
strings.

In HTML without CDATA, using HTML entities within the script tag is wrong
specifically because they are *not* interpreted. The text node in the HTML
document CANNOT contain arbitrary strings, and there is no further decode
step before the JS parser hits your code, so you're forced to take other
measures to ensure that `</script>` does not appear in your code. There are
a few places this can appear, only one of which is embedded in string
literals, so the method of avoiding this is actually sensitive to
the context and not practical to specify.

I hope you can appreciate how ridiculous this problem is for HTML - I don't
believe CDATA support in HTML 5 can solve this due to forward compatibility -
which is why it's an antipattern. Just don't do it, or use XHTML. It's not
cool to hate on XML anymore. ;)

Alex

On Thursday, 29 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:

> Without CDATA you have to encode script bodies properly.  With CDATA you
> have to encode script bodies properly.  What problem did CDATA solve?
>
> On Sep 28, 2016 8:03 PM, "Alexander Jones" <alex at weej.com
> <javascript:_e(%7B%7D,'cvml','alex at weej.com');>> wrote:
>
>> They do solve the problem. You encode your entire JS *before* pasting it,
>> encoding `]]>` and nothing more, and the XML document's text node contains
>> the unadulterated text, which the JS parser also sees. It's perfect layer
>> isolation. Ye olde HTML can't do that because there is no escaping
>> mechanism for `</script>` that actually allows the JS parser to see the
>> text (code) content unmodified.
>>
>> Viva la `<xhtml:revolución />` ;)
>>
>> On Wednesday, 28 September 2016, Mike Samuel <mikesamuel at gmail.com
>> <javascript:_e(%7B%7D,'cvml','mikesamuel at gmail.com');>> wrote:
>>
>>> I agree it's subideal which is why I work to address problems like this
>>> in template systems but ad-hoc string concatenation happens and embeddable
>>> sub-languages provide defense-in-depth without sacrificing correctness.
>>>
>>> CDATA sections solve no problems because they cannot contain any string
>>> that has "]]>" as a substring so you still have to s/\]\]>/]]>]]<!CDATA>/g.
>>>
>>> On Sep 28, 2016 2:32 PM, "Alexander Jones" <alex at weej.com> wrote:
>>>
>>>> That's awful. As you say, it's an antipattern, no further effort should
>>>> be spent on this. JSON produced by JavaScript has far more general uses
>>>> than slapping directly into a script tag unencoded, so no-one else should
>>>> have to see this. Also, there are many other producers of JSON than
>>>> JavaScript.
>>>>
>>>> Instead, use XHTML and CDATA (which has a straightforward encoding
>>>> mechanism that doesn't ruin the parseability of the code or affect it in
>>>> any way) if you really want to pull stunts like this.
>>>>
>>>> Alex
>>>>
>>>> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
>>>> wrote:
>>>>
>>>>> Idea: require implementations to stringify "</script>" as
>>>>> "<\uxxxxscript>".
>>>>>
>>>>> Benefits: remove XSS vulnerability when injecting JSON as content of
>>>>> <script> tag (quite common antipattern).
>>>>>
>>>>> Backward compatible: yes, unless binary equality is required and this
>>>>> string is used.
>>>>>
>>>>
>>>> _______________________________________________
>>>> es-discuss mailing list
>>>> es-discuss at mozilla.org
>>>> https://mail.mozilla.org/listinfo/es-discuss
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160929/44128471/attachment-0001.html>

# Simon Pieters (9 years ago)

On Wed, 28 Sep 2016 19:06:31 +0200, Michał Wadas <michalwadas at gmail.com>

wrote:

Idea: require implementations to stringify "</script>" as
"<\uxxxxscript>".

Benefits: remove XSS vulnerability when injecting JSON as content of <script> tag (quite common antipattern).

Backward compatible: yes, unless binary equality is required and this string is used.

You would also need to escape "<!--" and "<script" for HTML. See
html.spec.whatwg.org/multipage/scripting.html#restrictions

On Wed, 28 Sep 2016 19:06:31 +0200, Michał Wadas <michalwadas at gmail.com>  
wrote:

> Idea: require implementations to stringify "</script>" as  
> "<\uxxxxscript>".
>
> Benefits: remove XSS vulnerability when injecting JSON as content of
> <script> tag (quite common antipattern).
>
> Backward compatible: yes, unless binary equality is required and this
> string is used.

You would also need to escape "<!--" and "<script" for HTML. See  
https://html.spec.whatwg.org/multipage/scripting.html#restrictions-for-contents-of-script-elements

-- 
Simon Pieters
Opera Software

# Mike Samuel (9 years ago)

On Thu, Sep 29, 2016 at 2:09 AM, Alexander Jones <alex at weej.com> wrote:

In XHTML, CDATA allows a 'more' verbatim spelling of text node content. But the end token has to be escaped, as discussed. Despite this escaping, the text node can contain arbitrary strings.

In XHTML, you can achieve the same effect without CDATA, just by escaping XML entities. Again, and cruciallt, the text node can contain arbitrary strings.

So, <script><![CDATA[...]]></script> has a complete escaping process,

whereas, since CDATA sections were taken out of HTML foreign element content disallowing <svg><script><![[CDATA[...]]></script></svg>

HTML does not, so to figure out how to embed

alert("</script>"); if (a < /script>/.exec(myString)) ...

you have to do scripting language specific analysis.

Is that about right?

In HTML without CDATA, using HTML entities within the script tag is wrong specifically because they are not interpreted. The text node in the HTML document CANNOT contain arbitrary strings, and there is no further decode step before the JS parser hits your code, so you're forced to take other measures to ensure that </script> does not appear in your code. There are a few places this can appear, only one of which is embedded in string literals, so the method of avoiding this is actually sensitive to the context and not practical to specify.

I hope you can appreciate how ridiculous this problem is for HTML - I don't believe CDATA support in HTML 5 can solve this due to forward compatibility

which is why it's an antipattern. Just don't do it, or use XHTML. It's not cool to hate on XML anymore. ;)

Yes. I've written hardened DOM tree serializers. I appreciate these problems. No-one is hating on XML.

We're talking about JSON serializers. Every JSON serializers produces a subset of the output language. Choices about that sublanguage affect how easy/hard it is to use that serializer with other tools.

That "if everyone wrote software with property P, we would not have problem Q" is a great argument that we should prefer stacks with property P, but does not mean we should not take the prevalence of problem Q into account when designing elements of software stacks. You seem to actually be arguing that we should not do our best to prevent problem Q by other means, but real systems need defense-in-depth.

So I concede your point about CDATA sections but don't see that these arguments about antipatterns and the benefits of XHTML are all that relevant.

On Thu, Sep 29, 2016 at 2:09 AM, Alexander Jones <alex at weej.com> wrote:
> In XHTML, CDATA allows a 'more' verbatim spelling of text node content. But
> the end token has to be escaped, as discussed. Despite this escaping, the
> text node can contain arbitrary strings.



> In XHTML, you *can* achieve the same effect without CDATA, just by escaping
> XML entities. Again, and cruciallt, the text node can contain arbitrary
> strings.

So, <script><![CDATA[...]]></script> has a complete escaping process,
whereas, since CDATA sections were taken out of HTML foreign element
content disallowing
  <svg><script><![[CDATA[...]]></script></svg>
HTML does not, so to figure out how to embed

  alert("</script>");
  if (a < /script>/.exec(myString)) ...

you have to do scripting language specific analysis.

Is that about right?


> In HTML without CDATA, using HTML entities within the script tag is wrong
> specifically because they are *not* interpreted. The text node in the HTML
> document CANNOT contain arbitrary strings, and there is no further decode
> step before the JS parser hits your code, so you're forced to take other
> measures to ensure that `</script>` does not appear in your code. There are
> a few places this can appear, only one of which is embedded in string
> literals, so the method of avoiding this is actually sensitive to the
> context and not practical to specify.



> I hope you can appreciate how ridiculous this problem is for HTML - I don't
> believe CDATA support in HTML 5 can solve this due to forward compatibility
> - which is why it's an antipattern. Just don't do it, or use XHTML. It's not
> cool to hate on XML anymore. ;)

Yes.  I've written hardened DOM tree serializers.  I appreciate these problems.
No-one is hating on XML.

We're talking about JSON serializers.  Every JSON serializers produces
a subset of the output language. Choices about that sublanguage affect
how easy/hard it is to use that serializer with other tools.

That "if everyone wrote software with property P, we would not have
problem Q" is a great argument that we should prefer stacks with
property P, but does not mean we should not take the prevalence of
problem Q into account when designing elements of software stacks.
You seem to actually be arguing that we should not do our best to
prevent problem Q by other means, but real systems need
defense-in-depth.

So I concede your point about CDATA sections but don't see that these
arguments about antipatterns and the benefits of XHTML are all that
relevant.



> Alex
>
>
>
> On Thursday, 29 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:
>>
>> Without CDATA you have to encode script bodies properly.  With CDATA you
>> have to encode script bodies properly.  What problem did CDATA solve?
>>
>>
>> On Sep 28, 2016 8:03 PM, "Alexander Jones" <alex at weej.com> wrote:
>>>
>>> They do solve the problem. You encode your entire JS *before* pasting it,
>>> encoding `]]>` and nothing more, and the XML document's text node contains
>>> the unadulterated text, which the JS parser also sees. It's perfect layer
>>> isolation. Ye olde HTML can't do that because there is no escaping mechanism
>>> for `</script>` that actually allows the JS parser to see the text (code)
>>> content unmodified.
>>>
>>> Viva la `<xhtml:revolución />` ;)
>>>
>>> On Wednesday, 28 September 2016, Mike Samuel <mikesamuel at gmail.com>
>>> wrote:
>>>>
>>>> I agree it's subideal which is why I work to address problems like this
>>>> in template systems but ad-hoc string concatenation happens and embeddable
>>>> sub-languages provide defense-in-depth without sacrificing correctness.
>>>>
>>>> CDATA sections solve no problems because they cannot contain any string
>>>> that has "]]>" as a substring so you still have to s/\]\]>/]]>]]<!CDATA>/g.
>>>>
>>>>
>>>> On Sep 28, 2016 2:32 PM, "Alexander Jones" <alex at weej.com> wrote:
>>>>>
>>>>> That's awful. As you say, it's an antipattern, no further effort should
>>>>> be spent on this. JSON produced by JavaScript has far more general uses than
>>>>> slapping directly into a script tag unencoded, so no-one else should have to
>>>>> see this. Also, there are many other producers of JSON than JavaScript.
>>>>>
>>>>> Instead, use XHTML and CDATA (which has a straightforward encoding
>>>>> mechanism that doesn't ruin the parseability of the code or affect it in any
>>>>> way) if you really want to pull stunts like this.
>>>>>
>>>>> Alex
>>>>>
>>>>> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Idea: require implementations to stringify "</script>" as
>>>>>> "<\uxxxxscript>".
>>>>>>
>>>>>> Benefits: remove XSS vulnerability when injecting JSON as content of
>>>>>> <script> tag (quite common antipattern).
>>>>>>
>>>>>> Backward compatible: yes, unless binary equality is required and this
>>>>>> string is used.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> es-discuss mailing list
>>>>> es-discuss at mozilla.org
>>>>> https://mail.mozilla.org/listinfo/es-discuss
>>>>>
>

# Oriol Bugzilla (9 years ago)

ECMAScript, while highly used in web browsers, should really not care about HTML constructs. That's where WHATWG and W3C come in. I suggest this type of feature should come from one of those groups, not ECMA.

That applies to escaping things like </script> or ]]>, and I agree. But as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are not valid JS expressions. I think it would make sense for JSON.stringify to escape these.

> ECMAScript, while highly used in web browsers, should really not care about HTML constructs. That's where WHATWG and W3C come in. I suggest this type of feature should come from one of those groups, not ECMA.

That applies to escaping things like `</script>` or `]]>`, and I agree. But as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are not valid JS expressions. I think it would make sense for `JSON.stringify` to escape these.

-Oriol

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160929/8102c3da/attachment.html>

# Mike Samuel (9 years ago)

On Thu, Sep 29, 2016 at 8:45 AM, Oriol Bugzilla <oriol-bugzilla at hotmail.com> wrote:

ECMAScript, while highly used in web browsers, should really not care about HTML constructs. That's where WHATWG and W3C come in. I suggest this type of feature should come from one of those groups, not ECMA.

That applies to escaping things like </script> or ]]>, and I agree. But as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are not valid JS expressions. I think it would make sense for JSON.stringify to escape these.

What is it that you're saying is not in TC-39's bailiwick?

Is it that w3c/whatwg should define what constitutes "embeddable JSON"?

Or is it that if it's worth defining a function that produces embeddable JSON from an EcmaScript object, that w3c/whatwg should include that in some set of EcmaScript APIs that it defines?

If you agree with my earlier claim """ We're talking about JSON serializers. Every serializers produces a subset of the output language. Choices about that sublanguage affect how easy/hard it is to use that serializer with other tools. """ then it seems that TC-39 might take embeddability into account when crafting the subset of JSON that JSON.stringify produces.

On Thu, Sep 29, 2016 at 8:45 AM, Oriol Bugzilla
<oriol-bugzilla at hotmail.com> wrote:
>> ECMAScript, while highly used in web browsers, should really not care
>> about HTML constructs. That's where WHATWG and W3C come in. I suggest this
>> type of feature should come from one of those groups, not ECMA.
>
> That applies to escaping things like `</script>` or `]]>`, and I agree. But
> as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are not
> valid JS expressions. I think it would make sense for `JSON.stringify` to
> escape these.

What is it that you're saying is not in TC-39's bailiwick?

Is it that w3c/whatwg should define what constitutes "embeddable JSON"?

Or is it that if it's worth defining a function that produces
embeddable JSON from an EcmaScript object, that w3c/whatwg should
include that in some set of EcmaScript APIs that it defines?

If you agree with my earlier claim
"""
We're talking about JSON serializers.  Every serializers produces
a subset of the output language. Choices about that sublanguage affect
how easy/hard it is to use that serializer with other tools.
"""
then it seems that TC-39 might take embeddability into account when
crafting the subset of JSON that JSON.stringify produces.

# Alexander Jones (9 years ago)

Maybe we should just make U+2028 and U+2029 valid in JS then? What other productions in JSON are invalid syntax in JS?

Maybe we should just make U+2028 and U+2029 valid in JS then? What other
productions in JSON are invalid syntax in JS?

On Thursday, 29 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:

> On Thu, Sep 29, 2016 at 8:45 AM, Oriol Bugzilla
> <oriol-bugzilla at hotmail.com <javascript:;>> wrote:
> >> ECMAScript, while highly used in web browsers, should really not care
> >> about HTML constructs. That's where WHATWG and W3C come in. I suggest
> this
> >> type of feature should come from one of those groups, not ECMA.
> >
> > That applies to escaping things like `</script>` or `]]>`, and I agree.
> But
> > as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are
> not
> > valid JS expressions. I think it would make sense for `JSON.stringify` to
> > escape these.
>
> What is it that you're saying is not in TC-39's bailiwick?
>
> Is it that w3c/whatwg should define what constitutes "embeddable JSON"?
>
> Or is it that if it's worth defining a function that produces
> embeddable JSON from an EcmaScript object, that w3c/whatwg should
> include that in some set of EcmaScript APIs that it defines?
>
> If you agree with my earlier claim
> """
> We're talking about JSON serializers.  Every serializers produces
> a subset of the output language. Choices about that sublanguage affect
> how easy/hard it is to use that serializer with other tools.
> """
> then it seems that TC-39 might take embeddability into account when
> crafting the subset of JSON that JSON.stringify produces.
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org <javascript:;>
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160929/cb4c36c9/attachment-0001.html>

# Mike Samuel (9 years ago)

On Thu, Sep 29, 2016 at 9:25 AM, Alexander Jones <alex at weej.com> wrote:

Maybe we should just make U+2028 and U+2029 valid in JS then? What other productions in JSON are invalid syntax in JS?

I don't think any other productions in JSON are invalid syntax in an Expression context.

JSON places no limit on size of numeric literals, and other languages ban unrepresentably large ones, but IIRC ES does not.

Obviously if you start parsing JSON in a statement context, you run into problems where a JSON object with one or more properties is an invalid BlockStatement and the ExpressionStatement production is not reached because of the negative lookahead.

On Thu, Sep 29, 2016 at 9:25 AM, Alexander Jones <alex at weej.com> wrote:
> Maybe we should just make U+2028 and U+2029 valid in JS then? What other
> productions in JSON are invalid syntax in JS?

I don't think any other productions in JSON are invalid syntax in an
Expression context.

JSON places no limit on size of numeric literals, and other languages
ban unrepresentably large ones, but IIRC ES does not.

Obviously if you start parsing JSON in a statement context, you run
into problems where a JSON object with one or more properties is an
invalid BlockStatement and the ExpressionStatement production is not
reached because of the negative lookahead.

# Mike Samuel (9 years ago)

On Wed, Sep 28, 2016 at 10:06 AM, Michał Wadas <michalwadas at gmail.com> wrote:

Idea: require implementations to stringify "</script>" as "<\uxxxxscript>".

Benefits: remove XSS vulnerability when injecting JSON as content of <script> tag (quite common antipattern).

Backward compatible: yes, unless binary equality is required and this string is used.

TLDR; I'm against this.

I've pushed back against a number of threads, so I want to avoid leaving the impression that I support this proposal.

I think this is a bad idea, so let me try to pull together the various threads and address them in one place.

Should EcmaScript or any other standards body define "embeddable JSON"?

No. Standards bodies move slowly. The main argument for this feature is to make it easier to write more secure code, and to transparently make existing code more secure.

Standards bodies move too slowly. Library code can roll-out quickly in response to zero-days or emerging threats, but standards cannot.

For example, client-side templates using mustaches ( goo.gl/eztprF ) are an emerging threat.

There has been a poor history of this, even with JSON. Crock's RFC 4627 said """ A JSON text can be safely passed into JavaScript's eval() function (which compiles and executes a string) if all the characters not enclosed in strings are in the set of characters that form JSON tokens. This can be quickly determined in JavaScript with two regular expressions and calls to the test and replace methods.

  var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
         text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
     eval('(' + text + ')');

""" which is not in the latest JSON RFC because it was found to be false in a dozen ways before RFC 7158 (obsoleted) removed that language.

The only way to deal with emerging threats is to have a quickly patchable system. Patching serializers causes spurious test failures, the broken-hearts problem: assertTrue("I <3 u", serializeHtml("I <3 u")) I suspect that the best we will ever be able to do re emerging-threats is to allow those who care about security to patch and fix tests and ignore the maintenance cost to unmaintained projects.

Is there any value in embeddable sanitizers?

I think embeddable serializers can provide defense-in-depth against faults in code that composes network messages which is why I wrote OWASP/json-sanitizer to do just that.

Is this backwards compatible?

No. JSON strings are used as keys in persisted tables because we have de-facto defined a canonical subset of JSON.

This kind of thing can be discouraged by randomizing the way Java is doing with builtin map implementaions in Java 9 and helps avoid broken-hearts problems. Java is a large API language so can provide umpteen variants of x in a way that wouldn't fit well in ES, and providing an alternate API loses a lot of the benefit of the original proposal.

Are embeddable serializers an anti-pattern?

No. The anti-pattern is that trustworthy and untrustworthy content are mixed using naive string concatenation to produce a trusted output.

Even if the real anti-pattern were not endemic within distributed systems, composing trustworthy network messages is hard and embeddable serializers provide useful defense-in-depth for message composing code.

Is XHTML more easily secured than HTML?

Yes. XML is much more easily statically analyzed, and mistaken assumptions in a serializer much more frequently manifest as parse failures so fail safe more often. When the embedding language fails-safe, the whole is more secure than if you have an embedded languages that fails-safe in an embedding language which does not as is the case with JSON in HTML.

This is why, when I write an HTML sanitizer or hardened DOM serializer, I try to make the output the intersection of HTML & vanilla XML+namespaces. (This prevents use of CDATA sections, incidentally so serializers have included JS rewriters.).

At the risk of FUD though, XHTML-specific parsing branches might be simpler but have been much less heavily tested and fuzzed, so it might actually be easier to craft a buffer overflow to take over the renderer for an origin that serves XHTML than one that serves HTML exclusively.

The security of XHTML is not relevant though, because XHTML isn't used.

To anyone who is passionate about the benefits of making HTML more XML-like, I would be happy to help with a proposal to the content-security-policy team or similar body to add a switch that says that the parsing should halt as soon as it is realized that the content is not syntactically valid XML to get the fail-safe benefits of XML.

On Wed, Sep 28, 2016 at 10:06 AM, Michał Wadas <michalwadas at gmail.com> wrote:
> Idea: require implementations to stringify "</script>" as "<\uxxxxscript>".
>
> Benefits: remove XSS vulnerability when injecting JSON as content of
> <script> tag (quite common antipattern).
>
> Backward compatible: yes, unless binary equality is required and this string
> is used.

TLDR; I'm against this.

I've pushed back against a number of threads, so I want to avoid
leaving the impression that I support this proposal.

I think this is a bad idea, so let me try to pull together the various
threads and address them in one place.

Should EcmaScript or any other standards body define "embeddable JSON"?
============================================================
No.  Standards bodies move slowly.  The main argument for this feature
is to make it easier to write more secure code, and to transparently
make existing code more secure.

Standards bodies move too slowly.  Library code can roll-out quickly
in response to zero-days or emerging threats, but standards cannot.

For example, client-side templates using mustaches ( goo.gl/eztprF )
are an emerging threat.

There has been a poor history of this, even with JSON.  Crock's RFC 4627 said
"""
    A JSON text can be safely passed into JavaScript's eval() function
   (which compiles and executes a string) if all the characters not
   enclosed in strings are in the set of characters that form JSON
   tokens.  This can be quickly determined in JavaScript with two
   regular expressions and calls to the test and replace methods.

      var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
             text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
         eval('(' + text + ')');
"""
which is not in the latest JSON RFC because it was found to be false
in a dozen ways
before RFC 7158 (obsoleted) removed that language.

The only way to deal with emerging threats is to have a quickly
patchable system.  Patching serializers causes spurious test failures,
the broken-hearts problem:
   assertTrue("I <3 u", serializeHtml("I <3 u"))
I suspect that the best we will ever be able to do re emerging-threats
is to allow those who care about security to patch and fix tests and
ignore the maintenance cost to unmaintained projects.

Is there any value in embeddable sanitizers?
=================================
I think embeddable serializers can provide defense-in-depth against
faults in code that composes network messages which is why I wrote
https://github.com/OWASP/json-sanitizer to do just that.

Is this backwards compatible?
=======================
No.  JSON strings are used as keys in persisted tables because we have
de-facto defined a canonical subset of JSON.

This kind of thing can be discouraged by randomizing the way Java is
doing with builtin map implementaions in Java 9 and helps avoid
broken-hearts problems.  Java is a large API language so can provide
umpteen variants of x in a way that wouldn't fit well in ES, and
providing an alternate API loses a lot of the benefit of the original
proposal.

Are embeddable serializers an anti-pattern?
========================================
No.  The anti-pattern is that trustworthy and untrustworthy content
are mixed using naive string concatenation to produce a trusted
output.

Even if the real anti-pattern were not endemic within distributed
systems, composing trustworthy network messages is hard and embeddable
serializers provide useful defense-in-depth for message composing
code.

Is XHTML more easily secured than HTML?
======================
Yes.  XML is much more easily statically analyzed, and mistaken
assumptions in a serializer much more frequently manifest as parse
failures so fail safe more often.  When the embedding language
fails-safe, the whole is more secure than if you have an embedded
languages that fails-safe in an embedding language which does not as
is the case with JSON in HTML.

This is why, when I write an HTML sanitizer or hardened DOM
serializer, I try to make the output the intersection of HTML &
vanilla XML+namespaces.  (This prevents use of CDATA sections,
incidentally so serializers have included JS rewriters.).

At the risk of FUD though, XHTML-specific parsing branches might be
simpler but have been much less heavily tested and fuzzed, so it might
actually be easier to craft a buffer overflow to take over the
renderer for an origin that serves XHTML than one that serves HTML
exclusively.

The security of XHTML is not relevant though, because XHTML isn't used.

To anyone who is passionate about the benefits of making HTML more
XML-like, I would be happy to help with a proposal to the
content-security-policy team or similar body to add a switch that says
that the parsing should halt as soon as it is realized that the
content is not syntactically valid XML to get the fail-safe benefits
of XML.

# Mark S. Miller (9 years ago)

On Thu, Sep 29, 2016 at 9:25 AM, Alexander Jones <alex at weej.com> wrote:

Maybe we should just make U+2028 and U+2029 valid in JS then? What other productions in JSON are invalid syntax in JS?

IIRC, Doug Crockford, possibly Mike Samuel, and I (and perhaps others) advocated such a change to EcmaScript back during the transition from ES3 to ES3.1/ES5. ES differed enough between platforms in other ways that, some of us felt, it would have been worth the experiment to see if we could get away with it -- without breaking the web. We were not able to convince people to engage in that experiment then. Such an experiment would be much more expensive now, with a much lower probability of success, and with a lower payoff. I don't see it happening.

On Thursday, 29 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:

On Thu, Sep 29, 2016 at 8:45 AM, Oriol Bugzilla <oriol-bugzilla at hotmail.com> wrote:

ECMAScript, while highly used in web browsers, should really not care about HTML constructs. That's where WHATWG and W3C come in. I suggest this

type of feature should come from one of those groups, not ECMA.

That applies to escaping things like </script> or ]]>, and I agree. But as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are not valid JS expressions. I think it would make sense for JSON.stringify to escape these.

What is it that you're saying is not in TC-39's bailiwick?

Is it that w3c/whatwg should define what constitutes "embeddable JSON"?

Or is it that if it's worth defining a function that produces embeddable JSON from an EcmaScript object, that w3c/whatwg should include that in some set of EcmaScript APIs that it defines?

If you agree with my earlier claim """ We're talking about JSON serializers. Every serializers produces a subset of the output language. Choices about that sublanguage affect how easy/hard it is to use that serializer with other tools. """ then it seems that TC-39 might take embeddability into account when crafting the subset of JSON that JSON.stringify produces.

I agree that this issue belongs with TC39 much more than it belongs anywhere else. TC39's steering of JS is certainly influenced by how JS gets used in web browsers. When an issue touches both JS and browser specific concerns, it can often be unclear whose "jurisdiction" it belongs in. This one is not unclear. It should be treated as a language issue by TC39.

On Thu, Sep 29, 2016 at 9:25 AM, Alexander Jones <alex at weej.com> wrote:

> Maybe we should just make U+2028 and U+2029 valid in JS then? What other
> productions in JSON are invalid syntax in JS?

IIRC, Doug Crockford, possibly Mike Samuel, and I (and perhaps others)
advocated such a change to EcmaScript back during the transition from ES3
to ES3.1/ES5. ES differed enough between platforms in other ways that, some
of us felt, it would have been worth the experiment to see if we could get
away with it -- without breaking the web. We were not able to convince
people to engage in that experiment then. Such an experiment would be much
more expensive now, with a much lower probability of success, and with a
lower payoff. I don't see it happening.

>
> On Thursday, 29 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:
>
>> On Thu, Sep 29, 2016 at 8:45 AM, Oriol Bugzilla
>> <oriol-bugzilla at hotmail.com> wrote:
>> >> ECMAScript, while highly used in web browsers, should really not care
>> >> about HTML constructs. That's where WHATWG and W3C come in. I suggest
>> this
>> >> type of feature should come from one of those groups, not ECMA.
>> >
>> > That applies to escaping things like `</script>` or `]]>`, and I agree.
>> But
>> > as Mike Samuel mentioned, JSON strings containing U+2028 or U+2029 are
>> not
>> > valid JS expressions. I think it would make sense for `JSON.stringify`
>> to
>> > escape these.
>>
>> What is it that you're saying is not in TC-39's bailiwick?
>>
>> Is it that w3c/whatwg should define what constitutes "embeddable JSON"?
>>
>> Or is it that if it's worth defining a function that produces
>> embeddable JSON from an EcmaScript object, that w3c/whatwg should
>> include that in some set of EcmaScript APIs that it defines?
>>
>> If you agree with my earlier claim
>> """
>> We're talking about JSON serializers.  Every serializers produces
>> a subset of the output language. Choices about that sublanguage affect
>> how easy/hard it is to use that serializer with other tools.
>> """
>> then it seems that TC-39 might take embeddability into account when
>> crafting the subset of JSON that JSON.stringify produces.
>>
>

I agree that this issue belongs with TC39 much more than it belongs
anywhere else. TC39's steering of JS is certainly influenced by how JS gets
used in web browsers. When an issue touches both JS and browser specific
concerns, it can often be unclear whose "jurisdiction" it belongs in. This
one is not unclear. It should be treated as a language issue by TC39.

> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>

-- 
    Cheers,
    --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160929/4ba21d3e/attachment-0001.html>