Feature Request: Make ECMA262 a superset of JSON

# Richard Gibson (8 years ago)

ECMAScript claims JSON as a subset twice in tc39.github.io/ecma262/#sec-json.parse , but (as has been well-documented) that is not true because it JSON strings can contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR while ECMAScript strings cannot. Mark Miller alludes to a pre-ES5 push for allowing them (which ultimately failed) in esdiscuss.org/topic/json-stringify-script#content-17 , and posits that a repeat today would also fail. Having never seen a windmill that didn't need slaying, I hereby make the attempt.

Aside from slightly simplifying the spec (by eliminating the need for a production specific to JSON.parse) and retroactively validating its claims, such a change to DoubleStringCharacter and SingleStringCharacter would allow safely embedding arbitrary JSON directly within ECMAScript, a request which has been made before in the context of source concatenation/construction.

User-visible effects from the change would be limited to the absence of SyntaxError s from code like eval(' "\u2028" ') or its raw-text equivalent.

# Claude Pache (8 years ago)

Besides U+2028 and U+2029, there is also the proto key, which has a special meaning in JS as implemented in browser. That prevents definitely to "safely" embed arbitrary JSON within JS.

# Raul-Sebastian Mihăilă (8 years ago)

Step 4 of www.ecma-international.org/ecma-262/7.0/index.html#sec-json.parse says that __proto__ shouldn't have special meaning when parsing.

# Raul-Sebastian Mihăilă (8 years ago)

Disregard my reply as it doesn't make sense. :-)

# Richard Gibson (8 years ago)

Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.

# Isiah Meadows (8 years ago)

I'll point out that encoding and evaluating JSON literally can easily become a potential security issue, anyways (consider if a user can encode --></script> in the relevant data). But you might have a point if you

consider JSONP.

On Sun, Oct 16, 2016, 22:08 Richard Gibson <richard.gibson at gmail.com> wrote:

Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.

By the way, "__proto__" is intentionally different than __proto__ to avoid a gaping security hole for those using objects as dictionaries (untrusted code causing prototypes to change is bad). The string version is no different from any other property, while the identifier version either gets or sets the prototype.

# Raul-Sebastian Mihăilă (8 years ago)

Note that ["__proto__"] is different from __proto__, while "__proto__" is the same as __proto__.

# Isiah Meadows (8 years ago)

Okay. I stand corrected (it was in fact ["__proto__"] that actually evaluates to the property).


Isiah Meadows me at isiahmeadows.com

# Richard Gibson (8 years ago)

On Tue, Oct 18, 2016 at 8:57 AM, Isiah Meadows <isiahmeadows at gmail.com>

wrote:

I'll point out that encoding and evaluating JSON literally can easily become a potential security issue, anyways (consider if a user can encode --></script> in the relevant data). But you might have a point if you consider JSONP.

This feels like it's drifting a bit. I'm asserting that the "the alternative definition of DoubleStringCharacter" required for JSON.parse can in fact become the only definition of that production, retroactively validating the currently-false claims in §24.3.1 tc39.github.io/ecma262/#sec-json.parse of JSON being a subset of

ECMAScript literals. U+2028 and U+2028 would still be line terminators for purposes of ASI/line numbering/etc., but would be allowed unescaped in string literals.

On Sun, Oct 16, 2016, 22:08 Richard Gibson <richard.gibson at gmail.com> wrote:

Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.

By the way, "__proto__" is intentionally different than __proto__ to avoid a gaping security hole for those using objects as dictionaries (untrusted code causing prototypes to change is bad). The string version is no different from any other property, while the identifier version either gets or sets the prototype.

While irrelevant to the proposed change, that appears to be false (though it would certainly be beneficial if true)—the special check for "proto" in Step 5 of §B.3.1 tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers

uses propKey, which step 1 defines as "the result of evaluating PropertyName", and evaluation of LiteralPropertyName in §12.2.6.7 tc39.github.io/ecma262/#sec-object-initializer-runtime-semantics-evaluation

does not preserve a distinction between IdentifierName and StringLiteral forms in its result. I also observe equal treatment of quoted and unquoted "proto" object literal properties in SpiderMonkey, V8, and Chakra.

# Isiah Meadows (8 years ago)

Inline.

On Tue, Oct 18, 2016, 12:01 Richard Gibson <richard.gibson at gmail.com> wrote:

On Tue, Oct 18, 2016 at 8:57 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:

I'll point out that encoding and evaluating JSON literally can easily become a potential security issue, anyways (consider if a user can encode --></script> in the relevant data). But you might have a point if you consider JSONP.

This feels like it's drifting a bit. I'm asserting that the "the alternative definition of DoubleStringCharacter" required for JSON.parse can in fact become the only definition of that production, retroactively validating the currently-false claims in §24.3.1 tc39.github.io/ecma262/#sec-json.parse of JSON being a subset of ECMAScript literals. U+2028 and U+2028 would still be line terminators for purposes of ASI/line numbering/etc., but would be allowed unescaped in string literals.

I'll also mention that the JSON website does claim to be "based on" a subset (the JavaScript object literal syntax), but there is a subtle distinction.

I'll admit my original statement was probably not fully on topic, though (I was uncertain whether I should've said it or not).

On Sun, Oct 16, 2016, 22:08 Richard Gibson <richard.gibson at gmail.com> wrote:

Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.

By the way, "__proto__" is intentionally different than __proto__ to avoid a gaping security hole for those using objects as dictionaries (untrusted code causing prototypes to change is bad). The string version is no different from any other property, while the identifier version either gets or sets the prototype.

While irrelevant to the proposed change, that appears to be false (though it would certainly be beneficial if true)—the special check for "proto" in Step 5 of §B.3.1 tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers uses propKey, which step 1 defines as "the result of evaluating PropertyName", and evaluation of LiteralPropertyName in §12.2.6.7 tc39.github.io/ecma262/#sec-object-initializer-runtime-semantics-evaluation does not preserve a distinction between IdentifierName and StringLiteral forms in its result. I also observe equal treatment of quoted and unquoted "proto" object literal properties in SpiderMonkey, V8, and Chakra.

I tested it myself earlier, and took back my initial statement in another email, since I think someone else beat you to it (it's computed vs not).