Feature Request: Make ECMA262 a superset of JSON
Besides U+2028 and U+2029, there is also the proto key, which has a special meaning in JS as implemented in browser. That prevents definitely to "safely" embed arbitrary JSON within JS.
Step 4 of
www.ecma-international.org/ecma-262/7.0/index.html#sec-json.parse
says that __proto__
shouldn't have special meaning when parsing.
Disregard my reply as it doesn't make sense. :-)
Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.
I'll point out that encoding and evaluating JSON literally can easily
become a potential security issue, anyways (consider if a user can encode
--></script>
in the relevant data). But you might have a point if you
consider JSONP.
On Sun, Oct 16, 2016, 22:08 Richard Gibson <richard.gibson at gmail.com> wrote:
Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.
By the way, "__proto__"
is intentionally different than __proto__
to
avoid a gaping security hole for those using objects as dictionaries
(untrusted code causing prototypes to change is bad). The string version is
no different from any other property, while the identifier version either
gets or sets the prototype.
Note that ["__proto__"]
is different from __proto__
, while
"__proto__"
is the same as __proto__
.
Okay. I stand corrected (it was in fact ["__proto__"]
that actually
evaluates to the property).
Isiah Meadows me at isiahmeadows.com
On Tue, Oct 18, 2016 at 8:57 AM, Isiah Meadows <isiahmeadows at gmail.com>
wrote:
I'll point out that encoding and evaluating JSON literally can easily become a potential security issue, anyways (consider if a user can encode
--></script>
in the relevant data). But you might have a point if you consider JSONP.
This feels like it's drifting a bit. I'm asserting that the "the alternative definition of DoubleStringCharacter" required for JSON.parse can in fact become the only definition of that production, retroactively validating the currently-false claims in §24.3.1 tc39.github.io/ecma262/#sec-json.parse of JSON being a subset of
ECMAScript literals. U+2028 and U+2028 would still be line terminators for purposes of ASI/line numbering/etc., but would be allowed unescaped in string literals.
On Sun, Oct 16, 2016, 22:08 Richard Gibson <richard.gibson at gmail.com> wrote:
Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.
By the way,
"__proto__"
is intentionally different than__proto__
to avoid a gaping security hole for those using objects as dictionaries (untrusted code causing prototypes to change is bad). The string version is no different from any other property, while the identifier version either gets or sets the prototype.
While irrelevant to the proposed change, that appears to be false (though it would certainly be beneficial if true)—the special check for "proto" in Step 5 of §B.3.1 tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers
uses propKey, which step 1 defines as "the result of evaluating PropertyName", and evaluation of LiteralPropertyName in §12.2.6.7 tc39.github.io/ecma262/#sec-object-initializer-runtime-semantics-evaluation
does not preserve a distinction between IdentifierName and StringLiteral forms in its result. I also observe equal treatment of quoted and unquoted "proto" object literal properties in SpiderMonkey, V8, and Chakra.
Inline.
On Tue, Oct 18, 2016, 12:01 Richard Gibson <richard.gibson at gmail.com> wrote:
On Tue, Oct 18, 2016 at 8:57 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
I'll point out that encoding and evaluating JSON literally can easily become a potential security issue, anyways (consider if a user can encode
--></script>
in the relevant data). But you might have a point if you consider JSONP.This feels like it's drifting a bit. I'm asserting that the "the alternative definition of DoubleStringCharacter" required for JSON.parse can in fact become the only definition of that production, retroactively validating the currently-false claims in §24.3.1 tc39.github.io/ecma262/#sec-json.parse of JSON being a subset of ECMAScript literals. U+2028 and U+2028 would still be line terminators for purposes of ASI/line numbering/etc., but would be allowed unescaped in string literals.
I'll also mention that the JSON website does claim to be "based on" a subset (the JavaScript object literal syntax), but there is a subtle distinction.
I'll admit my original statement was probably not fully on topic, though (I was uncertain whether I should've said it or not).
On Sun, Oct 16, 2016, 22:08 Richard Gibson <richard.gibson at gmail.com> wrote:
Allow me to clarify: permitting U+2028 and U+2029 in ECMAScript strings would allow safely embedding arbitrary JSON in the precise sense that such embeddings would always be syntactically valid and evaluatable Literal, ArrayLiteral, or ObjectLiteral expressions. That is already true of "proto" keys, even though their evaluation results in web browsers don't exactly match JSON.parse.
By the way,
"__proto__"
is intentionally different than__proto__
to avoid a gaping security hole for those using objects as dictionaries (untrusted code causing prototypes to change is bad). The string version is no different from any other property, while the identifier version either gets or sets the prototype.While irrelevant to the proposed change, that appears to be false (though it would certainly be beneficial if true)—the special check for "proto" in Step 5 of §B.3.1 tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers uses propKey, which step 1 defines as "the result of evaluating PropertyName", and evaluation of LiteralPropertyName in §12.2.6.7 tc39.github.io/ecma262/#sec-object-initializer-runtime-semantics-evaluation does not preserve a distinction between IdentifierName and StringLiteral forms in its result. I also observe equal treatment of quoted and unquoted "proto" object literal properties in SpiderMonkey, V8, and Chakra.
I tested it myself earlier, and took back my initial statement in another email, since I think someone else beat you to it (it's computed vs not).
ECMAScript claims JSON as a subset twice in tc39.github.io/ecma262/#sec-json.parse , but (as has been well-documented) that is not true because it JSON strings can contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR while ECMAScript strings cannot. Mark Miller alludes to a pre-ES5 push for allowing them (which ultimately failed) in esdiscuss.org/topic/json-stringify-script#content-17 , and posits that a repeat today would also fail. Having never seen a windmill that didn't need slaying, I hereby make the attempt.
Aside from slightly simplifying the spec (by eliminating the need for a production specific to JSON.parse) and retroactively validating its claims, such a change to DoubleStringCharacter and SingleStringCharacter would allow safely embedding arbitrary JSON directly within ECMAScript, a request which has been made before in the context of source concatenation/construction.
User-visible effects from the change would be limited to the absence of SyntaxError s from code like
eval(' "\u2028" ')
or its raw-text equivalent.