JSON parser grammar

# Oliver Hunt (15 years ago)

So i've been looking at the JSON object grammar and have been talking
to brendan and i'm getting somewhat conflicting information.

The grammars on json.org and in the ES5 spec both prohibit leading 0's
on any number, but the various implementations disagree with this.

json2.js (from json.org), ie8, and chrome all support the standard ES
octal literal lexer -- eg. JSON.parse("[010]")[0] === 8

SpiderMonkey allows a leading 0 but still interprets it as a decimal
value -- eg. JSON.parse("[010]")[0] === 10

It seems to me that the spec needs to be corrected to specify what the
behaviour actually is, rather than what we wish it could be.

# Oliver Hunt (15 years ago)

So i've been looking at the JSON object grammar and have been talking
to brendan and i'm getting somewhat conflicting information.

The grammars on json.org and in the ES5 spec both prohibit leading 0's
on any number, but the various implementations disagree with this.

json2.js (from json.org), ie8, and chrome all support the standard ES
octal literal lexer -- eg. JSON.parse("[010]")[0] === 8

SpiderMonkey allows a leading 0 but still interprets it as a decimal
value -- eg. JSON.parse("[010]")[0] === 10

It seems to me that the spec needs to be corrected to specify what the
behaviour actually is, rather than what we wish it could be.

# Mark S. Miller (15 years ago)

On Tue, Jun 2, 2009 at 7:06 PM, Oliver Hunt <oliver at apple.com> wrote:

So i've been looking at the JSON object grammar and have been talking to brendan and i'm getting somewhat conflicting information.

The grammars on json.org and in the ES5 spec both prohibit leading 0's on any number, but the various implementations disagree with this.

json2.js (from json.org), ie8, and chrome all support the standard ES octal literal lexer -- eg. JSON.parse("[010]")[0] === 8

SpiderMonkey allows a leading 0 but still interprets it as a decimal value -- eg. JSON.parse("[010]")[0] === 10

It seems to me that the spec needs to be corrected to specify what the behaviour actually is, rather than what we wish it could be.

Since octal wasn't an official part of ES3, remains absent from official ES5, and is now explicitly prohibited from ES5/strict, it is good that it is not specified by JSON. I am surprised that json2.js accepts the syntax, and even more surprised that it interprets it as octal. Although the rfc says

A JSON parser transforms a JSON text into another representation. A JSON parser MUST accept all texts that conform to the JSON grammar. A JSON parser MAY accept non-JSON forms or extensions.

I think the behavior you state of json2.js, ie8, and chrome should be considered a bug. I hesitate to make the same statement about SpiderMonkey, because their behavior falls within both the letter and spirit of the rfc, while maintaining the subset relationship between JSON and EcmaScript.

I asked Crock and he clarified why json2.js has this bug. json2.js relies on eval to parse the json. For safety it guards this eval with regular expressions. These regular expressions are already too complicated to be confident in their safety, so it wasn't worth adding complexity for a non-safety issue.

As for how json2.js interprets these numbers -- according to eval's interpretation on the underlying platform.

# Oliver Hunt (15 years ago)

On Jun 2, 2009, at 7:26 PM, Mark S. Miller wrote:

Since octal wasn't an official part of ES3, remains absent from
official ES5, and is now explicitly prohibited from ES5/strict, it
is good that it is not specified by JSON. I am surprised that
json2.js accepts the syntax, and even more surprised that it
interprets it as octal. Although the rfc says

A JSON parser transforms a JSON text into another
representation. A JSON parser MUST accept all texts that conform to the JSON grammar. A JSON parser MAY accept non-JSON forms or extensions. I think the behavior you state of json2.js, ie8, and chrome should
be considered a bug. I hesitate to make the same statement about
SpiderMonkey, because their behavior falls within both the letter
and spirit of the rfc, while maintaining the subset relationship
between JSON and EcmaScript.

I'm not talking about the RFC, i'm talking about the ES5 spec. I
guess it would be in the spirit of the RFC for the ES5 spec to define
a JSON grammar that was more (or less) lax than the the RFC, but the
ES5 spec itself should not allow variation between implementations
that would be considered "valid" as historically any place in ES that
has undefined "valid" behaviour has proved to be a compatibility
problem later on. Currently I can make a string containing a JSON
object that will produce different output (or not produce output at
all) across multiple implementations that are all "correct" -- this
seems like something that is just inviting disaster.

The json.org grammar allows the following set of characters in a string

  • Any unicode character except ", , or a control character
  • ", \, /, \b, \f, \n, \r, \t, or \u four-hex-digits

The ES5 spec is the same, only it defines "control character" as any
character less than 0x20, and drops escaped unicode. I'm inclined to
believe that dropping the unicode escaping is likely to be a typo- esque error, the exclusion of control characters seems deliberate but
has the effect of disallowing tab characters (among others). My
testing seems to imply that mozilla allows all control characters in a
JSON string literal including newlines, so i'd like clarification on
what is actually allowed.

# Allen Wirfs-Brock (15 years ago)

See inline

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Oliver Hunt Sent: Tuesday, June 02, 2009 8:59 PM To: Mark S.Miller Cc: es-discuss at mozilla.org Subject: Re: JSON parser grammar

On Jun 2, 2009, at 7:26 PM, Mark S. Miller wrote: I'm not talking about the RFC, i'm talking about the ES5 spec. I guess it would be in the spirit of the RFC for the ES5 spec to define a JSON grammar that was more (or less) lax than the the RFC, but the ES5 spec itself should not allow variation between implementations that would be considered "valid" as historically any place in ES that has undefined "valid" behaviour has proved to be a compatibility problem later on.

The intent was for the ES5 JSON grammar to exactly match the JSON RFC grammar. If you think it is different, then you may have found a bug so let's make sure...

The ES5 spec intentionally doesn't include the " JSON parser MAY accept non-JSON forms or extensions." language from the RFC but the general extension allowance given in section 16 are probably sufficient to allow a conforming ES5 implementation of JSON.parse to accept non-JSON forms or extension. See more below...

Currently I can make a string containing a JSON object that will produce different output (or not produce output at all) across multiple implementations that are all "correct" -- this seems like something that is just inviting disaster.

Examples, please? The intent is that applying JSON.parse to a string containing a valid JSON form should produce an equivalent set of objects on all conforming ES5 implementation.

The json.org grammar allows the following set of characters in a string

  • Any unicode character except ", , or a control character
  • ", \, /, \b, \f, \n, \r, \t, or \u four-hex-digits

The ES5 spec is the same, only it defines "control character" as any character less than 0x20,

The JSON RFC also defines control character in this way: " All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)."

and drops escaped unicode.

No it doesn't (from the grammar in 15.12.1.1:

JSONStringCharacter :: JSONSourceCharacter but not double-quote " or backslash
\ JSONEscapeSequence

JSONEscapeSequence :: JSONEscapeCharacter UnicodeEscapeSequence <------------

I'm inclined to

believe that dropping the unicode escaping is likely to be a typo- esque error, the exclusion of control characters seems deliberate but has the effect of disallowing tab characters (among others).

It identically matches the RFC

My testing seems to imply that mozilla allows all control characters in a JSON string literal including newlines, so i'd like clarification on what is actually allowed.

Step 2 of 15.12.1 (JSON.parse) seems pretty clear in this regard: 2. Parse JText using the grammars in 15.12.1. Throw a SyntaxError exception if the JText did not conform to the JSON grammar for the goal symbol JSONValue.

A string containing control characters does not does not conform JSONString so a SyntaxError should be thrown.

However, section 16 says: " all operations (...) that are allowed to throw SyntaxError are permitted to exhibit implementation-defined behaviour instead of throwing SyntaxError when they encounter an implementation-defined extension to the program syntax or regular expression pattern or flag syntax."

We can probably debate whether this extension allowance includes or should include JSON.parse. I probably could be convinced that it should not but there seems to be a strong history of tolerance of almost correct inputs by JavaScript implementations so I don't know whether or not we could get consensus on that.

# Robert Sayre (15 years ago)

On Wed, Jun 3, 2009 at 1:27 AM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:

The intent was for the ES5 JSON grammar to exactly match the JSON RFC grammar.  If you think it is different, then you may have found a bug so let's make sure...

It definitely doesn't match, on purpose. For example, the RFC requires JSON strings to represent objects (or arrays) at the root, no primitives allowed.

Examples, please? The intent is that applying JSON.parse to a string containing a valid JSON form should produce an equivalent set of objects on all conforming ES5 implementation.

JSON.parse("[010]")

should be an error, per spec. Nobody follows the spec though...

# Allen Wirfs-Brock (15 years ago)

-----Original Message----- From: Robert Sayre [mailto:sayrer at gmail.com] Sent: Tuesday, June 02, 2009 10:33 PM To: Allen Wirfs-Brock Cc: Oliver Hunt; Mark S.Miller; Rob Sayre; es-discuss at mozilla.org Subject: Re: JSON parser grammar

On Wed, Jun 3, 2009 at 1:27 AM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:

The intent was for the ES5 JSON grammar to exactly match the JSON RFC grammar.  If you think it is different, then you may have found a bug so let's make sure...

It definitely doesn't match, on purpose. For example, the RFC requires JSON strings to represent objects (or arrays) at the root, no primitives allowed.

You're right, we did intentionally allow top level primitives.

Examples, please? The intent is that applying JSON.parse to a string containing a valid JSON form should produce an equivalent set of objects on all conforming ES5 implementation.

JSON.parse("[010]")

should be an error, per spec. Nobody follows the spec though...

As I read them neither the RFC or the current ES5 JSON grammar recognize "[010]" as a valid JSON form, so according to the ES5 spec. a syntax error should be thrown. If we really want all implementation to accept "010" as a JSONNumber then we should specify it as such. Of course we have to define what it means (decimal, octal??).

My inclination would be to require ES5 implementation to exactly conform the whatever JSON grammar we provide and to throw syntax errors if the input doesn't exactly conform to the grammar. (in other say that the section 16 extension allowance doesn't apply to JSON.parse. If an implementation wants to support JSON syntax extensions it could always do so by providing a JSON.parseExtended function (or whatever they want to call it) that uses an implementation defined grammar.

# Mark S. Miller (15 years ago)

On Tue, Jun 2, 2009 at 10:56 PM, Allen Wirfs-Brock < Allen.Wirfs-Brock at microsoft.com> wrote:

My inclination would be to require ES5 implementation to exactly conform the whatever JSON grammar we provide and to throw syntax errors if the input doesn't exactly conform to the grammar. (in other say that the section 16 extension allowance doesn't apply to JSON.parse. If an implementation wants to support JSON syntax extensions it could always do so by providing a JSON.parseExtended function (or whatever they want to call it) that uses an implementation defined grammar.

+1.

# Oliver Hunt (15 years ago)

On Jun 2, 2009, at 11:09 PM, Rob Sayre wrote:

On 6/3/09 1:56 AM, Allen Wirfs-Brock wrote:

My inclination would be to require ES5 implementation to exactly
conform the whatever JSON grammar we provide and to throw syntax
errors if the input doesn't exactly conform to the grammar. (in
other say that the section 16 extension allowance doesn't apply to
JSON.parse. If an implementation wants to support JSON syntax
extensions it could always do so by providing a JSON.parseExtended
function (or whatever they want to call it) that uses an
implementation defined grammar.

I could live with that. But since the ES5 grammar does not match the
RFC, and no one is shipping a fully conformant JSON.parse
implementation right now, we should consider whether we want to
allow or disallow each case that comes up. Mozilla has a few edge
cases that stem from RFC 4627, Section 4. [1]

1.) leading zeros are parsed as decimal numbers (octal seems like a
bug no matter what, per MarkM)

IE8 and V8's JSON implementation, and json2.js at json.org all
interpret 010, as octal (eg. 8), and 009 as 9

2.) trailing commas in objects and arrays are allowed ({"foo": 42,"bar":42,})

V8's JSON implementation also accepts [1,,,2]

3.) tabs and linebreaks are allowed in JSON strings (but
JSON.stringify produces escape sequences, per spec)

My testing shows that only '' (excluding actual escape sequences) and
'"' are prohibited -- all other values from 0-0xFFFF are allowed.

# Brendan Eich (15 years ago)

On Jun 3, 2009, at 11:12 AM, Oliver Hunt wrote:

1.) leading zeros are parsed as decimal numbers (octal seems like a
bug no matter what, per MarkM) IE8 and V8's JSON implementation, and json2.js at json.org all
interpret 010, as octal (eg. 8), and 009 as 9

Those look like bugs ;-).

The "noctal" (0377 is 255 but 0800 is 800) in JS since 1995 is surely
the original bug, but we're stuck with it to some extent. We shouldn't
spread it to JSON.

The ES specs mostly try to ignore noctal and hope it goes away, which
can be a good strategy if there's a better mousetrap leading
developers away from the attractive nuisance. But no one intentionally
uses octal or noctal, AFAICT. Only perhaps by accident, and I know of
no real-world mistakes of this kind (but I can believe they're out
there still).

2.) trailing commas in objects and arrays are allowed ({"foo": 42,"bar":42,}) V8's JSON implementation also accepts [1,,,2]

Good for it! :-)

3.) tabs and linebreaks are allowed in JSON strings (but
JSON.stringify produces escape sequences, per spec) My testing shows that only '' (excluding actual escape sequences)
and '"' are prohibited -- all other values from 0-0xFFFF are allowed.

Seems like our bug.

# Oliver Hunt (15 years ago)

On Jun 3, 2009, at 11:18 AM, Rob Sayre wrote:

On 6/3/09 2:12 PM, Oliver Hunt wrote:

1.) leading zeros are parsed as decimal numbers (octal seems like
a bug no matter what, per MarkM)

IE8 and V8's JSON implementation, and json2.js at json.org all
interpret 010, as octal (eg. 8), and 009 as 9

Yes, I understand. Do you see why strict mode makes this behavior
undesirable?

I'm not saying it makes the behaviour desirable, i'm commenting on the
fact that any time implementations have been lax eventually all
implementations become lax. I for one welcome our octal-free
overlords ;)

2.) trailing commas in objects and arrays are allowed ({"foo": 42,"bar":42,}) V8's JSON implementation also accepts [1,,,2]

What does it produce? An array with holes, or an array with null
members?

An array with holes -- in so far as i can tell V8's json object
exactly matches the result of eval(string), just prohibiting arbitrary
code execution.

# Allen Wirfs-Brock (15 years ago)

See below

-----Original Message----- From: Oliver Hunt [mailto:oliver at apple.com] ...

On Jun 2, 2009, at 11:09 PM, Rob Sayre wrote:

On 6/3/09 1:56 AM, Allen Wirfs-Brock wrote: ...

1.) leading zeros are parsed as decimal numbers (octal seems like a bug no matter what, per MarkM) IE8 and V8's JSON implementation, and json2.js at json.org all interpret 010, as octal (eg. 8), and 009 as 9

I'm not sure how you are testing IE8, but in my tests of IE8 JSON.parse('010') yields a syntax error (as currently specified by ES5) while JSON.parse('10') Returns the number 10.

json2.js is probably producing the results you see on IE because internally it uses eval and IE supports octal literal with the semantics you observed. Are you sure, you are actually running the native JSON when you seem to see octal being accepted? Native JSON is only enabled if your page is operating in "IE8 standards" mode.

2.) trailing commas in objects and arrays are allowed ({"foo": 42,"bar":42,}) V8's JSON implementation also accepts [1,,,2]

IE8 syntax errors on both '[1,]' and '[1,,3]' as currently specified by ES5

3.) tabs and linebreaks are allowed in JSON strings (but JSON.stringify produces escape sequences, per spec) My testing shows that only '' (excluding actual escape sequences) and '"' are prohibited -- all other values from 0-0xFFFF are allowed.

IE8 allows all control characters except or NUL,LF,CR to appear unescaped in JSON string literals Violates/extends current ES5 spec.

As far as I can tell, this is an unintended extension in IE8 that is a result of reusing some parts of the JavaScript lexer for JSON.

# Erik Arvidsson (15 years ago)

The V8 implementation is a pretty early implementation and I would consider all of the issues raised here to be bugs in it.

V8 actually just compiles the json as ordinary js:

www.google.com/codesearch/p?hl=en#W9JxUuHYyMg/trunk/src/json-delay.js&q=ParseJSONUnfiltered&l=30

I'm CCing Christian Plesner Hansen who wrote the JSON.parse method for V8 as well as v8-users.

# Douglas Crockford (15 years ago)

Allen Wirfs-Brock wrote:

JSON.parse("[010]")

should be an error, per spec. Nobody follows the spec though...

As I read them neither the RFC or the current ES5 JSON grammar recognize "[010]" as a valid JSON form, so according to the ES5 spec. a syntax error should be thrown. If we really want all implementation to accept "010" as a JSONNumber then we should specify it as such. Of course we have to define what it means (decimal, octal??).

My inclination would be to require ES5 implementation to exactly conform the whatever JSON grammar we provide and to throw syntax errors if the input doesn't exactly conform to the grammar. (in other say that the section 16 extension allowance doesn't apply to JSON.parse. If an implementation wants to support JSON syntax extensions it could always do so by providing a JSON.parseExtended function (or whatever they want to call it) that uses an implementation defined grammar.

I agree. It is not helpful to developers to allow weird forms on browser A but not on browser B. What should be allowed is clearly described in the E5 spec.

# Allen Wirfs-Brock (15 years ago)

I want to bring this discussion around to focus on concrete points that we need to make decisions on.

  1. There is a bug in the ES5 candidate spec. in that it says that: JSONSourceCharacter :: SourceCharacter but not U+0000 thru U+001F

This is pretty clearly bogus as it means that tabs and new line characters cannot occur anywhere in JSON source text (not just string literals). I'll probably fix it by simply equating JSONSourceCharter to SourceCharacter.

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.

My inclination is to say we should disallow such open-ended extensions. As I suggest earlier, an implementation can always provide a non-standard extended parse function if it wants to support an extended grammar.

  1. If we disallow JSON grammar extensions (for JSON.parse) should we extend the existing grammar with some Postel's Law flexibility?

I could accept this for cases where we have some evidence that there are actual JSON encoders in the wild that violate/extend the JSON grammar in the identified manner.

Here are the individual cases that I know of to consider:

a) Allow strings, numbers, Booleans, and null in addition to objects and arrays as top level JSON text.

The ES5 spec. already has this although it isn't in the RFC. I haven't heard any suggestions that we remove it.

b) Permit leading zeros on numbers either with or without octal implications.

I'm with Brendan on this, I don't think we should let octal constants into JSON. I don't have deep problem with leading zeroes for decimal constants but given the historic octal interpretation within JavaScript it is probably safer to syntax error than to simply ignore leading zeros.

Does anyone know of any encoders or uses that actually insert leading 0's?

c) Trailing commas in objects and arrays

Are there encoders that do this or are we just anticipating that there might be manually generated files where this is convenient?

I could go either way on this one but would prefer some supporting evidence

d) Holes in arrays, eg [1,,3]

I don't think we should allow it unless we know there are encoders that generate it was acceptable to legacy eval based parsers.

e) Allow some/all control characters to appear unescaped in JSON string literals. Which ones?

Might be plausible. Crock, why did you originally forbid them? Are there known encoders that pass through such characters without escaping them?

f) Allow single quotes within JSON text as string delimiters

I'm not really suggesting we allow this, but I'm told that at least one major web site has done this.

Any other possible Postelisms? I have to say, that going through this list I don't find many of them very compelling.

Votes??

# Douglas Crockford (15 years ago)
  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize?

No. An implementation has the license to support other formats (such as an XML object or a SuperJSON object). But the JSON object should recognize only the JSON forms described by ES5. There should be no Chapter 16 squishiness here.

  1. If we disallow JSON grammar extensions (for JSON.parse) should we extend the existing grammar with some Postel's Law flexibility?

a) Allow strings, numbers, Booleans, and null in addition to objects and arrays as top level JSON text.

Yes. This turns out to be very useful.

b) Permit leading zeros on numbers either with or without octal implications.

No. Clearly we don't want octal. Allowing octally forms invites confusion.

Does anyone know of any encoders or uses that actually insert leading 0's?

I do not know of any. If they did exist, they would be in violation of the JSON rules.

c) Trailing commas in objects and arrays

This is a hazard for hand coding, just as with object literals. JSON was intended for machine-to-machine communication, so I prefer to not allow extra commas.

d) Holes in arrays, eg [1,,3]

No holes.

e) Allow some/all control characters to appear unescaped in JSON string literals. Which ones?

No. Keep it simple.

Are there known encoders that pass through such characters without escaping them?

Not that I know of. Again, would be in violation.

f) Allow single quotes within JSON text as string delimiters

No.

# Mark S. Miller (15 years ago)

The JSON RFC, by including the escape clause "A JSON parser MAY accept non-JSON forms or extensions", admits non-validating parsers. The table at code.google.com/p/json-sans-eval gives us some good

terminology. The reason we need JSON to be provided by platforms rather than libraries is that we desire JSON parsers that are simultaneously fast, secure, and validating. Unfortunately, it also specifies only (<object> | <array>) as a valid start symbol for

parsing.

On Wed, Jun 3, 2009 at 12:59 PM, Douglas Crockford <douglas at crockford.com> wrote:

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize?

No. An implementation has the license to support other formats (such as an XML object or a SuperJSON object). But the JSON object should recognize only the JSON forms described by ES5. There should be no Chapter 16 squishiness here.

Crock, is your position that ES5 should specify a validating JSON parse exactly equivalent to the parse specified in the RFC (i.e., waiving the escape clause), but with JSON <value> as the start symbol?

If so, then I agree.

Are there any other differences between the RFC and ES5 besides the start symbol and the RFC's escape clause? If so, can we repair all of them?

Has anyone tested the annoying \u2028 and \u2029 issue on current implementations? I'd guess there are currently bugs here as well.

# Douglas Crockford (15 years ago)

Mark S. Miller wrote:

Crock, is your position that ES5 should specify a validating JSON parse exactly equivalent to the parse specified in the RFC (i.e., waiving the escape clause), but with JSON <value> as the start symbol? If so, then I agree.

Yes.

# Douglas Crockford (15 years ago)

Mark S. Miller wrote:

Crock, is your position that ES5 should specify a validating JSON parse exactly equivalent to the parse specified in the RFC (i.e., waiving the escape clause), but with JSON <value> as the start symbol? If so, then I agree.

Yes. Then we are in agreement.

# Robert Sayre (15 years ago)

On Wed, Jun 3, 2009 at 4:59 PM, Douglas Crockford <douglas at crockford.com> wrote:

Mark S. Miller wrote:

Crock, is your position that ES5 should specify a validating JSON parse exactly equivalent to the parse specified in the RFC (i.e., waiving the escape clause), but with JSON <value> as the start symbol? If so, then I agree.

Yes. Then we are in agreement.

OK, so, all such deviations will be considered bugs by implementations that purport to conform. Right?

# Mark S. Miller (15 years ago)

On Wed, Jun 3, 2009 at 2:10 PM, Robert Sayre <sayrer at gmail.com> wrote:

OK, so, all such deviations will be considered bugs by implementations that purport to conform. Right?

Yes.

# Oliver Hunt (15 years ago)

On Jun 3, 2009, at 2:15 PM, Mark S. Miller wrote:

On Wed, Jun 3, 2009 at 2:10 PM, Robert Sayre <sayrer at gmail.com> wrote:

OK, so, all such deviations will be considered bugs by
implementations that purport to conform. Right?

Yes.

Awesome.

# Waldemar Horwat (15 years ago)

Here are my views on this.

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.

Yes, unless we want to have a confusing proliferation of different JSON method names as we add new data types to the language in the future.

  1. If we disallow JSON grammar extensions (for JSON.parse) should we extend the existing grammar with some Postel's Law flexibility?

No, except for things we explicitly discuss and approve here.

Here are the individual cases that I know of to consider:

a) Allow strings, numbers, Booleans, and null in addition to objects and arrays as top level JSON text.

Yes.

b) Permit leading zeros on numbers either with or without octal implications.

No.

c) Trailing commas in objects and arrays

No.

d) Holes in arrays, eg [1,,3]

No.

e) Allow some/all control characters to appear unescaped in JSON string literals. Which ones?

Don't care, as long as all of them (except line terminators) are treated alike.

f) Allow single quotes within JSON text as string delimiters

No.

Waldemar
# Douglas Crockford (15 years ago)

Waldemar Horwat wrote:

Here are my views on this.

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.

JSON is done. JSON will not be revised. Someday it might be replaced and that replacement will have a different name and likely a different model. Chapter 16 should not give a license to fiddle with the JSON grammar.

# Waldemar Horwat (15 years ago)

Douglas Crockford wrote:

Waldemar Horwat wrote:

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.

JSON is done. JSON will not be revised. Someday it might be replaced and that replacement will have a different name and likely a different model. Chapter 16 should not give a license to fiddle with the JSON grammar.

OK, so we need not discuss any new numeric types any further in committee because it would be impossible to round-trip them through JSON. Do we have agreement on that?

Waldemar
# Allen Wirfs-Brock (15 years ago)

-----Original Message----- From: Waldemar Horwat [mailto:waldemar at google.com]

OK, so we need not discuss any new numeric types any further in committee because it would be impossible to round-trip them through JSON. Do we have agreement on that?

I think that's reality. Languages with multiple numeric types already have to deal with encoding to/from standard JSON format. My understanding is the goal of JSON is simple data interchange rather than supporting the union of all data types available in all languages now or in the future.

# Douglas Crockford (15 years ago)

Waldemar Horwat wrote:

Douglas Crockford wrote:

Waldemar Horwat wrote:

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.

JSON is done. JSON will not be revised. Someday it might be replaced and that replacement will have a different name and likely a different model. Chapter 16 should not give a license to fiddle with the JSON grammar.

OK, so we need not discuss any new numeric types any further in committee because it would be impossible to round-trip them through JSON. Do we have agreement on that?

Not necessarily. What we can agree on is that new numeric types cannot impose changes on the JSON syntax.

# Waldemar Horwat (15 years ago)

Douglas Crockford wrote:

Waldemar Horwat wrote:

OK, so we need not discuss any new numeric types any further in committee because it would be impossible to round-trip them through JSON. Do we have agreement on that?

Not necessarily. What we can agree on is that new numeric types cannot impose changes on the JSON syntax.

Which means that it would be impossible to round-trip them through JSON. Anything that uses the existing syntax already has a set interpretation as an existing ECMAScript object.

Waldemar
# Breton Slivka (15 years ago)

On Thu, Jun 4, 2009 at 11:25 AM, Waldemar Horwat <waldemar at google.com> wrote:

Which means that it would be impossible to round-trip them through JSON.  Anything that uses the existing syntax already has a set interpretation as an existing ECMAScript object.

Sorry to bungle into this conversation, but just out of curiosity, does there exist a specification that requires a JSON number to be interpreted as an ecmascript Number* ? Would it be violating any spec if a future javascript implementation interpreted the numbers in a JSON object as some other Number type? The way I see it, it is already a problem that some JSON supporting languages cannot round trip some of their native number types. Was type accurate round tripping one of the original goals of JSON? Has it become a goal?

*as in the IEEE 64-bit floating point type that javascript uses by default.

# Christian Plesner Hansen (15 years ago)

(Sorry for the repeat, my first attempt bounced because I wasn't subscribed to es-discuss)

The V8 implementation is a pretty early implementation and I would consider all of the issues raised here to be bugs in it.

It is indeed an early implementation but the decision to use the source parser directly is deliberate. Because of the way json2.js works anyone currently using it would get our implementation in the future and I didn't want to change the behavior in subtle ways under people's feet.

It's unclear to me from reading this thread if any other browsers actually implement JSON correctly according to ES5. If none do, how sure are we that these changes won't break people's programs?

As for octal literals I agree that they're not a good idea. However, adding new special cases to avoid them is a cure worse than the disease. It adds to the complexity of the language in an area that is already dangerously complex.

# Mark S. Miller (15 years ago)

On Thu, Jun 4, 2009 at 1:19 AM, Christian Plesner Hansen <christian.plesner.hansen at gmail.com> wrote:

It's unclear to me from reading this thread if any other browsers actually implement JSON correctly according to ES5.

More, it seems clear from reading this thread that none do.

If none do, how sure are we that these changes won't break people's programs?

"Sure" is never an option ;). Our relative confidence comes from our knowledge of the behavior of various popular json emitters. In particular, we know of none that emit octal.

As for octal literals I agree that they're not a good idea.  However, adding new special cases to avoid them is a cure worse than the disease.  It adds to the complexity of the language in an area that is already dangerously complex.

Since octal syntax will currently be interpreted as decimal on some browsers and octal on others, even by json.js (since it delegates to eval), there is no safe way to allow this syntax. By allowing them, one's JSON.parse is no longer validating. It will not complain about data that will be interpreted differently by different browsers.

Recall that we felt we needed to add JSON support to the spec because JS library code cannot provide parsers that are simultaneously fast, safe, and validating. If all you want is safe and fast, json_sans_eval is a fine solution that works on existing browsers.

# Mark S. Miller (15 years ago)

On Thu, Jun 4, 2009 at 7:42 AM, Mark S. Miller <erights at google.com> wrote:

Since octal syntax will currently be interpreted as decimal on some browsers and octal on others, even by json.js (since it delegates to eval),

Oops. Meant "json2.js".

# Allen Wirfs-Brock (15 years ago)

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Christian Plesner Hansen

(Sorry for the repeat, my first attempt bounced because I wasn't subscribed to es-discuss)

Please join...

The V8 implementation is a pretty early implementation and I would consider all of the issues raised here to be bugs in it.

It is indeed an early implementation but the decision to use the source parser directly is deliberate. Because of the way json2.js works anyone currently using it would get our implementation in the future and I didn't want to change the behavior in subtle ways under people's feet.

It's unclear to me from reading this thread if any other browsers actually implement JSON correctly according to ES5. If none do, how sure are we that these changes won't break people's programs?

The IE8 implementation also started with json2.js as the model and evolved in response to ES5 developments prior to IE8 production release. There are a few variances between the currently shipping IE8 JSON implementation and the current ES5 spec. A couple of these are bugs in our implementation and others are the result of late churn in the E5 specification (some of which we contributed to based upon our IE8 experiences). Almost all the variances deal with API edge cases and unusual situations (for example, a Number wrapper object returned from a stringify replacer function might be rendered as an object rather than a numeric literal). Almost all of the known issues have been found by our internal testing and analysis of the algorithms and haven't shown up in user problem reports. Regardless, we intend to evolve our implementation to strictly conform to the final ES5 spec. as quickly as we can.

In releasing the IE8 JSON support most of the problem reports we received weren't related to differences between the IE8 implementation and json2.js. We actually discovered that jason2.js was not as widely used as we thought. Most of the issues we received involved encoding/decoding differences between our implementation (and jason2.js) and other encoders that applications were using. We also encountered issues where people were using the jason2 API names but applying them to differing, home grown APIs encoders/decoders.

So far, I think the implementers that have been actively engaged with ES5 development within TC39 have been doing a pretty good job of collaborating to make sure that all our JSON implementations are highly compatible with each other and with the emerging standard. Please feel free to join in.

As for octal literals I agree that they're not a good idea. However, adding new special cases to avoid them is a cure worse than the disease. It adds to the complexity of the language in an area that is already dangerously complex.

Octal literals have never been part of the JSON format and support for them by some JSON parsers is likely a side-effect of those parsers use of eval. This sort of implementation accident is exactly the sort of thing we want to make sure doesn't get unnecessarily enshrined in either standards or future implementations. Not supporting JSON octal literals adds no complexity to the ES5 JSON spec. because they are simply not part of the format and are never mentioned. It only adds complexity to implementations if they are trying to reuse their JavaScript lexer (assuming it supports octal constants) to lex JSON text and there are already enough other differences between the JSON and ECMAScript token set that it isn't clear that this is a good idea. Regardless, ES5 lexers are already required to not recognize octal number when operating in strict mode.

# Christian Plesner Hansen (15 years ago)

"Sure" is never an option ;). Our relative confidence comes from our knowledge of the behavior of various popular json emitters. In particular, we know of none that emit octal.

It takes just one home-made emitter on a popular site somewhere for this to break down. On the other hand if there are already browsers that never interpret numbers as octal and haven't experienced any problems with it then that's pretty close to "sure".

Since octal syntax will currently be interpreted as decimal on some browsers and octal on others, even by json.js (since it delegates to eval), there is no safe way to allow this syntax. By allowing them, one's JSON.parse is no longer validating. It will not complain about data that will be interpreted differently by different browsers.

That sounds reasonable, though I'm unsure if the value of validation justifies making the language more complex. If we're already in a situation where nobody uses octal numbers in their json then validation would seem unnecessary.

Recall that we felt we needed to add JSON support to the spec because JS library code cannot provide parsers that are simultaneously fast, safe, and validating. If all you want is safe and fast, json_sans_eval is a fine solution that works on existing browsers.

I wasn't aware of the validation aspect. Also, I don't know of any truly safe and fast json implementations that work on existing browsers (no offense to json2.js).

# Mark S. Miller (15 years ago)

On Thu, Jun 4, 2009 at 9:02 AM, Christian Plesner Hansen <christian.plesner.hansen at gmail.com> wrote:

I wasn't aware of the validation aspect.  Also, I don't know of any truly safe and fast json implementations that work on existing browsers (no offense to json2.js).

code.google.com/p/json-sans-eval We're happily using it in Caja.

We do not consider json2.js to be safe enough.

# Mark S. Miller (15 years ago)

On Thu, Jun 4, 2009 at 8:07 AM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:

[...] Not supporting JSON octal literals adds no complexity to the ES5 JSON spec. because they are simply not part of the format and are never mentioned. It only adds complexity to implementations if they are trying to reuse their JavaScript lexer (assuming it supports octal constants) to lex JSON text and there are already enough other differences between the JSON and ECMAScript token set that it isn't clear that this is a good idea. Regardless, ES5 lexers are already required to not recognize octal number when operating in strict mode.

No, it does add some complexity to the spec differently from the sense in which octal is prohibited in ES5/strict. In ES5/strict code, the literal 010 must be interpreted at a decimal 10. In JSON.parse, 010 must be rejected. In order to mandate this, the standard JSON grammar (as documented both in the RFC and on json.org) does not allow any digits after a leading zero. I support this added complexity for all the reasons discussed. But it is undeniably added complexity.

# Christian Plesner Hansen (15 years ago)

[snip]

In releasing the IE8 JSON support most of the problem reports we received weren't related to differences between the IE8 implementation and json2.js.  We actually discovered that jason2.js was not as widely used as we thought.  Most of the issues we received involved encoding/decoding differences between our implementation (and jason2.js)  and other encoders that applications were using. We also encountered issues where people were using the jason2 API names but applying them to differing, home grown APIs encoders/decoders.

Great, it sounds like there is no reason to expect any problems from conforming to the spec. I have filed the incompatibilities as a bug against our implementation.

Octal literals have never been part of the JSON format and support for them by some JSON parsers is likely a side-effect of those parsers use of eval.  This sort of implementation accident is exactly the sort of thing we want to make sure doesn't get unnecessarily enshrined in either standards or future implementations. Not supporting JSON octal literals adds no complexity to the ES5 JSON spec. because they are simply not part of the format and are never mentioned. It only adds complexity to implementations if they are trying to reuse their JavaScript lexer (assuming it supports octal constants) to lex JSON text and there are already enough other differences between the JSON and ECMAScript token set that it isn't clear that this is a good idea.  Regardless, ES5 lexers are already required to not recognize octal number when operating in strict mode.

What I mean when I say that it increases complexity is that having different interpretations of something that otherwise appears to be the same thing adds to the cognitive overhead of using the language. With 'eval' and 'JSON.parse' you have two functions that behave deceivingly similarly, and that take similar input. Big obvious differences between the two are easy to deal with but making subtle differences between them invites confusion, in particular in this case because there is already several different interpretations of number literals in different contexts.

The original post and this current discussion illustrates my point: subtle differences invite confusion.

# Allen Wirfs-Brock (15 years ago)

Conforming consensus...

I want to make sure that I understand the consensus of last week's discussion on this thread so I can update the spec. accordingly. Below is the decision points that I sent out last week. I've annotated it with what I believe was the consensus of the discussion. Let me know if anybody disagrees that this is actually the consensus conclusions.

There are two new items at the end of the list that came up in the thread. It's not clear to me whether there was consensus on the final item.

Are there any other points that need to be captured?

Allen

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Allen Wirfs-Brock Sent: Wednesday, June 03, 2009 12:43 PM To: Rob Sayre; Oliver Hunt Cc: Mark S.Miller; es-discuss at mozilla.org; Douglas Crockford; Robert Sayre Subject: RE: JSON parser grammar

I want to bring this discussion around to focus on concrete points that we need to make decisions on.

  1. There is a bug in the ES5 candidate spec. in that it says that: JSONSourceCharacter :: SourceCharacter but not U+0000 thru U+001F

This is pretty clearly bogus as it means that tabs and new line characters cannot occur anywhere in JSON source text (not just string literals). I'll probably fix it by simply equating JSONSourceCharter to SourceCharacter.

Do apparent disagreement. This is just a simple bug fix.

  1. Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.

My inclination is to say we should disallow such open-ended extensions. As I suggest earlier, an implementation can always provide a non- standard extended parse function if it wants to support an extended grammar.

We will require strict conformance with no extensions from the ES5 specificaiton.

  1. If we disallow JSON grammar extensions (for JSON.parse) should we extend the existing grammar with some Postel's Law flexibility?

I could accept this for cases where we have some evidence that there are actual JSON encoders in the wild that violate/extend the JSON grammar in the identified manner.

Here are the individual cases that I know of to consider:

a) Allow strings, numbers, Booleans, and null in addition to objects and arrays as top level JSON text.

Yes we keep this extension in the specification

The ES5 spec. already has this although it isn't in the RFC. I haven't heard any suggestions that we remove it.

b) Permit leading zeros on numbers either with or without octal implications.

I'm with Brendan on this, I don't think we should let octal constants into JSON. I don't have deep problem with leading zeroes for decimal constants but given the historic octal interpretation within JavaScript it is probably safer to syntax error than to simply ignore leading zeros.

Does anyone know of any encoders or uses that actually insert leading 0's?

No, no octal and no leading 0 interpretation as decimal

c) Trailing commas in objects and arrays

Are there encoders that do this or are we just anticipating that there might be manually generated files where this is convenient?

I could go either way on this one but would prefer some supporting evidence

no

d) Holes in arrays, eg [1,,3]

I don't think we should allow it unless we know there are encoders that generate it was acceptable to legacy eval based parsers.

no

e) Allow some/all control characters to appear unescaped in JSON string literals. Which ones?

Might be plausible. Crock, why did you originally forbid them? Are there known encoders that pass through such characters without escaping them?

no

f) Allow single quotes within JSON text as string delimiters

I'm not really suggesting we allow this, but I'm told that at least one major web site has done this.

No

Any other possible Postelisms? I have to say, that going through this list I don't find many of them very compelling.

Allow JSON.parse to recognize unquote property name.

No

In addition of code units in the range 0x0000-0x001f JSON.stringify inserts escape sequences into string literals for some or all of the following code units: 0x007f-0x009f, 0x00ad, 0x0600-0x0604,0x070f,0x17bf,0x17b5,0x200c-0x200f,0x2028-0x202f,0x2060-0x206f,0xfeff,0xfff0-0xffff

Was there any consensus that at least some of these code points should be escapeded? If show which if not all?

# Mark Miller (15 years ago)

On Mon, Jun 8, 2009 at 11:19 AM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:

Conforming consensus...

I want to make sure that I understand the consensus of last week's discussion on this thread so I can update the spec. accordingly.  Below is the decision points that I sent out last week.  I've annotated it with what I believe was the consensus of the discussion.  Let me know if anybody disagrees that this is actually the consensus conclusions. [...]

I agree with all the decisions above.

In addition of code units in the range 0x0000-0x001f JSON.stringify inserts escape sequences into string literals for some or all of the following code units: 0x007f-0x009f, 0x00ad, 0x0600-0x0604,0x070f,0x17bf,0x17b5,0x200c-0x200f,0x2028-0x202f,0x2060-0x206f,0xfeff,0xfff0-0xffff

Was there any consensus that at least some of these code points should be escapeded?  If show which if not all?

I think it is important that 0x2028 and 0x2029 be escaped. I have no opinion about the others.

# Douglas Crockford (15 years ago)

In addition of code units in the range 0x0000-0x001f JSON.stringify inserts escape sequences into string literals for some or all of the following code units: 0x007f-0x009f, 0x00ad, 0x0600-0x0604,0x070f,0x17bf,0x17b5,0x200c-0x200f,0x2028-0x202f,0x2060-0x206f,0xfeff,0xfff0-0xffff

There is no harm in doing this, and it will improve interoperability with ES3.

# Hallvord R. M. Steen (15 years ago)

On Wed, 03 Jun 2009 21:42:46 +0200, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:

a) Allow strings, numbers, Booleans, and null in addition to objects and
arrays as top level JSON text. The ES5 spec. already has this although it isn't in the RFC. I haven't
heard any suggestions that we remove it.

This may be a stupid question, but..

How can you allow "strings" as top level JSON text and keep the 15.12.2 step 2 requirement that JSON.parse() must throw if the input isn't valid JSON source text? In other words, how is an implementation supposed to know if I'm passing in a string of random content that should be "parsed" into a string or a malformed piece of JSON that should cause an exception?

# Oliver Hunt (15 years ago)

On Jul 3, 2009, at 3:41 PM, Hallvord R. M. Steen wrote:

On Wed, 03 Jun 2009 21:42:46 +0200, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com

wrote:

a) Allow strings, numbers, Booleans, and null in addition to
objects and arrays as top level JSON text. The ES5 spec. already has this although it isn't in the RFC. I
haven't heard any suggestions that we remove it.

This may be a stupid question, but..

How can you allow "strings" as top level JSON text and keep the
15.12.2 step 2 requirement that JSON.parse() must throw if the input
isn't valid JSON source text? In other words, how is an
implementation supposed to know if I'm passing in a string of random
content that should be "parsed" into a string or a malformed piece
of JSON that should cause an exception?

A piece of text is either a string literal or it is not -- i suspect
you're confusing JSON.parse("foo") where you are passing a string
containing the characters f,o and o with JSON.parse(""foo"") in
which the string contains the characters ",f,o,o and " -- eg. a
string literal.

# Hallvord R. M. Steen (15 years ago)

a) Allow strings, numbers, Booleans, and null in addition to objects
and arrays as top level JSON text. The ES5 spec. already has this although it isn't in the RFC. I haven't heard any suggestions that we remove it.

How can you allow "strings" as top level JSON text?

A piece of text is either a string literal or it is not -- i suspect
you're confusing JSON.parse("foo") where you are passing a string
containing the characters f,o and o with JSON.parse(""foo"") in which
the string contains the characters ",f,o,o and " -- eg. a string
literal.

Indeed I was, particularly since IE8's implementation doesn't seem to understand this string-inside-string feature yet so when I tried this earlier I remained confused :-p. Thanks for clarifying.

Another question: The JSON grammar says

JSONNumber :: -opt DecimalIntegerLiteral JSONFraction opt ExponentPart opt

JSONFraction :: . DecimalDigits

This apparently makes numbers like "1." illegal? Should this really throw:

JSON.parse('[1.]') ?

And what about JSON.parse('[1.e10]') ?

Both are of course allowed in normal JavaScript source text.

# Douglas Crockford (15 years ago)

Hallvord R. M. Steen wrote:

Another question: The JSON grammar says

JSONNumber :: -opt DecimalIntegerLiteral JSONFraction opt ExponentPart opt

JSONFraction :: . DecimalDigits

This apparently makes numbers like "1." illegal? Should this really throw:

JSON.parse('[1.]') ?

And what about JSON.parse('[1.e10]') ?

Both are of course allowed in normal JavaScript source text.

There are lots of things accepted by JavaScript that are not part of JSON.

# Allen Wirfs-Brock (15 years ago)

IE8 does correctly process: JSON.parse(""foo"") I just tried it and it worked fine. There are a few discrepancies between the IE8 JSON implementation and the current ES5 draft. See blogs.msdn.com/jscript/archive/2009/06/23/native-json-support-in-ie8-and-tracking-the-ecmascript-fifth-edition-draft-specification.aspx for details.

# Garrett Smith (14 years ago)

On 6/3/09, Douglas Crockford <douglas at crockford.com> wrote:

Allen Wirfs-Brock wrote:

JSON.parse("[010]")

should be an error, per spec. Nobody follows the spec though...

As I read them neither the RFC or the current ES5 JSON grammar recognize "[010]" as a valid JSON form, so according to the ES5 spec. a syntax error should be thrown. If we really want all implementation to accept "010" as a JSONNumber then we should specify it as such. Of course we have to define what it means (decimal, octal??).

My inclination would be to require ES5 implementation to exactly conform the whatever JSON grammar we provide and to throw syntax errors if the input doesn't exactly conform to the grammar. (in other say that the section 16 extension allowance doesn't apply to JSON.parse. If an implementation wants to support JSON syntax extensions it could always do so by providing a JSON.parseExtended function (or whatever they want to call it) that uses an implementation defined grammar.

I agree. It is not helpful to developers to allow weird forms on browser A but not on browser B. What should be allowed is clearly described in the E5 spec.

On 6/3/09, Douglas Crockford <douglas at crockford.com> wrote:

Allen Wirfs-Brock wrote:

JSON.parse("[010]")

should be an error, per spec. Nobody follows the spec though...

As I read them neither the RFC or the current ES5 JSON grammar recognize "[010]" as a valid JSON form, so according to the ES5 spec. a syntax error should be thrown. If we really want all implementation to accept "010" as a JSONNumber then we should specify it as such. Of course we have to define what it means (decimal, octal??).

My inclination would be to require ES5 implementation to exactly conform the whatever JSON grammar we provide and to throw syntax errors if the input doesn't exactly conform to the grammar. (in other say that the section 16 extension allowance doesn't apply to JSON.parse. If an implementation wants to support JSON syntax extensions it could always do so by providing a JSON.parseExtended function (or whatever they want to call it) that uses an implementation defined grammar.

I agree. It is not helpful to developers to allow weird forms on browser A but not on browser B. What should be allowed is clearly described in the E5 spec.

That's right.

A wrapper for JSON should be consistent and if native JSON support isn't, then what does the wrapper do, follow the ES5 spec or follow what the browsers do? What if no browsers follow the spec? Does that mean we can't use native JSON at all?

The wrapper can use capability checks to determine if native JSON is buggy and if it is, use a fallback. I've provided a sample capability test below, for those who don't know what I mean.

Most of the questions on Grammar were answered in this thread, however, the question of U+0009 as a JSONStringCharacter remains. All major browsers allowi U+0009 in JSONString. What should the capability test check? If all major browsers parse without error ' "\t" ' to result in a string with the character U+0009, then the feature test can detect that failing and use the fallback.

JSON.parse accepting U+0009 in strings is now part of public API and major libraries rely on that. Is going to be codified?

To summarize, the pressing questions are:

  1. Is the spec going to change to allow U+0009? And if it isn't, why not? and
  2. What should the fallback for JSON.parse use? Should it: (a) go by the letter of the spec and perform a capability test to expect that an error is thrown for JSON.parse(' "\t" '), or (b) go with what seems to be a de facto standard and allow U+0009 as JSONStringCharacter?

There is a need to answer these questions so that code can be written using JSON. It's not just an academic exercise here. I would like to provide an advisable solution as an FAQ Entry the comp.lang.javascript FAQ.

Example of a feature test:

// Incomplete and untested. var IS_JSON_PARSE_SUPPORTED = function() { var IS_JSON_PARSE_SUPPORTED = typeof JSON == "object" && typeof JSON.parse == "function" && isJSONParserCompliant();

function isJSONParserCompliant() { // TODO: add more checks for known failings // in major implementations. return canParseNumbers(); }

function canParseNumbers() { try { JSON.parse("010"); return false; // error expected. } catch(ex) { } return true; } return IS_JSON_PARSE_SUPPORTED; }();

Garrett

# Douglas Crockford (14 years ago)

On 6/22/2010 5:28 PM, Garrett Smith wrote:

Most of the questions on Grammar were answered in this thread, however, the question of U+0009 as a JSONStringCharacter remains. All major browsers allowi U+0009 in JSONString. What should the capability test check? If all major browsers parse without error ' "\t" ' to result in a string with the character U+0009, then the feature test can detect that failing and use the fallback.

JSON.parse accepting U+0009 in strings is now part of public API and major libraries rely on that. Is going to be codified?

To summarize, the pressing questions are:

  1. Is the spec going to change to allow U+0009? And if it isn't, why not? and
  2. What should the fallback for JSON.parse use? Should it: (a) go by the letter of the spec and perform a capability test to expect that an error is thrown for JSON.parse(' "\t" '), or (b) go with what seems to be a de facto standard and allow U+0009 as JSONStringCharacter?

This has already been asked and answered. We are going with a strict interpretation of the JSON standard.

# Oliver Hunt (14 years ago)

On Jun 22, 2010, at 6:06 PM, Douglas Crockford wrote:

On 6/22/2010 5:28 PM, Garrett Smith wrote:

Most of the questions on Grammar were answered in this thread, however, the question of U+0009 as a JSONStringCharacter remains. All major browsers allowi U+0009 in JSONString. What should the capability test check? If all major browsers parse without error ' "\t" ' to result in a string with the character U+0009, then the feature test can detect that failing and use the fallback.

JSON.parse accepting U+0009 in strings is now part of public API and major libraries rely on that. Is going to be codified?

To summarize, the pressing questions are:

  1. Is the spec going to change to allow U+0009? And if it isn't, why not? and
  2. What should the fallback for JSON.parse use? Should it: (a) go by the letter of the spec and perform a capability test to expect that an error is thrown for JSON.parse(' "\t" '), or (b) go with what seems to be a de facto standard and allow U+0009 as JSONStringCharacter?

This has already been asked and answered. We are going with a strict interpretation of the JSON standard.

I believe that implication is that the strict JSON spec should be changed. Currently all implementations of "JSON" are incorrect in the respect, which to me implies that the spec is wrong.

# Dean Landolt (14 years ago)

On Tue, Jun 22, 2010 at 9:34 PM, Oliver Hunt <oliver at apple.com> wrote:

On Jun 22, 2010, at 6:06 PM, Douglas Crockford wrote:

On 6/22/2010 5:28 PM, Garrett Smith wrote:

Most of the questions on Grammar were answered in this thread, however, the question of U+0009 as a JSONStringCharacter remains. All major browsers allowi U+0009 in JSONString. What should the capability test check? If all major browsers parse without error ' "\t" ' to result in a string with the character U+0009, then the feature test can detect that failing and use the fallback.

JSON.parse accepting U+0009 in strings is now part of public API and major libraries rely on that. Is going to be codified?

To summarize, the pressing questions are:

  1. Is the spec going to change to allow U+0009? And if it isn't, why not? and
  1. What should the fallback for JSON.parse use? Should it: (a) go by the letter of the spec and perform a capability test to expect that an error is thrown for JSON.parse(' "\t" '), or (b) go with what seems to be a de facto standard and allow U+0009 as JSONStringCharacter?

This has already been asked and answered. We are going with a strict interpretation of the JSON standard.

I believe that implication is that the strict JSON spec should be changed. Currently all implementations of "JSON" are incorrect in the respect, which to me implies that the spec is wrong.

But that's the rub -- the JSON spec cannot be changed. It (intentionally) has no version number. ECMA could superset it -- ES-JSON, if you will -- which could specifically allow \t, but this could not strictly be considered JSON, and would break in many JSON parsers in the wild.

Perhaps there's value in ECMA taking on such a task (they're in a unique position to get real traction behind a superset of JSON, and we all have a wishlist of JSON extensions). But it certainly wouldn't be JSON.

# Oliver Hunt (14 years ago)

On Jun 22, 2010, at 7:07 PM, Dean Landolt wrote:

On Tue, Jun 22, 2010 at 9:34 PM, Oliver Hunt <oliver at apple.com> wrote:

But that's the rub -- the JSON spec cannot be changed. It (intentionally) has no version number. ECMA could superset it -- ES-JSON, if you will -- which could specifically allow \t, but this could not strictly be considered JSON, and would break in many JSON parsers in the wild.

Perhaps there's value in ECMA taking on such a task (they're in a unique position to get real traction behind a superset of JSON, and we all have a wishlist of JSON extensions). But it certainly wouldn't be JSON.

I just looked through a few of the json parsers listed on json.org, and of the sample I looked at all accept tab characters. Which parsers don't?

As far as I can tell, all the major browsers accept tabs, as do many other json parsers, at brief inspection it seems that the defacto (vs. actual) JSON spec allows tabs.

# Dean Landolt (14 years ago)

On Tue, Jun 22, 2010 at 10:20 PM, Oliver Hunt <oliver at apple.com> wrote:

On Jun 22, 2010, at 7:07 PM, Dean Landolt wrote:

On Tue, Jun 22, 2010 at 9:34 PM, Oliver Hunt <oliver at apple.com> wrote:

But that's the rub -- the JSON spec cannot be changed. It (intentionally) has no version number. ECMA could superset it -- ES-JSON, if you will -- which could specifically allow \t, but this could not strictly be considered JSON, and would break in many JSON parsers in the wild.

Perhaps there's value in ECMA taking on such a task (they're in a unique position to get real traction behind a superset of JSON, and we all have a wishlist of JSON extensions). But it certainly wouldn't be JSON.

I just looked through a few of the json parsers listed on json.org, and of the sample I looked at all accept tab characters. Which parsers don't?

As far as I can tell, all the major browsers accept tabs, as do many other json parsers, at brief inspection it seems that the defacto (vs. actual) JSON spec allows tabs.

There are countless JSON parsers in the wild -- likely > 1 for almost every

obscure language in existence, not counting all the one-offs. Any number of these were written with the expectation of not expecting control characters -- not too long ago I wrote a .NET streaming parser that I'm fairly sure would blow up at the first sight of a \t.

I know many of us in the ES community tend to prefer a Postel's Law approach -- and as long as tabs are always properly stringified it's not a huge interop problem. Still, an argument could be made that with browsers accepting known-bad input (per the JSON spec) it could encourage fragmentation (albeit it minor) of the one format that's really delivered on the promise of true interoperability.

# Luke Smith (14 years ago)

On Jun 22, 2010, at 7:20 PM, Oliver Hunt wrote:

On Jun 22, 2010, at 7:07 PM, Dean Landolt wrote:

On Tue, Jun 22, 2010 at 9:34 PM, Oliver Hunt <oliver at apple.com>
wrote:

But that's the rub -- the JSON spec cannot be changed. It
(intentionally) has no version number. ECMA could superset it -- ES- JSON, if you will -- which could specifically allow \t, but this
could not strictly be considered JSON, and would break in many JSON
parsers in the wild.

Perhaps there's value in ECMA taking on such a task (they're in a
unique position to get real traction behind a superset of JSON, and
we all have a wishlist of JSON extensions). But it certainly
wouldn't be JSON.

I just looked through a few of the json parsers listed on json.org,
and of the sample I looked at all accept tab characters. Which
parsers don't?

Browser native implementations are a recent development and are
subject to ES5, not just RFC 4627, so comparing to non-browser libs is
a bit of an apples and oranges comparison (or pears and asian pears
perhaps).

As far as I can tell, all the major browsers accept tabs, as do many
other json parsers, at brief inspection it seems that the defacto
(vs. actual) JSON spec allows tabs.

The language in the ES5 spec (vs 4627) is pretty clear with to
conformance, and there are plenty of other violations in every browser
implementation that still need to be ironed out, so labeling the
status quo as defacto seems premature. It seems to me that each
vendor has holes to patch in their implementation and it just so
happens they all have this one.

And regarding js library abstractions, I disagree that there is enough
stability in the JSON implementations to state that major libraries
rely on the parsing of tabs. I think "tolerate" is closer to the
truth. I certainly do not rely on native implementations continuing
not to conform to a spec that, relatively speaking, is fresh out of
the gate. I would rather see the implementations get cleaned up and
follow spec than agree this early to disregard it.

L

# Oliver Hunt (14 years ago)

On Jun 22, 2010, at 8:17 PM, Dean Landolt wrote:

There are countless JSON parsers in the wild -- likely > 1 for almost every obscure language in existence, not counting all the one-offs. Any number of these were written with the expectation of not expecting control characters -- not too long ago I wrote a .NET streaming parser that I'm fairly sure would blow up at the first sight of a \t.

I am not suggesting that browser should not escape tabs, the issue is whether a conforming JSON implementation is allowed to parse a string that includes tab characters.

# Dean Landolt (14 years ago)

On Tue, Jun 22, 2010 at 11:27 PM, Oliver Hunt <oliver at apple.com> wrote:

On Jun 22, 2010, at 8:17 PM, Dean Landolt wrote:

There are countless JSON parsers in the wild -- likely > 1 for almost every obscure language in existence, not counting all the one-offs. Any number of these were written with the expectation of not expecting control characters -- not too long ago I wrote a .NET streaming parser that I'm fairly sure would blow up at the first sight of a \t.

I am not suggesting that browser should not escape tabs, the issue is whether a conforming JSON implementation is allowed to parse a string that includes tab characters.

Yes, I know: see my next paragraph.

# Garrett Smith (14 years ago)

On 6/22/10, Luke Smith <lsmith at lucassmith.name> wrote:

On Jun 22, 2010, at 7:20 PM, Oliver Hunt wrote:

On Jun 22, 2010, at 7:07 PM, Dean Landolt wrote:

On Tue, Jun 22, 2010 at 9:34 PM, Oliver Hunt <oliver at apple.com> wrote:

[...]

As far as I can tell, all the major browsers accept tabs, as do many other json parsers, at brief inspection it seems that the defacto (vs. actual) JSON spec allows tabs.

Most but not all. Look: IE9b and BESEN (though not a browser) both throw SyntaxError on U+0009 in JSONString:

alert( JSON.parse(' "\t" ') );

(error message in IE9 says: "Invalid Character".)

And regarding js library abstractions, I disagree that there is enough stability in the JSON implementations to state that major libraries rely on the parsing of tabs. I think "tolerate" is closer to the truth. I certainly do not rely on native implementations continuing not to conform to a spec that, relatively speaking, is fresh out of the gate. I would rather see the implementations get cleaned up and follow spec than agree this early to disregard it.

Prototype and jQuery don't check to make sure the string doesn't contain \0-\x1f. json2.js doesn't and I would not be surprised if other libraries didn't either. Basically, these scripts do minimal feature tests on global JSON (json2.js merely checks for existence). If global JSON exists, it is used, and if it does not exist, then a fallback is used. The result can be guaranteed to be widely inconsistent depending on the browser, version, and in IE, even document mode.

Any application that uses any library that uses unfiltered JSON.parse or json2.js may be expecting TAB to continue to work so changing that will probably break things for some environments. I can see why implementations might want to allow TAB, and most major browsers do, despite the fact that such extensions are explicitly forbidden.

I mean, I can see why Opera would allow \t in JSONString because otherwise, people would be complaining "it doesn't work in Opera."

Having varied behavior depending on the environment and the input supplied to JSON.parse is no good and having a de facto standard that contradicts the spec is no good.

| 15.12 The JSON Object | | [...] | | Conforming implementations of JSON.parse and JSON.stringify must support | the exact interchange format described in this specification without any deletions | or extensions to the format. This differs from RFC 4627 which permits a | JSON parser to accept non-JSON forms and extensions.

If all major implementations are going to start disallowing U+0009 in JSONString -- and according to what Doug wrote earlier today they are -- then the spec can stay as-is. Otherwise, if major implementations want to allow \t in JSONString, then they should argue for that case here and argue that the spec be changed and if they lose that argument, then they must not continue to violate the spec.

Garrett

# Jorge (14 years ago)

On 23/06/2010, at 06:09, Garrett Smith wrote:

(...) If all major implementations are going to start disallowing U+0009 in JSONString -- and according to what Doug wrote earlier today they are -- then the spec can stay as-is. Otherwise, if major implementations want to allow \t in JSONString, then they should argue for that case here and argue that the spec be changed and if they lose that argument, then they must not continue to violate the spec.

A good reason for not accepting unescaped tab chars (nor any other control chars) inside JSONStrings is that control chars often can't be transmitted (nor even copy-pasted) safely. The web and http is not all there's to JSON.

If anybody, in any library, is depending on not escaping (any of) them properly, they ought to stop doing so.

IOW, IMO, this is ok:

[].forEach.call(a= '{"c":"\t"}',function(c){console.log([c,c.charCodeAt(0)])}) ["{", 123] [""", 34] ["c", 99] [""", 34] [":", 58] [""", 34] ["", 92] ["t", 116] [""", 34] ["}", 125]

JSON.parse(a).c.charCodeAt(0) --> 9

But this isn't: it's a bug (b is not a valid JSON text):

[].forEach.call(b= '{"c":"\t"}',function(c){console.log([c,c.charCodeAt(0)])}) ["{", 123] [""", 34] ["c", 99] [""", 34] [":", 58] [""", 34] [" ", 9] [""", 34] ["}", 125]

JSON.parse(b).c.charCodeAt(0) --> 9

(Note how the tab in the wrong one is being displayed as [" ", 9] a bunch of spaces, it's been converted to that -automatically, not by me- somewhere along the copy/pasting/editing of this email...)

OTOH, Crockford/this committee might want for some reason to choose to "be conservative in what you send, liberal in what you accept". Or not.

# Mark S. Miller (14 years ago)

On Tue, Jun 22, 2010 at 8:27 PM, Oliver Hunt <oliver at apple.com> wrote:

On Jun 22, 2010, at 8:17 PM, Dean Landolt wrote:

There are countless JSON parsers in the wild -- likely > 1 for almost every obscure language in existence, not counting all the one-offs. Any number of these were written with the expectation of not expecting control characters -- not too long ago I wrote a .NET streaming parser that I'm fairly sure would blow up at the first sight of a \t.

I am not suggesting that browser should not escape tabs, the issue is whether a conforming JSON implementation is allowed to parse a string that includes tab characters.

AFAIK, none of the other JSON implementations at json.org claim to be a validating parser. If what you want is a valid but non-validating JSON implementation, code.google.com/p/json-sans-eval is fine as a JS

library. It is fast and safe. Much of the point of having ES5 include a JSON implementation is that no one's been able to write in JS a JSON parser that's fast, safe, and validating.

# Allen Wirfs-Brock (14 years ago)

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- ...

I am not suggesting that browser should not escape tabs, the issue is whether a conforming JSON implementation is allowed to parse a string that includes tab characters.

This issue was explicitly discussed and decided within TC39 while drafting and reviewing the ES5 specification. The final decisions is reflected in the final normative paragraph of 15.12.2 (Jason.parse):

"It is not permitted for a conforming implementation of JSON.parse to extend the JSON grammars. If an implementation wishes to support a modified or extended JSON interchange format it must do so by defining a different parse function."

Given the general tolerance of implementation specific extensions within the ECMAScript specification, this was not just a casual decision. The decision was made with the knowledge that many existing JSON parsers supported an extended grammar. To the extent that the spec. differ from any implementation that predate it, that is intentional and the expectation is that those implementation would need to change to be in conformance.

Nothing in the spec. says that an conforming JSON implemented is not allowed to parse a string that includes (unescaped) tab characters. It just says that the function that does so may not be named JSON.parse. If you want to provide in your conforming implementation support for parsing an extended form of the JSON grammar, feel free. Just call it JSON. parsePostel or anything else other than JSON.parse.

# Mark S. Miller (14 years ago)

On Tue, Jun 22, 2010 at 8:17 PM, Dean Landolt <dean at deanlandolt.com> wrote:

I know many of us in the ES community tend to prefer a Postel's Law approach -- and as long as tabs are always properly stringified it's not a huge interop problem. Still, an argument could be made that with browsers accepting known-bad input (per the JSON spec) it could encourage fragmentation (albeit it minor) of the one format that's really delivered on the promise of true interoperability.

Yes. On the web, as the sorry history of browsers shows too clearly, Eich's law may be the more relevant one:

# Brendan Eich (14 years ago)

On Jun 23, 2010, at 8:17 AM, Mark S. Miller wrote:

On Tue, Jun 22, 2010 at 8:17 PM, Dean Landolt <dean at deanlandolt.com> wrote: I know many of us in the ES community tend to prefer a Postel's Law approach -- and as long as tabs are always properly stringified it's not a huge interop problem. Still, an argument could be made that with browsers accepting known-bad input (per the JSON spec) it could encourage fragmentation (albeit it minor) of the one format that's really delivered on the promise of true interoperability.

Yes. On the web, as the sorry history of browsers shows too clearly, Eich's law may be the more relevant one:

From Dave Herman's blog at calculist.blogspot.com/2010/02/eichs-law.html:

Found this gem in a C++ comment while digging in the SpiderMonkey codebase: After much testing, it's clear that Postel's advice to protocol designers ("be liberal in what you accept, and conservative in what you send") invites a natural-law repercussion for JS as "protocol": "If you are liberal in what you accept, others will utterly fail to be conservative in what they send." The comment is unsigned, but it sounds like Brendan.

I still think the Robustness Principle in full [1] is worth upholding. Validation, especially in developer facing modes or tools (error consoles, debuggers), is important.

But tardy validation that annoys users by trying in vain to repeal a de-facto standard is worse than pointless. You don't get a second chance (dherman's blog post cites a comment I wrote about how <!-- comments, treated by JS in browsers as single-line comments, leak from inline script content into out-of-line .js files). And browser vendors will not risk losing market share on quixotic quests.

Nevertheless, if we can ban TAB per the ES5 JSON spec, let's do it. SpiderMonkey currently allows it as noted, but someone will file a bug (if it's not already on file).

I don't know whether TABs are an issue in the field that could make browsers playing the Prisoner's Dilemma defect from that spec. It is possible, since as Ollie notes so many extant JSON implementations allow them. OTOH TABs are rare in my experience. Worth discussing and (what would be much more useful) field testing. IE9 is head on that front.

But I hate tab characters, so I'm rooting for the home team here ;-).

/be

[1] en.wikipedia.org/wiki/Robustness_principle

# Mark S. Miller (14 years ago)

On Wed, Jun 23, 2010 at 12:11 PM, Brendan Eich <brendan at mozilla.com> wrote:

On Jun 23, 2010, at 8:17 AM, Mark S. Miller wrote:

On Tue, Jun 22, 2010 at 8:17 PM, Dean Landolt <dean at deanlandolt.com>wrote:

I know many of us in the ES community tend to prefer a Postel's Law approach -- and as long as tabs are always properly stringified it's not a huge interop problem. Still, an argument could be made that with browsers accepting known-bad input (per the JSON spec) it could encourage fragmentation (albeit it minor) of the one format that's really delivered on the promise of true interoperability.

Yes. On the web, as the sorry history of browsers shows too clearly, Eich's law may be the more relevant one:

From Dave Herman's blog at < calculist.blogspot.com/2010/02/eichs-law.html>:

Found this gem in a C++ comment while digging in the SpiderMonkey codebasemxr.mozilla.org/mozilla-central/source/js/src/jsscan.cpp#1464 :

After much testing, it's clear that Postel's advice to protocol designers ("be liberal in what you accept, and conservative in what you send") invites a natural-law repercussion for JS as "protocol":

"If you are liberal in what you accept, others will utterly fail to be conservative in what they send."

The comment is unsigned, but it sounds like Brendan.

I still think the Robustness Principle in full [1] is worth upholding. Validation, especially in developer facing modes or tools (error consoles, debuggers), is important.

But tardy validation that annoys users by trying in vain to repeal a de-facto standard is worse than pointless. You don't get a second chance (dherman's blog post cites a comment I wrote about how <!-- comments, treated by JS in browsers as single-line comments, leak from inline script content into out-of-line .js files). And browser vendors will not risk losing market share on quixotic quests.

Nevertheless, if we can ban TAB per the ES5 JSON spec, let's do it. SpiderMonkey currently allows it as noted, but someone will file a bug (if it's not already on file).

Done: bugzilla.mozilla.org/show_bug.cgi?id=574153, code.google.com/p/v8/issues/detail?id=751, bugs.webkit.org/show_bug.cgi?id=41102