[Json] JSON: remove gap between Ecma-404 and IETF draft

# Allen Wirfs-Brock (13 years ago)

On Nov 13, 2013, at 1:27 PM, Paul Hoffman wrote:

<no hat>

On Nov 13, 2013, at 12:24 PM, Joe Hildebrand (jhildebr) <jhildebr at cisco.com> wrote:

We would also need to change section 8.1 according to the mechanism that was previously proposed:

00 00 00 xx UTF-32BE 00 xx ?? xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx ?? UTF-16LE xx xx ?? ?? UTF-8

in order to account for strings at the top level whose first character has a codepoint greater than 127.

A string at the top level of a JSON text still needs to start with an ASCII " character, so the logic is still fine, I believe.

Carsten's point about whitespace is more problematic. Does the ECMA-404 definition of a JSON text allow it to start with one (or more) whitespace characters? The text in that document says: . . . A JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value grammar. The set of tokens includes six structural tokens, strings, numbers, and three literal name tokens. . . . Insignificant whitespace is allowed before or after any token. . . .

It would be nice if ECMA-404 was clearer on this, given that the racetrack illustrations show everything other than the whitespace. In specific, it would be good to know whether or not the racetrack for "value" in Section 5 is meant to have optional whitespace at the left and right to match the above text. If TC39 could say for certain on that, it would be useful to the community.

Yes, leading white space is allowed:

"The set of tokens includes the six structural tokens, strings, numbers, ..." (emphasis added)

"Insignificant whitespace is allowed before or after any token"

The elements matched by the value production are all tokens (or productions that begin and end with a token) so whitespace can occur to the left or right of any value

On Nov 13, 2013, at 1:27 PM, Paul Hoffman wrote:

> <no hat>
> 
> On Nov 13, 2013, at 12:24 PM, Joe Hildebrand (jhildebr) <jhildebr at cisco.com> wrote:
> 
>> We would also need to change section 8.1 according to the mechanism that
>> was previously proposed:
>> 
>> 00 00 00 xx  UTF-32BE
>>   00 xx ?? xx  UTF-16BE
>>   xx 00 00 00  UTF-32LE
>>   xx 00 xx ?? UTF-16LE
>>   xx xx ?? ?? UTF-8
>> 
>> in order to account for strings at the top level whose first character has
>> a codepoint greater than 127.
> 
> A string at the top level of a JSON text still needs to start with an ASCII " character, so the logic is still fine, I believe.
> 
> Carsten's point about whitespace is more problematic. Does the ECMA-404 definition of a JSON text allow it to start with one (or more) whitespace characters? The text in that document says:
> . . .
> A JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value grammar. The set of tokens includes six structural tokens, strings, numbers, and three literal name tokens.
> . . .
> Insignificant whitespace is allowed before or after any token.
> . . .
> 
> It would be nice if ECMA-404 was clearer on this, given that the racetrack illustrations show everything other than the whitespace. In specific, it would be good to know whether or not the racetrack for "value" in Section 5 is meant to have optional whitespace at the left and right to match the above text. If TC39 could say for certain on that, it would be useful to the community.

Yes, leading white space is allowed:

"The set of tokens includes the six structural tokens, *strings*,  *numbers*, ..."  (emphasis added)

"Insignificant whitespace is allowed before or after any token"

The elements matched by the value production are all tokens (or productions that begin and end with a token) so whitespace can occur to the left or right of any value

Allen

# Mark Davis ☕ (13 years ago)

On Wed, Nov 13, 2013 at 3:51 PM, Joe Hildebrand (jhildebr) < jhildebr at cisco.com> wrote:

that all software implementations which receive the un-prefixed text will not generate parse errors."

perhaps:

...all conformant software ...

Mark google.com/+MarkDavis

— Il meglio è l’inimico del bene —

On Wed, Nov 13, 2013 at 3:51 PM, Joe Hildebrand (jhildebr) <
jhildebr at cisco.com> wrote:

> that all software implementations
> which receive the un-prefixed text will not generate parse errors."
>

perhaps:

...all conformant software ...



Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131113/ec91bedb/attachment.html>

# Allen Wirfs-Brock (13 years ago)

On Nov 13, 2013, at 3:51 PM, Joe Hildebrand (jhildebr) wrote:

On 11/13/13 3:47 PM, "John Cowan" <cowan at mercury.ccil.org> wrote:

It's not clear that 404 disallows it, since 404 is defined in terms of characters, and a BOM is not a character but an out-of-band signal.

However, for example, a conforming implementation of the ECMAScript JSON.parse function would reject any string passed to it that starts with a U+FFEF code point because the unquoted occurrence of that code point does not conform to the ECMA-252, 5th Edition or Ecma-404 JSON grammar.

In order to be successfully processed, that code point would have to be stripped from the string prior to calling JSON.parse.

Allen

Agree. However, that signal would be a part of the 4627bis octet stream, so a little interop guidance would likely be useful. Something like:

"Some producers of JSON produce JSON-text that starts with a redundant U+FEFF (ZERO WIDTH NO-BREAK SPACE, previously known as BYTE ORDER MARK) with the ostensible purpose of signaling the encoding of the text to follow. Since JSON has other mechanisms to determine encoding, this is not required. Receiving applications MAY safely ignore this initial character without generating an error. Implementations that do not send U+FEFF are interoperable in the sense that all software implementations which receive the un-prefixed text will not generate parse errors."

Isn't it an interooperbility issue that many receiving applications do not ignore it.

On Nov 13, 2013, at 3:51 PM, Joe Hildebrand (jhildebr) wrote:

> On 11/13/13 3:47 PM, "John Cowan" <cowan at mercury.ccil.org> wrote:
> 
>> It's not clear that 404 disallows it, since 404 is defined in terms of
>> characters, and a BOM is not a character but an out-of-band signal.

However, for example, a conforming implementation of the ECMAScript JSON.parse function would reject any string passed to it that starts with a U+FFEF code point because the unquoted occurrence of that code point does not conform to the ECMA-252, 5th Edition or Ecma-404 JSON grammar.

In order to be successfully processed, that code point would have to be stripped from the string prior to calling JSON.parse.

Allen

> 
> Agree.  However, that signal would be a part of the 4627bis octet stream,
> so a little interop guidance would likely be useful.  Something like:
> 
> "Some producers of JSON produce JSON-text that starts with a redundant
> U+FEFF (ZERO WIDTH NO-BREAK SPACE, previously known as BYTE ORDER MARK)
> with the ostensible purpose of signaling the encoding of the text to
> follow.  Since JSON has other mechanisms to determine encoding, this is
> not required.  Receiving applications MAY safely ignore this initial
> character without generating an error.  Implementations that do not send
> U+FEFF are interoperable in the sense that all software implementations
> which receive the un-prefixed text will not generate parse errors."
> 

Isn't it an interooperbility issue that many receiving applications do not ignore it.

Allen

# Allen Wirfs-Brock (13 years ago)

On Nov 13, 2013, at 7:15 PM, Paul Hoffman wrote:

The question was specifically about ECMA-404, not ECMA-252. It would be great to hear from TC39 whether or not ECMA-404 allows or disallows it.

ECMA=nnn is the correct designation for Ecma standards. When speaking about the organization "Ecma" is the usage.

On Nov 13, 2013, at 7:15 PM, Paul Hoffman wrote:

>> 
> 
> The question was specifically about ECMA-404, not ECMA-252. It would be great to hear from TC39 whether or not ECMA-404 allows or disallows it.

ECMA=nnn is the correct designation for Ecma standards.  When speaking about the organization "Ecma" is the usage.

Allen

# Allen Wirfs-Brock (13 years ago)

On Nov 14, 2013, at 7:33 AM, John Cowan wrote:

Per contra, ECMA-404 refers only to text(ual content). The BOM is meaningful when transforming byte sequences into code point sequences, but ECMA-404 deals in the latter only. So it is the furthest thing from surprising that it makes no mention of BOMs, and has nothing to say about their use outside text.

Exactly ECMA-404 only defines a textual content format. That was intentional.

On Nov 14, 2013, at 7:33 AM, John Cowan wrote:

> 
> Per contra, ECMA-404 refers only to text(ual content).  The BOM is
> meaningful when transforming byte sequences into code point sequences,
> but ECMA-404 deals in the latter only.  So it is the furthest thing
> from surprising that it makes no mention of BOMs, and has nothing to
> say about their use outside text.

Exactly ECMA-404 only defines a textual content format.  That was intentional.

Allen