ES Discuss - Message History

Allen Wirfs-Brock (2013-12-07T18:05:19.000Z)

Go to Source

On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:

> On Fri, Dec 06, 2013 at 03:00:31PM -0800, Allen Wirfs-Brock wrote:
>> On Dec 6, 2013, at 12:56 PM, Nico Williams wrote:
>>> ...
> 
>> Why shouldn't an schema be allowed to consider the following to be semantically equivalent:
>>      {"unordered-list": [0,1]}
>> and
>>      {"unordered-list": [1,0]}
> 
> A *schema* is so allowed.
> 
> However, if a schema is also to be allowed to treat them as distinct
> then the *meta-schema* must treat them as distinct.  I.e., no matter
> what generic programming language bindings of JSON one users, the above
> two JSON texts must produce equivalent results when parsed!

"Equivalent" according to what definition?

The most basic form of parsing translator, beyond a simple recognizer that reports valid/invalid, is a translator that produces a parse tree. So lets assume that we create such a parse tree generator using the 4627bis grammar.  The parse trees for the two JSON arrays shown above will be different.  As you correctly state, if they weren't then any down stream semantics could not apply different meaning to them.  So, in what sense are you saying that the result of parsing (in this case the parse trees) must be equivalent?

> 
> The application is clearly free to then re-order those arrays' elements,
> or to compare them as equivalent.  The application cannot consider them
> not equivalent if the parsers/encoders don't either.

Similarly, the JSON texts:
   {"1":  1, "2", 2}
and
   {"2":  2, "1": 1}

or the JSON texts:
   {"a": 1, "a": 2}
and
   {"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the parser in order for downstream semantics to be applied.  And, in the real world, this ordering can be quite significant.  For example, for both of these cases, the standard JSON to JavaScript language binding produces observably different results. 

I think that if we cut through the rhetoric we are probably in agreement.   Within a JSON text, there is a clearly observable ordering to both the values that are elements of a JSON array and to the members of a JSON object.   Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

If we agree to that, then at the level of defining JSON syntax it seems that an assertion that JSON arrays are ordered is redundant (the grammar already tells us that) and an assertion that the members of a JSON object are unordered is incorrect. 

Where we seem to disagree is on if or where any such ordering requirements might be imposed. My contention is that they don't belong in a syntactic level specification such as ECMA-404 but do belong in downstream specifications for data models, language bindings, or application level schema. 

> 
>> Besides, we already agreed above that if you don't have schema-level
>> agreement then JSON is almost useless.  So why not just let schema
>> specifications or schema language specifications handle this. 
> 
> Because generic filters/tools/apps exist that would be non-conformant if
> they have any expectation about array order preservation in *parsers*
> and *encoders* of related tooling.
> 
> I.e., I very much expect these jq filters to repeat their inputs as-is
> without re-ordering any arrays in them (but they may definitely change
> things like whitespace and it may re-order names in objects):

No, this is where we diverge.  Ordering of names within objects can and does, in the real world, have significance. A generic tools that changes member order within a JSON object will break things.

> 
>    jq .
>    jq '[.[]]'
> 
> I also expect all of the C JSON libraries I know and have written code
> to (I think that's four C JSON libraries) to preserve array order when
> parsing / encoding JSON texts.  It'd be extremely strange to not be able
> to implement a JSON-using application that cares about array order!

and similarly for JSON-using applications that care about object member order

> 
>>> That whitespace (outside strings) is not significant may be expressed
>>> syntactically or semantically, but this has to be universally agreed if
>>> we'll have any chance of interoperating.
>> 
>> ECMA-404 states where insignificant whitespace is allowed. Is there
>> any disagreement about this?
> 
> No.  I was listing some cases where there can be significant differences
> in the "syntax only" vs. "syntax and [some] semantics" approaches.
> 
> If ECMA TC39 were to insist that arrays in JSON texts do not denote
> order, that parsers may re-order array elements at will, say, then I
> suspect I'd bet this WG would just... note that difference and move on.
> There's no chance, I think, that the IETF would accept such a departure
> from RFC4627 (which says that an array "is an ordered sequence of zero
> or more values").  The proposal that the original RFC title be restored
> is much less controversial than the idea that JSON arrays are not
> ordered.

Hopefully, it is now clear that this is not what I'm arguing for.  Any statement about array ordering is redundant because the grammar already covers that. The only harm is in somebody misconstruing it to be a requirement about downstream semantics.  However, the same is true about Object members.  You assertion that a generic filter is free to reorder members is a good example of how a statement about ordering, at this level of specification, can be misconstrued. 

... [snipping back and forth that I think is already addressed above]
>> 
>> Object name ordering is significant to widely used JSON language
>> bindings (eg, the ECMA-262 JSON parser).  But again this is a semantic
>> issue.
> 
> But there's no general requirement that object name order be preserved.
> Or at least I don't see you asserting that there is.  (But if you were,
> you'd care a lot about this semantic issue, and you'd want that bit of
> semantics specified somewhere, surely.)

It's hopefully clear by now that, yes I am asserting that object name order is important. 

And I do care about the semantic issues.  They just don't belong in a syntactic level specification of the JSON format such as ECMA-404. A problem I see with the RFC4627bis is that it conflates a syntactic level specification with a just little bit of semantic data model. It is neither a pure syntactic specification nor a complete data model.

> That specific programming language bindings/APIs/implementations make
> object name significant (or preserve it) does not impose a requirement
> to preserve object name order on other implementations that don't do so
> today.  A great many implementations use hash tables to represent
> objects internally, and they lose any other object name ordering.
> 
>> Because ECMA-404 is trying to restrict itself to describe the space of
>> well-formed JSON text there really is nothing to say about object name
>> ordering at that level. It's a semantic issue. 
> 
> Of course.  And RFC4627 does deal with semantics.  It is appropriate for
> RFC4627bis to do so as well.  Even if we agreed to drop all RFC2119
> language as to semantics we'd still keep interoperability notes about
> the old (and widely-deployed) semantics.
> 
>> The semantics you want to specify can be layered upon a normative
>> reference to ECMA-404. Rather have competing and potentially
>> divergence specifications we should be looking a clean separation of
>> concerns. 
> 
> We already have a clean separation in RFC4627bis: there's the ABNF
> (syntax) and everything else (semantics).  And we all now seem to agree
> that the ABNF in draft-ietf-json-rfc4627bis-08 is equivalent to the
> syntax in ECMA-404.  If the title of RFC4627 is restored then what ECMA
> concerns remain?

Multiple normative definitions of the same material.  Whether they are equivalent is a matter of interpretation and opinion that can lead to confusion and possibly divergence over time. A solution to this was requested in the TC39 feedback.  RFC4627bis should normatively reference ECMA-404 WRT to the syntax and static semantics of JSON. If it chooses to also restate the ECMA-404 grammar in a different notation (ie, ABNF) that material should be designate as informative with ECMA-404 serving as the normative specification of that material.

> 
>> The position stated by TC39 that ECMA-404 already exists as a
>> normative specification of the JSON syntax and we have requested that
>> RFC4627bis normatively reference it as such and that any restatement
>> of ECMA-404 subject matter should be marked as informative.  We think
>> that dueling normative specifications would be a bad thing. Seeing
>> that the form of expression used by ECMA-404 seems to be a issue for
>> some JSON WG participants I have suggested that TC39 could probably be
>> convinced to revise ECMA-404 to include a BNF style formalism for the
>> syntax.  If there is interest in this alternative I'd be happy to
>> champion it within TC39.
> 
> Is there an assertion that ECMA-404 and draft-ietf-json-rfc4627bis-08
> disagree as to syntax?  I don't think so.  There's a concern that they
> might, and the easiest way to resolve that concern is to use the same
> syntax specification in both cases.  It would help a lot if TC39 were to
> publish an ABNF syntax for JSON texts, but even without that it's pretty
> clear that the two documents do not disagree as to syntax.

Then I think we should be close to agreement.  Does the JSON WG wish to formally request that TC39 add a ABNF specification to a new edition of ECMA-404?  Would RFC4627bis then normatively reference ECMA-404?

Allen

domenic at domenicdenicola.com (2013-12-10T00:50:15.296Z)

On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:

> On Fri, Dec 06, 2013 at 03:00:31PM -0800, Allen Wirfs-Brock wrote:
>> On Dec 6, 2013, at 12:56 PM, Nico Williams wrote:
>>> ...
> 
>> Why shouldn't an schema be allowed to consider the following to be semantically equivalent:
>>
>>      {"unordered-list": [0,1]}
>>
>> and
>>
>>      {"unordered-list": [1,0]}
> 
> A *schema* is so allowed.
> 
> However, if a schema is also to be allowed to treat them as distinct
> then the *meta-schema* must treat them as distinct.  I.e., no matter
> what generic programming language bindings of JSON one users, the above
> two JSON texts must produce equivalent results when parsed!

"Equivalent" according to what definition?

The most basic form of parsing translator, beyond a simple recognizer that reports valid/invalid, is a translator that produces a parse tree. So lets assume that we create such a parse tree generator using the 4627bis grammar.  The parse trees for the two JSON arrays shown above will be different.  As you correctly state, if they weren't then any down stream semantics could not apply different meaning to them.  So, in what sense are you saying that the result of parsing (in this case the parse trees) must be equivalent?

> 
> The application is clearly free to then re-order those arrays' elements,
> or to compare them as equivalent.  The application cannot consider them
> not equivalent if the parsers/encoders don't either.

Similarly, the JSON texts:

    {"1":  1, "2", 2}

and

    {"2":  2, "1": 1}

or the JSON texts:

    {"a": 1, "a": 2}

and

    {"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the parser in order for downstream semantics to be applied.  And, in the real world, this ordering can be quite significant.  For example, for both of these cases, the standard JSON to JavaScript language binding produces observably different results. 

I think that if we cut through the rhetoric we are probably in agreement.   Within a JSON text, there is a clearly observable ordering to both the values that are elements of a JSON array and to the members of a JSON object.   Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

If we agree to that, then at the level of defining JSON syntax it seems that an assertion that JSON arrays are ordered is redundant (the grammar already tells us that) and an assertion that the members of a JSON object are unordered is incorrect. 

Where we seem to disagree is on if or where any such ordering requirements might be imposed. My contention is that they don't belong in a syntactic level specification such as ECMA-404 but do belong in downstream specifications for data models, language bindings, or application level schema. 

> 
>> Besides, we already agreed above that if you don't have schema-level
>> agreement then JSON is almost useless.  So why not just let schema
>> specifications or schema language specifications handle this. 
> 
> Because generic filters/tools/apps exist that would be non-conformant if
> they have any expectation about array order preservation in *parsers*
> and *encoders* of related tooling.
> 
> I.e., I very much expect these jq filters to repeat their inputs as-is
> without re-ordering any arrays in them (but they may definitely change
> things like whitespace and it may re-order names in objects):

No, this is where we diverge.  Ordering of names within objects can and does, in the real world, have significance. A generic tools that changes member order within a JSON object will break things.

> 
>    jq .
>    jq '[.[]]'
> 
> I also expect all of the C JSON libraries I know and have written code
> to (I think that's four C JSON libraries) to preserve array order when
> parsing / encoding JSON texts.  It'd be extremely strange to not be able
> to implement a JSON-using application that cares about array order!

and similarly for JSON-using applications that care about object member order

> 
>>> That whitespace (outside strings) is not significant may be expressed
>>> syntactically or semantically, but this has to be universally agreed if
>>> we'll have any chance of interoperating.
>> 
>> ECMA-404 states where insignificant whitespace is allowed. Is there
>> any disagreement about this?
> 
> No.  I was listing some cases where there can be significant differences
> in the "syntax only" vs. "syntax and [some] semantics" approaches.
> 
> If ECMA TC39 were to insist that arrays in JSON texts do not denote
> order, that parsers may re-order array elements at will, say, then I
> suspect I'd bet this WG would just... note that difference and move on.
> There's no chance, I think, that the IETF would accept such a departure
> from RFC4627 (which says that an array "is an ordered sequence of zero
> or more values").  The proposal that the original RFC title be restored
> is much less controversial than the idea that JSON arrays are not
> ordered.

Hopefully, it is now clear that this is not what I'm arguing for.  Any statement about array ordering is redundant because the grammar already covers that. The only harm is in somebody misconstruing it to be a requirement about downstream semantics.  However, the same is true about Object members.  You assertion that a generic filter is free to reorder members is a good example of how a statement about ordering, at this level of specification, can be misconstrued. 

... [snipping back and forth that I think is already addressed above]
>> 
>> Object name ordering is significant to widely used JSON language
>> bindings (eg, the ECMA-262 JSON parser).  But again this is a semantic
>> issue.
> 
> But there's no general requirement that object name order be preserved.
> Or at least I don't see you asserting that there is.  (But if you were,
> you'd care a lot about this semantic issue, and you'd want that bit of
> semantics specified somewhere, surely.)

It's hopefully clear by now that, yes I am asserting that object name order is important. 

And I do care about the semantic issues.  They just don't belong in a syntactic level specification of the JSON format such as ECMA-404. A problem I see with the RFC4627bis is that it conflates a syntactic level specification with a just little bit of semantic data model. It is neither a pure syntactic specification nor a complete data model.

> That specific programming language bindings/APIs/implementations make
> object name significant (or preserve it) does not impose a requirement
> to preserve object name order on other implementations that don't do so
> today.  A great many implementations use hash tables to represent
> objects internally, and they lose any other object name ordering.
> 
>> Because ECMA-404 is trying to restrict itself to describe the space of
>> well-formed JSON text there really is nothing to say about object name
>> ordering at that level. It's a semantic issue. 
> 
> Of course.  And RFC4627 does deal with semantics.  It is appropriate for
> RFC4627bis to do so as well.  Even if we agreed to drop all RFC2119
> language as to semantics we'd still keep interoperability notes about
> the old (and widely-deployed) semantics.
> 
>> The semantics you want to specify can be layered upon a normative
>> reference to ECMA-404. Rather have competing and potentially
>> divergence specifications we should be looking a clean separation of
>> concerns. 
> 
> We already have a clean separation in RFC4627bis: there's the ABNF
> (syntax) and everything else (semantics).  And we all now seem to agree
> that the ABNF in draft-ietf-json-rfc4627bis-08 is equivalent to the
> syntax in ECMA-404.  If the title of RFC4627 is restored then what ECMA
> concerns remain?

Multiple normative definitions of the same material.  Whether they are equivalent is a matter of interpretation and opinion that can lead to confusion and possibly divergence over time. A solution to this was requested in the TC39 feedback.  RFC4627bis should normatively reference ECMA-404 WRT to the syntax and static semantics of JSON. If it chooses to also restate the ECMA-404 grammar in a different notation (ie, ABNF) that material should be designate as informative with ECMA-404 serving as the normative specification of that material.

> 
>> The position stated by TC39 that ECMA-404 already exists as a
>> normative specification of the JSON syntax and we have requested that
>> RFC4627bis normatively reference it as such and that any restatement
>> of ECMA-404 subject matter should be marked as informative.  We think
>> that dueling normative specifications would be a bad thing. Seeing
>> that the form of expression used by ECMA-404 seems to be a issue for
>> some JSON WG participants I have suggested that TC39 could probably be
>> convinced to revise ECMA-404 to include a BNF style formalism for the
>> syntax.  If there is interest in this alternative I'd be happy to
>> champion it within TC39.
> 
> Is there an assertion that ECMA-404 and draft-ietf-json-rfc4627bis-08
> disagree as to syntax?  I don't think so.  There's a concern that they
> might, and the easiest way to resolve that concern is to use the same
> syntax specification in both cases.  It would help a lot if TC39 were to
> publish an ABNF syntax for JSON texts, but even without that it's pretty
> clear that the two documents do not disagree as to syntax.

Then I think we should be close to agreement.  Does the JSON WG wish to formally request that TC39 add a ABNF specification to a new edition of ECMA-404?  Would RFC4627bis then normatively reference ECMA-404?

Edit