[Json] Response to Statement from W3C TAG

# Allen Wirfs-Brock (10 years ago)

On Dec 4, 2013, at 11:39 PM, Carsten Bormann wrote:

On 05 Dec 2013, at 06:08, Tim Bray <tbray at textuality.com> wrote:

FWIW, I have never understood what the ECMAnauts mean by the word “semantics” in this context, so I have no idea whether I agree with this statement.

As one of the contributors to ECMA-404 I'd be happy to elaborate

You know this, but just for the record: we could be applying the meaning we have for these terms in CS.

Yes, that is indeed the starting point. However, TC39 is largely composed of language designers and language implementors so the meaning of "semantics" we use is generally the one used within that branch of CS.

The syntax just tells you which sequences of symbols are part of the language. (This is what we have ABNF for; ECMA-404 uses racetracks plus some English language that maps the characters to the tokens in the syntax-level racetracks for value, object, and array, and to the English language components of the token-level racetracks for number and string.)

Agreed. I would state this as: the syntax tells you which sequences of symbols form valid statements within the language.

Language designer also use the term "static semantics". The static semantics of a language are a set of rules that further restrict which sequences of symbols form valid statements within the language. For example, a rule that the 'member' names must be disjoint within an 'object' production could be a static semantic rule (however, there is intentionally no such rule in ECMA-404).

The line between syntax and static semantics can be fuzzy. Static semantic rules are typically used to express rules that cannot be technically expressed using the chosen syntactic formalism or rules which are simply inconvenient to express using that formalism. For example, the editor of ECMA-404 chose to simplify the RR track expression of the JSON syntax by using static semantic rules for whitespace rather than incorporating them into RR diagrams.

Another form of static semantic rules are equivalences that state when two or more different sequences of symbols must be considered as equivalent. For example, the rules that state equivalencies between escape sequences and individual code points within an JSON 'string'. Such equivalences are not strictly necessary at this level, but it it simplifies the specification of higher level semantics if equivalent symbol sequences can be normalized at this level of specification.

When we talk about the "semantics" of a language (rather than "static semantics") we are talking about attributing meaning (in some domain and context) to well-formed (as specified via syntax and static semantics) statements expressed in that language.

ECMA-404 intentionally restricts itself to specify the syntax and static semantics of the JSON language. More below on why.

Semantics is needed to describe e.g. that some whitespace is “insignificant” (not contributing to the semantics), describe the intended interpretation of escape sequences in strings,

Yes these are static semantic rules (although whitespace rules could be expressed using syntactic formalisms).

that the sequences of symbols enabled by the production “number” are to be interpreted in base 10,

Yes, ECMA-404 includes this as a static semantic statement although it is arguably could be classified as a semantic statement above the level of static semantics. Whether "77" is semantically interpreted as the mathematical value 63 or 77 isn't really relevant to whether "77" is a well-formed JSON number.

or that “the order of the values is significant” in arrays (which seems to be intended to contrast them to JSON objects, where ECMA-404 weasels out of saying whether the order is significant).

ECMA-404 removed the statement "an object is an unordered collection..." that exists in RFC-6427. Arguably, ECMA-404 should not have made the statement "the the order of values is significant" WRT arrays. I'll file a bug ticket on that. The reason that neither of these statements is appropriate at this level of specification is that they are pure semantic statements that have no impact upon determining whether a sequence of symbols are well-formed JSON text.

Objectively, the members of a JSON 'object' do occur in a specific order and a semantic interpreter of an object might ascribe meaning to that ordering. Similarly, a JSON 'array' also has an objectively observable ordering of its contained values. It is again up to a semantic interpreter as to whether or not it ascribes meaning to that ordering.

ECMA-404 does quite a bit of of the latter, so indeed I also have trouble interpreting such a statement.

So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid describing JSON beyond the level of static semantics.

ECMA-404 see JSON as "a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications".

There are many possible semantics and categories of semantics that can be applied to well-formed statements expressed using the JSON syntax.

One type of semantics are language bindings that specify how a JSON text might be translated into the data types and structures of some particular programming language or runtime environment. The translation of a JavaScript string encoding of a JSON text into JavaScript objects and values by JSON.parse is one specific example of this kind of semantic application of JSON. But there are many languages that can be supported by such language bindings and there is not necessarily a best or canonical JSON binding for any language.

Another form of semantics imposes schema based meaning and restrictions upon a well-formed JSON text. A schema explicitly defines an application level meaning to the elements for some specific subset of well-formed SON texts. It might require only certain forms of JSON values, provide specific meaning to JSON numbers or strings that occur in specified positions, require the occurrence of certain object members, apply meaning to the ordering of object members or array elements, etc. This is probably most common form of semantics applied to JSON and is used by almost all real world JSON use cases.

The problem with trying to standardize JSON semantics is that the various semantics that can be usefully be imposed upon JSON are often mutually incompatible with each other. At a trivial level, we see this with issues like the size of numbers or duplicate object member keys. It is very hard to decide whose semantics are acceptable and whose is not.

What we can do, is draw a bright-line just above the level of static semantics.This is what ECMA-404 attempts to do. If defines a small set of structuring elements that can be recursively composed and represent in a textual encoding. It provides a common vocabulary upon which various semantics can be overlaid and nothing else. The intent of ECMA-404 is to provide the definitive specification of the syntax and static semantic of the JSON format that can be used by higher level semantic specifications.

Allen Wirfs-Brock ECMA-262 project editor

# Allen Wirfs-Brock (10 years ago)

On Dec 5, 2013, at 11:34 PM, Carsten Bormann wrote:

Allen,

thank you a lot for this elaborate response. It really helps me understand the different points of view that have emerged. I’ll go ahead and insert my personal point of view in this response to maybe make that more understandable as well (I’m not speaking for the JSON WG at all here, of course). Maybe you can relay this to es-discuss so both mailing lists benefit from it.

Did your reply bounce from es-discuss? I won't elide any of you comments below, just in case.

if you or anybody else know of actual bugs or ambiguities in ECMA-404 the best way to communicate that to TC39 and the ECMA-404 project editor is to open a ticket at bugs.ecmascript.org. Product: "ECMA-404 JSON", Component: "1st Edition".

The syntax just tells you which sequences of symbols are part of the language. (This is what we have ABNF for; ECMA-404 uses racetracks plus some English language that maps the characters to the tokens in the syntax-level racetracks for value, object, and array, and to the English language components of the token-level racetracks for number and string.)

Agreed. I would state this as: the syntax tells you which sequences of symbols form valid statements within the language.

Language designer also use the term "static semantics". The static semantics of a language are a set of rules that further restrict which sequences of symbols form valid statements within the language.

Right, the "static semantics" is used to form a subset of what we arbitrarily call “syntax”, further restricting what sequence of symbols is in the language.

For example, a rule that the 'member' names must be disjoint within an 'object' production could be a static semantic rule (however, there is intentionally no such rule in ECMA-404).

Thanks, it is interesting to hear that this was a deliberate omission.

The line between syntax and static semantics can be fuzzy. Static semantic rules are typically used to express rules that cannot be technically expressed using the chosen syntactic formalism or rules which are simply inconvenient to express using that formalism. For example, the editor of ECMA-404 chose to simplify the RR track expression of the JSON syntax by using static semantic rules for whitespace rather than incorporating them into RR diagrams.

No, that isn’t static semantics. The racetracks don’t have a useful meaning (i.e., express a different, more restricted syntax) without the English language rules about whitespace, (More specifically, three of the racetracks operate on a different domain than the other two, without that having made explicit.) Static semantics can only serve to restrict the set of syntactically valid symbol sequences. Accepting whitespace is on the syntax level. (Then ignoring it is indeed semantics.)

I think we're quibble here about unimportant points. Multiple level specifications is a common practice for language specification. For example, using regular expressions to define the lexical productions (the tokens) and a BNF grammar to define the syntactic level. It is also common practice to use prose to describe the role of whitespace at the lexical level. For example see www.ecma-international.org/ecma-262/5.1/#sec-5.1.2

The important point is whether or not ECMA-404 under specifies the language, is ambiguous, or has any other errors. If it does, please file bug reports so corrections can be made in a revised editions.

Another form of static semantic rules are equivalences that state when two or more different sequences of symbols must be considered as equivalent. For example, the rules that state equivalencies between escape sequences and individual code points within an JSON 'string'. Such equivalences are not strictly necessary at this level, but it it simplifies the specification of higher level semantics if equivalent symbol sequences can be normalized at this level of specification.

It may be convenient to lump this under static semantics (the static semantics may need to rely on such rules), but we are now in the area of semantic interpretation, no longer in the area of what should be strictly syntax but has been split into “syntax" and "static semantics" for notational convenience.

I disagree. It is useful at the syntactic/static semantic level to specify that two symbol sequences must be equivalent for semantic purposes. And we can do this without providing any actual semantics for the symbol sequences. For example:

We can say that "abc" and "\u0061\u0062\u0063" must be assigned identical semantics without actually specify what that semantics is. Whether you prefer to call it static semantics or something else, it is independent of any specific semantic domain and reasonably at the level of concerns addressed by ECMA-404.

When we talk about the "semantics" of a language (rather than "static semantics") we are talking about attributing meaning (in some domain and context) to well-formed (as specified via syntax and static semantics) statements expressed in that language.

Exactly.

ECMA-404 intentionally restricts itself to specify the syntax and static semantics of the JSON language. More below on why.

If that was the intention, that didn’t work out too well.

specific bugs please...

Semantics is needed to describe e.g. that some whitespace is “insignificant” (not contributing to the semantics), describe the intended interpretation of escape sequences in strings, Yes these are static semantic rules (although whitespace rules could be expressed using syntactic formalisms).

The syntax allows the whitespace. The semantics tells you it doesn’t make a difference with respect to the meaning. (OK, if you lump in semantic equivalence under static semantics, you can say the above, but this muddies the terms.)

that the sequences of symbols enabled by the production “number” are to be interpreted in base 10, Yes, ECMA-404 includes this as a static semantic statement although it is arguably could be classified as a semantic statement above the level of static semantics. Whether "77" is semantically interpreted as the mathematical value 63 or 77 isn't really relevant to whether "77" is a well-formed JSON number.

ECMA-404 indeed does not provide the full semantics of its numbers, just saying that they are “represented in base 10”, appealing to a deeply rooted common understanding of what that means (which by the way has been codified in ECMA-63 and then ISO 6093). Note that there is no meaning of “represented in” outside of the domain of semantics — the text clearly is about mapping the abstract (semantic) concept of a number to its base-10 representation using JSON’s syntax. It seems that this phrasing is a remnant from a time when the semantics was intended to be part of the specification.

"represented in base 10" probably would be better stated as "represented as a sequence of decimal digits" which would eliminate the semantic implication.

Yes, there are remnants in ECMA-404 (and in REF-6427bis) from the days when the JSON format and its language binding to ECMAScript tended to be equated. One of the things we should be trying to do is eliminate those remnants.

\

or that “the order of the values is significant” in arrays (which seems to be intended to contrast them to JSON objects, where ECMA-404 weasels out of saying whether the order is significant).

ECMA-404 removed the statement "an object is an unordered collection..." that exists in RFC-6427.

Indeed, it is again interesting to note that this was an intentional change from the existing JSON specifications.

Arguably, ECMA-404 should not have made the statement "the the order of values is significant" WRT arrays. I'll file a bug ticket on that. The reason that neither of these statements is appropriate at this level of specification is that they are pure semantic statements that have no impact upon determining whether a sequence of symbols are well-formed JSON text.

Well, in your definition of static semantics that includes semantic equivalence, the statement is appropriate. It is, however, somewhat random whether ECMA-404 provides statements about semantic equivalence or not; it is certainly not trying for any completeness.

More specifics please. I don't see how semantic equivalence enters into this discussion of arrays. What equivalences comes into play? As I said above, I think the existence of that phase "the order of values is significant" is a bug. "significant" to what? Certainly the intent wasn't to forbid a schema level semantics from considering [1,2] and [2,1] as being equivalent in some particular field position.

Objectively, the members of a JSON 'object' do occur in a specific order and a semantic interpreter of an object might ascribe meaning to that ordering. Similarly, a JSON 'array' also has an objectively observable ordering of its contained values. It is again up to a semantic interpreter as to whether or not it ascribes meaning to that ordering.

It is also up to a semantic interpreter as to whether it interprets base-10 numbers from left to right or from right to left. However, I would argue that some of the potential interpretations are violating the principle of least surprise. More so, JSON in the real world benefits from a significant amount of common interpretation.

Agreed. I believe we have this today, at this level.

A reasonable way to capture this at the specification level is to define a generic “JSON data model” and define the semantic processing that leads up to this, but then of course leave it up to the application how to interpret the elements of the JSON data model. A JSON array would be delivered as an ordered sequence to the (application-independent) JSON data model, but the application could still interpret that information as a set or as a record structure, depending on application context.

What do you mean by "delivered" in your second sentence. It sounds like you are either talking about a language binding or perhaps a JSON parser interface. The former is clearly in the realm that I classify as semantics and I would expect any reasonable parser-based interface to preserve all ordering relationships that exist in the parsed text.

As another example, the JSON to ECMAScript language binding defined by ECMA-262 implicitly defines an ordering of the properties of the ECMAScript objects that are created corresponding to JSON objects even though RFC-6427 said that an array is an unordered set of values. It just falls out of the ECMAScript data model.

We could try to say that all semantics applied to the JSON format MUST preserve the ordering of JSON array elements. But is seems unnecessary and in some cases excessively restrictive.

Defining a complete and universal "JSON data model" is hard. It is possible to defined a normative JSON syntax without providing such a model and that is the direction that ECMA-404 has taken. If somebody wants to attempt to define such a data model they are welcome to write a spec. layered above ECMA-404 and to demonstrate its utility.

In practice, JSON is almost useless without schema level semantic agreement between the producer and consumer of a JSON text. Most of the issues we are discussing here are easily subsumed by such schema level agreements.

ECMA-404 does quite a bit of of the latter, so indeed I also have trouble interpreting such a statement.

So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid describing JSON beyond the level of static semantics.

ECMA-404 see JSON as "a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications”.

I hope by now it should be clear that ECMA-404 is neither very successful in focusing on the syntax only, nor is it a particularly good specification of that syntax due to its mix of English language and racetrack graphics. (I still like it as a tutorial for the syntax.)

No, it isn't clear. Specific bugs would help clarify. I didn't choose to use racetracks in the specification and I might not have made that choice myself. But I will defend it as a valid formalism and one that is well understood. A grammar expressed using ovals and arrows is just as valid as one expressed using ASCII characters.

It's silly to be squabbling over such a notational issues and counter-productive if such squabbles results multiple different normative standards for the same language/format.

TC39 would likely be receptive to a request to add to ECMA-404 an informative annex with a BNF grammar for JSON (even ABNF, even though it isn't TC39's normal BNF conventions). Asking is likely to produce better results than throwing stones.

There are many possible semantics and categories of semantics that can be applied to well-formed statements expressed using the JSON syntax.

The problem with this approach is that much of the interoperability of JSON stems from implementations having derived a common data model. Some of this is in the spec (RFC 4627), some if it has been derived by implementers drawing analogies with its ancestor JavaScript, some of it stems from the fact that the syntax is simply suggestive of a specific data model.

Much more would be gained in documenting that (common) data model (including documenting the differences that have ensued, and selecting some deviations as canonical and others as mistakes) than from retracting some of the explicit semantics (while keeping some of them as well as the implicit ones weakly in place).

This is where I disagree. Do you have any examples of interoperability problems occurring at this level? As I said above. Successful JSON interoperability is most dependent upon schema level semantic agreement and good language bindings. In practice, those levels easily can encompass the sort of data model issues you seem to be concerned about.

However, I don't think TC39 wants to put any barriers in front of somebody trying to specify such a data model or models. We tried to avoid such barriers is by not including unnecessary semantic restrictions in ECMA-404.

One type of semantics are language bindings that specify how a JSON text might be translated into the data types and structures of some particular programming language or runtime environment. The translation of a JavaScript string encoding of a JSON text into JavaScript objects and values by JSON.parse is one specific example of this kind of semantic application of JSON. But there are many languages that can be supported by such language bindings and there is not necessarily a best or canonical JSON binding for any language.

Leaving out the common data model and going directly from syntax to language binding is a recipe for creating interoperability problems. The approach “works” from the view of a single specific language (and thus may seem palatable for a group of experts in a specific language, such as TC39), but it is not aiding in interoperability of JSON at large.

Examples? The language expertise within TC39 certainly extends beyond just ECMAScript and that expertise informs the consensus decisions we make.

A counter example we have actually discussed is any limitation on the number of digits in a JSON number. While some applications of JSON might want to limit the precision other have a need for arbitrary large digit sequences. Such restrictions and allowances must be dealt with at the schema specification level so there is no need to arbitrarily restrict precision at the format/language level of specification.

That's it for now. I think I've already addressed any substantive issues you raise below.

Happy to continue the conversation.

# Allen Wirfs-Brock (10 years ago)

On Dec 6, 2013, at 12:56 PM, Nico Williams wrote:

On Fri, Dec 06, 2013 at 11:50:13AM -0800, Allen Wirfs-Brock wrote:

In practice, JSON is almost useless without schema level semantic agreement between the producer and consumer of a JSON text. Most of

Yes.

the issues we are discussing here are easily subsumed by such schema level agreements.

Hmmm, well, there has to be some meta-schema.

That arrays preserve order is meta-schema for JSON, else we'd have no interop -- and this is critical for comparisons, so specifying this bit of meta-schema/ semantics enables very important semantics: arrays can be compared for equivalence without having to sort them (which would require further specification of collation for all JSON values!).

What "array" are you talking about. The an 'array' symbol sequence in a JSON text? A language-specific array-like data structure generated from such a symbol sequence by the parser for a specific JSON language binding? A domain data structure generate by a schema aware parser?

Why shouldn't an schema be allowed to consider the following to be semantically equivalent:

  {"unordered-list": [0,1]}

and

  {"unordered-list": [1,0]}

Besides, we already agreed above that if you don't have schema-level agreement then JSON is almost useless. So why not just let schema specifications or schema language specifications handle this.

That whitespace (outside strings) is not significant may be expressed syntactically or semantically, but this has to be universally agreed if we'll have any chance of interoperating.

ECMA-404 states where insignificant whitespace is allowed. Is there any disagreement about this?

That object names (keys) are not dups is trickier: for on-line processing there may be no way to detect dups or do anything about them, but for many common implementations object name uniqueness is very much a requirement. So here, finally, we have a bit of semantics that could be in the spec but could be left out (we spent a lot of time on the current consensus for RFC4627bis, and I think it's safe to say that we're all happy with it)

Should be left out. Both because of legacy precedent and because it can be dealt with that a language binding or schema semantics specification.

But I think we are already in agreement on leaving this out at the static semantic level.

That objects name order is irrelevant and non-deterministic is widely assumed/accepted (though often users ask that JSON filters preserve object name order from input to output). (And, of course, for on-line encoders and parsers name order could well be made significant, but building schemas that demand ordered names means vastly limiting the world of JSON tooling that can be used to interoperably implement such schemas.)

Object name ordering is significant to widely used JSON language bindings (eg, the ECMA-262 JSON parser). But again this is a semantic issue.

Because ECMA-404 is trying to restrict itself to describe the space of well-formed JSON text there really is nothing to say about object name ordering at that level. It's a semantic issue.

Similarly for numbers, the interoperable number ranges and precisions are not really syntactic (they could be expressed via syntax, but it'd be a pain to do it that way).

I think it's clear that we have consensus in the IETF JSON WG for:

  • whitespace semantics (not significant outside strings)

this is a syntactic issue that is covered by ECMA-404

  • array element order semantics (elements are "ordered")
  • object name dups/order semantics (names SHOULD be unique, but interop considerations described; name/value pairs are "unordered")
  • no real constraints on numeric values but interoperable range/precision described

the rest are semantic issues that ECMA-404 does not want to address. The one place it arguably over steps, by saying that "the order of array values is significant", really has no associated semantics. This is one place where I prefer the current draft language in RFC-4627bis clause 5 over the corresponding language in ECMA-404. The Introduction (is the intro normative?) to 4627bis says "an array is an ordered sequence" and "an object is an unordered collection" but I don't see any actual contextual meaning given to either "ordered" or "unordered" within the document.

If ECMA-404 differs in any way that does not impose more/different semantics, then maybe we don't care as far as RFC4627bis goes. If ECMA-404 does impose more/different semantics then we'll care a great deal.

If it does, that's unintended and a correctable bug in ECMA-404.

Since ECMA-404 targets just the syntax and minimal semantics, it's probably just fine for RFC4627bis to reference ECMA-404, but since RFC4627bis would be specifying a bit more semantics, we'd probably not want to make that reference be normative, at least not with some text excplaining that it's normative only because we believe that the JSON syntax given in both docs are equivalent.

The semantics you want to specify can be layered upon a normative reference to ECMA-404. Rather have competing and potentially divergence specifications we should be looking a clean separation of concerns.

The position stated by TC39 that ECMA-404 already exists as a normative specification of the JSON syntax and we have requested that RFC4627bis normatively reference it as such and that any restatement of ECMA-404 subject matter should be marked as informative. We think that dueling normative specifications would be a bad thing. Seeing that the form of expression used by ECMA-404 seems to be a issue for some JSON WG participants I have suggested that TC39 could probably be convinced to revise ECMA-404 to include a BNF style formalism for the syntax. If there is interest in this alternative I'd be happy to champion it within TC39.

# Carsten Bormann (10 years ago)

On 07 Dec 2013, at 12:55, Nico Williams <nico at cryptonector.com> wrote:

And we all now seem to agree that the ABNF in draft-ietf-json-rfc4627bis-08 is equivalent to the syntax in ECMA-404.

Yes, we like to believe that. The thing that worries me is that nobody knows whether that is actually true.

(At least I’d hope someone who is comfortable with the description methods in ECMA-404 makes a serious pass at establishing this equivalence, even when it’s ultimately not possible to actually prove it. That someone will not be me.)

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:

On Fri, Dec 06, 2013 at 03:00:31PM -0800, Allen Wirfs-Brock wrote:

On Dec 6, 2013, at 12:56 PM, Nico Williams wrote:

...

Why shouldn't an schema be allowed to consider the following to be semantically equivalent:

 {"unordered-list": [0,1]}

and

 {"unordered-list": [1,0]}

A schema is so allowed.

However, if a schema is also to be allowed to treat them as distinct then the meta-schema must treat them as distinct. I.e., no matter what generic programming language bindings of JSON one users, the above two JSON texts must produce equivalent results when parsed!

"Equivalent" according to what definition?

The most basic form of parsing translator, beyond a simple recognizer that reports valid/invalid, is a translator that produces a parse tree. So lets assume that we create such a parse tree generator using the 4627bis grammar. The parse trees for the two JSON arrays shown above will be different. As you correctly state, if they weren't then any down stream semantics could not apply different meaning to them. So, in what sense are you saying that the result of parsing (in this case the parse trees) must be equivalent?

The application is clearly free to then re-order those arrays' elements, or to compare them as equivalent. The application cannot consider them not equivalent if the parsers/encoders don't either.

Similarly, the JSON texts:

{"1":  1, "2", 2}

and

{"2":  2, "1": 1}

or the JSON texts:

{"a": 1, "a": 2}

and

{"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the parser in order for downstream semantics to be applied. And, in the real world, this ordering can be quite significant. For example, for both of these cases, the standard JSON to JavaScript language binding produces observably different results.

I think that if we cut through the rhetoric we are probably in agreement. Within a JSON text, there is a clearly observable ordering to both the values that are elements of a JSON array and to the members of a JSON object. Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

If we agree to that, then at the level of defining JSON syntax it seems that an assertion that JSON arrays are ordered is redundant (the grammar already tells us that) and an assertion that the members of a JSON object are unordered is incorrect.

Where we seem to disagree is on if or where any such ordering requirements might be imposed. My contention is that they don't belong in a syntactic level specification such as ECMA-404 but do belong in downstream specifications for data models, language bindings, or application level schema.

Besides, we already agreed above that if you don't have schema-level agreement then JSON is almost useless. So why not just let schema specifications or schema language specifications handle this.

Because generic filters/tools/apps exist that would be non-conformant if they have any expectation about array order preservation in parsers and encoders of related tooling.

I.e., I very much expect these jq filters to repeat their inputs as-is without re-ordering any arrays in them (but they may definitely change things like whitespace and it may re-order names in objects):

No, this is where we diverge. Ordering of names within objects can and does, in the real world, have significance. A generic tools that changes member order within a JSON object will break things.

jq . jq '[.[]]'

I also expect all of the C JSON libraries I know and have written code to (I think that's four C JSON libraries) to preserve array order when parsing / encoding JSON texts. It'd be extremely strange to not be able to implement a JSON-using application that cares about array order!

and similarly for JSON-using applications that care about object member order

That whitespace (outside strings) is not significant may be expressed syntactically or semantically, but this has to be universally agreed if we'll have any chance of interoperating.

ECMA-404 states where insignificant whitespace is allowed. Is there any disagreement about this?

No. I was listing some cases where there can be significant differences in the "syntax only" vs. "syntax and [some] semantics" approaches.

If ECMA TC39 were to insist that arrays in JSON texts do not denote order, that parsers may re-order array elements at will, say, then I suspect I'd bet this WG would just... note that difference and move on. There's no chance, I think, that the IETF would accept such a departure from RFC4627 (which says that an array "is an ordered sequence of zero or more values"). The proposal that the original RFC title be restored is much less controversial than the idea that JSON arrays are not ordered.

Hopefully, it is now clear that this is not what I'm arguing for. Any statement about array ordering is redundant because the grammar already covers that. The only harm is in somebody misconstruing it to be a requirement about downstream semantics. However, the same is true about Object members. You assertion that a generic filter is free to reorder members is a good example of how a statement about ordering, at this level of specification, can be misconstrued.

... [snipping back and forth that I think is already addressed above]

Object name ordering is significant to widely used JSON language bindings (eg, the ECMA-262 JSON parser). But again this is a semantic issue.

But there's no general requirement that object name order be preserved. Or at least I don't see you asserting that there is. (But if you were, you'd care a lot about this semantic issue, and you'd want that bit of semantics specified somewhere, surely.)

It's hopefully clear by now that, yes I am asserting that object name order is important.

And I do care about the semantic issues. They just don't belong in a syntactic level specification of the JSON format such as ECMA-404. A problem I see with the RFC4627bis is that it conflates a syntactic level specification with a just little bit of semantic data model. It is neither a pure syntactic specification nor a complete data model.

That specific programming language bindings/APIs/implementations make object name significant (or preserve it) does not impose a requirement to preserve object name order on other implementations that don't do so today. A great many implementations use hash tables to represent objects internally, and they lose any other object name ordering.

Because ECMA-404 is trying to restrict itself to describe the space of well-formed JSON text there really is nothing to say about object name ordering at that level. It's a semantic issue.

Of course. And RFC4627 does deal with semantics. It is appropriate for RFC4627bis to do so as well. Even if we agreed to drop all RFC2119 language as to semantics we'd still keep interoperability notes about the old (and widely-deployed) semantics.

The semantics you want to specify can be layered upon a normative reference to ECMA-404. Rather have competing and potentially divergence specifications we should be looking a clean separation of concerns.

We already have a clean separation in RFC4627bis: there's the ABNF (syntax) and everything else (semantics). And we all now seem to agree that the ABNF in draft-ietf-json-rfc4627bis-08 is equivalent to the syntax in ECMA-404. If the title of RFC4627 is restored then what ECMA concerns remain?

Multiple normative definitions of the same material. Whether they are equivalent is a matter of interpretation and opinion that can lead to confusion and possibly divergence over time. A solution to this was requested in the TC39 feedback. RFC4627bis should normatively reference ECMA-404 WRT to the syntax and static semantics of JSON. If it chooses to also restate the ECMA-404 grammar in a different notation (ie, ABNF) that material should be designate as informative with ECMA-404 serving as the normative specification of that material.

The position stated by TC39 that ECMA-404 already exists as a normative specification of the JSON syntax and we have requested that RFC4627bis normatively reference it as such and that any restatement of ECMA-404 subject matter should be marked as informative. We think that dueling normative specifications would be a bad thing. Seeing that the form of expression used by ECMA-404 seems to be a issue for some JSON WG participants I have suggested that TC39 could probably be convinced to revise ECMA-404 to include a BNF style formalism for the syntax. If there is interest in this alternative I'd be happy to champion it within TC39.

Is there an assertion that ECMA-404 and draft-ietf-json-rfc4627bis-08 disagree as to syntax? I don't think so. There's a concern that they might, and the easiest way to resolve that concern is to use the same syntax specification in both cases. It would help a lot if TC39 were to publish an ABNF syntax for JSON texts, but even without that it's pretty clear that the two documents do not disagree as to syntax.

Then I think we should be close to agreement. Does the JSON WG wish to formally request that TC39 add a ABNF specification to a new edition of ECMA-404? Would RFC4627bis then normatively reference ECMA-404?

# Carsten Bormann (10 years ago)

On 07 Dec 2013, at 19:05, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

That would be a major breaking change. The JSON WG is chartered not to do those.

If the purpose of removing semantics from the specification is to create a derivative of JSON where this matters, I can finally have my binary data in JSON. You see, I have proposed for a while that any string that is immediately preceded by two newlines is interpreted as a base64url representation of a binary string instead of a text string. Problem solved.

If this usage of whitespace seems somehow revolting, maybe you get an idea of how unacceptable reducing the definition of JSON to its syntax is. Interoperability requires more than common syntax.

In JSON, objects are unordered collections or sets of name/value pairs. It says so right there on json.org (“sets”), and it says so in RFC 4627 (“collections”)*). We may not like it, but it has been a promise for a decade. We need to heed it. (Another promise was that JSON doesn’t change**).)

Data interchange formats where this is not the case may be using the JSON syntax, but aren’t JSON.

*) (The difference is unfortunate, but a fact that we need to deal with.)

**) Which can’t be strictly true, as JSON is as much defined by the collection of its implementations as by its specification. But that’s just limiting the extent of the promise, not giving us a free get out of jail card.

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 12:30 PM, Carsten Bormann wrote:

On 07 Dec 2013, at 19:05, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

That would be a major breaking change. The JSON WG is chartered not to do those.

It is also a major breaking change if downstream semantics can't depend upon the ordering of object members. In particular, it means that the standard built-in ECMAScript JSON parsers as well as the classic JavaScript eval-based processing will be non-conforming. The latter is particularly puzzling as that was the original basis upon which JSON was defined.

In fact, the only place that either the current RFC-4627bis draft or the original RFC-4627 says anything about object name/value pairs being "unordered" is the their introductions The 4627bis language appears to have been directly copied from the original RFC. i It isn't clear whether or not the introduction to 4627bis is intended to be normative. If it is, then I note that it also says in both the new and old documents) that JSON's design goals were for it to be "a subset of JavaScript". The syntactic elements of JavaScript that correspond to the JSON object syntax do have a specified ordering semantics.

When we prepared ECMA-404 we concluded that characterizing JSON objects as unordered was a mistake in the original RFC. The original author did not object to this interpretation.

If the purpose of removing semantics from the specification is to create a derivative of JSON where this matters,

No the purpose is to ensure that the specification remains compatible with the most widely deployed JSON parsers. Specifically, the ECMA-262 conforming parsers that are implemented by JavaScript engines in all major browsers.

I can finally have my binary data in JSON. You see, I have proposed for a while that any string that is immediately preceded by two newlines is interpreted as a base64url representation of a binary string instead of a text string. Problem solved.

If this usage of whitespace seems somehow revolting, maybe you get an idea of how unacceptable reducing the definition of JSON to its syntax is. Interoperability requires more than common syntax.

In JSON, objects are unordered collections or sets of name/value pairs. It says so right there on json.org (“sets”), and it says so in RFC 4627 (“collections”)*). We may not like it, but it has been a promise for a decade. We need to heed it. (Another promise was that JSON doesn’t change**).)

You also need to look at objective reality and consider the possibility that the informal (and non-normative text) on both the json.org website and in the original RFC never actually matched reality.

JSON is derived from JavaSript (whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.

Data interchange formats where this is not the case may be using the JSON syntax, but aren’t JSON.

I disagree with this conclusion, but I think you are approaching an important point of possible agreement. The JSON syntax is used in many ways and for many purposes and if worthy of independent standardization. That is what ECMA-404 does. The JSON WG is certainly free (actually encouraged) to issue a normative standard that addresses interchange requirements for the MIME type application/json. But that should be view only as a spec. for application/json interchange, not the one and only JSON specification.

# Bjoern Hoehrmann (10 years ago)

Allen Wirfs-Brock wrote:

On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:

However, if a schema is also to be allowed to treat them as distinct then the meta-schema must treat them as distinct. I.e., no matter what generic programming language bindings of JSON one users, the above two JSON texts must produce equivalent results when parsed!

"Equivalent" according to what definition?

I suspect intended was "must not produce".

And I do care about the semantic issues. They just don't belong in a syntactic level specification of the JSON format such as ECMA-404. A problem I see with the RFC4627bis is that it conflates a syntactic level specification with a just little bit of semantic data model. It is neither a pure syntactic specification nor a complete data model.

JSON_texts = {     x | x is a JSON text }

JSON_diffs = { (a,b) | a and b are elements of JSON_texts and
                       a is significantly different from b }

A pure specification in your sense above defines only membership in the JSON_texts set. ECMA-404 is not pure in this sense because it defines that e.g. ("[]", "[ ]") is not a member of JSON_diffs.

ECMA-404 does not define that

('{"x":1,"y":2}', '{"y":2,"x":1}')

is not a member of JSON_diffs. Right? It says the white space in the example is insignificant, but it does not say order of key-value-pairs in objects is insignificant. Carsten Bormann gave other examples like ECMA-404's definition of equivalent escape sequences.

Readers of ECMA-404 might assume that it gives a complete description of what people developing and operating JSON-based systems agree are significant differences. They might build systems that rely on the order of key-value-pairs in objects because of this, for instance wiki.apache.org/solr/UpdateJSON#Solr_3.1_Example

Systems like ecmascript's JSON.stringify API cannot ordinarily create such JSON texts and would be unable to interact with such a system. That is something the IETF JSON Working Group wishes to avoid, accordingly they provide a more complete definition of the JSON_diffs equivalence relation that better reflects rough consensus and running code of the JSON community.

I believe the combination of impurity and incompleteness in ECMA-404 is harmful to the JSON community.

# Carsten Bormann (10 years ago)

On 08 Dec 2013, at 00:05, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

On Dec 7, 2013, at 12:30 PM, Carsten Bormann wrote:

On 07 Dec 2013, at 19:05, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

That would be a major breaking change. The JSON WG is chartered not to do those.

It is also a major breaking change if downstream semantics can't depend upon the ordering of object members.

Wait a minute. It can’t be a breaking change because it is not a change.

JSON parsers are free to implement extensions (section 4), so none of the JavaScript extensions make them non-conforming JSON parsers. Many JSON parsers won’t implement these extensions, and many JSON generators won’t be able to generate them, so arguing they have become part of JSON because they are in one parser doesn’t quite work.

When we prepared ECMA-404 we concluded that characterizing JSON objects as unordered was a mistake in the original RFC.

Silently making this breaking change is a nice illustration for the process issues that might make some of us a bit reluctant to use ECMA-404 as a normative reference, even if it were turned into a technically superior spec.

The original author did not object to this interpretation.

It is, however, still on json.org, so we seem to have a bit of a communication problem here.

JSON is derived from JavaSript

“Was originally derived” would be closer; after JavaScript changed, JSON is not even a subset of JavaScript any more. And that historical ancestry doesn’t make JavaScript specifications the specification for JSON.

(whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.

As it is free to do; that doesn’t change JSON the data interchange format though.

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 4:55 PM, John Cowan wrote:

Allen Wirfs-Brock scripsit:

Similarly, the JSON texts:

{"1":  1, "2", 2}

and

{"2":  2, "1": 1}

or the JSON texts:

{"a": 1, "a": 2}

and

{"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the parser in order for downstream semantics to be applied.

I cannot accept this statement without proof. Where in the ECMAscript definition does it say this?

First, the console out from an experiment run from the developer console of Firefox 27

17:22:24.873 var jsonText1 = '{"a": 1, "a": 2}';
17:22:25.244 undefined                                     <----- ignore these, they are a console artifact
17:22:50.107 console.log(jsonText1);
17:22:50.124 undefined
17:22:50.125 "{"a": 1, "a": 2}"    <-----note that the console doesn't property escape embedded quotes
17:23:45.060 var jsonText2 = '{"a": 2, "a": 1}';
17:23:45.062 undefined
17:24:18.594 console.log(jsonText2);
17:24:18.649 undefined
17:24:18.649 "{"a": 2, "a": 1}"
17:25:31.540 var parsedText1 = JSON.parse(jsonText1);
17:25:31.577 undefined
17:26:36.429 console.log(parsedText1.a)
17:26:36.568 undefined
17:26:36.569 2
17:27:13.754 var parsedText2 = JSON.parse(jsonText2);
17:27:13.882 undefined
17:27:37.533 console.log(parsedText2.a)
17:27:37.660 undefined
17:27:37.661 1

Note that the value of the 'a' property on the JavaScript object produced by JSON.parse is either 1 or 2 depending upon the ordering of the member definitions with duplicate names. I'll leave it to you to try using your favorite browser. However, I'm confident that you will see the same result as this is what ECMA-262, 5th Edition requires. I happen to be fairly familiar with that document, so I can explain how that is:

  1. JSON.parse is specified in by the algorithms in section 15.12.2 starting with the first algorithm in that section.
  2. Step 2 of that algorithm requires validation the input string against the JSON grammar provided in 15.12.1
  3. If the input text cannot be recognized by a parser for that grammar, an exception must be thrown at that point.
  4. If the input text is recognized by the parser, then step 3 says to parse and evaluate the input text as with it was ECMAScript source code. The result of that evaluation is what is normally returned from the function. The ECMAScript parsing and evaluation rules can be used in this manner because a well-formed JSON text (that is verified in step 2) is a subset of an ECMAScript PrimaryExpression.
  5. The text of a JSON object definition will be parsed and evaluated as if it was an EMAScript ObjectLiteral The evaluation semantics are specified by the algorithms that follow the BNF in that section.
  6. Note that the body of an ObjectLiteral is described by the PropertyNameAndValueList production which produces a comma separated list of PropertyAssignment productions.
  7. The PropertyAssignments of a PropertyNameAndValueList are evaluated in left to right order, as specified by 4th algorithm on this section.
  8. As each PropertyAssignment is evaluated, it performs a [[DefineOwnProperty]] operation up on the result object using the property name and value provided by the PropertyAssignment.
  9. [[DefineOwnProperty]] is defined in section 8.12.9. It is a fairly complex operation but the short story is that if a property of that name does not already exist one is created and assigned the associated value. If a property of that name does already exist, the existing value is overwritten with the current value.

In other words, ECMA-262 explicitly specifies that when multiple occurrences of the same member name occurs in a JSON object, the value associated with the last (right-most) occurrence is used. Order matters.

A similar analysis applies to the first example.

# Bjoern Hoehrmann (10 years ago)
  • Allen Wirfs-Brock wrote:

On Dec 7, 2013, at 4:55 PM, John Cowan wrote:

Allen Wirfs-Brock scripsit:

Similarly, the JSON texts: {"1": 1, "2", 2} and {"2": 2, "1": 1}

or the JSON texts: {"a": 1, "a": 2} and {"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the parser in order for downstream semantics to be applied.

I cannot accept this statement without proof. Where in the ECMAscript definition does it say this?

In other words, ECMA-262 explicitly specifies that when multiple occurrences of the same member name occurs in a JSON object, the value associated with the last (right-most) occurrence is used. Order matters.

A similar analysis applies to the first example.

Your analysis does not demonstrate that JSON.parse preserves ordering. I am confident that even in the current ES6 draft JSON.stringify does not preserve ordering even if JSON.parse somehow did. It's based on Object.keys which does not define ordering as currently proposed. If you can re-create the key-value-pair order in your first example from the output of JSON.parse without depending on implementation-defined behavior, seeing the code for that would be most instructive.

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 6:39 PM, Bjoern Hoehrmann wrote:

Your analysis does not demonstrate that JSON.parse preserves ordering. I am confident that even in the current ES6 draft JSON.stringify does not preserve ordering even if JSON.parse somehow did. It's based on Object.keys which does not define ordering as currently proposed. If you can re-create the key-value-pair order in your first example from the output of JSON.parse without depending on implementation-defined behavior, seeing the code for that would be most instructive.

You are correct that, ES5 does not define the for-in enumeration order. But it does say that the Object.keys ordering must be the same as for-in enumeration order. and there is a defacto standard for a partial enumeration order that all browsers implement.

Quoting from mail.mozilla.org/htdig/es-discuss/2009-October/010060.html:

The common behavior subset here is: for objects with no properties that look like array indices, and no enumerable prototype properties, for..in enumeration returns properties in insertion order. That particular behavior is a de facto standard and required for Web compatibility. A future standard should specify at least that much.

Also esdiscuss/2010-December/012469.html:

We did identify one situation where enumeration order will be the same across all major implementation that are currently in use (including IE6):

The enumeration order of an object's properties will be the order in which the properties were added if all the following conditions hold:

  • The object has no inherited enumerable properties
  • The object has no array indexed properties
  • No properties have been deleted
  • No property has had its attributes modified or been changed from a data property to an accessor property or visa versa

Also see esdiscuss/2011-March/012965 must other discussion history you can find in the es-discuss archives.

In practice, JavaScript implementation do have a standard enumeration order that applies for the cases that most commonly when parsing and generating JSON text. Application do depend upon that ordering.

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 7:22 PM, John Cowan wrote:

Allen Wirfs-Brock scripsit:

In other words, ECMA-262 explicitly specifies that when multiple occurrences of the same member name occurs in a JSON object, the value associated with the last (right-most) occurrence is used. Order matters.

Okay, I concede that order matters when there are duplicate names. I still deny that it matters otherwise.

In reality it defines the JavaScript for-in enumeration order over the JS object property generated by JSON.parse.

Try this in your favorite browser:

var jText = '{"b": 1, "a": 2, "c": 3};
for (var key in JSON.parse(jText) console.log(key);

You will get as output:

 b
 a
 c

I can assure you that code exists on the web that depends, whether intentionally or not, on this ordering. Past experience among browser implementations is that site break when attempts are made to change this ordering.

# Carsten Bormann (10 years ago)

On 08 Dec 2013, at 10:58, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:

There are two description methods is ECMA-404: text and "railroad" diagrams.

There are actually two quite different kinds of racetracks (“railroad diagrams”), the first three at the parser level and the second two at the scanner level. This is nowhere explained (it just happens to be the only one of the many potential interpretations of ECMA-404 that yields a result getting close to current practice), and you have to piece together from section 4 what the interface between the two levels might be.

The textual descriptions are in some cases quite precise, but in some other cases, leave quite a bit of ambiguity. And stuff like "It may have an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D)." (in particlar the first clause of that sentence) doesn't make much sense.

I read them mainly as a way to give meaning to the bubbles in the racetracks. All the statements that define the semantics of the data are really errata anyway, we have learned. (If one were to do the semantics properly for section 8, one could simply reference ISO 6093. RFC 4627 does that implicitly by saying "The representation of numbers is similar to that used in most programming languages.".)

As for the railroad tracks, besides just floating in the spec without references, the notation is also not at all explained. If one took the most straightforward and obvious interpretation (that's not how standards work, but anyway), it's not too difficult to come up with a formally precise way of converting each of them into a diagrams for a finite state machine. From there, conversion to the ABNF, or showing equivalence, on a quite formal level, shouldn't be too much a problem.

Yes, please. I was asking for someone to do that work, maybe in the process generating some guidance on how to read ECMA-404. It won’t be me.

# Bjoern Hoehrmann (10 years ago)

Martin J. Dürst wrote:

The textual descriptions are in some cases quite precise, but in some other cases, leave quite a bit of ambiguity. And stuff like "It may have an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D)." (in particlar the first clause of that sentence) doesn't make much sense. If e.g. 1.2 has an exponent of 10, it's going to be 6.1917 or so, not at all what this notation is usually used for.

Apparently in 2 is "an exponent of" x. That does not make much sense to me either, but it does appear to be a common english idiom.

# Carsten Bormann (10 years ago)

here are some replies to your messages that I promised. I opted not to use line-by-line responses; I hope they are easier to read this way. Two technical, and two more general points.

Processing model

You are presenting a processing model that is based on a generic parser that creates an AST and then application-specific post-processing. This is pretty much how XML worked.

One of the major advances of JSON was that it has a data model (even if it is somewhat vaguely specified — implementers were still quick in picking it up). JSON implementations typically go right from the characters to a representation of that data model that is appropriate for the platform, and encoders typically go all the way back in one step. Interoperability happens as a result of this processing.

That's a major reason why it is so important to think about JSON in terms of its data model. The IETF JSON WG has elected not to flesh out the model any more for 4627bis than it is in RFC 4627 (which I personally don't agree with, but it would have been more hard work). Dismantling what is there in the way of a data model, and thus falling back into an XML mindset, would be a major regression, though.

Description techniques

You are right that programming language designers have been used to two-level grammars (scanner/parser) for a long time. One thing that RFC 4627 got right was not doing this, but using the single-level ABNF. (Technically, there still is an UTF-8 decoder below the ABNF, but that is a rather well-understood, separable machine.) JSON is simple enough to enable single-level description, and RFC 5234 ABNF provides a rigorous yet understandable way to do this. There are tools that operate on that ABNF and produce useful results, because it has a defined meaning.

Let me be very frank here, because I get the impression that previous statements about this were too diplomatic to be clear.

There is no way on earth that anyone can argue that the description of the JSON syntax in ECMA-404 is in any way superior to that in RFC 4627. By a large margin. This is not about possibly finding bugs in the hodge-podge that ECMA-404 is; thank you for offering to do ECMA's work here, but I'm not biting. This is about making sure from a start that the spec is right. Making 4627bis reference ECMA-404 would be a major regression. There is no reason for incurring this regression seven years after it already was done right. The IETF isn't known for doing things that are unjustifiable on a technical level.

Stewardship

You mention that there is significant programming language expertise in TC39. I couldn't agree more (actually, the previous sentence is a wild understatement), and I can say that I have been using ECMA-262 (ES3 is the last version with which I have significant experience) as a model for programming language standards.

My point was not at all about lack of experience, it is about attention. By its nature, TC39's work on JSON will always focus on the narrower needs of JavaScript. That makes TC39 less qualified for the stewardship of a standard that transcends language barriers than one might like to admit. I'm not going to illustrate this point further; there is ample illustration in the mailing list archives.

Way forward

As always, only the chairs can speak for the JSON WG, and even they need to confirm any needed consensus in the WG beforehand. But I think I can say that we are still only guessing what TC39 is trying to achieve with the sudden creation of ECMA-404. I think we need to have a frank discussion about the objectives of further work on JSON. The JSON WG has a charter that defines its objectives, which is focusing on stability and interoperability. I'd like to understand TC39's objectives with respect to JSON, so we can find out whether there is common ground or not.

# Jorge Chamorro (10 years ago)

On 08/12/2013, at 16:26, Carsten Bormann wrote:

Way forward

As always, only the chairs can speak for the JSON WG, and even they need to confirm any needed consensus in the WG beforehand. But I think I can say that we are still only guessing what TC39 is trying to achieve with the sudden creation of ECMA-404. I think we need to have a frank discussion about the objectives of further work on JSON. The JSON WG has a charter that defines its objectives, which is focusing on stability and interoperability. I'd like to understand TC39's objectives with respect to JSON, so we can find out whether there is common ground or not.

Here's the message from the very same inventor of JSON telling exactly what's ECMA-404 "trying to achieve". Hope it helps:

Begin forwarded message:

From: Douglas Crockford <douglas at crockford.com>

Date: 13 de junio de 2013 17:50:33 GMT+02:00

To: "json at ietf.org" <json at ietf.org>

Subject: [Json] Two Documents

The confusion and controversy around this work is due to a mistake that I made in RFC 4627. The purpose of the RFC, which is clearly indicated in the title, was to establish a MIME type. I also gave a description of the JSON Data Interchange Format. My mistake was in conflating the two, putting details about the MIME type into the description of the format. My intention was to add clarity. That obviously was not the result.

JSON is just a format. It describes a syntax of brackets and commas that is useful in many contexts, profiles, and applications. JSON is agnostic about all of that stuff. JSON shouldn't even care about character encoding. Its only dependence on Unicode in the hex numbers used in the \u notation. JSON can be encoded in ASCII or EBCDIC or even Hollerith codes. JSON can be used in contexts where there is no character encoding at all, such as paper documents and marble monuments.

There are uses of JSON however in which such choices matter, and where behavior needs to be attached to or derived from the syntax. That is important stuff, and it belongs in different documents. Such documents will place necessary restrictions on JSON's potential. No such document can fit all applications, which causes much of the controversy we've seen here. One size cannot fit all. JSON the format is universal. But real applications require reasonable restrictions.

So we should be working on at least two documents, which is something we have discussed earlier. The first is The JSON Data Interchange Format, which is a simple grammar. The second is a best practices document, which recommends specific conventions of usage.

# Allen Wirfs-Brock (10 years ago)

On Dec 8, 2013, at 7:44 AM, John Cowan wrote:

Allen Wirfs-Brock scripsit:

You are correct that, ES5 does not define the for-in enumeration order. But it does say that the Object.keys ordering must be the same as for-in enumeration order. and there is a defacto standard for a partial enumeration order that all browsers implement.

In short, half a dozen or so JSON implementations in a JavaScript environment agree. That hardly means that all other JSON implementations in whatever environment should be dragged along with them.

Right, from an interoperability perspective, the half dozen or so user agents used by essentially the entire world to run web applications really aren't significant at all...

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 11:00 PM, Martin J. Dürst wrote:

On 2013/12/08 9:49, John Cowan wrote:

Tim Bray scripsit:

I assume all parties to the discussion know that in 100% of all programming-language libraries in use for dealing with JSON as transmitted on the wire, JSON objects are loaded into hash tables or dicts or whatever your language idiom is, and there is no way for software using those libraries to discover what order they were serialized in,

Well, no, not 100%. In Lisp-family languages, JSON objects are often deserialized to ordered constructs. Nevertheless:

Similarly, as of somewhere around version 1.9.x or 2.0, Hash entries in Ruby are ordered, and one would assume that the original order in JSON would be reflected in the order of the Hash entries.

any suggestion that this order should be considered significant for application usage would be regarded as insane.

+1 to that.

+1 here, too.

Millions of web developers write code with these sorts of dependencies. Not because they are insane, more often it is because then are unaware of the bit falls. However, its not an interoperability issue if they are writing web application code that is only intended run in a web browser and all browsers behave the same on that code.

More broadly, the JSON language binding parsers that I'm most familiar with do not generate a high fidelity view of all valid JSON texts that they are presented with. It would be a mistake to depend upon such parsers to interchange data using JSON schemas that assign meaning to the ordering of object members. However, that would not necessarily be the case for an application that is using a streaming JSON parser.

Consider this informal description of a data schema that is representable using JSON.

Conversation Schema: A Conversation is a JSON text consisting of a single JSON object. The object may have an arbitrary number of members. The members represent a natural language conversation where the key part of each member identifies participant in the conversation and the value part of each member is a JSON string value that captures a statement by the associated participant. Multiple members may have the same key value, corresponding to multiple statements by the same participant. The order of the members corresponding to the order in which the statements were made during the conversation.

And here is an example of such a JSON text:

{
"allenwb":  "there is an objectively observable order to the members of a JSON object",
"JSON WG participant 1":  "It would be insane to depend upon that ordering",
"allenwb":  "not if there is agreement between a producer and consumer on the meaning of the ordering",
"JSON WG participant 2":  "But JSON.parse and similar language bindings don't preserve order",
"allenwb":  "A streaming JSON parser would naturally preserve member order",
"JSON WG participant 2": "I din't think there are any such parsers",
"allenwb": "But someone might decide to create one, and if they do it will expose object members, in order",
"allenwb": "Plus, in this particular case the schema is so simple the application developer might well design to write a custom, schema specific streaming parser."
}
# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 11:09 PM, Martin J. Dürst wrote:

[Somebody please forward this message to es-discuss at mozilla.org, unless it's not rejected.]

On 2013/12/08 8:05, Allen Wirfs-Brock wrote:

In JSON, objects are unordered collections or sets of name/value pairs. It says so right there on json.org (“sets”), and it says so in RFC 4627 (“collections”)*). We may not like it, but it has been a promise for a decade. We need to heed it. (Another promise was that JSON doesn’t change**).)

You also need to look at objective reality and consider the possibility that the informal (and non-normative text) on both the json.org website and in the original RFC never actually matched reality.

JSON is derived from JavaSript (whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.

RFC 4627 was published in July 2006, so the ECMA-262 version of 2009 may not be very relevant.

My understanding was that one of reasons for activating the JSONWG was the perceived need for a JSON grammar that could be normatively referenced. RFC 4627 (2006) was not a normative document. ECMA-262, 5th Edition (2009) aka ISO/IEC-16262-3 (2011) is a normative document. So is ECMA-404.

# Allen Wirfs-Brock (10 years ago)

On Dec 7, 2013, at 11:21 PM, Nico Williams wrote:

On Sun, Dec 08, 2013 at 04:09:04PM +0900, "Martin J. Dürst" wrote:

On 2013/12/08 8:05, Allen Wirfs-Brock wrote:

JSON is derived from JavaSript (whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.

RFC 4627 was published in July 2006, so the ECMA-262 version of 2009 may not be very relevant.

ECMA-262 may well have been a codification of older behavior -- there's winning this sort of argument. If there really is an unreconcilable divergence as-deployed, then we ought to document that. A simple addition to the last paragraph (or a new paragraph after it) of section 4 of RFC4627bis-08 should suffice.

I agree. Fundamentally I have been asking what is the technical meaning of the statement "an object is an unordered collection" that occurs in section 1 (Introduction) of the current draft for 4627bis. No one has yet responded to my question regarding whether or not statements in the Introduction are considered normative.

Section 4 which presumably is supplying the normative specification of a JSON object currently says nothing about member ordering.

If the intent is for the statement in the introduction to have some normative mean, then please make that meaning technically clear in section 4.

# Carsten Bormann (10 years ago)

On 08 Dec 2013, at 19:57, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

{
"allenwb":  "there is an objectively observable order to the members of a JSON object",
"JSON WG participant 1":  "It would be insane to depend upon that ordering",
"allenwb":  "not if there is agreement between a producer and consumer on the meaning of the ordering",
"JSON WG participant 2":  "But JSON.parse and similar language bindings don't preserve order",
"allenwb":  "A streaming JSON parser would naturally preserve member order",
"JSON WG participant 2": "I din't think there are any such parsers",
"allenwb": "But someone might decide to create one, and if they do it will expose object members, in order",
"allenwb": "Plus, in this particular case the schema is so simple the application developer might well design to write a custom, schema specific streaming parser."
}

Which by at least one JSON decoder*) is decoded as:

allenwb: Plus, in this particular case the schema is so simple the application developer
  might well design to write a custom, schema specific streaming parser.
JSON WG participant 1: It would be insane to depend upon that ordering
JSON WG participant 2: I din't think there are any such parsers

(For readability, this one encoded in YAML, another JSON extension.)

Nice demonstration of the point here.

*) ruby -rjson -ryaml -e 'puts JSON.parse(File.read("allen.json")).to_yaml' >allen.yaml

# Nick Niemeir (10 years ago)

On Sun, Dec 8, 2013 at 10:57 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

However, that would not necessarily be the case for an application that is using a streaming JSON parser.

{
"allenwb":  "there is an objectively observable order to the members of a
JSON object",
"JSON WG participant 1":  "It would be insane to depend upon that
ordering",
"allenwb":  "not if there is agreement between a producer and consumer on
the meaning of the ordering",
"JSON WG participant 2":  "But JSON.parse and similar language bindings
don't preserve order",
"allenwb":  "A streaming JSON parser would naturally preserve member
order",
"JSON WG participant 2": "I din't think there are any such parsers",
"allenwb": "But someone might decide to create one, and if they do it will
expose object members, in order",
"allenwb": "Plus, in this particular case the schema is so simple the
application developer might well design to write a custom, schema specific
streaming parser."
}

One good example of a streaming parser is the npm package JSONStream. If you wanted to accept conversations on standard in and output allenwb's statements on standard out you could use this node program:

var JSONStream = require('JSONStream')

process.stdin
  .pipe(JSONStream.parse('allenwb'))
  .pipe(process.stdout)

With this JSON text the output is the expected:

there is an objectively observable order to the members of a JSON objectnot
if there is agreement between a producer and consumer on the meaning of the
orderingA streaming JSON parser would naturally preserve member orderBut
someone might decide to create one, and if they do it will expose object
members, in orderPlus, in this particular case the schema is so simple the
application developer might well design to write a custom, schema specific
streaming parser.
# Bjoern Hoehrmann (10 years ago)

Allen Wirfs-Brock wrote:

{
"allenwb":  "there is an objectively observable order to the members of a JSON object",
"JSON WG participant 1":  "It would be insane to depend upon that ordering",
"allenwb":  "not if there is agreement between a producer and consumer on the meaning of the ordering",
"JSON WG participant 2":  "But JSON.parse and similar language bindings don't preserve order",
"allenwb":  "A streaming JSON parser would naturally preserve member order",
"JSON WG participant 2": "I din't think there are any such parsers",
"allenwb": "But someone might decide to create one, and if they do it will expose object members, in order",
"allenwb": "Plus, in this particular case the schema is so simple the application developer might well design to write a custom, schema specific streaming parser."
}

There is observable white space outside strings in JSON texts. It would be insane to depend on the placement of white space outside strings. Not if there is agreement on the meaning of that white space. Most parsers do not preserve such white space. A generic ABNF parser would naturally preserve it...

It is quite possible that there are steganographic or cryptographic pro- tocols that use insignificant white space in JSON texts as subtle form of communication or for integrity protection, just like they might use order of object members for the same purpose.

However, what we are discussing here is what people should assume when we say "We use JSON!" so there do not have to be detailed negotiations to establish agreements, i.e., a Standard. And people should very much assume that the ordering of object members is as insignificant as the placement of white space outside strings.

# Carsten Bormann (10 years ago)

On 09 Dec 2013, at 05:10, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:

In the original text, neither are these two usages disambiguated, nor is there any explanation about where the "10" is coming from or how it has to be used.

I think this is symptomatic of a larger problem that we occasionally fall for when writing specs.

ECMA-404 appears to be a textbook example of a “trapdoor spec” — if you already know what it is supposed to say, then it reads fine, but if you approach it as a fresh spec, it is undecipherable, as it relies on tacit knowledge to connect the dots.

Now in this case that may not be as big a problem because everybody already does know what JSON is*). I’m still not thrilled to use it as a normative reference.

More importantly, reducing JSON to its surface syntax, and removing a few points about the data model (even though much of it remains in the form of allusions) opens the door to forking the data model. This will allow all kinds of cool things to be done by repurposing the JSON syntax, but will damage the JSON ecosystem that is built around that data model.

One wonders whether that is the point.

Grüße, Carsten

*) Here specifically, we all know how to write numbers in programming languages, and (as long as you don’t address the hard problems like exactness) the idiosyncratic syntax details (decimal only, no leading zeroes on mantissa, no plus, but leading zeroes or plus are allowed on the exponent, E can be upper or lower case) are all that is needed to detail this spec, even though there is much more to actual interoperability. Few implementers will get the semantics wrong from that skimpy spec.

# Carsten Bormann (10 years ago)

So what's the reason you talk about two levels?

If you interpret the first three racetracks as generating a sequence of characters, or the last two as generating a sequence of tokens, you get the wrong result.

RFC 4627 does that implicitly by saying "The representation of numbers is similar to that used in most programming languages.".)

That's not very precise either, but it's at least telling the reader where to look further if s/he doesn't understand what's intended.

Actually, to the extent that RFC 4627 does define JSON's data model, the result of this simple statement is surprisingly precise. It only stops helping you much when you reach the limits of precision or range (e.g., what to do with 1e400.)

Another problem is that it's not scalable, in the sense that it won't work anymore if everybody would do it.

Right. But then, section 11.8.3.1 of the ES6 draft is an example for why it is tedious to do this. (It is also, I believe, a nice example how easy it would be to get this wrong and that nobody would actually notice a mistake buried in there, unless they do the work to systematically check every detail or to translate it into a machine-checkable form. Fortunately, our number system is relatively stable; I’d hate to maintain a spec that has this level of tedium on something that actually evolves. For added fun, compare with 7.1.3.1.1, which is mostly saying the same thing, but does it in a subtly different way. That’s why ES6 is 531 pages...)

I'm not planning to do any work. I was just trying to point out that the technical work is not that difficult (after some leaps of faith to take the 'most obvious' interpretation of racetracks,…).

Yep. But if nobody does that work (or, more precisely, admits to having done that work), we simply don’t know whether the statement that triggered this little subthread is true or not. I have made too many stupid mistakes in seemingly simple specs that became obvious only as soon as I used a tool to check the spec.

# Allen Wirfs-Brock (10 years ago)

On Dec 9, 2013, at 3:53 AM, Carsten Bormann wrote:

So what's the reason you talk about two levels?

If you interpret the first three racetracks as generating a sequence of characters, or the last two as generating a sequence of tokens, you get the wrong result.

RFC 4627 does that implicitly by saying "The representation of numbers is similar to that used in most programming languages.".)

That's not very precise either, but it's at least telling the reader where to look further if s/he doesn't understand what's intended.

Actually, to the extent that RFC 4627 does define JSON's data model, the result of this simple statement is surprisingly precise. It only stops helping you much when you reach the limits of precision or range (e.g., what to do with 1e400.)

Another problem is that it's not scalable, in the sense that it won't work anymore if everybody would do it.

Right. But then, section 11.8.3.1 of the ES6 draft is an example for why it is tedious to do this. (It is also, I believe, a nice example how easy it would be to get this wrong and that nobody would actually notice a mistake buried in there, unless they do the work to systematically check every detail or to translate it into a machine-checkable form. Fortunately, our number system is relatively stable; I’d hate to maintain a spec that has this level of tedium on something that actually evolves. For added fun, compare with 7.1.3.1.1, which is mostly saying the same thing, but does it in a subtly different way. That’s why ES6 is 531 pages...)

I'm not planning to do any work. I was just trying to point out that the technical work is not that difficult (after some leaps of faith to take the 'most obvious' interpretation of racetracks,…).

Yep. But if nobody does that work (or, more precisely, admits to having done that work), we simply don’t know whether the statement that triggered this little subthread is true or not. I have made too many stupid mistakes in seemingly simple specs that became obvious only as soon as I used a tool to check the spec.

Grüße, Carsten

I want to address a few points brought up in this subthread, primarily between Carsten and Martin.

First Syntax Diagrams (aks, RailRoad Diagrams and called racetracks in this thread) are a well known formalism for expressing a context free grammar. For example see en.wikipedia.org/wiki/Syntax_diagram Any competent software engineer should be able to recognize and read a syntax diagram of this sort. There is no mystery about them. Any grammar that can be expressed using BNF can also be expressed using a Syntax Diagram although I think most would agree that BNF is a better alternative for large grammars.

This whole issue of the use of Syntax Diagrams rather than BNF is a stylist debate that is hard to take seriously. If TC39 informed you that we are converting the notation used in ECMA-404 to a BNF formalism would that end the objections to normatively referencing ECMA-404 from 4627bis? Unfortunately, I'm pretty sure it wouldn't.

Regarding, using of a multiply level definition within ECMA-404. That is a standard practice within language specification where the "tokens" of a language are often described using a FSM level formalism and the syntactic structure is described using a PDA level formalism. However, there is nothing that prevents a PDA level abstraction such as a BNF from being using to describe "tokens" even with the full power of a PDA isn't used. The ECMA-262 specification is an example of a language specification that using a BNF to describe both its lexical and syntactic structure.

In the case of ECMA-404, clause 4 is clearly defining the lexical level of the language (it is talking about "tokens") and it clearly states that numbers and strings are tokens. Hence there is no ambiguity about how to interpret the syntax diagrams for number and string in clauses 8 and 9. None of the subelements of diagrams are "tokens" so there is no plausible way they could be misconstrued as generating or recognizing a sequence of tokens.

The only normative purpose of the first paragraph in clause 8 (Numbers) is to identify the code points that are symbolically referenced by the Syntax diagram. Everything else in that paragraph is either redundant (describe by the diagram) or pseudo-semantics that are outside the scope of what ECMA-404 defines.

This is a common problem seen in many specification that try to clarify a formalism with supplementary prose and instead ends up sowing confusion. If a bug was filed against this for ECMA-404 it will probably be cleaned up in the next edition. Note that the current 4627bis draft is very similar in this regard. It talks about an "exponent part" with out defining that term. (it doesn't appear in the grammar). It doesn't specify how to actually interpret a number token as a mathematical value or how to generate one from a mathematical value. It only says that JSON numbers are similar to those in most programming languages (which includes a very wide range of possibilities).

Specs. can have both technical and editorial bugs. If you think there are bugs in ECMA-404 the best thing to do is to submit a bug ticket at bugs.ecmascript.org. If there is a critical bug that you think prevents 4627bis from normatively referencing ECMA-404 say so and assign the bug a high priority in the initial ticket. But please, start with actual errors, ambiguities, inconsistencies, or similar substantive issue. Stylistic issues won't be ignore but they are less important and harder to reach agreement on.

# Bjoern Hoehrmann (10 years ago)
  • Allen Wirfs-Brock wrote:

This whole issue of the use of Syntax Diagrams rather than BNF is a stylist debate that is hard to take seriously. If TC39 informed you that we are converting the notation used in ECMA-404 to a BNF formalism would that end the objections to normatively referencing ECMA-404 from 4627bis? Unfortunately, I'm pretty sure it wouldn't.

If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other discussion of JSON and a clear indication that future editions will not add such discussion, and will not change the grammar without IETF con- sensus, I would be willing to entertain the idea of making ECMA-404 a normative reference.

How soon would TC39 be able to make such a decision and publish a re- vised edition of ECMA-404 as described above?

# Brendan Eich (10 years ago)

bugs.ecmascript.org -- please use it, you will be amazed at how quickly the bug is resolved. Thanks,

# Allen Wirfs-Brock (10 years ago)

On Dec 9, 2013, at 5:40 PM, Bjoern Hoehrmann wrote:

  • Allen Wirfs-Brock wrote:

This whole issue of the use of Syntax Diagrams rather than BNF is a stylist debate that is hard to take seriously. If TC39 informed you that we are converting the notation used in ECMA-404 to a BNF formalism would that end the objections to normatively referencing ECMA-404 from 4627bis? Unfortunately, I'm pretty sure it wouldn't.

If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other discussion of JSON and a clear indication that future editions will not add such discussion, and will not change the grammar without IETF con- sensus, I would be willing to entertain the idea of making ECMA-404 a normative reference.

Note that ECMA-404 already says (in the introduction):

"It is expected that other standards will refer to this one, strictly adhering to the JSON text format, while imposing restrictions on various encoding details. Such standards may require specific behaviours. JSON itself specifies no behaviour.

Because it is so simple, it is not expected that the JSON grammar will ever change. This gives JSON, as a foundational notation, tremendous stability."

The second paragraph is speaking about the language described by the grammar, not the actual formalism used to express the grammar. I'm quite sure that there is no interest at all within TC39 to ever change the actual JSON language. If you are looking for some sort of contractual commitment from ECMA, I suspect you are wasting your time. Does the IETF make such commitments?

TC39 is a consensus based organization so I can't make commitments for it or the ECMA-404 project editor. But, let me quote two previous statements I've made on this thread concerning the grammar notation:

"It's silly to be squabbling over such a notational issues and counter-productive if such squabbles results multiple different normative standards for the same language/format. TC39 would likely be receptive to a request to add to ECMA-404 an informative annex with a BNF grammar for JSON (even ABNF, even though it isn't TC39's normal BNF conventions). Asking is likely to produce better results than throwing stones."

"The position stated by TC39 that ECMA-404 already exists as a normative specification of the JSON syntax and we have requested that RFC4627bis normatively reference it as such and that any restatement of ECMA-404 subject matter should be marked as informative. We think that dueling normative specifications would be a bad thing. Seeing that the form of expression used by ECMA-404 seems to be a issue for some JSON WG participants I have suggested that TC39 could probably be convinced to revise ECMA-404 to include a BNF style formalism for the syntax. If there is interest in this alternative I'd be happy to champion it within TC39."

This doesn't mean that TC39 would necessarily agree to eliminate the Syntax Diagrams, or that we wouldn't carefully audit any grammar contribution to make sure that it is describing the same language. There may also be minor issues that need to be resolved. But we seem to agree that we already are both accurately describing the same language so this is really about notational agreement.

How soon would TC39 be able to make such a decision and publish a re- vised edition of ECMA-404 as described above?

As a base line, ECMA-404 was created in less than a week. It takes a couple months to push through a letter ballot to above a revised standard.

# James Clark (10 years ago)

On Fri, Dec 6, 2013 at 2:51 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:

The static semantics of a language are a set of rules that further restrict which sequences of symbols form valid statements within the language. For example, a rule that the 'member' names must be disjoint within an 'object' production could be a static semantic rule (however, there is intentionally no such rule in ECMA-404).

The line between syntax and static semantics can be fuzzy. Static semantic rules are typically used to express rules that cannot be technically expressed using the chosen syntactic formalism or rules which are simply inconvenient to express using that formalism. For example, the editor of ECMA-404 chose to simplify the RR track expression of the JSON syntax by using static semantic rules for whitespace rather than incorporating them into RR diagrams.

Another form of static semantic rules are equivalences that state when two or more different sequences of symbols must be considered as equivalent. For example, the rules that state equivalencies between escape sequences and individual code points within an JSON 'string'. Such equivalences are not strictly necessary at this level, but it it simplifies the specification of higher level semantics if equivalent symbol sequences can be normalized at this level of specification.

When we talk about the "semantics" of a language (rather than "static

semantics") we are talking about attributing meaning (in some domain and context) to well-formed (as specified via syntax and static semantics) statements expressed in that language.

...

What we can do, is draw a bright-line just above the level of static semantics.This is what ECMA-404 attempts to do.

I don't see how you can accommodate the second kind of static semantic rule within the definition of conformance that you have chosen for ECMA-404. Section 2 defines conformance in terms of whether a sequence of Unicode code points conforms to the grammar. This doesn't even accommodate the first kind of static semantic rule, but it is obviously easy to extend it so that it does. However, to accommodate the second kind of static semantic rule, you would need a notion of conformance that deals with how conforming parsers interpret a valid sequence of code points.

I think it is coherent to draw a bright-line just above the first level of static semantics. If you did that, then most of the prose of section 9 (on Strings) would have to be removed; but this would be rather inconvenient, because most specifications of higher-level semantics would end up having to specify it themselves.

However, I find it hard to see any bright-line above the second level of static semantics and below semantics generally. Let's consider section 9. I would argue that this section should define a "semantics" for string tokens, by defining a mapping from sequences of code points matching the production string (what I would call the "lexical space") into arbitrary sequences of code points (what I would call the "value space"). The spec sometimes seems to be doing this and sometimes seems to be doing something more like your second kind of static semantics. Sometimes it uses the term "code point" or "character" to refer to code points in the lexical space ("A string is a sequence of Unicode code points wrapped with quotation marks"), and sometimes it uses those terms to refer to code points in the value space ("Any code point may be represented as a hexadecimal number"). You could redraft so that it was expressed purely in terms of code points in the lexical space, but that would be awkward and unnatural: for example, an hexadecimal escape would represent either one or two code points in the lexical space. Furthermore I don't see what you would gain by this. Once you talk about equivalences between sequences, you are into semantics and you need a richer notion of conformance.

So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to

avoid describing JSON beyond the level of static semantics.

ECMA-404 see JSON as "a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications".

There are many possible semantics and categories of semantics that can be applied to well-formed statements expressed using the JSON syntax.

...

The problem with trying to standardize JSON semantics is that the various semantics that can be usefully be imposed upon JSON are often mutually incompatible with each other. At a trivial level, we see this with issues like the size of numbers or duplicate object member keys. It is very hard to decide whose semantics are acceptable and whose is not.

I would argue that ECMA-404 should define the least restrictive reasonable semantics: the semantics should not treat as identical any values that higher layers might reasonably want to treat as distinct. This is not the one, true JSON semantics: it is merely a semantic layer on which other higher-level semantic layers can in turn be built. I don't think it's so hard to define this:

  1. a value is an object, array, number, string, boolean or null.

  2. an object is an ordered sequence of <string, value> pairs

  3. an array is an ordered sequence of values

  4. a string is an ordered sequence of Unicode code points

Item 2 maybe surprising to some people, but there's not really much choice given that JS preserves the order of object keys. The difficult case is number. But even with number, I would argue that there are clearly some lexical values that can uncontroversially be specified to be equivalent (for example, 1e1 with 1E1 or 1e1 with 1e+1). A set of decisions on lexical equivalence effectively determines a value space for numbers. For example, you might reasonably decide that two values are equivalent if they represent real numbers with the same mathematical value.

If ECMA-404 doesn't provide such a semantic layer, it becomes quite challenging for higher-level language bindings to specify their semantics in a truly rigorous way. Take strings for example. I think by far the cleanest way to rigorously define a mapping from string tokens to sequences of code points is to have a BNF and a syntax-directed mapping as the ECMAScript spec does very nicely in 7.8.4 ( www.ecma-international.org/ecma-262/5.1/#sec-7.8.4). If ECMA-404 provides merely a syntax and a specification of string equivalence, it becomes quite a challenge to draft a specification that somehow expresses the mapping while still normatively relying on the ECMA-404 spec for the syntax. What will happen in practice is that these higher level mapping will not be specified rigorously.

I think ECMA-404 would be significantly more useful for its intended purpose if it provided the kind of semantics I am suggesting.

I know XML is not very fashionable these days but we have a couple of decades of experience with XML and SGML which I think do have some relevance to a specification of "structured data interchange between programming language". One conclusion I would draw from this experience is that the concept of an XML Infoset or something like it is very useful. Most users of XML deal with higher-level semantic abstractions rather than directly with the XML Infoset, but it has proven very useful to be able to specify these higher-level semantic abstractions in terms of the XML Infoset rather than having to specify them directly in terms of the XML syntax. Another conclusion I would draw is that it would have worked much better to integrate the XML Infoset specification into the main XML specification. The approach of having a separate XML Infoset specification has meant that there is no proper rigorous specification how to map from the XML syntax to the XML Infoset (it seems to be assumed to be so obvious that it does not need stating). I tried an integrated approach of specifying the syntax and data model together in the MicroXML spec ( dvcs.w3.org/hg/microxml/raw-file/tip/spec/microxml.html), and I think it works much better. The current approach of ECMA-404 is a bit like that of the XML Recommendation: it pretends at times to be just specifying when a sequence of code points is valid, and yet the specification contains a fairly random selection of statements of how a valid sequence should be interpreted.

James

# Carsten Bormann (10 years ago)

On 10 Dec 2013, at 01:32, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

Stylistic issues

Well, for 4627bis, we have tools that allowed us to fuzz the ABNF against a set of existing JSON implementations. This is the kind of care I expect from spec writers. Nobody has fessed up to having done equivalent work for ECMA-404. Matter of style? Yes, but in quite another sense.

Grüße, Carsten

# Carsten Bormann (10 years ago)

On 10 Dec 2013, at 07:52, James Clark <jjc at jclark.com> wrote:

Infoset

Not a bad idea to lead us out of this quagmire.

So a JSON infoset would capture a processed AST, but not yet the transformation to the data model level.

JSON implementations would create the JSON data model from that infoset (typically without actually reifying the latter as an AST), and JSON extensions like ECMAscript's would be free to do whatever they want. It is just important to distinguish the two, so people don’t confuse the data model with the infoset, or think that a JSON implementation needs to provide access to the infoset.

Re the infoset for JSON numbers: That is clearly a rational, expressed as a pair of two integers: a numerator and a (power of ten) denominator. (JSON cannot express any other rationals, or any irrationals for that matter.)

1.23 is [123, 100] 1.5 is [15, 10] 1e4 is [10000, 1] 1e-4 is [1, 10000]

Now one could argue whether the infoset should distinguish 1 and 1.0. Naively, that would be 1 is [1, 1] 1.0 is [10, 10] I’d argue that you want to reduce toward the denominator being the minimal power of ten, i.e. 1 is [1, 1] 1.0 is [1, 1] 1.5 is [15, 10]

Grüße, Carsten

# James Clark (10 years ago)

On Tue, Dec 10, 2013 at 4:12 PM, Carsten Bormann <cabo at tzi.org> wrote:

So a JSON infoset would capture a processed AST, but not yet the

transformation to the data model level.

JSON implementations would create the JSON data model from that infoset (typically without actually reifying the latter as an AST), and JSON extensions like ECMAscript's would be free to do whatever they want. It is just important to distinguish the two, so people don’t confuse the data model with the infoset, or think that a JSON implementation needs to provide access to the infoset.

I agree it would reduce confusion to use a different term for the infoset versus the data model. "Infoset"/"data model" is one possible choice of terms, though I wonder whether the XML heritage of "infoset" might be off putting to many. Another possibility would be "abstract data model"/"concrete data model".

I’d argue that you want to reduce toward the denominator being the minimal power of ten, i.e. 1 is [1, 1] 1.0 is [1, 1] 1.5 is [15, 10]

That would be my preference too.

The only thing that makes me hesitate is that I could imagine implementations that distinguish integers and floats, and use C-style rules to distinguish the two. For example, 1 is an integer but 1.0 or 1e0 is a float. I don't know whether any such implementations exist.

James

# Carsten Bormann (10 years ago)

On 10 Dec 2013, at 07:52, James Clark <jjc at jclark.com> wrote:

Most users of XML deal with higher-level semantic abstractions rather than directly with the XML Infoset, but it has proven very useful to be able to specify these higher-level semantic abstractions in terms of the XML Infoset rather than having to specify them directly in terms of the XML syntax.

The XML infoset is very much tied to the needs (and idiosyncrasies) of the serialization that XML uses. There are many ways this infoset is mapped into the data model used by an XML-based application.

The main innovation of JSON was to actually supply such a data model as part of the format. I would argue that his property was what made JSON “win” over XML.

Turning back the clock and trying to use JSON as a conveyer of an infoset instead of using it with its data model could be considered unproductive. On the other hand, some people want to do alternative data models with the JSON syntax, so maybe standardization has to cater for that.

One of the reasons many people react so violently to such a proposal is that it is bound to cause confusion that these alternative data models are now also “JSON data models”, reducing the value of the JSON data model as the hingepin of interoperability.

I don’t know how to counteract that confusion while also enabling the use of alternative data models by the definition of the infoset. But maybe we can find a way.

Grüße, Carsten

# Carsten Bormann (10 years ago)

On 10 Dec 2013, at 12:39, James Clark <jjc at jclark.com> wrote:

The only thing that makes me hesitate is that I could imagine implementations that distinguish integers and floats, and use C-style rules to distinguish the two. For example, 1 is an integer but 1.0 or 1e0 is a float. I don't know whether any such implementations exist.

Absolutely, they do, and they all differ in how exactly they do the distinction. www.ietf.org/mail-archive/web/json/current/msg01523.html

This is a cause of real interoperability problems.

The question is how to find out of that maze of different interpretations. There is no way this can be done so that none of them “breaks”.

It may seem natural to stick to the way numbers are interpreted in many programming languages that distinguish floating point values from integer values. However, JavaScript doesn’t so it can’t supply guidance. And that leads to exactly the problem documented in jira.talendforge.org/browse/TDI-26517 — interoperability broken when a non-distinguishing sender accidentally chooses the representation that triggers the wrong behavior at the receiver.

It is probably better to suggest handling 1.0 as 1.

When we are done with that, there is still negative zero. www.ietf.org/mail-archive/web/json/current/msg01661.html

Grüße, Carsten

# Bjoern Hoehrmann (10 years ago)
  • Allen Wirfs-Brock wrote:

On Dec 9, 2013, at 5:40 PM, Bjoern Hoehrmann wrote:

If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other discussion of JSON and a clear indication that future editions will not add such discussion, and will not change the grammar without IETF con- sensus, I would be willing to entertain the idea of making ECMA-404 a normative reference.

The second paragraph is speaking about the language described by the grammar, not the actual formalism used to express the grammar. I'm quite sure that there is no interest at all within TC39 to ever change the actual JSON language. If you are looking for some sort of contractual commitment from ECMA, I suspect you are wasting your time. Does the IETF make such commitments?

As you know, the charter of the JSON Working Group says

The resulting document will be jointly published as an RFC and by ECMA. ECMA participants will be participating in the working group editing through the normal process of working group participation.
The responsible AD will coordinate the approval process with ECMA so that the versions of the document that are approved by each body are the same.

If things had gone according to plan, it seems likely that Ecma would have requested the IANA registration for application/json jointly lists the IETF and Ecma International has holding Change Control over it, and it seems unlikely there would have been much disagreement about that.

It is normal to award change control to other organisations, for instance, RFC 3023 gives change control for the XML media types to the W3C. I can look up examples for jointly held change control if that would help.

And no, I am not looking for an enforceable contract, just a clear formal decision and statement.

This doesn't mean that TC39 would necessarily agree to eliminate the Syntax Diagrams, or that we wouldn't carefully audit any grammar contribution to make sure that it is describing the same language.
There may also be minor issues that need to be resolved. But we seem to agree that we already are both accurately describing the same language so this is really about notational agreement.

Having non-normative syntax diagrams in addition to the ABNF grammar would be fine if they can automatically be generated from the ABNF.

I was talking about removing most of the prose, leaving only boiler- plate, a very short introduction, and references. Then it would be a specification of only the syntax and most technical concerns would be addressed on both sides. If you see this as a viable way forward, then I think the JSON WG should explore this option further.

As a base line, ECMA-404 was created in less than a week. It takes a couple months to push through a letter ballot to above a revised standard.

The RFC4627bis draft could be approved and be held for normatives re- ferences to materialise; this is not uncommon for IETF standards. It usually takes a couple of months for the RFC editor to process the document anyway, so personally a couple of months of waiting for a revised edition of ECMA-404 would be okay with me.

# Allen Wirfs-Brock (10 years ago)

On Dec 9, 2013, at 10:52 PM, James Clark wrote:

On Fri, Dec 6, 2013 at 2:51 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

The static semantics of a language are a set of rules that further restrict which sequences of symbols form valid statements within the language. For example, a rule that the 'member' names must be disjoint within an 'object' production could be a static semantic rule (however, there is intentionally no such rule in ECMA-404).

The line between syntax and static semantics can be fuzzy. Static semantic rules are typically used to express rules that cannot be technically expressed using the chosen syntactic formalism or rules which are simply inconvenient to express using that formalism. For example, the editor of ECMA-404 chose to simplify the RR track expression of the JSON syntax by using static semantic rules for whitespace rather than incorporating them into RR diagrams.

Another form of static semantic rules are equivalences that state when two or more different sequences of symbols must be considered as equivalent. For example, the rules that state equivalencies between escape sequences and individual code points within an JSON 'string'. Such equivalences are not strictly necessary at this level, but it it simplifies the specification of higher level semantics if equivalent symbol sequences can be normalized at this level of specification.

When we talk about the "semantics" of a language (rather than "static semantics") we are talking about attributing meaning (in some domain and context) to well-formed (as specified via syntax and static semantics) statements expressed in that language. ... What we can do, is draw a bright-line just above the level of static semantics.This is what ECMA-404 attempts to do.

I don't see how you can accommodate the second kind of static semantic rule within the definition of conformance that you have chosen for ECMA-404. Section 2 defines conformance in terms of whether a sequence of Unicode code points conforms to the grammar. This doesn't even accommodate the first kind of static semantic rule, but it is obviously easy to extend it so that it does. However, to accommodate the second kind of static semantic rule, you would need a notion of conformance that deals with how conforming parsers interpret a valid sequence of code points.

Well, its certainly is a nit to pick, but in context I interpret the term "grammar" as used in clause 2 (and also the Introduction) as meaning the full normative content of clauses 4 to 9. This includes the actual CFG specification and the associated static semantic rules.

The notion of a conforming parser could be added, I less sure that it is really necessary. We don't even need to consider string escapes to get into the issue of equivalent JSON texts as it also exists because of optional white space.

I think it is coherent to draw a bright-line just above the first level of static semantics. If you did that, then most of the prose of section 9 (on Strings) would have to be removed; but this would be rather inconvenient, because most specifications of higher-level semantics would end up having to specify it themselves.

I generally agree with this, including the convenience perspective. It essentially also applies to the decimal interpretation of numbers. There is an argument to be made that both should just be discussed informatively and leave to higher level semantic specs. to make those interpretation normative.

However, I find it hard to see any bright-line above the second level of static semantics and below semantics generally. Let's consider section 9. I would argue that this section should define a "semantics" for string tokens, by defining a mapping from sequences of code points matching the production string (what I would call the "lexical space") into arbitrary sequences of code points (what I would call the "value space"). The spec sometimes seems to be doing this and sometimes seems to be doing something more like your second kind of static semantics. Sometimes it uses the term "code point" or "character" to refer to code points in the lexical space ("A string is a sequence of Unicode code points wrapped with quotation marks"), and sometimes it uses those terms to refer to code points in the value space ("Any code point may be represented as a hexadecimal number"). You could redraft so that it was expressed purely in terms of code points in the lexical space, but that would be awkward and unnatural: for example, an hexadecimal escape would represent either one or two code points in the lexical space. Furthermore I don't see what you would gain by this. Once you talk about equivalences between sequences, you are into semantics and you need a richer notion of conformance.

Generally agree. We are probably seeing some editorial confusion as feedback (including mine) was integrated into the editor's initial draft. This can all be improved in a subsequent edition

So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid describing JSON beyond the level of static semantics.

ECMA-404 see JSON as "a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications".

There are many possible semantics and categories of semantics that can be applied to well-formed statements expressed using the JSON syntax. ...

The problem with trying to standardize JSON semantics is that the various semantics that can be usefully be imposed upon JSON are often mutually incompatible with each other. At a trivial level, we see this with issues like the size of numbers or duplicate object member keys. It is very hard to decide whose semantics are acceptable and whose is not.

I would argue that ECMA-404 should define the least restrictive reasonable semantics: the semantics should not treat as identical any values that higher layers might reasonably want to treat as distinct. This is not the one, true JSON semantics: it is merely a semantic layer on which other higher-level semantic layers can in turn be built. I don't think it's so hard to define this:

  1. a value is an object, array, number, string, boolean or null.
  2. an object is an ordered sequence of <string, value> pairs
  3. an array is an ordered sequence of values
  4. a string is an ordered sequence of Unicode code points

Indeed, this aligns very well with my perspective

Item 2 maybe surprising to some people, but there's not really much choice given that JS preserves the order of object keys. The difficult case is number. But even with number, I would argue that there are clearly some lexical values that can uncontroversially be specified to be equivalent (for example, 1e1 with 1E1 or 1e1 with 1e+1). A set of decisions on lexical equivalence effectively determines a value space for numbers. For example, you might reasonably decide that two values are equivalent if they represent real numbers with the same mathematical value.

If ECMA-404 doesn't provide such a semantic layer, it becomes quite challenging for higher-level language bindings to specify their semantics in a truly rigorous way. Take strings for example. I think by far the cleanest way to rigorously define a mapping from string tokens to sequences of code points is to have a BNF and a syntax-directed mapping as the ECMAScript spec does very nicely in 7.8.4 (www.ecma-international.org/ecma-262/5.1/#sec-7.8.4). If ECMA-404 provides merely a syntax and a specification of string equivalence, it becomes quite a challenge to draft a specification that somehow expresses the mapping while still normatively relying on the ECMA-404 spec for the syntax. What will happen in practice is that these higher level mapping will not be specified rigorously.

I think ECMA-404 would be significantly more useful for its intended purpose if it provided the kind of semantics I am suggesting.

I know XML is not very fashionable these days but we have a couple of decades of experience with XML and SGML which I think do have some relevance to a specification of "structured data interchange between programming language". One conclusion I would draw from this experience is that the concept of an XML Infoset or something like it is very useful. Most users of XML deal with higher-level semantic abstractions rather than directly with the XML Infoset, but it has proven very useful to be able to specify these higher-level semantic abstractions in terms of the XML Infoset rather than having to specify them directly in terms of the XML syntax. Another conclusion I would draw is that it would have worked much better to integrate the XML Infoset specification into the main XML specification. The approach of having a separate XML Infoset specification has meant that there is no proper rigorous specification how to map from the XML syntax to the XML Infoset (it seems to be assumed to be so obvious that it does not need stating). I tried an integrated approach of specifying the syntax and data model together in the MicroXML spec (dvcs.w3.org/hg/microxml/raw-file/tip/spec/microxml.html), and I think it works much better. The current approach of ECMA-404 is a bit like that of the XML Recommendation: it pretends at times to be just specifying when a sequence of code points is valid, and yet the specification contains a fairly random selection of statements of how a valid sequence should be interpreted.

Thank you, this is very useful feedback. Would you mind submit this as a bug report against ECMA-404 at bugs.ecmascript.org ? I can do it, but community feedback is important and I'd like to to be on the CC list for the bug.

# Allen Wirfs-Brock (10 years ago)

On Dec 10, 2013, at 2:08 AM, Martin J. Dürst wrote:

On 2013/12/10 9:32, Allen Wirfs-Brock wrote:

...

Specs. can have both technical and editorial bugs. If you think there are bugs in ECMA-404 the best thing to do is to submit a bug ticket at bugs.ecmascript.org. If there is a critical bug that you think prevents 4627bis from normatively referencing ECMA-404 say so and assign the bug a high priority in the initial ticket. But please, start with actual errors, ambiguities, inconsistencies, or similar substantive issue. Stylistic issues won't be ignore but they are less important and harder to reach agreement on.

I'll submit some. What about the ECMA people submitting some bug reports on 4627bis in return?

Is there a bug tracking system or are perceived bugs simply submitted to the mailing list.

I'm some of the friction around here is simply a mater of poorly understood processes and differing social conventions.

# Allen Wirfs-Brock (10 years ago)

On Dec 10, 2013, at 3:08 PM, Bjoern Hoehrmann wrote:

  • Allen Wirfs-Brock wrote:

On Dec 9, 2013, at 5:40 PM, Bjoern Hoehrmann wrote:

If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other discussion of JSON and a clear indication that future editions will not add such discussion, and will not change the grammar without IETF con- sensus, I would be willing to entertain the idea of making ECMA-404 a normative reference.

The second paragraph is speaking about the language described by the grammar, not the actual formalism used to express the grammar. I'm quite sure that there is no interest at all within TC39 to ever change the actual JSON language. If you are looking for some sort of contractual commitment from ECMA, I suspect you are wasting your time. Does the IETF make such commitments?

As you know, the charter of the JSON Working Group says

The resulting document will be jointly published as an RFC and by ECMA. ECMA participants will be participating in the working group editing through the normal process of working group participation.
The responsible AD will coordinate the approval process with ECMA so that the versions of the document that are approved by each body are the same.

If things had gone according to plan, it seems likely that Ecma would have requested the IANA registration for application/json jointly lists the IETF and Ecma International has holding Change Control over it, and it seems unlikely there would have been much disagreement about that.

It is normal to award change control to other organisations, for instance, RFC 3023 gives change control for the XML media types to the W3C. I can look up examples for jointly held change control if that would help.

And no, I am not looking for an enforceable contract, just a clear formal decision and statement.

Obviously, the originally envisioned process broke down, but I don't think we need to discuss that right here, right now.

It isn't clear to me that TC39 is particularly interested in holding changing control for the application/json media type just like it apparently doesn't have change control for the application/ecmascript or application/javascript. In practice those registrations simply have not been of particular concern. Maybe they should be. Does anybody actually lookup the application/javascript media type actually think that the relevant reference is still Netscape Communications Corp., "Core JavaScript Reference 1.5", September 2000

TC39's concern seems to be both narrower (just the JSON syntax and static semantics, not wire encodings) and wider (implementations that aren't tied to the application/json media type) than the JSON WG's. I know that the TC39 consensus is that ECMA-404 (probably with some revision) should be serviceable as a foundation for other specs that address other issues.

This doesn't mean that TC39 would necessarily agree to eliminate the Syntax Diagrams, or that we wouldn't carefully audit any grammar contribution to make sure that it is describing the same language.
There may also be minor issues that need to be resolved. But we seem to agree that we already are both accurately describing the same language so this is really about notational agreement.

Having non-normative syntax diagrams in addition to the ABNF grammar would be fine if they can automatically be generated from the ABNF.

I was talking about removing most of the prose, leaving only boiler- plate, a very short introduction, and references. Then it would be a specification of only the syntax and most technical concerns would be addressed on both sides. If you see this as a viable way forward, then I think the JSON WG should explore this option further.

I agree, this sounds plausible to me.

As a base line, ECMA-404 was created in less than a week. It takes a couple months to push through a letter ballot to above a revised standard.

The RFC4627bis draft could be approved and be held for normatives re- ferences to materialise; this is not uncommon for IETF standards. It usually takes a couple of months for the RFC editor to process the document anyway, so personally a couple of months of waiting for a revised edition of ECMA-404 would be okay with me.

I don't see why we should be about to mutually resolve this.

# Allen Wirfs-Brock (10 years ago)

(important typo correction in last paragraph) On Dec 10, 2013, at 3:08 PM, Bjoern Hoehrmann wrote:

  • Allen Wirfs-Brock wrote:

On Dec 9, 2013, at 5:40 PM, Bjoern Hoehrmann wrote:

If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other discussion of JSON and a clear indication that future editions will not add such discussion, and will not change the grammar without IETF con- sensus, I would be willing to entertain the idea of making ECMA-404 a normative reference.

The second paragraph is speaking about the language described by the grammar, not the actual formalism used to express the grammar. I'm quite sure that there is no interest at all within TC39 to ever change the actual JSON language. If you are looking for some sort of contractual commitment from ECMA, I suspect you are wasting your time. Does the IETF make such commitments?

As you know, the charter of the JSON Working Group says

The resulting document will be jointly published as an RFC and by ECMA. ECMA participants will be participating in the working group editing through the normal process of working group participation.
The responsible AD will coordinate the approval process with ECMA so that the versions of the document that are approved by each body are the same.

If things had gone according to plan, it seems likely that Ecma would have requested the IANA registration for application/json jointly lists the IETF and Ecma International has holding Change Control over it, and it seems unlikely there would have been much disagreement about that.

It is normal to award change control to other organisations, for instance, RFC 3023 gives change control for the XML media types to the W3C. I can look up examples for jointly held change control if that would help.

And no, I am not looking for an enforceable contract, just a clear formal decision and statement.

Obviously, the originally envisioned process broke down, but I don't think we need to discuss that right here, right now.

It isn't clear to me that TC39 is particularly interested in holding changing control for the application/json media type just like it apparently doesn't have change control for the application/ecmascript or application/javascript. In practice those registrations simply have not been of particular concern. Maybe they should be. Does anybody actually lookup the application/javascript media type actually think that the relevant reference is still Netscape Communications Corp., "Core JavaScript Reference 1.5", September 2000

TC39's concern seems to be both narrower (just the JSON syntax and static semantics, not wire encodings) and wider (implementations that aren't tied to the application/json media type) than the JSON WG's. I know that the TC39 consensus is that ECMA-404 (probably with some revision) should be serviceable as a foundation for other specs that address other issues.

This doesn't mean that TC39 would necessarily agree to eliminate the Syntax Diagrams, or that we wouldn't carefully audit any grammar contribution to make sure that it is describing the same language.
There may also be minor issues that need to be resolved. But we seem to agree that we already are both accurately describing the same language so this is really about notational agreement.

Having non-normative syntax diagrams in addition to the ABNF grammar would be fine if they can automatically be generated from the ABNF.

I was talking about removing most of the prose, leaving only boiler- plate, a very short introduction, and references. Then it would be a specification of only the syntax and most technical concerns would be addressed on both sides. If you see this as a viable way forward, then I think the JSON WG should explore this option further.

I agree, this sounds plausible to me.

As a base line, ECMA-404 was created in less than a week. It takes a couple months to push through a letter ballot to above a revised standard.

The RFC4627bis draft could be approved and be held for normatives re- ferences to materialise; this is not uncommon for IETF standards. It usually takes a couple of months for the RFC editor to process the document anyway, so personally a couple of months of waiting for a revised edition of ECMA-404 would be okay with me.

I don't see why we >shouldn't< be about to mutually resolve this.

# Allen Wirfs-Brock (10 years ago)

On Dec 10, 2013, at 9:28 PM, James Clark wrote:

On Wed, Dec 11, 2013 at 11:59 AM, R S <sayrer at gmail.com> wrote: On Tue, Dec 10, 2013 at 8:44 PM, James Clark <jjc at jclark.com> wrote: For example, the ECMA spec needs the order of key/value pairs in an object to be significant

I don't think ECMA-404 has such a requirement. In fact, it says otherwise in at least two places.

Earlier in this discussion I believe AWB said he thought that ECMA-404 shouldn't say so.

No, currently ECMA-404 intentionally does not say that key/value pairs are unordered because there is clearly an ordering to them in a JSON text. it does't say one way or another whether any downstream semantics is derived from the ordering.

In JavaScript, objects do preserve the order of keys, and many programs rely on this. ECMAScript 5 doesn't specify this, but I was under the impression that TC39 plans to fix this.

There is a de facto standard for the ordering of the most common cases. At some point this ordering will probably be included in ECMA-262

The ES6 draft currently depends on ECMAScript evaluation semantics to determine the value represented by a JSON text.

At this point in time, this is just for economy of specification text as we know that for syntactically valid JSON input that object literal evaluation produces the same result that JSON.parse needs to produce . In both cases, source code key/value pairs are processed in left-to-right source code order and constructs an object whose property enumeration order is derived from the processing order.

So assuming TC39 fixes ECMAScript at some point to match the reality on the Web, JSON.parse will specified to preserve order. I believe current implementations of JSON.parse already do so.

yes and even if we changed how we expressed the specification for JSON.parse it would still have to preserve the current de facto ordering.

Maybe I am worrying unnecessarily about this. It would certainly make life simpler to be able to say JSON objects are unordered collections without qualification. But I suspect that this doesn't reflect reality on the Web.

and you'd be correct. I think you could get away with recommending that applications not depend upon the ordering but i doubt that web developers would pay attention. ECMAScript stated out trying to not define a specific enumeration order for object properties. But developers still wrote code that dependent upon the ordering they observed in the wild and interoperability pressures have generally forced convergence upon the ordering used by the browsers with the most market share.

The way to deal with this is by careful spec layering.

An alternative way to deal with this is to write less in the IETF spec, and use a normative reference to ECMA-404.

That is exactly the kind of layering I think we need.

It's probably obvious that I'd also agree with that.

# Allen Wirfs-Brock (10 years ago)

On Dec 10, 2013, at 3:08 PM, Bjoern Hoehrmann wrote:

... If things had gone according to plan, it seems likely that Ecma would have requested the IANA registration for application/json jointly lists the IETF and Ecma International has holding Change Control over it, and it seems unlikely there would have been much disagreement about that.

It is normal to award change control to other organisations, for instance, RFC 3023 gives change control for the XML media types to the W3C. I can look up examples for jointly held change control if that would help.

Speaking of media types. I just noticed that RFC 4329 seems painfully out of date WRT application/ecmascript and application/javascript and the langauge specification that defines the payload.

Perhaps this is another area that needs coordination.