JSON.canonicalize()

# Anders Rundgren (7 years ago)

Dear List,

Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is described in: cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html

Current workspace: cyberphone/json-canonicalization

Thanx, Anders Rundgren

Dear List,

Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is described in:
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html

Current workspace: https://github.com/cyberphone/json-canonicalization

Thanx,
Anders Rundgren

# C. Scott Ananian (7 years ago)

See wiki.laptop.org/go/Canonical_JSON -- you should probably at least mention unicode normalization of strings. You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.

See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
least mention unicode normalization of strings.  You probably should also
specify a validator: it doesn't matter if you emit canonical JSON if you
can tweak the hash of the value by feeding non-canonical JSON as an input.
  --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> Dear List,
>
> Here is a proposal that I would be very happy getting feedback on since it
> builds on ES but is not (at all) limited to ES.
>
> The request is for a complement to the ES "JSON" object called
> canonicalize() which would have identical parameters to the existing
> stringify() method.
>
> The JSON canonicalization scheme (including ES code for emulating it), is
> described in:
> https://cyberphone.github.io/doc/security/draft-rundgren-jso
> n-canonicalization-scheme.html
>
> Current workspace: https://github.com/cyberphone/json-canonicalization
>
> Thanx,
> Anders Rundgren
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/c1667ec4/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 08:52, C. Scott Ananian wrote:

See wiki.laptop.org/go/Canonical_JSON -- you should probably at least mention unicode normalization of strings.

Yes, I could add that unicode normalization of strings is out of scope for this specification.

You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx, Anders

On 2018-03-16 08:52, C. Scott Ananian wrote:
> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least
> mention unicode normalization of strings. 

Yes, I could add that unicode normalization of strings is out of scope for this specification.


> You probably should also specify a validator: it doesn't matter if you emit 
> canonical JSON if you can tweak the hash of the value by feeding non-canonical 
> JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders


>    --scott
> 
> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     Dear List,
> 
>     Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
> 
>     The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
> 
>     The JSON canonicalization scheme (including ES code for emulating it), is described in:
>     https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>
> 
>     Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>
> 
>     Thanx,
>     Anders Rundgren
>     _______________________________________________
>     es-discuss mailing list
>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>     https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
> 
>

# C. Scott Ananian (7 years ago)

Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

However, unicode normalization allows multiple representations of "the same" string, which defeats this security property. Depending on your implementation language and use, a string with precomposed accepts could compare equal to a string with separated accents, even though the canonical JSON or hash differed. In an extreme case (with a weak hash function, say MD5), this can be used to break security by re-encoding all strings in multiple variants until a collision is found. This is just a slight variant on the fact that JSON allows multiple ways to encode a character using escape sequences. You've already taken the trouble to disambiguate this case; security-conscious applications should take care to perform unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in "canonical JSON" format, then an attacker can try to create collisions by violating the rules of canonical JSON format, whether by using different escape sequences, adding whitespace, etc. This can be used to make JSON which is "the same" appear "different", violating the intent of the canonicalization. Any security application of canonical JSON will require a strict mode for JSON.parse() as well as a strict mode for JSON.stringify().

Canonical JSON is often used to imply a security property: two JSON blobs
with identical contents are expected to have identical canonical JSON forms
(and thus identical hashed values).

However, unicode normalization allows multiple representations of "the
same" string, which defeats this security property.  Depending on your
implementation language and use, a string with precomposed accepts could
compare equal to a string with separated accents, even though the canonical
JSON or hash differed.  In an extreme case (with a weak hash function, say
MD5), this can be used to break security by re-encoding all strings in
multiple variants until a collision is found.  This is just a slight
variant on the fact that JSON allows multiple ways to encode a character
using escape sequences.  You've already taken the trouble to disambiguate
this case; security-conscious applications should take care to perform
unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in
"canonical JSON" format, then an attacker can try to create collisions by
violating the rules of canonical JSON format, whether by using different
escape sequences, adding whitespace, etc.  This can be used to make JSON
which is "the same" appear "different", violating the intent of the
canonicalization.  Any security application of canonical JSON will require
a strict mode for JSON.parse() as well as a strict mode for
JSON.stringify().
  --scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-16 08:52, C. Scott Ananian wrote:
>
>> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
>> least
>> mention unicode normalization of strings.
>>
>
> Yes, I could add that unicode normalization of strings is out of scope for
> this specification.
>
>
> You probably should also specify a validator: it doesn't matter if you
>> emit canonical JSON if you can tweak the hash of the value by feeding
>> non-canonical JSON as an input.
>>
>
> Pardon me, but I don't understand what you are writing here.
>
> Hash functions only "raison d'être" are providing collision safe checksums.
>
> thanx,
> Anders
>
>
>    --scott
>>
>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>> wrote:
>>
>>     Dear List,
>>
>>     Here is a proposal that I would be very happy getting feedback on
>> since it builds on ES but is not (at all) limited to ES.
>>
>>     The request is for a complement to the ES "JSON" object called
>> canonicalize() which would have identical parameters to the existing
>> stringify() method.
>>
>>     The JSON canonicalization scheme (including ES code for emulating
>> it), is described in:
>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>> n-canonicalization-scheme.html <https://cyberphone.github.io/
>> doc/security/draft-rundgren-json-canonicalization-scheme.html>
>>
>>     Current workspace: https://github.com/cyberphone/
>> json-canonicalization <https://github.com/cyberphone
>> /json-canonicalization>
>>
>>     Thanx,
>>     Anders Rundgren
>>     _______________________________________________
>>     es-discuss mailing list
>>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>>     https://mail.mozilla.org/listinfo/es-discuss <
>> https://mail.mozilla.org/listinfo/es-discuss>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/4a4664c8/attachment.html>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net>

wrote:

Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers? JSON intentionally avoids specifying any precision for numbers.

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision? I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?

However, unicode normalization allows multiple representations of "the same" string, which defeats this security property. Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data. JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.

and use, a string with precomposed accepts could compare equal to a string with separated accents, even though the canonical JSON or hash differed. In an extreme case (with a weak hash function, say MD5), this can be used to break security by re-encoding all strings in multiple variants until a collision is found. This is just a slight variant on the fact that JSON allows multiple ways to encode a character using escape sequences. You've already taken the trouble to disambiguate this case; security-conscious applications should take care to perform unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in "canonical JSON" format, then an attacker can try to create collisions by violating the rules of canonical JSON format, whether by using different escape sequences, adding whitespace, etc. This can be used to make JSON which is "the same" appear "different", violating the intent of the canonicalization. Any security application of canonical JSON will require a strict mode for JSON.parse() as well as a strict mode for JSON.stringify().

Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't endpoints be re-canonicalizing before hashing anyway? Why would one want to ship the canonical form over the wire if it loses precision?

--scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:
On 2018-03-16 08:52, C. Scott Ananian wrote:

See wiki.laptop.org/go/Canonical_JSON -- you should probably at least mention unicode normalization of strings.

Yes, I could add that unicode normalization of strings is out of scope for this specification.

You probably should also specify a validator: it doesn't matter if you

emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx, Anders

--scott
On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren < anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
Dear List,

Here is a proposal that I would be very happy getting feedback on
since it builds on ES but is not (at all) limited to ES.
The request is for a complement to the ES "JSON" object called
canonicalize() which would have identical parameters to the existing stringify() method.

Why should canonicalize take a replacer? Hasn't replacement already happened?

On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net>
wrote:

> Canonical JSON is often used to imply a security property: two JSON blobs
> with identical contents are expected to have identical canonical JSON forms
> (and thus identical hashed values).
>

What does "identical contents" mean in the context of numbers?  JSON
intentionally avoids specifying any precision for numbers.

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision?
I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?





> However, unicode normalization allows multiple representations of "the
> same" string, which defeats this security property.  Depending on your
> implementation language
>

We shouldn't normalize unicode in strings that contain packed binary data.
JSON strings are strings of UTF-16 code-units, not Unicode scalar values
and any system that assumes the latter will break often.


> and use, a string with precomposed accepts could compare equal to a string
> with separated accents, even though the canonical JSON or hash differed.
> In an extreme case (with a weak hash function, say MD5), this can be used
> to break security by re-encoding all strings in multiple variants until a
> collision is found.  This is just a slight variant on the fact that JSON
> allows multiple ways to encode a character using escape sequences.  You've
> already taken the trouble to disambiguate this case; security-conscious
> applications should take care to perform unicode normalization as well, for
> the same reason.
>
> Similarly, if you don't offer a verifier to ensure that the input is in
> "canonical JSON" format, then an attacker can try to create collisions by
> violating the rules of canonical JSON format, whether by using different
> escape sequences, adding whitespace, etc.  This can be used to make JSON
> which is "the same" appear "different", violating the intent of the
> canonicalization.  Any security application of canonical JSON will require
> a strict mode for JSON.parse() as well as a strict mode for
> JSON.stringify().
>

Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't
endpoints be re-canonicalizing before hashing anyway?  Why would one want
to ship the canonical form over the wire if it loses precision?



>   --scott
>
> On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <
> anders.rundgren.net at gmail.com> wrote:
>
>> On 2018-03-16 08:52, C. Scott Ananian wrote:
>>
>>> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
>>> least
>>> mention unicode normalization of strings.
>>>
>>
>> Yes, I could add that unicode normalization of strings is out of scope
>> for this specification.
>>
>>
>> You probably should also specify a validator: it doesn't matter if you
>>> emit canonical JSON if you can tweak the hash of the value by feeding
>>> non-canonical JSON as an input.
>>>
>>
>> Pardon me, but I don't understand what you are writing here.
>>
>> Hash functions only "raison d'être" are providing collision safe
>> checksums.
>>
>> thanx,
>> Anders
>>
>>
>>    --scott
>>>
>>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>>> wrote:
>>>
>>>     Dear List,
>>>
>>>     Here is a proposal that I would be very happy getting feedback on
>>> since it builds on ES but is not (at all) limited to ES.
>>>
>>>     The request is for a complement to the ES "JSON" object called
>>> canonicalize() which would have identical parameters to the existing
>>> stringify() method.
>>>
>>
Why should canonicalize take a replacer?  Hasn't replacement already
happened?



>     The JSON canonicalization scheme (including ES code for emulating it),
>>> is described in:
>>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>>> n-canonicalization-scheme.html <https://cyberphone.github.io/
>>> doc/security/draft-rundgren-json-canonicalization-scheme.html>
>>>
>>>     Current workspace: https://github.com/cyberphone/
>>> json-canonicalization <https://github.com/cyberphone
>>> /json-canonicalization>
>>>
>>>     Thanx,
>>>     Anders Rundgren
>>>     _______________________________________________
>>>     es-discuss mailing list
>>>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>>>     https://mail.mozilla.org/listinfo/es-discuss <
>>> https://mail.mozilla.org/listinfo/es-discuss>
>>>
>>>
>>>
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/e7334322/attachment-0001.html>

# C. Scott Ananian (7 years ago)

On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net> wrote:

Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers? JSON intentionally avoids specifying any precision for numbers.

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision? I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?

However, unicode normalization allows multiple representations of "the

same" string, which defeats this security property. Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data. JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.

Both of these points are made on the URL I originally cited: wiki.laptop.org/go/Canonical_JSON

On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <mikesamuel at gmail.com> wrote:
>
>
> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net>
> wrote:
>
>> Canonical JSON is often used to imply a security property: two JSON blobs
>> with identical contents are expected to have identical canonical JSON forms
>> (and thus identical hashed values).
>>
>
> What does "identical contents" mean in the context of numbers?  JSON
> intentionally avoids specifying any precision for numbers.
>
> JSON.stringify(1/3) === '0.3333333333333333'
>
> What would happen with JSON from systems that allow higher precision?
> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>
> However, unicode normalization allows multiple representations of "the
>> same" string, which defeats this security property.  Depending on your
>> implementation language
>>
>
> We shouldn't normalize unicode in strings that contain packed binary
> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
> values and any system that assumes the latter will break often.
>

Both of these points are made on the URL I originally cited:
http://wiki.laptop.org/go/Canonical_JSON
 --scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/cfb5b042/attachment.html>

# Carsten Bormann (7 years ago)

On Mar 16, 2018, at 16:23, Mike Samuel <mikesamuel at gmail.com> wrote:

JSON strings are strings of UTF-16 code-units

No.

(You are confusing this with JavaScript strings.)

Grüße, Carsten

On Mar 16, 2018, at 16:23, Mike Samuel <mikesamuel at gmail.com> wrote:
> 
> JSON strings are strings of UTF-16 code-units

No.

(You are confusing this with JavaScript strings.)

Grüße, Carsten

# Anders Rundgren (7 years ago)

On 2018-03-16 16:38, C. Scott Ananian wrote:

Canonical JSON is often used to imply a security property: two JSON blobs > with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

Right.

However, unicode normalization allows multiple representations of "the same" string, which defeats this security property.

This is an aspect that I believe belongs to the "application" level. This specification is only about "on the wire" format.

Rationale: if this was a part of the SPECIFICATION it would either be ignored (=useless) or be a showstopper (=dead) due to complexity.

If applications using the received data want to address this issue they can for example call msdn.microsoft.com/en-us/library/windows/desktop/dd318671(v=vs.85).aspx and reject if they want.

Or always normalize: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

Depending on your implementation language and use, a string with precomposed accepts could compare equal to a string with separated accents, even though the canonical JSON or hash differed.

I don't want to go there for the reasons mentioned.

In an extreme case (with a weak hash function, say MD5), this can be > used to break security by re-encoding all strings in multiple variants until a collision is found. This is just a slight variant on the fact that JSON allows multiple ways to encode a character using escape sequences. You've already taken the trouble to disambiguate this case; security-conscious applications should take care to perform unicode normalization as well, for the same reason.

If you are able to break the hash function all bets are off anyway because then you can presumably change any part of the object and it would still appear authentic.

Escape normalization: If you don't do this normalization, signatures would typically break and that's not really a "security" (=attacker) problem; it is rather a "nuisance" of the same caliber as a server not responding.

Similarly, if you don't offer a verifier to ensure that the input is in "canonical JSON" format, then an attacker can try to create collisions by violating the rules of canonical JSON format, whether by using different escape sequences, adding whitespace, etc. This can be used to make JSON which is "the same" appear "different", violating the intent of the canonicalization.

Again, if the hash function is broken, there's nothing to do except maybe cry :-(

This a Unicode problem, not a cryptographic problem.

Any security application of canonical JSON will require a strict mode for JSON.parse() as well as a strict mode for JSON.stringify().

Indeed, you ALWAYS must verify that indata conforms to the agreed conventions.

Anyway, feel free pushing a different JSON canonicalization scheme!

Here is another: gibson042.github.io/canonicaljson-spec It claims that you should support "lone surrogates" (invalid Unicode) which for example JDK doesn't. I don't go there either.

Anders

On 2018-03-16 16:38, C. Scott Ananian wrote:
> Canonical JSON is often used to imply a security property: two JSON blobs > with identical contents are expected to have identical canonical JSON
> forms (and thus identical hashed values).

Right.

> However, unicode normalization allows multiple representations
> of "the same" string, which defeats this security property.

This is an aspect that I believe belongs to the "application" level.  This specification is only about "on the wire" format.

Rationale: if this was a part of the SPECIFICATION it would either be ignored (=useless) or be a showstopper (=dead) due to complexity.

If applications using the received data want to address this issue they can for example call
https://msdn.microsoft.com/en-us/library/windows/desktop/dd318671(v=vs.85).aspx
and reject if they want.

Or always normalize: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize


> Depending on your implementation language and use, a string with
> precomposed accepts could compare equal to a string with separated 
> accents, even though the canonical JSON or hash differed.  

I don't want to go there for the reasons mentioned.


> In an extreme case (with a weak hash function, say MD5), this can be  > used to break security by re-encoding all strings in multiple variants
> until a collision is found.  This is just a slight variant on the fact
> that JSON allows multiple ways to encode a character using escape sequences.
> You've already taken the trouble to disambiguate this case; security-conscious
> applications should take care to perform unicode normalization as well, for the same reason.

If you are able to break the hash function all bets are off anyway because then you can presumably change *any* part of the object and it would still appear authentic.

Escape normalization: If you don't do this normalization, signatures would typically break and that's not really a "security" (=attacker) problem; it is rather a "nuisance" of the same caliber as a server not responding.


> Similarly, if you don't offer a verifier to ensure that the input is
> in "canonical JSON" format, then an attacker can try to create collisions 
> by violating the rules of canonical JSON format, whether by using different
> escape sequences, adding whitespace, etc.  This can be used to make JSON which
> is "the same" appear "different", violating the intent of the canonicalization.

Again, if the hash function is broken, there's nothing to do except maybe cry :-(

This a Unicode problem, not a cryptographic problem.


> Any security application of canonical JSON will require a strict mode for 
> JSON.parse() as well as a strict mode for JSON.stringify().

Indeed, you ALWAYS must verify that indata conforms to the agreed conventions.

Anyway, feel free pushing a different JSON canonicalization scheme!

Here is another: http://gibson042.github.io/canonicaljson-spec/
It claims that you should support "lone surrogates" (invalid Unicode) which for example JDK doesn't.
I don't go there either.

Anders

>    --scott
> 
> On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-16 08:52, C. Scott Ananian wrote:
> 
>         See http://wiki.laptop.org/go/Canonical_JSON <http://wiki.laptop.org/go/Canonical_JSON> -- you should probably at least
>         mention unicode normalization of strings.
> 
> 
>     Yes, I could add that unicode normalization of strings is out of scope for this specification.
> 
> 
>         You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.
> 
> 
>     Pardon me, but I don't understand what you are writing here.
> 
>     Hash functions only "raison d'être" are providing collision safe checksums.
> 
>     thanx,
>     Anders
> 
> 
>             --scott
> 
>         On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
> 
>              Dear List,
> 
>              Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
> 
>              The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
> 
>              The JSON canonicalization scheme (including ES code for emulating it), is described in:
>         https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html> <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>>
> 
>              Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization> <https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>>
> 
>              Thanx,
>              Anders Rundgren
>              _______________________________________________
>              es-discuss mailing list
>         es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>
>         https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss> <https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>>
> 
> 
> 
>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 12:44 PM, C. Scott Ananian <ecmascript at cscott.net>

wrote:

On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net

wrote:

Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers? JSON intentionally avoids specifying any precision for numbers.

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision? I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?

However, unicode normalization allows multiple representations of "the

same" string, which defeats this security property. Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data. JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.

Both of these points are made on the URL I originally cited: wiki.laptop.org/go/Canonical_JSON

Thanks, I see """ Floating point numbers are not allowed in canonical JSON. Neither are leading zeros or "minus 0" for integers. """ which answers my question.

I also see """ A previous version of this specification required strings to be valid unicode, and relied on JSON's \u escape. This was abandoned as it doesn't allow representing arbitrary binary data in a string, and it doesn't preserve the identity of non-canonical unicode strings. """ which addresses my question.

I also see """ It is suggested that unicode strings be represented as the UTF-8 encoding of unicode Normalization Form C www.unicode.org/reports/tr15 (UAX #15). However, arbitrary content may be represented as a string: it is not guaranteed that string contents can be meaningfully parsed as UTF-8. """ which seems to be mixing concerns about the wire format used to encode JSON as octets and NFC which would apply to the text of the JSON string.

If that confusion is cleaned up, then it seems a fine subset of JSON to ship over the wire with a JSON content-type.

It is entirely unsuitable to embedding in HTML or XML though. IIUC, with an implementation based on this

JSON.canonicalize(JSON.stringify("</script>")) === "</script>" && JSON.canonicalize(JSON.stringify("]]>")) === "]]>"

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

JSON.canonicalize(JSON.stringify("\u2028\u2029")) === "\u2028\u2029"

It also is not suitable for use internally within systems that internally use cstrings.

JSON.canonicalize(JSON.stringify("\u0000")) === "\u0000"

On Fri, Mar 16, 2018 at 12:44 PM, C. Scott Ananian <ecmascript at cscott.net>
wrote:

> On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <mikesamuel at gmail.com>
> wrote:
>>
>>
>> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net
>> > wrote:
>>
>>> Canonical JSON is often used to imply a security property: two JSON
>>> blobs with identical contents are expected to have identical canonical JSON
>>> forms (and thus identical hashed values).
>>>
>>
>> What does "identical contents" mean in the context of numbers?  JSON
>> intentionally avoids specifying any precision for numbers.
>>
>> JSON.stringify(1/3) === '0.3333333333333333'
>>
>> What would happen with JSON from systems that allow higher precision?
>> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>>
>> However, unicode normalization allows multiple representations of "the
>>> same" string, which defeats this security property.  Depending on your
>>> implementation language
>>>
>>
>> We shouldn't normalize unicode in strings that contain packed binary
>> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
>> values and any system that assumes the latter will break often.
>>
>
> Both of these points are made on the URL I originally cited:
> http://wiki.laptop.org/go/Canonical_JSON
>

Thanks, I see
"""
Floating point numbers are not allowed in canonical JSON. Neither are
leading zeros or "minus 0" for integers.
"""
which answers my question.

I also see
"""
A previous version of this specification required strings to be valid
unicode, and relied on JSON's \u escape. This was abandoned as it doesn't
allow representing arbitrary binary data in a string, and it doesn't
preserve the identity of non-canonical unicode strings.
"""
which addresses my question.

I also see
"""
It is suggested that unicode strings be represented as the UTF-8 encoding
of unicode Normalization Form C <http://www.unicode.org/reports/tr15/> (UAX
#15). However, arbitrary content may be represented as a string: it is not
guaranteed that string contents can be meaningfully parsed as UTF-8.
"""
which seems to be mixing concerns about the wire format used to encode JSON
as octets and NFC which would apply to the text of the JSON string.

If that confusion is cleaned up, then it seems a fine subset of JSON to
ship over the wire with a JSON content-type.

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

  JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
  JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

The output of JSON.canonicalize would also not be in the subset of JSON
that is also a subset of JavaScript's PrimaryExpression.

   JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally
use cstrings.

  JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/d580b538/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though. IIUC, with an implementation based on this

JSON.canonicalize(JSON.stringify("</script>")) === "</script>" && JSON.canonicalize(JSON.stringify("]]>")) === "]]>"

I don't know what you are trying to prove here :-)

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

JSON.canonicalize(JSON.stringify("\u2028\u2029")) === "\u2028\u2029"

It also is not suitable for use internally within systems that internally use cstrings.

JSON.canonicalize(JSON.stringify("\u0000")) === "\u0000"

JSON.canonicalize() would be [almost] identical to JSON.stringify()

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"' // Returns true

"Emulator":

var canonicalize = function(object) {

 var buffer = '';
 serialize(object);
 return buffer;

 function serialize(object) {
     if (object !== null && typeof object === 'object') {
         if (Array.isArray(object)) {
             buffer += '[';
             let next = false;
             object.forEach((element) => {
                 if (next) {
                     buffer += ',';
                 }
                 next = true;
                 serialize(element);
             });
             buffer += ']';
         } else {
             buffer += '{';
             let next = false;
             Object.keys(object).sort().forEach((property) => {
                 if (next) {
                     buffer += ',';
                 }
                 next = true;
                 buffer += JSON.stringify(property);
                 buffer += ':';
                 serialize(object[property]);
             });
             buffer += '}';
         }
     } else {
         buffer += JSON.stringify(object);
     }
 }

};

On 2018-03-16 18:04, Mike Samuel wrote:

> It is entirely unsuitable to embedding in HTML or XML though.
> IIUC, with an implementation based on this
> 
>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

I don't know what you are trying to prove here :-)


> The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.
> 
>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
> 
> It also is not suitable for use internally within systems that internally use cstrings.
> 
>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
> 

JSON.canonicalize() would be [almost] identical to JSON.stringify()

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

     var buffer = '';
     serialize(object);
     return buffer;

     function serialize(object) {
         if (object !== null && typeof object === 'object') {
             if (Array.isArray(object)) {
                 buffer += '[';
                 let next = false;
                 object.forEach((element) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     serialize(element);
                 });
                 buffer += ']';
             } else {
                 buffer += '{';
                 let next = false;
                 Object.keys(object).sort().forEach((property) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     buffer += JSON.stringify(property);
                     buffer += ':';
                     serialize(object[property]);
                 });
                 buffer += '}';
             }
         } else {
             buffer += JSON.stringify(object);
         }
     }
};

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.

IIUC, with an implementation based on this

JSON.canonicalize(JSON.stringify("</script>")) === "</script>" && JSON.canonicalize(JSON.stringify("]]>")) === "]]>"

I don't know what you are trying to prove here :-)

Only that canonical JSON is useful in a very narrow context. It cannot be embedded in an HTML script tag. It cannot be embedded in an XML or HTML foreign content context without extra care. If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

The output of JSON.canonicalize would also not be in the subset of JSON
that is also a subset of JavaScript's PrimaryExpression.
JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
"\u2028\u2029"

It also is not suitable for use internally within systems that internally use cstrings.

JSON.canonicalize(JSON.stringify("\u0000")) === "\u0000"
JSON.canonicalize() would be [almost] identical to JSON.stringify()

You're correct. Many JSON producers have a web-safe version, but the JavaScript builtin does not. My point is that JSON.canonicalize undoes those web-safety tweaks.

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"' // Returns true

"Emulator":

var canonicalize = function(object) {
var buffer = '';
serialize(object);

I thought canonicalize took in a string of JSON and produced the same. Am I wrong? "Canonicalize" to my mind means a function that returns the canonical member of an equivalence class given any member from that same equivalence class, so is always 'a -> 'a.

return buffer;

function serialize(object) {
    if (object !== null && typeof object === 'object') {

JSON.stringify(new Date(0)) === ""1970-01-01T00:00:00.000Z"" because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you

can avoid this complexity.

        if (Array.isArray(object)) {

            buffer += '[';
            let next = false;
            object.forEach((element) => {
                if (next) {
                    buffer += ',';
                }
                next = true;
                serialize(element);
            });
            buffer += ']';
        } else {
            buffer += '{';
            let next = false;
            Object.keys(object).sort().forEach((property) => {
                if (next) {
                    buffer += ',';
                }
                next = true;

                buffer += JSON.stringify(property);

I think you need a symbol check here. JSON.stringify(Symbol.for('foo')) === undefined

                buffer += ':';
                serialize(object[property]);
            });
            buffer += '}';
        }
    } else {
        buffer += JSON.stringify(object);

This fails to distinguish non-integral numbers from integral ones, and produces non-standard output when object === undefined. Again, not a problem if the input is required to be valid JSON.

On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-16 18:04, Mike Samuel wrote:
>
> It is entirely unsuitable to embedding in HTML or XML though.
>> IIUC, with an implementation based on this
>>
>>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>
>
> I don't know what you are trying to prove here :-)
>

Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without
extra care.
If it contains a string literal that embeds a NUL it cannot be embedded in
XML period even if extra care is taken.

>
> The output of JSON.canonicalize would also not be in the subset of JSON
>> that is also a subset of JavaScript's PrimaryExpression.
>>
>>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>> `"\u2028\u2029"`
>>
>> It also is not suitable for use internally within systems that internally
>> use cstrings.
>>
>>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>>
>>
> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>

You're correct.  Many JSON producers have a web-safe version, but the
JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.

> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  //
> Returns true
>
> "Emulator":
>
> var canonicalize = function(object) {
>
>     var buffer = '';
>     serialize(object);
>

I thought canonicalize took in a string of JSON and produced the same.  Am
I wrong?
"Canonicalize" to my mind means a function that returns the canonical
member of an
equivalence class given any member from that same equivalence class, so is
always 'a -> 'a.

>     return buffer;
>
>     function serialize(object) {
>         if (object !== null && typeof object === 'object') {
>

JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity.

            if (Array.isArray(object)) {
>                 buffer += '[';
>                 let next = false;
>                 object.forEach((element) => {
>                     if (next) {
>                         buffer += ',';
>                     }
>                     next = true;
>                     serialize(element);
>                 });
>                 buffer += ']';
>             } else {
>                 buffer += '{';
>                 let next = false;
>                 Object.keys(object).sort().forEach((property) => {
>                     if (next) {
>                         buffer += ',';
>                     }
>                     next = true;

                    buffer += JSON.stringify(property);
>

I think you need a symbol check here.  JSON.stringify(Symbol.for('foo'))
=== undefined

>                     buffer += ':';
>                     serialize(object[property]);
>                 });
>                 buffer += '}';
>             }
>         } else {
>             buffer += JSON.stringify(object);
>

This fails to distinguish non-integral numbers from integral ones, and
produces non-standard output
when object === undefined.  Again, not a problem if the input is required
to be valid JSON.

>         }
>     }
> };
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/d26710b3/attachment.html>

# C. Scott Ananian (7 years ago)

On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.

IIUC, with an implementation based on this

JSON.canonicalize(JSON.stringify("</script>")) === "</script>" && JSON.canonicalize(JSON.stringify("]]>")) === "]]>"

I don't know what you are trying to prove here :-)

He wants to ship it as application/json and have it be safe if the browser happens to ignore the mime type and interpret it as HTML or XML, I believe. Mandatory encoding of < as an escape would make the output "safe" for such use. I'm not convinced this is in-scope, but it's an interesting case to consider when determining which characters ought to be escaped.

(I think he's writing JSON.canonicalize(JSON.stringify(...)) where he means to write JSON.canonicalize(...), at least if I understand the proposed API correctly.)

The output of JSON.canonicalize would also not be in the subset of JSON
that is also a subset of JavaScript's PrimaryExpression.
JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
"\u2028\u2029"

I'm not sure about this, but I think he's saying you can't just eval the canonical JSON output, because newlines appear literally, not escaped. I believe I actually ran into some compatibility issues with this back when I was playing around with canonical JSON as well; certain JSON parsers wouldn't accept "JSON" with embedded literal newlines.

OTOH, I don't think anyone should be encouraged to eval JSON! As noted previously, there should be a strict parse function to go along with the strict serialize function.

It also is not suitable for use internally within systems that internally

use cstrings.

JSON.canonicalize(JSON.stringify("\u0000")) === "\u0000"

A literal NUL character is unrepresentable in a naive C implementation. You need to use pascal-style strings in your low-level implementation. This is an important consideration for non-JavaScript use. In my page I noted, "Because only two byte values are escaped, be aware that JSON-encoded data may contain embedded control characters and nulls." A similar warning is at least called for here.

On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <mikesamuel at gmail.com> wrote: I also see """ It is suggested that unicode strings be represented as the UTF-8 encoding of unicode Normalization Form C www.unicode.org/reports/tr15 (UAX #15). However, arbitrary content may be represented as a string: it is not guaranteed that string contents can be meaningfully parsed as UTF-8. """ which seems to be mixing concerns about the wire format used to encode JSON as octets and NFC which would apply to the text of the JSON string.

Yes, it is rather unfortunate that we have only one datatype here and a bit of an impedance mismatch. JSON serialization is usually considered literally as a byte-stream, but JavaScript wants to parse those bytes as some encoding (usually UTF-8) of a UTF-16 string.

My suggestion is just to make this very plain in a SHOULD comment to the potentially implementor. If the underlying data is unicode string data, it SHOULD be represented as the UTF-8 encoding of unicode Normalization Form C (UAX #15). However, the consumer should be aware that the data may be binary bits and not interpretable as a valid UTF-8 string.

Re:

Escape normalization: If you don't do this normalization, signatures would typically break and that's not really a "security" (=attacker) problem; it is rather a "nuisance" of the same caliber as a server not responding.

Consider signatures for malware detection. If an attacker can trivially modify their (in this example) JSON-encoded payload so that it is still "canonical" and still passes whatever input verifier exists (so much easier if there is not strict parsing!), then they can bypass your signature-based detection system. That's a security problem.

Both sides must be true: equal hashes should mean equal content (to high probability) and unequal hashes should mean different content. Otherwise there is a security problem.

On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-16 18:04, Mike Samuel wrote:
>
> It is entirely unsuitable to embedding in HTML or XML though.
>> IIUC, with an implementation based on this
>>
>>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>
>
> I don't know what you are trying to prove here :-)

He wants to ship it as application/json and have it be safe if the browser
happens to ignore the mime type and interpret it as HTML or XML, I
believe.  Mandatory encoding of < as an escape would make the output "safe"
for such use.  I'm not convinced this is in-scope, but it's an interesting
case to consider when determining which characters ought to be escaped.

(I think he's writing `JSON.canonicalize(JSON.stringify(...))` where he
means to write `JSON.canonicalize(...)`, at least if I understand the
proposed API correctly.)

> The output of JSON.canonicalize would also not be in the subset of JSON
>> that is also a subset of JavaScript's PrimaryExpression.
>>
>>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>> `"\u2028\u2029"`
>>
>
I'm not sure about this, but I think he's saying you can't just `eval` the
canonical JSON output, because newlines appear literally, not escaped. I
believe I actually ran into some compatibility issues with this back when I
was playing around with canonical JSON as well; certain JSON parsers
wouldn't accept "JSON" with embedded literal newlines.

OTOH, I don't think anyone should be encouraged to eval JSON!  As noted
previously, there should be a strict parse function to go along with the
strict serialize function.

> It also is not suitable for use internally within systems that internally
>> use cstrings.
>>
>>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>>
>
A literal NUL character is unrepresentable in a naive C implementation.
You need to use pascal-style strings in your low-level implementation.
This is an important consideration for non-JavaScript use.  In my page I
noted, "Because only two byte values are escaped, be aware that
JSON-encoded data may contain embedded control characters and nulls."  A
similar warning is at least called for here.

> On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <mikesamuel at gmail.com>
> wrote:
> I also see
> """
> It is suggested that unicode strings be represented as the UTF-8 encoding
> of unicode Normalization Form C <http://www.unicode.org/reports/tr15/> (UAX
> #15). However, arbitrary content may be represented as a string: it is not
> guaranteed that string contents can be meaningfully parsed as UTF-8.
> """
> which seems to be mixing concerns about the wire format used to encode
> JSON as octets and NFC which would apply to the text of the JSON string.
>

Yes, it is rather unfortunate that we have only one datatype here and a bit
of an impedance mismatch.  JSON serialization is usually considered
literally as a byte-stream, but JavaScript wants to parse those bytes as
some encoding (usually UTF-8) of a UTF-16 string.

My suggestion is just to make this very plain in a SHOULD comment to the
potentially implementor.  If the underlying data is unicode string data, it
SHOULD be represented as the UTF-8 encoding of unicode Normalization Form C
(UAX #15).   However, the consumer should be aware that the data may be
binary bits and not interpretable as a valid UTF-8 string.

Re:

> Escape normalization: If you don't do this normalization, signatures would
> typically break and that's not really a "security" (=attacker) problem; it
> is rather a "nuisance" of the same caliber as a server not responding.

Consider signatures for malware detection.  If an attacker can trivially
modify their (in this example) JSON-encoded payload so that it is still
"canonical" and still passes whatever input verifier exists (so much easier
if there is not strict parsing!), then they can bypass your signature-based
detection system.  That's a security problem.

Both sides must be true: equal hashes should mean equal content (to high
probability) and unequal hashes should mean different content.  Otherwise
there is a security problem.
 --scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/314589c1/attachment-0001.html>

# C. Scott Ananian (7 years ago)

And just to be clear: I'm all for standardizing a canonical JSON form. In addition to my 11-year-old attempt, there have been countless others, and still no standard. I just want us to learn from the previous attempts and try to make something at least as good as everything which has come before, especially in terms of the various non-obvious considerations which individual implementors have discovered the hard way over the years.

And just to be clear: I'm all for standardizing a canonical JSON form.  In
addition to my 11-year-old attempt, there have been countless others, and
still no *standard*.  I just want us to learn from the previous attempts
and try to make something at least as good as everything which has come
before, especially in terms of the various non-obvious considerations which
individual implementors have discovered the hard way over the years.
  --scott

On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

>
>
> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
> anders.rundgren.net at gmail.com> wrote:
>
>> On 2018-03-16 18:04, Mike Samuel wrote:
>>
>> It is entirely unsuitable to embedding in HTML or XML though.
>>> IIUC, with an implementation based on this
>>>
>>>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>>
>>
>> I don't know what you are trying to prove here :-)
>>
>
> Only that canonical JSON is useful in a very narrow context.
> It cannot be embedded in an HTML script tag.
> It cannot be embedded in an XML or HTML foreign content context without
> extra care.
> If it contains a string literal that embeds a NUL it cannot be embedded in
> XML period even if extra care is taken.
>
>
>
>>
>> The output of JSON.canonicalize would also not be in the subset of JSON
>>> that is also a subset of JavaScript's PrimaryExpression.
>>>
>>>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>>> `"\u2028\u2029"`
>>>
>>> It also is not suitable for use internally within systems that
>>> internally use cstrings.
>>>
>>>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>>>
>>>
>> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>>
>
> You're correct.  Many JSON producers have a web-safe version, but the
> JavaScript builtin does not.
> My point is that JSON.canonicalize undoes those web-safety tweaks.
>
>
>
>> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  //
>> Returns true
>>
>> "Emulator":
>>
>> var canonicalize = function(object) {
>>
>>     var buffer = '';
>>     serialize(object);
>>
>
> I thought canonicalize took in a string of JSON and produced the same.  Am
> I wrong?
> "Canonicalize" to my mind means a function that returns the canonical
> member of an
> equivalence class given any member from that same equivalence class, so is
> always 'a -> 'a.
>
>
>>     return buffer;
>>
>>     function serialize(object) {
>>         if (object !== null && typeof object === 'object') {
>>
>
> JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
> because Date.prototype.toJSON exists.
>
> If you operate as a JSON_string -> JSON_string function then you
> can avoid this complexity.
>
>             if (Array.isArray(object)) {
>>                 buffer += '[';
>>                 let next = false;
>>                 object.forEach((element) => {
>>                     if (next) {
>>                         buffer += ',';
>>                     }
>>                     next = true;
>>                     serialize(element);
>>                 });
>>                 buffer += ']';
>>             } else {
>>                 buffer += '{';
>>                 let next = false;
>>                 Object.keys(object).sort().forEach((property) => {
>>                     if (next) {
>>                         buffer += ',';
>>                     }
>>                     next = true;
>
>                     buffer += JSON.stringify(property);
>>
>
> I think you need a symbol check here.  JSON.stringify(Symbol.for('foo'))
> === undefined
>
>
>>                     buffer += ':';
>>                     serialize(object[property]);
>>                 });
>>                 buffer += '}';
>>             }
>>         } else {
>>             buffer += JSON.stringify(object);
>>
>
> This fails to distinguish non-integral numbers from integral ones, and
> produces non-standard output
> when object === undefined.  Again, not a problem if the input is required
> to be valid JSON.
>
>
>>         }
>>     }
>> };
>>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/469c2fb7/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 18:46, Mike Samuel wrote:

On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
On 2018-03-16 18:04, Mike Samuel wrote:

    It is entirely unsuitable to embedding in HTML or XML though.
    IIUC, with an implementation based on this

        JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
    JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`


I don't know what you are trying to prove here :-)
Only that canonical JSON is useful in a very narrow context. It cannot be embedded in an HTML script tag. It cannot be embedded in an XML or HTML foreign content context without extra care. If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

If we stick to browsers, JSON.canonicalize() would presumably be used with WebCrypto, WebSocket etc.

Node.js is probably a more important target.

Related stuff: tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html JSON signatures without canonicalization.

    The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

         JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

    It also is not suitable for use internally within systems that internally use cstrings.

        JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`


JSON.canonicalize() would be [almost] identical to JSON.stringify()

You're correct. Many JSON producers have a web-safe version, but the JavaScript builtin does not. My point is that JSON.canonicalize undoes those web-safety tweaks.

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

     var buffer = '';
     serialize(object);

I thought canonicalize took in a string of JSON and produced the same. Am I wrong?

Yes, it is just a variant of JSON.stringify().

"Canonicalize" to my mind means a function that returns the canonical member of an equivalence class given any member from that same equivalence class, so is always 'a -> 'a.

This is rather a canonicalizing serializer.

     return buffer;

     function serialize(object) {
         if (object !== null && typeof object === 'object') {

JSON.stringify(new Date(0)) === ""1970-01-01T00:00:00.000Z"" because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you can avoid this complexity.

             if (Array.isArray(object)) {
                 buffer += '[';
                 let next = false;
                 object.forEach((element) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     serialize(element);
                 });
                 buffer += ']';
             } else {
                 buffer += '{';
                 let next = false;
                 Object.keys(object).sort().forEach((property) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true; 

                     buffer += JSON.stringify(property);

I think you need a symbol check here. JSON.stringify(Symbol.for('foo')) === undefined

                     buffer += ':';
                     serialize(object[property]);
                 });
                 buffer += '}';
             }
         } else {
             buffer += JSON.stringify(object);

This fails to distinguish non-integral numbers from integral ones, and produces non-standard output when object === undefined. Again, not a problem if the input is required to be valid JSON.

Well, a proper implementation would build on JSON.stringify() with property sorting as the only enhancement.

On 2018-03-16 18:46, Mike Samuel wrote:
> 
> 
> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-16 18:04, Mike Samuel wrote:
> 
>         It is entirely unsuitable to embedding in HTML or XML though.
>         IIUC, with an implementation based on this
> 
>             JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>         JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
> 
> 
>     I don't know what you are trying to prove here :-)
> 
> 
> Only that canonical JSON is useful in a very narrow context.
> It cannot be embedded in an HTML script tag.
> It cannot be embedded in an XML or HTML foreign content context without extra care.
> If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

If we stick to browsers, JSON.canonicalize() would presumably be used with WebCrypto, WebSocket etc.

Node.js is probably a more important target.

Related stuff:
https://tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html
JSON signatures without canonicalization.

> 
> 
>         The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.
> 
>              JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
> 
>         It also is not suitable for use internally within systems that internally use cstrings.
> 
>             JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
> 
> 
>     JSON.canonicalize() would be [almost] identical to JSON.stringify()
> 
> 
> You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
> My point is that JSON.canonicalize undoes those web-safety tweaks.
> 
>     JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true
> 
>     "Emulator":
> 
>     var canonicalize = function(object) {
> 
>          var buffer = '';
>          serialize(object);
> 
> 
> I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?

Yes, it is just a variant of JSON.stringify().

> "Canonicalize" to my mind means a function that returns the canonical member of an
> equivalence class given any member from that same equivalence class, so is always 'a -> 'a.

This is rather a canonicalizing serializer.

> 
>          return buffer;
> 
>          function serialize(object) {
>              if (object !== null && typeof object === 'object') {
> 
> 
> JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
> because Date.prototype.toJSON exists.
> 
> If you operate as a JSON_string -> JSON_string function then you
> can avoid this complexity.
> 
>                  if (Array.isArray(object)) {
>                      buffer += '[';
>                      let next = false;
>                      object.forEach((element) => {
>                          if (next) {
>                              buffer += ',';
>                          }
>                          next = true;
>                          serialize(element);
>                      });
>                      buffer += ']';
>                  } else {
>                      buffer += '{';
>                      let next = false;
>                      Object.keys(object).sort().forEach((property) => {
>                          if (next) {
>                              buffer += ',';
>                          }
>                          next = true; 
> 
>                          buffer += JSON.stringify(property);
> 
> 
> I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
> 
>                          buffer += ':';
>                          serialize(object[property]);
>                      });
>                      buffer += '}';
>                  }
>              } else {
>                  buffer += JSON.stringify(object);
> 
> 
> This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
> when object === undefined.  Again, not a problem if the input is required to be valid JSON.

Well, a proper implementation would build on JSON.stringify() with property sorting as the only enhancement.

> 
>              }
>          }
>     };
> 
>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 1:54 PM, C. Scott Ananian <ecmascript at cscott.net>

wrote:

And just to be clear: I'm all for standardizing a canonical JSON form. In addition to my 11-year-old attempt, there have been countless others, and still no standard. I just want us to learn from the previous attempts and try to make something at least as good as everything which has come before, especially in terms of the various non-obvious considerations which individual implementors have discovered the hard way over the years.

I think the hashing use case is an important one. At the risk of bikeshedding, "canonical" seems to overstate the usefulness. Many assume that the canonical form of something is usually the one you use in preference to any other equivalent.

If the integer-only restriction is relaxed (see below), then

The proposed canonical form seems useful as an input to strong hash functions.
It seems usable as a complete message body, but not preferable due to potential loss of precision.
It seems usable but not preferable as a long-term storage format.
It seems a source of additional risk when used in conjunction with other common web languages.

If that is correct, Would people be averse to marketing this as "hashable JSON" instead of "canonical JSON?"

Numbers

There seem to be 3 main forks in the design space w.r.t. numbers. I'm sure cscott has thought of more, but to make it clear why I think canonical JSON is not very useful as a wire/storage format.

Integers only PROS: avoids floating point equality issues that have bedeviled many systems CONS: can support only a small portion of the JSON value space CONS: small loss of precision risk with integers encoded from Decimal values. For example, won't roundtrip Java BigDecimals.
Any numbers with minimal changes: dropping + signs, normalizing zeros, using a fixed threshold for scientific notation. PROS: supports whole JSON value-space CONS: less useful for hashing CONS: risks loss of precision when decoders decide based on presence of decimal point whether to represent as double or int.
Preserve textual representation. PROS: avoids loss of precision PROS: can support whole JSON value-space CONS: not very useful for hashing

It seems that there is a tradeoff between usefulness for hashing and the ability to support the whole JSON value-space.

Recommending this as a wire / storage format further complicates that tradeoff.

Regardless of which fork is chosen, there are some risks with the current design. For example, 1e100000 takes up some space in memory. This might allow timing attacks. Imagine an attacker can get Alice to embed 1e100000 or another number in her JSON. Alice sends that message to Bob over an encrypted channel. Bob converts the JSON to canonical JSON. If Bob refuses some JSON payloads over a threshold size or the time to process is noticably different for 1e100000 vs 1e1 then the attacker can tell, via traffic analysis alone, when Alice communicates with Bob. We should avoid that in-memory blowup if possible.

On Fri, Mar 16, 2018 at 1:54 PM, C. Scott Ananian <ecmascript at cscott.net>
wrote:

> And just to be clear: I'm all for standardizing a canonical JSON form.  In
> addition to my 11-year-old attempt, there have been countless others, and
> still no *standard*.  I just want us to learn from the previous attempts
> and try to make something at least as good as everything which has come
> before, especially in terms of the various non-obvious considerations which
> individual implementors have discovered the hard way over the years.
>

I think the hashing use case is an important one.  At the risk of
bikeshedding, "canonical" seems to overstate the usefulness.  Many assume
that the canonical form of something is usually the one you use in
preference to any other equivalent.

If the integer-only restriction is relaxed (see below), then
* The proposed canonical form seems useful as an input to strong hash
functions.
* It seems usable as a complete message body, but not preferable due to
potential loss of precision.
* It seems usable but not preferable as a long-term storage format.
* It seems a source of additional risk when used in conjunction with other
common web languages.

If that is correct, Would people be averse to marketing this as "hashable
JSON" instead of "canonical JSON?"

------

Numbers

There seem to be 3 main forks in the design space w.r.t. numbers.  I'm sure
cscott has thought of more, but to make it clear why I think canonical JSON
is not very useful as a wire/storage format.

1. Integers only
    PROS: avoids floating point equality issues that have bedeviled many
systems
    CONS: can support only a small portion of the JSON value space
    CONS: small loss of precision risk with integers encoded from Decimal
values.
        For example, won't roundtrip Java BigDecimals.
2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
    using a fixed threshold for scientific notation.
    PROS: supports whole JSON value-space
    CONS: less useful for hashing
    CONS: risks loss of precision when decoders decide based on presence of
       decimal point whether to represent as double or int.
3. Preserve textual representation.
    PROS: avoids loss of precision
    PROS: can support whole JSON value-space
    CONS: not very useful for hashing

It seems that there is a tradeoff between usefulness for hashing and the
ability to
support the whole JSON value-space.

Recommending this as a wire / storage format further complicates that
tradeoff.

Regardless of which fork is chosen, there are some risks with the current
design.
For example, 1e100000 takes up some space in memory.  This might allow
timing attacks.
Imagine an attacker can get Alice to embed 1e100000 or another number in
her JSON.
Alice sends that message to Bob over an encrypted channel.  Bob converts
the JSON to
canonical JSON.  If Bob refuses some JSON payloads over a threshold size or
the
time to process is noticably different for 1e100000 vs 1e1 then the
attacker can
tell, via traffic analysis alone, when Alice communicates with Bob.
We should avoid that in-memory blowup if possible.






>   --scott
>
> On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <mikesamuel at gmail.com> wrote:
>
>>
>>
>> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
>> anders.rundgren.net at gmail.com> wrote:
>>
>>> On 2018-03-16 18:04, Mike Samuel wrote:
>>>
>>> It is entirely unsuitable to embedding in HTML or XML though.
>>>> IIUC, with an implementation based on this
>>>>
>>>>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>>>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>>>
>>>
>>> I don't know what you are trying to prove here :-)
>>>
>>
>> Only that canonical JSON is useful in a very narrow context.
>> It cannot be embedded in an HTML script tag.
>> It cannot be embedded in an XML or HTML foreign content context without
>> extra care.
>> If it contains a string literal that embeds a NUL it cannot be embedded
>> in XML period even if extra care is taken.
>>
>>
>>
>>>
>>> The output of JSON.canonicalize would also not be in the subset of JSON
>>>> that is also a subset of JavaScript's PrimaryExpression.
>>>>
>>>>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>>>> `"\u2028\u2029"`
>>>>
>>>> It also is not suitable for use internally within systems that
>>>> internally use cstrings.
>>>>
>>>>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>>>>
>>>>
>>> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>>>
>>
>> You're correct.  Many JSON producers have a web-safe version, but the
>> JavaScript builtin does not.
>> My point is that JSON.canonicalize undoes those web-safety tweaks.
>>
>>
>>
>>> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'
>>> // Returns true
>>>
>>> "Emulator":
>>>
>>> var canonicalize = function(object) {
>>>
>>>     var buffer = '';
>>>     serialize(object);
>>>
>>
>> I thought canonicalize took in a string of JSON and produced the same.
>> Am I wrong?
>> "Canonicalize" to my mind means a function that returns the canonical
>> member of an
>> equivalence class given any member from that same equivalence class, so
>> is always 'a -> 'a.
>>
>>
>>>     return buffer;
>>>
>>>     function serialize(object) {
>>>         if (object !== null && typeof object === 'object') {
>>>
>>
>> JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
>> because Date.prototype.toJSON exists.
>>
>> If you operate as a JSON_string -> JSON_string function then you
>> can avoid this complexity.
>>
>>             if (Array.isArray(object)) {
>>>                 buffer += '[';
>>>                 let next = false;
>>>                 object.forEach((element) => {
>>>                     if (next) {
>>>                         buffer += ',';
>>>                     }
>>>                     next = true;
>>>                     serialize(element);
>>>                 });
>>>                 buffer += ']';
>>>             } else {
>>>                 buffer += '{';
>>>                 let next = false;
>>>                 Object.keys(object).sort().forEach((property) => {
>>>                     if (next) {
>>>                         buffer += ',';
>>>                     }
>>>                     next = true;
>>
>>                     buffer += JSON.stringify(property);
>>>
>>
>> I think you need a symbol check here.  JSON.stringify(Symbol.for('foo'))
>> === undefined
>>
>>
>>>                     buffer += ':';
>>>                     serialize(object[property]);
>>>                 });
>>>                 buffer += '}';
>>>             }
>>>         } else {
>>>             buffer += JSON.stringify(object);
>>>
>>
>> This fails to distinguish non-integral numbers from integral ones, and
>> produces non-standard output
>> when object === undefined.  Again, not a problem if the input is required
>> to be valid JSON.
>>
>>
>>>         }
>>>     }
>>> };
>>>
>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/3366109b/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 19:30, Mike Samuel wrote:

Any numbers with minimal changes: dropping + signs, normalizing zeros, using a fixed threshold for scientific notation. PROS: supports whole JSON value-space CONS: less useful for hashing CONS: risks loss of precision when decoders decide based on presence of decimal point whether to represent as double or int.

Have you actually looked into the specification? cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 ES6 has all what it takes.

Anders

On 2018-03-16 19:30, Mike Samuel wrote:
> 2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>      using a fixed threshold for scientific notation.
>      PROS: supports whole JSON value-space
>      CONS: less useful for hashing
>      CONS: risks loss of precision when decoders decide based on presence of
>         decimal point whether to represent as double or int.

Have you actually looked into the specification?
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2
ES6 has all what it takes.

Anders


> 3. Preserve textual representation.
>      PROS: avoids loss of precision
>      PROS: can support whole JSON value-space
>      CONS: not very useful for hashing
> 
> It seems that there is a tradeoff between usefulness for hashing and the ability to
> support the whole JSON value-space.
> 
> Recommending this as a wire / storage format further complicates that tradeoff.
> 
> Regardless of which fork is chosen, there are some risks with the current design.
> For example, 1e100000 takes up some space in memory.  This might allow timing attacks.
> Imagine an attacker can get Alice to embed 1e100000 or another number in her JSON.
> Alice sends that message to Bob over an encrypted channel.  Bob converts the JSON to
> canonical JSON.  If Bob refuses some JSON payloads over a threshold size or the
> time to process is noticably different for 1e100000 vs 1e1 then the attacker can
> tell, via traffic analysis alone, when Alice communicates with Bob.
> We should avoid that in-memory blowup if possible.
> 
> 
> 
> 
>        --scott
> 
>     On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <mikesamuel at gmail.com <mailto:mikesamuel at gmail.com>> wrote:
> 
> 
> 
>         On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>             On 2018-03-16 18:04, Mike Samuel wrote:
> 
>                 It is entirely unsuitable to embedding in HTML or XML though.
>                 IIUC, with an implementation based on this
> 
>                     JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>                 JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
> 
> 
>             I don't know what you are trying to prove here :-)
> 
> 
>         Only that canonical JSON is useful in a very narrow context.
>         It cannot be embedded in an HTML script tag.
>         It cannot be embedded in an XML or HTML foreign content context without extra care.
>         If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.
> 
> 
>                 The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.
> 
>                      JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
> 
>                 It also is not suitable for use internally within systems that internally use cstrings.
> 
>                     JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
> 
> 
>             JSON.canonicalize() would be [almost] identical to JSON.stringify()
> 
> 
>         You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
>         My point is that JSON.canonicalize undoes those web-safety tweaks.
> 
>             JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true
> 
>             "Emulator":
> 
>             var canonicalize = function(object) {
> 
>                  var buffer = '';
>                  serialize(object);
> 
> 
>         I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?
>         "Canonicalize" to my mind means a function that returns the canonical member of an
>         equivalence class given any member from that same equivalence class, so is always 'a -> 'a.
> 
>                  return buffer;
> 
>                  function serialize(object) {
>                      if (object !== null && typeof object === 'object') {
> 
> 
>         JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
>         because Date.prototype.toJSON exists.
> 
>         If you operate as a JSON_string -> JSON_string function then you
>         can avoid this complexity.
> 
>                          if (Array.isArray(object)) {
>                              buffer += '[';
>                              let next = false;
>                              object.forEach((element) => {
>                                  if (next) {
>                                      buffer += ',';
>                                  }
>                                  next = true;
>                                  serialize(element);
>                              });
>                              buffer += ']';
>                          } else {
>                              buffer += '{';
>                              let next = false;
>                              Object.keys(object).sort().forEach((property) => {
>                                  if (next) {
>                                      buffer += ',';
>                                  }
>                                  next = true; 
> 
>                                  buffer += JSON.stringify(property);
> 
> 
>         I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
> 
>                                  buffer += ':';
>                                  serialize(object[property]);
>                              });
>                              buffer += '}';
>                          }
>                      } else {
>                          buffer += JSON.stringify(object);
> 
> 
>         This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
>         when object === undefined.  Again, not a problem if the input is required to be valid JSON.
> 
>                      }
>                  }
>             };
> 
> 
> 
>         _______________________________________________
>         es-discuss mailing list
>         es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>         https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
> 
> 
>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-16 19:30, Mike Samuel wrote:

Any numbers with minimal changes: dropping + signs, normalizing zeros, using a fixed threshold for scientific notation. PROS: supports whole JSON value-space CONS: less useful for hashing CONS: risks loss of precision when decoders decide based on presence of decimal point whether to represent as double or int.

Have you actually looked into the specification? cyberphone.github.io/doc/security/draft-rundgren-jso n-canonicalization-scheme.html#rfc.section.3.2.2 ES6 has all what it takes.

Yes, but other notions of canonical equivalence have been mentioned here so reasons to prefer one to another seem in scope.

On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-16 19:30, Mike Samuel wrote:
>
>> 2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>>      using a fixed threshold for scientific notation.
>>      PROS: supports whole JSON value-space
>>      CONS: less useful for hashing
>>      CONS: risks loss of precision when decoders decide based on presence
>> of
>>         decimal point whether to represent as double or int.
>>
>
> Have you actually looked into the specification?
> https://cyberphone.github.io/doc/security/draft-rundgren-jso
> n-canonicalization-scheme.html#rfc.section.3.2.2
> ES6 has all what it takes.
>

Yes, but other notions of canonical equivalence have been mentioned here
so reasons to prefer one to another seem in scope.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/f0964c5b/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 19:51, Mike Samuel wrote:

On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:

On 2018-03-16 19:30, Mike Samuel wrote:

    2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
          using a fixed threshold for scientific notation.
          PROS: supports whole JSON value-space
          CONS: less useful for hashing
          CONS: risks loss of precision when decoders decide based on presence of
             decimal point whether to represent as double or int.


Have you actually looked into the specification?
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2>
ES6 has all what it takes.

Yes, but other notions of canonical equivalence have been mentioned here so reasons to prefer one to another seem in scope.

Availability beats perfection anytime. This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)

Anders

On 2018-03-16 19:51, Mike Samuel wrote:
> 
> 
> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-16 19:30, Mike Samuel wrote:
> 
>         2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>               using a fixed threshold for scientific notation.
>               PROS: supports whole JSON value-space
>               CONS: less useful for hashing
>               CONS: risks loss of precision when decoders decide based on presence of
>                  decimal point whether to represent as double or int.
> 
> 
>     Have you actually looked into the specification?
>     https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2>
>     ES6 has all what it takes.
> 
> 
> Yes, but other notions of canonical equivalence have been mentioned here
> so reasons to prefer one to another seem in scope.

Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)

Anders

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-16 19:51, Mike Samuel wrote:
On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren < anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
On 2018-03-16 19:30, Mike Samuel wrote:

    2. Any numbers with minimal changes: dropping + signs,
normalizing zeros, using a fixed threshold for scientific notation. PROS: supports whole JSON value-space CONS: less useful for hashing CONS: risks loss of precision when decoders decide based on presence of decimal point whether to represent as double or int.
Have you actually looked into the specification?
https://cyberphone.github.io/doc/security/draft-rundgren-jso
n-canonicalization-scheme.html#rfc.section.3.2.2 < cyberphone.github.io/doc/security/draft-rundgren-js on-canonicalization-scheme.html#rfc.section.3.2.2> ES6 has all what it takes.

Yes, but other notions of canonical equivalence have been mentioned here so reasons to prefer one to another seem in scope.
Availability beats perfection anytime. This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)

Perhaps. Any thoughts on my question about the merits of "Hashable" vs "Canonical"?

On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-16 19:51, Mike Samuel wrote:
>
>>
>>
>> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>> wrote:
>>
>>     On 2018-03-16 19:30, Mike Samuel wrote:
>>
>>         2. Any numbers with minimal changes: dropping + signs,
>> normalizing zeros,
>>               using a fixed threshold for scientific notation.
>>               PROS: supports whole JSON value-space
>>               CONS: less useful for hashing
>>               CONS: risks loss of precision when decoders decide based on
>> presence of
>>                  decimal point whether to represent as double or int.
>>
>>
>>     Have you actually looked into the specification?
>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>> n-canonicalization-scheme.html#rfc.section.3.2.2 <
>> https://cyberphone.github.io/doc/security/draft-rundgren-js
>> on-canonicalization-scheme.html#rfc.section.3.2.2>
>>     ES6 has all what it takes.
>>
>>
>> Yes, but other notions of canonical equivalence have been mentioned here
>> so reasons to prefer one to another seem in scope.
>>
>
> Availability beats perfection anytime.  This is the VHS (if anybody
> remember that old story) of canonicalization and I don't feel too bad about
> that :-)


Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
"Canonical"?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/660a968e/attachment-0001.html>

# C. Scott Ananian (7 years ago)

I think the horse is out of the barn re hashable-vs-canonical. It has (independently) been invented and named canonical JSON many many times, starting 11 years ago.

gibson042.github.io/canonicaljson-spec, www.npmjs.com/package/another-json, www.npmjs.com/package/canonical-json, www.npmjs.com/package/keyify, www.npmjs.com/package/canonical-tent-json, www.npmjs.com/package/content-addressable-json, godoc.org/github.com/gibson042/canonicaljson-go, tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00, keybase.io/docs/api/1.0/canonical_packings#json, tools.ietf.org/html/rfc7638#section-3.3, wiki.laptop.org/go/Canonical_JSON, mirkokiefer/canonical-json, davidchambers/CANON

"Content Addressable JSON" is a variant of your "hashable JSON" proposal, though. But the "canonicals" seem to vastly outnumber the "hashables".

My question for Anders is: do you actually plan to incorporate any feedback into changes to your proposal? Or were you really just looking for us to validate your work, not actually contribute to it?

I think the horse is out of the barn re hashable-vs-canonical.  It has
(independently) been invented and named canonical JSON many many times,
starting 11 years ago.

https://gibson042.github.io/canonicaljson-spec/
https://www.npmjs.com/package/another-json
https://www.npmjs.com/package/canonical-json
https://www.npmjs.com/package/keyify
https://www.npmjs.com/package/canonical-tent-json
https://www.npmjs.com/package/content-addressable-json
https://godoc.org/github.com/gibson042/canonicaljson-go
https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00
https://keybase.io/docs/api/1.0/canonical_packings#json
https://tools.ietf.org/html/rfc7638#section-3.3
http://wiki.laptop.org/go/Canonical_JSON
https://github.com/mirkokiefer/canonical-json
https://github.com/davidchambers/CANON

"Content Addressable JSON" is a variant of your "hashable JSON" proposal,
though.  But the "canonicals" seem to vastly outnumber the "hashables".

My question for Anders is: do you actually plan to incorporate any feedback
into changes to your proposal?  Or were you really just looking for us to
validate your work, not actually contribute to it?
 --scott

On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

>
>
> On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <
> anders.rundgren.net at gmail.com> wrote:
>
>> On 2018-03-16 19:51, Mike Samuel wrote:
>>
>>>
>>>
>>> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
>>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>>> wrote:
>>>
>>>     On 2018-03-16 19:30, Mike Samuel wrote:
>>>
>>>         2. Any numbers with minimal changes: dropping + signs,
>>> normalizing zeros,
>>>               using a fixed threshold for scientific notation.
>>>               PROS: supports whole JSON value-space
>>>               CONS: less useful for hashing
>>>               CONS: risks loss of precision when decoders decide based
>>> on presence of
>>>                  decimal point whether to represent as double or int.
>>>
>>>
>>>     Have you actually looked into the specification?
>>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>>> n-canonicalization-scheme.html#rfc.section.3.2.2 <
>>> https://cyberphone.github.io/doc/security/draft-rundgren-js
>>> on-canonicalization-scheme.html#rfc.section.3.2.2>
>>>     ES6 has all what it takes.
>>>
>>>
>>> Yes, but other notions of canonical equivalence have been mentioned here
>>> so reasons to prefer one to another seem in scope.
>>>
>>
>> Availability beats perfection anytime.  This is the VHS (if anybody
>> remember that old story) of canonicalization and I don't feel too bad about
>> that :-)
>
>
> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
> "Canonical"?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/b88a961c/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 20:09, Mike Samuel wrote:

Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)
Perhaps. Any thoughts on my question about the merits of "Hashable" vs "Canonical"?

No, there were so much noise here so I may have need a more dense description if possible.

Anders

On 2018-03-16 20:09, Mike Samuel wrote:
> 
>     Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)
> 
> 
> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs "Canonical"?

No, there were so much noise here so I may have need a more dense description if possible.

Anders

# Richard Gibson (7 years ago)

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as 1. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.

gibson042.github.io/canonicaljson-spec* [2]: ecma-international.org/ecma-262/7.0/#sec- tostring-applied-to-the-number-type [3]: ecma-international.org/ecma-262/7.0/#sec-quotejsonstring [4]: tools.ietf.org/html/rfc8259#section-8.1

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric
use cases or otherwise restricted subsets thereof as addressed by JOSE, it
is not suitable for producing canonical/hashable/etc. JSON, which requires
a fully general solution such as [1]. Both its number serialization [2] and
string serialization [3] specify aspects that harm compatibility (the
former having arbitrary branches dependent upon the value of numbers, the
latter being capable of producing invalid UTF-8 octet sequences that
represent unpaired surrogate code points—unacceptable for exchange outside
of a closed ecosystem [4]). JSON is a general *language-agnostic*
interchange format, and ECMAScript JSON.stringify is *not* a JSON
canonicalization solution.

[1]: *http://gibson042.github.io/canonicaljson-spec/
<http://gibson042.github.io/canonicaljson-spec/>*
[2]: http://ecma-international.org/ecma-262/7.0/#sec-
tostring-applied-to-the-number-type
[3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
[4]: https://tools.ietf.org/html/rfc8259#section-8.1

On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

>
>
> On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <
> anders.rundgren.net at gmail.com> wrote:
>
>> On 2018-03-16 19:51, Mike Samuel wrote:
>>
>>>
>>>
>>> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
>>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>>> wrote:
>>>
>>>     On 2018-03-16 19:30, Mike Samuel wrote:
>>>
>>>         2. Any numbers with minimal changes: dropping + signs,
>>> normalizing zeros,
>>>               using a fixed threshold for scientific notation.
>>>               PROS: supports whole JSON value-space
>>>               CONS: less useful for hashing
>>>               CONS: risks loss of precision when decoders decide based
>>> on presence of
>>>                  decimal point whether to represent as double or int.
>>>
>>>
>>>     Have you actually looked into the specification?
>>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>>> n-canonicalization-scheme.html#rfc.section.3.2.2 <
>>> https://cyberphone.github.io/doc/security/draft-rundgren-js
>>> on-canonicalization-scheme.html#rfc.section.3.2.2>
>>>     ES6 has all what it takes.
>>>
>>>
>>> Yes, but other notions of canonical equivalence have been mentioned here
>>> so reasons to prefer one to another seem in scope.
>>>
>>
>> Availability beats perfection anytime.  This is the VHS (if anybody
>> remember that old story) of canonicalization and I don't feel too bad about
>> that :-)
>
>
> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
> "Canonical"?
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/8962da77/attachment-0001.html>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 3:23 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-16 20:09, Mike Samuel wrote:
Availability beats perfection anytime.  This is the VHS (if anybody
remember that old story) of canonicalization and I don't feel too bad about that :-)

Perhaps. Any thoughts on my question about the merits of "Hashable" vs "Canonical"?
No, there were so much noise here so I may have need a more dense description if possible.

In the email to which you responded "Have you actually looked ..." look for "If that is correct, Would people be averse to marketing this as "hashable JSON" instead of "canonical JSON?""

On Fri, Mar 16, 2018 at 3:23 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-16 20:09, Mike Samuel wrote:
>
>>
>>     Availability beats perfection anytime.  This is the VHS (if anybody
>> remember that old story) of canonicalization and I don't feel too bad about
>> that :-)
>>
>>
>> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
>> "Canonical"?
>>
>
> No, there were so much noise here so I may have need a more dense
> description if possible.
>

In the email to which you responded "Have you actually looked ..." look for
"If that is correct, Would people be averse to marketing this as "hashable
JSON" instead of "canonical JSON?""
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/96629bde/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is nota JSON canonicalization solution.

It effectively depends on your objectives.

.#2 is not really a problem, you would typically not output canonicalized JSON, it is only used internally since there are no requirements that input is canonicalized . .#3 yes, if you create bad data you can [always] screw up. It sounds BTW as a bug which presumable get fixed some day. .#4 If you are targeting Node.js, Browsers, OpenAPI, and all other platforms compatible with those, JSON.stringify() seems to suffice.

The JSON.canonicalize() method proposal was intended for the systems specified in #4.

Perfection is often the enemy of good.

Anders

On 2018-03-16 20:24, Richard Gibson wrote:
> Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.

It effectively depends on your objectives.

#2 is not really a problem, you would typically not output canonicalized JSON, it is only used internally since there are no requirements that input is canonicalized .
#3 yes, if you create bad data you can [always] screw up.  It sounds BTW as a bug which presumable get fixed some day.
#4 If you are targeting Node.js, Browsers, OpenAPI, and all other platforms compatible with those, JSON.stringify() seems to suffice.

The JSON.canonicalize() method proposal was intended for the systems specified in #4.

Perfection is often the enemy of good.

Anders

>
> [1]: _http://gibson042.github.io/canonicaljson-spec/_
> [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type <http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type>
> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring <http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring>
> [4]: https://tools.ietf.org/html/rfc8259#section-8.1 <https://tools.ietf.org/html/rfc8259#section-8.1>
>
> On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel <mikesamuel at gmail.com <mailto:mikesamuel at gmail.com>> wrote:
>
>
>
>     On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
>
>         On 2018-03-16 19:51, Mike Samuel wrote:
>
>
>
>             On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
>
>                 On 2018-03-16 19:30, Mike Samuel wrote:
>
>                     2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>                           using a fixed threshold for scientific notation.
>                           PROS: supports whole JSON value-space
>                           CONS: less useful for hashing
>                           CONS: risks loss of precision when decoders decide based on presence of
>                              decimal point whether to represent as double or int.
>
>
>                 Have you actually looked into the specification?
>             https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2> <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2>>
>                 ES6 has all what it takes.
>
>
>             Yes, but other notions of canonical equivalence have been mentioned here
>             so reasons to prefer one to another seem in scope.
>
>
>         Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)
>
>
>     Perhaps.  Any thoughts on my question about the merits of "Hashable" vs "Canonical"?
>
>     _______________________________________________
>     es-discuss mailing list
>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>     https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/52489db0/attachment-0001.html>

# C. Scott Ananian (7 years ago)

On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

Perfection is often the enemy of good.

So, to be clear: you don't plan on actually incorporating any feedback into your proposal, since it's already "good"?

On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> Perfection is often the enemy of good.
>

So, to be clear: you don't plan on actually incorporating any feedback into
your proposal, since it's already "good"?
  --scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/9470785b/attachment.html>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net>

wrote:

On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

Perfection is often the enemy of good.

So, to be clear: you don't plan on actually incorporating any feedback into your proposal, since it's already "good"?

To restate my main objections:

I think any proposal to offer an alternative stringify instead of a string->string transform is not very good

and could be easily improved by rephrasing it as a string->string transform.

Also, presenting this as a better wire format I think is misleading since I think it has no advantages as a wire format over JSON.stringify's output, and recommending canonical JSON, except for the short duration needed to hash it creates more problems than it solves.

On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net>
wrote:

> On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
> anders.rundgren.net at gmail.com> wrote:
>
>> Perfection is often the enemy of good.
>>
>
> So, to be clear: you don't plan on actually incorporating any feedback
> into your proposal, since it's already "good"?
>

To restate my main objections:

I think any proposal to offer an alternative stringify instead of a
string->string transform is not very good
and could be easily improved by rephrasing it as a string->string transform.

Also, presenting this as a better wire format I think is misleading since I
think it has no advantages as a wire format over JSON.stringify's
output, and recommending canonical JSON, except for the short duration
needed to hash it creates more problems than it solves.

>   --scott
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/f9b39377/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-16 21:41, Mike Samuel wrote:

On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net <mailto:ecmascript at cscott.net>> wrote:

On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:

    Perfection is often the enemy of good.


So, to be clear: you don't plan on actually incorporating any feedback into your proposal, since it's already "good"?

I'm not going to incorporate Unicode Normalization because it is better addressed at the application level.

To restate my main objections:

I think any proposal to offer an alternative stringify instead of a string->string transform is not very good and could be easily improved by rephrasing it as a string->string transform.

Could you give a concrete example on that?

Also, presenting this as a better wire format I think is misleading

This was not my intention, I just expressed it poorly. It was rather mixed with my objection to Unicode Normalization.

since I think it has no advantages as a wire format over JSON.stringify's output,

Right, JSON.stringify() is a much better for creating the external format since it honors "creation order".

and recommending canonical JSON, except for the short duration needed to hash it creates more problems than it solves.

Wrong, this is exactly what I had in mind. If the hashable/canonicalizable method works as described (it does not?) it solves the hashing problem.

Anders

On 2018-03-16 21:41, Mike Samuel wrote:
> 
> 
> On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net <mailto:ecmascript at cscott.net>> wrote:
> 
>     On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>         Perfection is often the enemy of good.
> 
> 
>     So, to be clear: you don't plan on actually incorporating any feedback into your proposal, since it's already "good"?

I'm not going to incorporate Unicode Normalization because it is better addressed at the application level.

> To restate my main objections:
> 
> I think any proposal to offer an alternative stringify instead of a string->string transform is not very good
> and could be easily improved by rephrasing it as a string->string transform.

Could you give a concrete example on that?

> Also, presenting this as a better wire format I think is misleading 

This was not my intention, I just expressed it poorly.  It was rather mixed with my objection to Unicode Normalization.

> since I think it has no advantages as a wire format over JSON.stringify's output,

Right, JSON.stringify() is a much better for creating the external format since it honors "creation order".

> and recommending canonical JSON, except for the short duration needed to hash it creates more problems than it solves.

Wrong, this is exactly what I had in mind.  If the hashable/canonicalizable method works as described (it does not?) it solves the hashing problem.

Anders

# Mathias Bynens (7 years ago)

On Fri, Mar 16, 2018 at 9:04 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

JSON.canonicalize(JSON.stringify("\u2028\u2029")) === "\u2028\u2029"

Soon U+2028 and U+2029 will no longer be edge cases. A Stage 3 proposal (currently shipping in Chrome) makes them valid in ECMAScript string literals, making JSON a strict subset of ECMAScript: tc39/proposal

On Fri, Mar 16, 2018 at 9:04 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

>
> The output of JSON.canonicalize would also not be in the subset of JSON
> that is also a subset of JavaScript's PrimaryExpression.
>
>    JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>

Soon U+2028 and U+2029 will no longer be edge cases. A Stage 3 proposal
(currently shipping in Chrome) makes them valid in ECMAScript string
literals, making JSON a strict subset of ECMAScript:
https://github.com/tc39/proposal-json-superset
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180317/2c4d116e/attachment.html>

# C. Scott Ananian (7 years ago)

My main feedback is that since this topic has been covered so many times in the past, any serious standardization proposal should include a section surveying existing "canonical JSON" standards and implementations and comparing the proposed standard with prior work. A standard should be a "best of breed" implementation, which adequately replaces existing work, not just another average implementation narrowly tailored to the proposer's own particular use cases.

I don't think Unicode Normalization should necessarily be a requirement of a canonical JSON standard. But any reasonable proposal should at least acknowledge the issues raised, as well as the issues of embedded nulls, HTML safety, and the other points that have been raised in this thread (and the many other points addressed by the dozen other "canonical JSON" implementations I linked to). If you're just going to say, "my proposal is good enough", well then mine is "good enough" too, and so are the other dozen, and none of them need to be the "official JavaScript canonical form". What's your compelling argument that your proposal is better than any of the other dozen? And why start the discussion on this list if you're not going to do anything with the information you learn?

My main feedback is that since this topic has been covered so many times in
the past, any serious standardization proposal should include a section
surveying existing "canonical JSON" standards and implementations and
comparing the proposed standard with prior work.  A standard should be a
"best of breed" implementation, which adequately replaces existing work,
not just another average implementation narrowly tailored to the proposer's
own particular use cases.

I don't think Unicode Normalization should necessarily be a requirement of
a canonical JSON standard.  But any reasonable proposal should at least
acknowledge the issues raised, as well as the issues of embedded nulls,
HTML safety, and the other points that have been raised in this thread (and
the many other points addressed by the dozen other "canonical JSON"
implementations I linked to).  If you're just going to say, "my proposal is
good enough", well then mine is "good enough" too, and so are the other
dozen, and none of them need to be the "official JavaScript canonical
form".  What's your compelling argument that your proposal is better than
any of the other dozen?  And why start the discussion on this list if
you're not going to do anything with the information you learn?
 --scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/1497cd06/attachment-0001.html>

# Mike Samuel (7 years ago)

On Fri, Mar 16, 2018, 4:58 PM Anders Rundgren <anders.rundgren.net at gmail.com>

wrote:

On 2018-03-16 21:41, Mike Samuel wrote:
On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net <mailto:ecmascript at cscott.net>> wrote:
On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:

To restate my main objections:

I think any proposal to offer an alternative stringify instead of a string->string transform is not very good and could be easily improved by rephrasing it as a string->string transform.

Could you give a concrete example on that?

I've given three. As written, the proposal produces invalid or low quality output given (undefined, objects with toJSON methods, and symbols as either keys or values). These would not be problems for a real canonicalizer since none are present in a string of JSON.

In addition, two distant users of the canonicalizer who wish to check hashes need to agree on the ancillary arguments like the replacer if canonicalize takes the same arguments and actually uses them. They also need to agree on implementation details of toJSON methods which is a backward compatibility hazard.

If you did solve the toJSON problem by incorporating calls to that method you've now complicated cross-platform behavior. If you phrase in terms of string->string it is much easier to disentangle the definition of

canonicalizers JSON from JS and make it language agnostic.

Finally, your proposal is not the VHS of canonicalizers. That would be x=>JSON.stringify(JSON.parse(x)) since it's deployed and used.

On Fri, Mar 16, 2018, 4:58 PM Anders Rundgren <anders.rundgren.net at gmail.com>
wrote:

> On 2018-03-16 21:41, Mike Samuel wrote:
> >
> >
> > On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net
> <mailto:ecmascript at cscott.net>> wrote:
> >
> >     On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
> wrote:
> >
>
> > To restate my main objections:
> >
> > I think any proposal to offer an alternative stringify instead of a
> string->string transform is not very good
> > and could be easily improved by rephrasing it as a string->string
> transform.
>
> Could you give a concrete example on that?
>
>
>
I've given three.  As written, the proposal produces invalid or low quality
output given (undefined, objects with toJSON methods, and symbols as either
keys or values).  These would not be problems for a real canonicalizer
since none are present in a string of JSON.

In addition, two distant users of the canonicalizer who wish to check
hashes need to agree on the ancillary arguments like the replacer if
canonicalize takes the same arguments and actually uses them.  They also
need to agree on implementation details of toJSON methods which is a
backward compatibility hazard.

If you did solve the toJSON problem by incorporating calls to that method
you've now complicated cross-platform behavior.  If you phrase in terms of
string->string it is much easier to disentangle the definition of
canonicalizers JSON from JS and make it language agnostic.

Finally, your proposal is not the VHS of canonicalizers.  That would be
x=>JSON.stringify(JSON.parse(x)) since it's deployed and used.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180316/91c39c44/attachment.html>

# kai zhu (7 years ago)

stepping aside from the security aspect, having your code-base’s json-files normalized with sorted-keys is good-housekeeping, especially when you want to sanely maintain ones >1mb in size (e.g. large swagger json-documentations) [1].

and you can easily operationalize your build-process / pre-commit-checks to auto-key-sort json-files with the following simple shell-function [2].

[1] kaizhu256/node-swgg-github-all/blob/2018.2.2/assets.swgg.swagger.json, kaizhu256/node-swgg-github-all/blob/2018.2.2/assets.swgg.swagger.json [2] kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.sh#L1513, kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.sh#L1513

#!/bin/sh
# .bashrc
: '
# to install, copy-paste the shell-function shFileJsonNormalize below
# into your shell startup script (.bashrc, .profile, etc...)


# example shell-usage:

source ~/.bashrc
printf "{
    \"version\": \"0.0.1\",
    \"name\": \"my-app\",
    \"aa\": {
        \"zz\": 1,
        \"yy\": {
            \"xx\": 2,
            \"ww\": 3
        }
    },
    \"bb\": [
        3,
        2,
        1,
        null
    ]
}" > package.json

shFileJsonNormalize package.json
cat package.json


# key-sorted output:
{
    "aa": {
        "yy": {
            "ww": 3,
            "xx": 2
        },
        "zz": 1
    },
    "bb": [
        3,
        2,
        1,
        null
    ],
    "name": "my-app",
    "version": "0.0.1"
}
'


shFileJsonNormalize() {(set -e
# this shell-function will
# 1. read the json-data from $FILE
# 2. normalize the json-data
# 3. write the normalized json-data back to $FILE
    FILE="$1"
    node -e "
// <script>
/*jslint
    bitwise: true,
    browser: true,
    maxerr: 8,
    maxlen: 100,
    node: true,
    nomen: true,
    regexp: true,
    stupid: true
*/
'use strict';
var local;
local = {};
local.fs = require('fs');
local.jsonStringifyOrdered = function (jsonObj, replacer, space) {
/*
 * this function will JSON.stringify the jsonObj,
 * with object-keys sorted and circular-references removed
 */
    var circularList, stringify, tmp;
    stringify = function (jsonObj) {
    /*
     * this function will recursively JSON.stringify the jsonObj,
     * with object-keys sorted and circular-references removed
     */
        // if jsonObj is an object, then recurse its items with object-keys sorted
        if (jsonObj &&
                typeof jsonObj === 'object' &&
                typeof jsonObj.toJSON !== 'function') {
            // ignore circular-reference
            if (circularList.indexOf(jsonObj) >= 0) {
                return;
            }
            circularList.push(jsonObj);
            // if jsonObj is an array, then recurse its jsonObjs
            if (Array.isArray(jsonObj)) {
                return '[' + jsonObj.map(function (jsonObj) {
                    // recurse
                    tmp = stringify(jsonObj);
                    return typeof tmp === 'string'
                        ? tmp
                        : 'null';
                }).join(',') + ']';
            }
            return '{' + Object.keys(jsonObj)
                // sort object-keys
                .sort()
                .map(function (key) {
                    // recurse
                    tmp = stringify(jsonObj[key]);
                    if (typeof tmp === 'string') {
                        return JSON.stringify(key) + ':' + tmp;
                    }
                })
                .filter(function (jsonObj) {
                    return typeof jsonObj === 'string';
                })
                .join(',') + '}';
        }
        // else JSON.stringify as normal
        return JSON.stringify(jsonObj);
    };
    circularList = [];
    return JSON.stringify(typeof jsonObj === 'object' && jsonObj
        // recurse
        ? JSON.parse(stringify(jsonObj))
        : jsonObj, replacer, space);
};
local.fs.writeFileSync(process.argv[1], local.jsonStringifyOrdered(
    JSON.parse(local.fs.readFileSync(process.argv[1], 'utf8')),
    null,
    4
) + '\n');
// </script>
    " "$FILE"
)}

stepping aside from the security aspect, having your code-base’s json-files normalized with sorted-keys is good-housekeeping, especially when you want to sanely maintain ones >1mb in size (e.g. large swagger json-documentations) [1].

and you can easily operationalize your build-process / pre-commit-checks to auto-key-sort json-files with the following simple shell-function [2].

[1] https://github.com/kaizhu256/node-swgg-github-all/blob/2018.2.2/assets.swgg.swagger.json <https://github.com/kaizhu256/node-swgg-github-all/blob/2018.2.2/assets.swgg.swagger.json>
[2] https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.sh#L1513 <https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.sh#L1513>



```shell
#!/bin/sh
# .bashrc
: '
# to install, copy-paste the shell-function shFileJsonNormalize below
# into your shell startup script (.bashrc, .profile, etc...)


# example shell-usage:

source ~/.bashrc
printf "{
    \"version\": \"0.0.1\",
    \"name\": \"my-app\",
    \"aa\": {
        \"zz\": 1,
        \"yy\": {
            \"xx\": 2,
            \"ww\": 3
        }
    },
    \"bb\": [
        3,
        2,
        1,
        null
    ]
}" > package.json
shFileJsonNormalize package.json
cat package.json


# key-sorted output:
{
    "aa": {
        "yy": {
            "ww": 3,
            "xx": 2
        },
        "zz": 1
    },
    "bb": [
        3,
        2,
        1,
        null
    ],
    "name": "my-app",
    "version": "0.0.1"
}
'


shFileJsonNormalize() {(set -e
# this shell-function will
# 1. read the json-data from $FILE
# 2. normalize the json-data
# 3. write the normalized json-data back to $FILE
    FILE="$1"
    node -e "
// <script>
/*jslint
    bitwise: true,
    browser: true,
    maxerr: 8,
    maxlen: 100,
    node: true,
    nomen: true,
    regexp: true,
    stupid: true
*/
'use strict';
var local;
local = {};
local.fs = require('fs');
local.jsonStringifyOrdered = function (jsonObj, replacer, space) {
/*
 * this function will JSON.stringify the jsonObj,
 * with object-keys sorted and circular-references removed
 */
    var circularList, stringify, tmp;
    stringify = function (jsonObj) {
    /*
     * this function will recursively JSON.stringify the jsonObj,
     * with object-keys sorted and circular-references removed
     */
        // if jsonObj is an object, then recurse its items with object-keys sorted
        if (jsonObj &&
                typeof jsonObj === 'object' &&
                typeof jsonObj.toJSON !== 'function') {
            // ignore circular-reference
            if (circularList.indexOf(jsonObj) >= 0) {
                return;
            }
            circularList.push(jsonObj);
            // if jsonObj is an array, then recurse its jsonObjs
            if (Array.isArray(jsonObj)) {
                return '[' + jsonObj.map(function (jsonObj) {
                    // recurse
                    tmp = stringify(jsonObj);
                    return typeof tmp === 'string'
                        ? tmp
                        : 'null';
                }).join(',') + ']';
            }
            return '{' + Object.keys(jsonObj)
                // sort object-keys
                .sort()
                .map(function (key) {
                    // recurse
                    tmp = stringify(jsonObj[key]);
                    if (typeof tmp === 'string') {
                        return JSON.stringify(key) + ':' + tmp;
                    }
                })
                .filter(function (jsonObj) {
                    return typeof jsonObj === 'string';
                })
                .join(',') + '}';
        }
        // else JSON.stringify as normal
        return JSON.stringify(jsonObj);
    };
    circularList = [];
    return JSON.stringify(typeof jsonObj === 'object' && jsonObj
        // recurse
        ? JSON.parse(stringify(jsonObj))
        : jsonObj, replacer, space);
};
local.fs.writeFileSync(process.argv[1], local.jsonStringifyOrdered(
    JSON.parse(local.fs.readFileSync(process.argv[1], 'utf8')),
    null,
    4
) + '\n');
// </script>
    " "$FILE"
)}
```

> On Mar 17, 2018, at 5:43 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
> 
> 
> 
> On Fri, Mar 16, 2018, 4:58 PM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> On 2018-03-16 21:41, Mike Samuel wrote:
> >
> >
> > On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net <mailto:ecmascript at cscott.net> <mailto:ecmascript at cscott.net <mailto:ecmascript at cscott.net>>> wrote:
> >
> >     On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
> >
> 
> > To restate my main objections:
> >
> > I think any proposal to offer an alternative stringify instead of a string->string transform is not very good
> > and could be easily improved by rephrasing it as a string->string transform.
> 
> Could you give a concrete example on that?
> 
> 
> 
> I've given three.  As written, the proposal produces invalid or low quality output given (undefined, objects with toJSON methods, and symbols as either keys or values).  These would not be problems for a real canonicalizer since none are present in a string of JSON.
> 
> In addition, two distant users of the canonicalizer who wish to check hashes need to agree on the ancillary arguments like the replacer if canonicalize takes the same arguments and actually uses them.  They also need to agree on implementation details of toJSON methods which is a backward compatibility hazard.
> 
> If you did solve the toJSON problem by incorporating calls to that method you've now complicated cross-platform behavior.  If you phrase in terms of string->string it is much easier to disentangle the definition of canonicalizers JSON from JS and make it language agnostic.
> 
> Finally, your proposal is not the VHS of canonicalizers.  That would be x=>JSON.stringify(JSON.parse(x)) since it's deployed and used.
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180317/a585a312/attachment-0001.html>

# Isiah Meadows (7 years ago)

With files frequently that size, it might be worth considering whether you should use a custom format+validator* instead. It'd take a lot less memory, which could be helpful since the first row alone of this file takes about 4-5K in Firefox when deserialized - I verified this in the console (To be exact, 5032 the first time, 4128 the second, and 4416 the third). Also, a megabyte is a lot to send down the wire in Web terms.

* In this case, you'd need a validator that uses minimal perfect hashes and a compact binary data representation that doesn't rely on a concrete start/end. That would avoid the mess of constantly having to look things up in memory, while leaving your IR much smaller. Another item of note: JS strings are 16-bit, which is wasteful in memory for your entire object.

Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

With files frequently that size, it might be worth considering whether
you should use a custom format+validator\* instead. It'd take a lot
less memory, which could be helpful since the first row alone of [this
file][1] takes about 4-5K in Firefox when deserialized - I verified
this in the console (To be exact, 5032 the first time, 4128 the
second, and 4416 the third). Also, a megabyte is a *lot* to send down
the wire in Web terms.

\* In this case, you'd need a validator that uses minimal perfect
hashes and a compact binary data representation that doesn't rely on a
concrete start/end. That would avoid the mess of constantly having to
look things up in memory, while leaving your IR much smaller. Another
item of note: JS strings are 16-bit, which is wasteful in memory for
your entire object.

[1]: https://raw.githubusercontent.com/kaizhu256/node-swgg-github-all/2018.2.2/assets.swgg.swagger.json

-----

Isiah Meadows
me at isiahmeadows.com

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Fri, Mar 16, 2018 at 11:53 PM, kai zhu <kaizhu256 at gmail.com> wrote:
> stepping aside from the security aspect, having your code-base’s json-files
> normalized with sorted-keys is good-housekeeping, especially when you want
> to sanely maintain ones >1mb in size (e.g. large swagger
> json-documentations) [1].
>
> and you can easily operationalize your build-process / pre-commit-checks to
> auto-key-sort json-files with the following simple shell-function [2].
>
> [1]
> https://github.com/kaizhu256/node-swgg-github-all/blob/2018.2.2/assets.swgg.swagger.json
> [2]
> https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.sh#L1513
>
>
>
> ```shell
> #!/bin/sh
> # .bashrc
> : '
> # to install, copy-paste the shell-function shFileJsonNormalize below
> # into your shell startup script (.bashrc, .profile, etc...)
>
>
> # example shell-usage:
>
> source ~/.bashrc
> printf "{
>     \"version\": \"0.0.1\",
>     \"name\": \"my-app\",
>     \"aa\": {
>         \"zz\": 1,
>         \"yy\": {
>             \"xx\": 2,
>             \"ww\": 3
>         }
>     },
>     \"bb\": [
>         3,
>         2,
>         1,
>         null
>     ]
> }" > package.json
> shFileJsonNormalize package.json
> cat package.json
>
>
> # key-sorted output:
> {
>     "aa": {
>         "yy": {
>             "ww": 3,
>             "xx": 2
>         },
>         "zz": 1
>     },
>     "bb": [
>         3,
>         2,
>         1,
>         null
>     ],
>     "name": "my-app",
>     "version": "0.0.1"
> }
> '
>
>
> shFileJsonNormalize() {(set -e
> # this shell-function will
> # 1. read the json-data from $FILE
> # 2. normalize the json-data
> # 3. write the normalized json-data back to $FILE
>     FILE="$1"
>     node -e "
> // <script>
> /*jslint
>     bitwise: true,
>     browser: true,
>     maxerr: 8,
>     maxlen: 100,
>     node: true,
>     nomen: true,
>     regexp: true,
>     stupid: true
> */
> 'use strict';
> var local;
> local = {};
> local.fs = require('fs');
> local.jsonStringifyOrdered = function (jsonObj, replacer, space) {
> /*
>  * this function will JSON.stringify the jsonObj,
>  * with object-keys sorted and circular-references removed
>  */
>     var circularList, stringify, tmp;
>     stringify = function (jsonObj) {
>     /*
>      * this function will recursively JSON.stringify the jsonObj,
>      * with object-keys sorted and circular-references removed
>      */
>         // if jsonObj is an object, then recurse its items with object-keys
> sorted
>         if (jsonObj &&
>                 typeof jsonObj === 'object' &&
>                 typeof jsonObj.toJSON !== 'function') {
>             // ignore circular-reference
>             if (circularList.indexOf(jsonObj) >= 0) {
>                 return;
>             }
>             circularList.push(jsonObj);
>             // if jsonObj is an array, then recurse its jsonObjs
>             if (Array.isArray(jsonObj)) {
>                 return '[' + jsonObj.map(function (jsonObj) {
>                     // recurse
>                     tmp = stringify(jsonObj);
>                     return typeof tmp === 'string'
>                         ? tmp
>                         : 'null';
>                 }).join(',') + ']';
>             }
>             return '{' + Object.keys(jsonObj)
>                 // sort object-keys
>                 .sort()
>                 .map(function (key) {
>                     // recurse
>                     tmp = stringify(jsonObj[key]);
>                     if (typeof tmp === 'string') {
>                         return JSON.stringify(key) + ':' + tmp;
>                     }
>                 })
>                 .filter(function (jsonObj) {
>                     return typeof jsonObj === 'string';
>                 })
>                 .join(',') + '}';
>         }
>         // else JSON.stringify as normal
>         return JSON.stringify(jsonObj);
>     };
>     circularList = [];
>     return JSON.stringify(typeof jsonObj === 'object' && jsonObj
>         // recurse
>         ? JSON.parse(stringify(jsonObj))
>         : jsonObj, replacer, space);
> };
> local.fs.writeFileSync(process.argv[1], local.jsonStringifyOrdered(
>     JSON.parse(local.fs.readFileSync(process.argv[1], 'utf8')),
>     null,
>     4
> ) + '\n');
> // </script>
>     " "$FILE"
> )}
> ```
>
> On Mar 17, 2018, at 5:43 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
>
>
>
> On Fri, Mar 16, 2018, 4:58 PM Anders Rundgren
> <anders.rundgren.net at gmail.com> wrote:
>>
>> On 2018-03-16 21:41, Mike Samuel wrote:
>> >
>> >
>> > On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian <ecmascript at cscott.net
>> > <mailto:ecmascript at cscott.net>> wrote:
>> >
>> >     On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren
>> > <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>> > wrote:
>> >
>>
>> > To restate my main objections:
>> >
>> > I think any proposal to offer an alternative stringify instead of a
>> > string->string transform is not very good
>> > and could be easily improved by rephrasing it as a string->string
>> > transform.
>>
>> Could you give a concrete example on that?
>>
>>
>
> I've given three.  As written, the proposal produces invalid or low quality
> output given (undefined, objects with toJSON methods, and symbols as either
> keys or values).  These would not be problems for a real canonicalizer since
> none are present in a string of JSON.
>
> In addition, two distant users of the canonicalizer who wish to check hashes
> need to agree on the ancillary arguments like the replacer if canonicalize
> takes the same arguments and actually uses them.  They also need to agree on
> implementation details of toJSON methods which is a backward compatibility
> hazard.
>
> If you did solve the toJSON problem by incorporating calls to that method
> you've now complicated cross-platform behavior.  If you phrase in terms of
> string->string it is much easier to disentangle the definition of
> canonicalizers JSON from JS and make it language agnostic.
>
> Finally, your proposal is not the VHS of canonicalizers.  That would be
> x=>JSON.stringify(JSON.parse(x)) since it's deployed and used.
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Richard Gibson (7 years ago)

On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com>

wrote:

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as 1. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.

gibson042.github.io/canonicaljson-spec* [2]: ecma-international.org/ecma-262/7.0/#sec-tostrin g-applied-to-the-number-type [3]: ecma-international.org/ecma-262/7.0/#sec-quotejsonstring [4]: tools.ietf.org/html/rfc8259#section-8.1

Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact. I also think it looks a bit weird, but that's just a matter of esthetics. Compatibility is an entirely different issue.

I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Sorting on Unicode Code Points is of course "technically 100% right" but

strictly put not necessary.

Certain scenarios call for different systems to independently generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Your claim about uppercase Unicode escapes is incorrect, there is no such

requirement:

tools.ietf.org/html/rfc8259#section-7

I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?

On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com>
wrote:

> On 2018-03-16 20:24, Richard Gibson wrote:
>
> Though ECMAScript JSON.stringify may suffice for certain
> Javascript-centric use cases or otherwise restricted subsets thereof as
> addressed by JOSE, it is not suitable for producing
> canonical/hashable/etc. JSON, which requires a fully general solution such
> as [1]. Both its number serialization [2] and string serialization [3]
> specify aspects that harm compatibility (the former having arbitrary
> branches dependent upon the value of numbers, the latter being capable of
> producing invalid UTF-8 octet sequences that represent unpaired surrogate
> code points—unacceptable for exchange outside of a closed ecosystem [4]).
> JSON is a general *language-agnostic* interchange format, and ECMAScript
> JSON.stringify is *not* a JSON canonicalization solution.
>
> [1]: *http://gibson042.github.io/canonicaljson-spec/
> <http://gibson042.github.io/canonicaljson-spec/>*
> [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostrin
> g-applied-to-the-number-type
> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>
>
> Richard, I may be wrong but AFAICT, our respective canoncalization schemes
> are in fact principally IDENTICAL.
>

In that they have the same goal, yes. In that they both achieve that goal,
no. I'm not married to choices like exponential notation and uppercase
escapes, but a JSON canonicalization scheme MUST cover all of JSON.


> That the number serialization provided by JSON.stringify() is
> unacceptable, is not generally taken as a fact.  I also think it looks a
> bit weird, but that's just a matter of esthetics.  Compatibility is an
> entirely different issue.
>

I concede this point. The modified algorithm is sufficient, but note that a
canonicalization scheme will remain static even if ECMAScript changes.

Sorting on Unicode Code Points is of course "technically 100% right" but
> strictly put not necessary.
>

Certain scenarios call for different systems to _independently_ generate
equivalent data structures, and it is a necessary property of canonical
serialization that it yields identical results for equivalent data
structures. JSON does not specify significance of object member ordering,
so member ordering does not distinguish otherwise equivalent objects, so
canonicalization MUST specify member ordering that is deterministic with
respect to all valid data.

Your claim about uppercase Unicode escapes is incorrect, there is no such
> requirement:
>
https://tools.ietf.org/html/rfc8259#section-7
>

I don't recall ever making a claim about uppercase Unicode escapes, other
than observing that it is the preferred form for examples in the JSON
RFCs... what are you talking about?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/40a90bce/attachment.html>

# Mike Samuel (7 years ago)

On Sun, Mar 18, 2018 at 10:08 AM, Richard Gibson <richard.gibson at gmail.com>

wrote:

On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com> wrote:

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as 1. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.

gibson042.github.io/canonicaljson-spec* [2]: ecma-international.org/ecma-262/7.0/#sec-tostrin g-applied-to-the-number-type [3]: ecma-international.org/ecma-262/7.0/#sec-quotejsonstring [4]: tools.ietf.org/html/rfc8259#section-8.1

Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact. I also think it looks a bit weird, but that's just a matter of esthetics. Compatibility is an entirely different issue.

I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Does this mean that the language below would need to be fixed at a specific version of Unicode or that we would need to cite a specific version for canonicalization but might allow a higher version for String.prototype.normalize and in future versions of the spec require it?

www.ecma-international.org/ecma-262/6.0/#sec-conformance """ A conforming implementation of ECMAScript must interpret source text input in conformance with the Unicode Standard, Version 5.1.0 or later """

and in ECMA 404 www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf

""" For undated references, the latest edition of the referenced document (including any amendments) applies. ISO/IEC 10646, Information Technology – Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode Standard www.unicode.org/versions/latest. """

Sorting on Unicode Code Points is of course "technically 100% right" but

strictly put not necessary.

Certain scenarios call for different systems to independently generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Code points include orphaned surrogates in a way that scalar values do not, right? So both "\uD800" and "\uD800\uDC00" are single codepoints. It seems like a strict prefix of a string should still sort before that string but prefix transitivity in general does not hold: "\uFFFF" < "\uD800\uDC00" && "\uFFFF" > "\uD800".

That shouldn't cause problems for hashability but I thought I'd raise it just in case.

On Sun, Mar 18, 2018 at 10:08 AM, Richard Gibson <richard.gibson at gmail.com>
wrote:

> On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com>
> wrote:
>
>> On 2018-03-16 20:24, Richard Gibson wrote:
>>
>> Though ECMAScript JSON.stringify may suffice for certain
>> Javascript-centric use cases or otherwise restricted subsets thereof as
>> addressed by JOSE, it is not suitable for producing
>> canonical/hashable/etc. JSON, which requires a fully general solution such
>> as [1]. Both its number serialization [2] and string serialization [3]
>> specify aspects that harm compatibility (the former having arbitrary
>> branches dependent upon the value of numbers, the latter being capable of
>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>> JSON is a general *language-agnostic* interchange format, and ECMAScript
>> JSON.stringify is *not* a JSON canonicalization solution.
>>
>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>> <http://gibson042.github.io/canonicaljson-spec/>*
>> [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostrin
>> g-applied-to-the-number-type
>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>
>> Richard, I may be wrong but AFAICT, our respective canoncalization
>> schemes are in fact principally IDENTICAL.
>>
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
>> That the number serialization provided by JSON.stringify() is
>> unacceptable, is not generally taken as a fact.  I also think it looks a
>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>> entirely different issue.
>>
>
> I concede this point. The modified algorithm is sufficient, but note that
> a canonicalization scheme will remain static even if ECMAScript changes.
>

Does this mean that the language below would need to be fixed at a specific
version of Unicode or that we would need to cite a specific version for
canonicalization but might allow a higher version for
String.prototype.normalize
and in future versions of the spec require it?

http://www.ecma-international.org/ecma-262/6.0/#sec-conformance
"""
A conforming implementation of ECMAScript must interpret source text input
in conformance with the Unicode Standard, Version 5.1.0 or later
"""

and in ECMA 404
<http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf>

"""
For undated references, the latest edition of the referenced document
(including any amendments) applies. ISO/IEC 10646, Information Technology –
Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode
Standard http://www.unicode.org/versions/latest.
"""


Sorting on Unicode Code Points is of course "technically 100% right" but
>> strictly put not necessary.
>>
>
> Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>

Code points include orphaned surrogates in a way that scalar values do not,
right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
It seems like a strict prefix of a string should still sort before that
string but prefix transitivity in general does not hold: "\uFFFF" <
"\uD800\uDC00" && "\uFFFF" > "\uD800".
That shouldn't cause problems for hashability but I thought I'd raise it
just in case.



> Your claim about uppercase Unicode escapes is incorrect, there is no such
>> requirement:
>>
> https://tools.ietf.org/html/rfc8259#section-7
>>
>
> I don't recall ever making a claim about uppercase Unicode escapes, other
> than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/843a0383/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 15:08, Richard Gibson wrote:

On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.

[1]: _http://gibson042.github.io/canonicaljson-spec/ <http://gibson042.github.io/canonicaljson-spec/>_
[2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type <http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type>
[3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring <http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring>
[4]: https://tools.ietf.org/html/rfc8259#section-8.1 <https://tools.ietf.org/html/rfc8259#section-8.1>

Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

Here it gets interesting... What in JSON cannot be expressed through JS and JSON.stringify()?

That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Agreed.

Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
Certain scenarios call for different systems to independently generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:

https://tools.ietf.org/html/rfc8259#section-7 <https://tools.ietf.org/html/rfc8259#section-7>
I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?

You're right, I found it it in the gibson042.github.io/canonicaljson-spec/#changelog

Thanx, Anders

On 2018-03-18 15:08, Richard Gibson wrote:
> On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
>
>     On 2018-03-16 20:24, Richard Gibson wrote:
>>     Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>>
>>     [1]: _http://gibson042.github.io/canonicaljson-spec/ <http://gibson042.github.io/canonicaljson-spec/>_
>>     [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type <http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type>
>>     [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring <http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring>
>>     [4]: https://tools.ietf.org/html/rfc8259#section-8.1 <https://tools.ietf.org/html/rfc8259#section-8.1>
>
>     Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>
>
> In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?

>     That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>
>
> I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Agreed.

>
>     Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>
>
> Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

>
>     Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>
>     https://tools.ietf.org/html/rfc8259#section-7 <https://tools.ietf.org/html/rfc8259#section-7>
>
> I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?

You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog

Thanx,
Anders

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/da1aee62/attachment-0001.html>

# C. Scott Ananian (7 years ago)

On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Because there are JavaScript strings which do not form valid UTF-16 code units. For example, the one-character string '\uD800'. On the input validation side, there are 8-bit strings which can not be decoded as UTF-8. A complete sorting spec needs to describe how these are to be handled. For example, something like WTF-8: simonsapin.github.io/wtf-8

On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> Violently agree but do not understand (I guess I'm just dumb...) why (for
> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
> (although the result would differ).
>

Because there are JavaScript strings which do not form valid UTF-16 code
units.  For example, the one-character string '\uD800'. On the input
validation side, there are 8-bit strings which can not be decoded as
UTF-8.  A complete sorting spec needs to describe how these are to be
handled. For example, something like WTF-8:
http://simonsapin.github.io/wtf-8/
  --scott


>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/d5f5e01f/attachment.html>

# Michał Wadas (7 years ago)

JSON supports arbitrary precision numbers that can't be properly represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

JSON supports arbitrary precision numbers that can't be properly
represented as 64 bit floats. This includes numbers like eg. 1e9999 or
1/1e9999.


On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <anders.rundgren.net at gmail.com>
wrote:

> On 2018-03-18 15:08, Richard Gibson wrote:
>
> On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com>
> wrote:
>
>> On 2018-03-16 20:24, Richard Gibson wrote:
>>
>> Though ECMAScript JSON.stringify may suffice for certain
>> Javascript-centric use cases or otherwise restricted subsets thereof as
>> addressed by JOSE, it is not suitable for producing
>> canonical/hashable/etc. JSON, which requires a fully general solution such
>> as [1]. Both its number serialization [2] and string serialization [3]
>> specify aspects that harm compatibility (the former having arbitrary
>> branches dependent upon the value of numbers, the latter being capable of
>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>> JSON is a general *language-agnostic* interchange format, and ECMAScript
>> JSON.stringify is *not* a JSON canonicalization solution.
>>
>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>> <http://gibson042.github.io/canonicaljson-spec/>*
>> [2]:
>> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>
>> Richard, I may be wrong but AFAICT, our respective canoncalization
>> schemes are in fact principally IDENTICAL.
>>
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
> Here it gets interesting...  What in JSON cannot be expressed through JS
> and JSON.stringify()?
>
>
>
>> That the number serialization provided by JSON.stringify() is
>> unacceptable, is not generally taken as a fact.  I also think it looks a
>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>> entirely different issue.
>>
>
> I concede this point. The modified algorithm is sufficient, but note that
> a canonicalization scheme will remain static even if ECMAScript changes.
>
>
> Agreed.
>
>
> Sorting on Unicode Code Points is of course "technically 100% right" but
>> strictly put not necessary.
>>
>
> Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>
>
> Violently agree but do not understand (I guess I'm just dumb...) why (for
> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
> (although the result would differ).
>
>
> Your claim about uppercase Unicode escapes is incorrect, there is no such
>> requirement:
>>
> https://tools.ietf.org/html/rfc8259#section-7
>>
>
> I don't recall ever making a claim about uppercase Unicode escapes, other
> than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
>
>
> You're right, I found it it in the
> https://gibson042.github.io/canonicaljson-spec/#changelog
>
> Thanx,
> Anders
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/ac320836/attachment.html>

# Mike Samuel (7 years ago)

On Sun, Mar 18, 2018 at 10:43 AM, C. Scott Ananian <ecmascript at cscott.net>

wrote:

On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Because there are JavaScript strings which do not form valid UTF-16 code units. For example, the one-character string '\uD800'. On the input validation side, there are 8-bit strings which can not be decoded as UTF-8. A complete sorting spec needs to describe how these are to be handled. For example, something like WTF-8: simonsapin. github.io/wtf-8/

Let's get terminology straight. "\uD800" is a valid string of UTF-16 code units. It is also a valid string of codepoints. It is not a valid string of scalar values.

www.unicode.org/glossary/#code_point : Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. www.unicode.org/glossary/#code_unit : The minimal bit combination that can represent a unit of encoded text for processing or interchange. www.unicode.org/glossary/#unicode_scalar_value : Any Unicode code point www.unicode.org/glossary/#code_point except high-surrogate

and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive.

On Sun, Mar 18, 2018 at 10:43 AM, C. Scott Ananian <ecmascript at cscott.net>
wrote:

> On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren <
> anders.rundgren.net at gmail.com> wrote:
>
>> Violently agree but do not understand (I guess I'm just dumb...) why (for
>> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
>> (although the result would differ).
>>
>
> Because there are JavaScript strings which do not form valid UTF-16 code
> units.  For example, the one-character string '\uD800'. On the input
> validation side, there are 8-bit strings which can not be decoded as
> UTF-8.  A complete sorting spec needs to describe how these are to be
> handled. For example, something like WTF-8: http://simonsapin.
> github.io/wtf-8/
>

Let's get terminology straight.
"\uD800" is a valid string of UTF-16 code units.   It is also a valid
string of codepoints.  It is not a valid string of scalar values.

http://www.unicode.org/glossary/#code_point : Any value in the Unicode
codespace; that is, the range of integers from 0 to 10FFFF16.
http://www.unicode.org/glossary/#code_unit : The minimal bit combination
that can represent a unit of encoded text for processing or interchange.
http://www.unicode.org/glossary/#unicode_scalar_value : Any Unicode *code
point <http://www.unicode.org/glossary/#code_point>* except high-surrogate
and low-surrogate code points. In other words, the ranges of integers 0 to
D7FF16 and E00016 to 10FFFF16 inclusive.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/eeb76928/attachment-0001.html>

# Mike Samuel (7 years ago)

On Sun, Mar 18, 2018 at 10:47 AM, Michał Wadas <michalwadas at gmail.com>

wrote:

JSON supports arbitrary precision numbers that can't be properly represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

I posted this on the summary thread but not here.

gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756 is structured as a string to string transform, so doesn't lose precision when round-tripping, e.g. Python bigints and Java BigDecimals.

It also avoids a space explosion for 1e9999 which might help blunt timing attacks as discussed earlier in this thread.

On Sun, Mar 18, 2018 at 10:47 AM, Michał Wadas <michalwadas at gmail.com>
wrote:

> JSON supports arbitrary precision numbers that can't be properly
> represented as 64 bit floats. This includes numbers like eg. 1e9999 or
> 1/1e9999.
>

I posted this on the summary thread but not here.

https://gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756 is
structured as a string to string transform, so doesn't lose precision when
round-tripping, e.g. Python bigints and Java BigDecimals.

It also avoids a space explosion for 1e9999 which might help blunt timing
attacks as discussed earlier in this thread.



> On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <anders.rundgren.net at gmail.com>
> wrote:
>
>> On 2018-03-18 15:08, Richard Gibson wrote:
>>
>> On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com>
>> wrote:
>>
>>> On 2018-03-16 20:24, Richard Gibson wrote:
>>>
>>> Though ECMAScript JSON.stringify may suffice for certain
>>> Javascript-centric use cases or otherwise restricted subsets thereof as
>>> addressed by JOSE, it is not suitable for producing
>>> canonical/hashable/etc. JSON, which requires a fully general solution such
>>> as [1]. Both its number serialization [2] and string serialization [3]
>>> specify aspects that harm compatibility (the former having arbitrary
>>> branches dependent upon the value of numbers, the latter being capable of
>>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>>> JSON is a general *language-agnostic* interchange format, and
>>> ECMAScript JSON.stringify is *not* a JSON canonicalization solution.
>>>
>>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>>> <http://gibson042.github.io/canonicaljson-spec/>*
>>> [2]: http://ecma-international.org/ecma-262/7.
>>> 0/#sec-tostring-applied-to-the-number-type
>>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>>
>>>
>>> Richard, I may be wrong but AFAICT, our respective canoncalization
>>> schemes are in fact principally IDENTICAL.
>>>
>>
>> In that they have the same goal, yes. In that they both achieve that
>> goal, no. I'm not married to choices like exponential notation and
>> uppercase escapes, but a JSON canonicalization scheme MUST cover all of
>> JSON.
>>
>>
>> Here it gets interesting...  What in JSON cannot be expressed through JS
>> and JSON.stringify()?
>>
>>
>>
>>> That the number serialization provided by JSON.stringify() is
>>> unacceptable, is not generally taken as a fact.  I also think it looks a
>>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>>> entirely different issue.
>>>
>>
>> I concede this point. The modified algorithm is sufficient, but note that
>> a canonicalization scheme will remain static even if ECMAScript changes.
>>
>>
>> Agreed.
>>
>>
>> Sorting on Unicode Code Points is of course "technically 100% right" but
>>> strictly put not necessary.
>>>
>>
>> Certain scenarios call for different systems to _independently_ generate
>> equivalent data structures, and it is a necessary property of canonical
>> serialization that it yields identical results for equivalent data
>> structures. JSON does not specify significance of object member ordering,
>> so member ordering does not distinguish otherwise equivalent objects, so
>> canonicalization MUST specify member ordering that is deterministic with
>> respect to all valid data.
>>
>>
>> Violently agree but do not understand (I guess I'm just dumb...) why (for
>> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
>> (although the result would differ).
>>
>>
>> Your claim about uppercase Unicode escapes is incorrect, there is no such
>>> requirement:
>>>
>> https://tools.ietf.org/html/rfc8259#section-7
>>>
>>
>> I don't recall ever making a claim about uppercase Unicode escapes, other
>> than observing that it is the preferred form for examples in the JSON
>> RFCs... what are you talking about?
>>
>>
>> You're right, I found it it in the https://gibson042.github.io/
>> canonicaljson-spec/#changelog
>>
>> Thanx,
>> Anders
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/02db506b/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 15:47, Michał Wadas wrote:

JSON supports arbitrary precision numbers that can't be properly represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

rfc7159: Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision

If interoperability is not an issue you are free to do whatever you feel useful. Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.

The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.

Anders

On 2018-03-18 15:47, Michał Wadas wrote:
> JSON supports arbitrary precision numbers that can't be properly 
> represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

rfc7159:
    Since software that implements
    IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
    generally available and widely used, good interoperability can be
    achieved by implementations that expect no more precision or range
    than these provide, in the sense that implementations will
    approximate JSON numbers within the expected precision

If interoperability is not an issue you are free to do whatever you feel useful.
Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.

The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.

Anders

> 
> 
> On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-18 15:08, Richard Gibson wrote:
>>     On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
>>
>>         On 2018-03-16 20:24, Richard Gibson wrote:
>>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>>>
>>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>>
>>
>>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
> 
>     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
> 
>>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>>
>>
>>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
> 
>     Agreed.
> 
>>
>>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>>
>>
>>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
> 
>     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
> 
>>
>>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>>
>>         https://tools.ietf.org/html/rfc8259#section-7
>>
>>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
> 
>     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
> 
>     Thanx,
>     Anders
> 
>     _______________________________________________
>     es-discuss mailing list
>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>     https://mail.mozilla.org/listinfo/es-discuss
>

# Mike Samuel (7 years ago)

Interop with systems that use 64b ints is not a .001% issue.

Interop with systems that use 64b ints is not a .001% issue.

On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-18 15:47, Michał Wadas wrote:
> > JSON supports arbitrary precision numbers that can't be properly
> > represented as 64 bit floats. This includes numbers like eg. 1e9999 or
> 1/1e9999.
>
> rfc7159:
>     Since software that implements
>     IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>     generally available and widely used, good interoperability can be
>     achieved by implementations that expect no more precision or range
>     than these provide, in the sense that implementations will
>     approximate JSON numbers within the expected precision
>
> If interoperability is not an issue you are free to do whatever you feel
> useful.
> Targeting a 0.001% customer base with standards, I gladly leave to others
> to cater for.
>
> The de-facto standard featured in any number of applications, is putting
> unusual/binary/whatever stuff in text strings.
>
> Anders
>
> >
> >
> > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
> wrote:
> >
> >     On 2018-03-18 15:08, Richard Gibson wrote:
> >>     On Sunday, March 18, 2018, Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
> wrote:
> >>
> >>         On 2018-03-16 20:24, Richard Gibson wrote:
> >>>         Though ECMAScript JSON.stringify may suffice for certain
> Javascript-centric use cases or otherwise restricted subsets thereof as
> addressed by JOSE, it is not suitable for producing canonical/hashable/etc.
> JSON, which requires a fully general solution such as [1]. Both its number
> serialization [2] and string serialization [3] specify aspects that harm
> compatibility (the former having arbitrary branches dependent upon the
> value of numbers, the latter being capable of producing invalid UTF-8 octet
> sequences that represent unpaired surrogate code points—unacceptable for
> exchange outside of a closed ecosystem [4]). JSON is a general
> /language-agnostic/interchange format, and ECMAScript JSON.stringify is
> *not*a JSON canonicalization solution.
> >>>
> >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
> >>>         [2]:
> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
> >>>         [3]:
> http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
> >>
> >>         Richard, I may be wrong but AFAICT, our respective
> canoncalization schemes are in fact principally IDENTICAL.
> >>
> >>
> >>     In that they have the same goal, yes. In that they both achieve
> that goal, no. I'm not married to choices like exponential notation and
> uppercase escapes, but a JSON canonicalization scheme MUST cover all of
> JSON.
> >
> >     Here it gets interesting...  What in JSON cannot be expressed
> through JS and JSON.stringify()?
> >
> >>         That the number serialization provided by JSON.stringify() is
> unacceptable, is not generally taken as a fact.  I also think it looks a
> bit weird, but that's just a matter of esthetics.  Compatibility is an
> entirely different issue.
> >>
> >>
> >>     I concede this point. The modified algorithm is sufficient, but
> note that a canonicalization scheme will remain static even if ECMAScript
> changes.
> >
> >     Agreed.
> >
> >>
> >>         Sorting on Unicode Code Points is of course "technically 100%
> right" but strictly put not necessary.
> >>
> >>
> >>     Certain scenarios call for different systems to _independently_
> generate equivalent data structures, and it is a necessary property of
> canonical serialization that it yields identical results for equivalent
> data structures. JSON does not specify significance of object member
> ordering, so member ordering does not distinguish otherwise equivalent
> objects, so canonicalization MUST specify member ordering that is
> deterministic with respect to all valid data.
> >
> >     Violently agree but do not understand (I guess I'm just dumb...) why
> (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same
> goal (although the result would differ).
> >
> >>
> >>         Your claim about uppercase Unicode escapes is incorrect, there
> is no such requirement:
> >>
> >>         https://tools.ietf.org/html/rfc8259#section-7
> >>
> >>     I don't recall ever making a claim about uppercase Unicode escapes,
> other than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
> >
> >     You're right, I found it it in the
> https://gibson042.github.io/canonicaljson-spec/#changelog
> >
> >     Thanx,
> >     Anders
> >
> >     _______________________________________________
> >     es-discuss mailing list
> >     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
> >     https://mail.mozilla.org/listinfo/es-discuss
> >
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/8f04b07e/attachment-0001.html>

# Richard Gibson (7 years ago)

On Sun, Mar 18, 2018 at 10:29 AM, Mike Samuel <mikesamuel at gmail.com> wrote:

Does this mean that the language below would need to be fixed at a specific version of Unicode or that we would need to cite a specific version for canonicalization but might allow a higher version for String.prototype.normalize and in future versions of the spec require it?

www.ecma-international.org/ecma-262/6.0/#sec-conformance """ A conforming implementation of ECMAScript must interpret source text input in conformance with the Unicode Standard, Version 5.1.0 or later """

and in ECMA 404 www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf

""" For undated references, the latest edition of the referenced document (including any amendments) applies. ISO/IEC 10646, Information Technology – Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode Standard www.unicode.org/versions/latest. """

I can't see why either would have to change. JSON canonicalization should produce a JSON text in UTF-8, using JSON escape sequences only for double quote, backslash, and ASCII control characters U+0000 through U+001F (which are not valid in JSON strings) and unpaired surrogates U+D800 through U+DFFF (which are not conforming UTF-8). The algorithm doesn't need to know whether any given code point has a UCS assignment.

Code points include orphaned surrogates in a way that scalar values do not,

right? So both "\uD800" and "\uD800\uDC00" are single codepoints. It seems like a strict prefix of a string should still sort before that string but prefix transitivity in general does not hold: "\uFFFF" < "\uD800\uDC00" && "\uFFFF" > "\uD800". That shouldn't cause problems for hashability but I thought I'd raise it just in case.

IMO, "\uD800\uDC00" should never be emitted because a proper canonicalization would be "𐀀" (character sequence U+0022 QUOTATION MARK, U+10000 LINEAR B SYLLABLE B008 A, U+0022 QUOTATION MARK; octet sequence 0x22, 0xF0, 0x90, 0x80, 0x80, 0x22).

As for sorting, using the represented code points makes sense to me, but is not the only option (e.g., another option is using the literal characters of the JSON text such that "Z" < """ < "\" < "\u0000" < "\u001F" < "\uD800" < "\uDC00" < "^" < "x" < "ä" < "가" < "Ａ" < "🔥" < "🙃"). Any specification of a total deterministic ordering would suffice, it's just that some are less intuitive than others.

On Sun, Mar 18, 2018 at 10:30 AM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-18 15:08, Richard Gibson wrote:

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

Here it gets interesting... What in JSON cannot be expressed through JS and JSON.stringify()?

JSON can express arbitrary numbers, but ECMAScript JSON.stringify is limited to those with an exact IEEE 754 binary64 representation.

And probably more importantly (though not a gap with respect to JSON specifically), it emits octet sequences that don't conform to UTF-8 when serializing unpaired surrogates.

Certain scenarios call for different systems to independently generate

equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Any specification of a total deterministic ordering would suffice. Relying upon 16-bit code units would impose a greater burden on systems that do not use such representations internally, but is not fundamentally broken.

On Sun, Mar 18, 2018 at 10:29 AM, Mike Samuel <mikesamuel at gmail.com> wrote:

> Does this mean that the language below would need to be fixed at a
> specific version of Unicode or that we would need to cite a specific
> version for
> canonicalization but might allow a higher version for String.prototype.normalize
> and in future versions of the spec require it?
>
> http://www.ecma-international.org/ecma-262/6.0/#sec-conformance
> """
> A conforming implementation of ECMAScript must interpret source text input
> in conformance with the Unicode Standard, Version 5.1.0 or later
> """
>
> and in ECMA 404
> <http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf>
>
> """
> For undated references, the latest edition of the referenced document
> (including any amendments) applies. ISO/IEC 10646, Information Technology –
> Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode
> Standard http://www.unicode.org/versions/latest.
> """
>

I can't see why either would have to change. JSON canonicalization should
produce a JSON text in UTF-8, using JSON escape sequences only for double
quote, backslash, and ASCII control characters U+0000 through U+001F (which
are not valid in JSON strings) and unpaired surrogates U+D800 through
U+DFFF (which are not conforming UTF-8). The algorithm doesn't need to know
whether any given code point has a UCS assignment.

Code points include orphaned surrogates in a way that scalar values do not,
> right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
> It seems like a strict prefix of a string should still sort before that
> string but prefix transitivity in general does not hold: "\uFFFF" <
> "\uD800\uDC00" && "\uFFFF" > "\uD800".
> That shouldn't cause problems for hashability but I thought I'd raise it
> just in case.
>

IMO, "\uD800\uDC00" should never be emitted because a proper
canonicalization would be "𐀀" (character sequence U+0022 QUOTATION MARK,
U+10000 LINEAR B SYLLABLE B008 A, U+0022 QUOTATION MARK; octet sequence
0x22, 0xF0, 0x90, 0x80, 0x80, 0x22).

As for sorting, using the represented code points makes sense to me, but is
not the only option (e.g., another option is using the literal characters
of the JSON text such that "Z" < "\"" < "\\" < "\u0000" < "\u001F" <
"\uD800" < "\uDC00" < "^" < "x" < "ä" < "가" < "Ａ" < "🔥" < "🙃"). Any
specification of a total deterministic ordering would suffice, it's just
that some are less intuitive than others.

On Sun, Mar 18, 2018 at 10:30 AM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-18 15:08, Richard Gibson wrote:
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
> Here it gets interesting...  What in JSON cannot be expressed through JS
> and JSON.stringify()?
>

JSON can express arbitrary numbers, but ECMAScript JSON.stringify is
limited to those with an exact IEEE 754 binary64 representation.

And probably more importantly (though not a gap with respect to JSON
specifically), it emits octet sequences that don't conform to UTF-8 when
serializing unpaired surrogates.

Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>
>
> Violently agree but do not understand (I guess I'm just dumb...) why (for
> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
> (although the result would differ).
>

Any specification of a total deterministic ordering would suffice. Relying
upon 16-bit code units would impose a greater burden on systems that do not
use such representations internally, but is not fundamentally broken.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/a53cb85d/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 16:47, Mike Samuel wrote:

Interop with systems that use 64b ints is not a .001% issue.

Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.

This discussion (at least from my point of view), is about creating stuff that fits into standards.

Anders

On 2018-03-18 16:47, Mike Samuel wrote:
> Interop with systems that use 64b ints is not a .001% issue.

Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.

This discussion (at least from my point of view), is about creating stuff that fits into standards.

Anders

> 
> On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-18 15:47, Michał Wadas wrote:
>      > JSON supports arbitrary precision numbers that can't be properly
>      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
> 
>     rfc7159:
>          Since software that implements
>          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>          generally available and widely used, good interoperability can be
>          achieved by implementations that expect no more precision or range
>          than these provide, in the sense that implementations will
>          approximate JSON numbers within the expected precision
> 
>     If interoperability is not an issue you are free to do whatever you feel useful.
>     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
> 
>     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
> 
>     Anders
> 
>      >
>      >
>      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
>      >
>      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >>     On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
>      >>
>      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >>>
>      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >>
>      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >>
>      >>
>      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >
>      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >
>      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >>
>      >>
>      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >
>      >     Agreed.
>      >
>      >>
>      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >>
>      >>
>      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >
>      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >
>      >>
>      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >>
>      >> https://tools.ietf.org/html/rfc8259#section-7
>      >>
>      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >
>      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >
>      >     Thanx,
>      >     Anders
>      >
>      >     _______________________________________________
>      >     es-discuss mailing list
>      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>
>      > https://mail.mozilla.org/listinfo/es-discuss
>      >
> 
>     _______________________________________________
>     es-discuss mailing list
>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>     https://mail.mozilla.org/listinfo/es-discuss
>

# Mike Samuel (7 years ago)

A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.

A definition of canonical that is not tied to JavaScript's current range of
values would fit into more standards than the proposal as it stands.

On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-18 16:47, Mike Samuel wrote:
> > Interop with systems that use 64b ints is not a .001% issue.
>
> Certainly not but using "Number" for dealing with such data would never be
> considered by for example the IETF.
>
> This discussion (at least from my point of view), is about creating stuff
> that fits into standards.
>
> Anders
>
> >
> > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
> wrote:
> >
> >     On 2018-03-18 15:47, Michał Wadas wrote:
> >      > JSON supports arbitrary precision numbers that can't be properly
> >      > represented as 64 bit floats. This includes numbers like eg.
> 1e9999 or 1/1e9999.
> >
> >     rfc7159:
> >          Since software that implements
> >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
> >          generally available and widely used, good interoperability can
> be
> >          achieved by implementations that expect no more precision or
> range
> >          than these provide, in the sense that implementations will
> >          approximate JSON numbers within the expected precision
> >
> >     If interoperability is not an issue you are free to do whatever you
> feel useful.
> >     Targeting a 0.001% customer base with standards, I gladly leave to
> others to cater for.
> >
> >     The de-facto standard featured in any number of applications, is
> putting unusual/binary/whatever stuff in text strings.
> >
> >     Anders
> >
> >      >
> >      >
> >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>
> <mailto:anders.rundgren.net at gmail.com <mailto:
> anders.rundgren.net at gmail.com>>> wrote:
> >      >
> >      >     On 2018-03-18 15:08, Richard Gibson wrote:
> >      >>     On Sunday, March 18, 2018, Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>
> <mailto:anders.rundgren.net at gmail.com <mailto:
> anders.rundgren.net at gmail.com>>> wrote:
> >      >>
> >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
> >      >>>         Though ECMAScript JSON.stringify may suffice for
> certain Javascript-centric use cases or otherwise restricted subsets
> thereof as addressed by JOSE, it is not suitable for producing
> canonical/hashable/etc. JSON, which requires a fully general solution such
> as [1]. Both its number serialization [2] and string serialization [3]
> specify aspects that harm compatibility (the former having arbitrary
> branches dependent upon the value of numbers, the latter being capable of
> producing invalid UTF-8 octet sequences that represent unpaired surrogate
> code points—unacceptable for exchange outside of a closed ecosystem [4]).
> JSON is a general /language-agnostic/interchange format, and ECMAScript
> JSON.stringify is *not*a JSON canonicalization solution.
> >      >>>
> >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
> >      >>>         [2]:
> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
> >      >>>         [3]:
> http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
> >      >>
> >      >>         Richard, I may be wrong but AFAICT, our respective
> canoncalization schemes are in fact principally IDENTICAL.
> >      >>
> >      >>
> >      >>     In that they have the same goal, yes. In that they both
> achieve that goal, no. I'm not married to choices like exponential notation
> and uppercase escapes, but a JSON canonicalization scheme MUST cover all of
> JSON.
> >      >
> >      >     Here it gets interesting...  What in JSON cannot be expressed
> through JS and JSON.stringify()?
> >      >
> >      >>         That the number serialization provided by
> JSON.stringify() is unacceptable, is not generally taken as a fact.  I also
> think it looks a bit weird, but that's just a matter of esthetics.
> Compatibility is an entirely different issue.
> >      >>
> >      >>
> >      >>     I concede this point. The modified algorithm is sufficient,
> but note that a canonicalization scheme will remain static even if
> ECMAScript changes.
> >      >
> >      >     Agreed.
> >      >
> >      >>
> >      >>         Sorting on Unicode Code Points is of course "technically
> 100% right" but strictly put not necessary.
> >      >>
> >      >>
> >      >>     Certain scenarios call for different systems to
> _independently_ generate equivalent data structures, and it is a necessary
> property of canonical serialization that it yields identical results for
> equivalent data structures. JSON does not specify significance of object
> member ordering, so member ordering does not distinguish otherwise
> equivalent objects, so canonicalization MUST specify member ordering that
> is deterministic with respect to all valid data.
> >      >
> >      >     Violently agree but do not understand (I guess I'm just
> dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not
> achieve the same goal (although the result would differ).
> >      >
> >      >>
> >      >>         Your claim about uppercase Unicode escapes is incorrect,
> there is no such requirement:
> >      >>
> >      >> https://tools.ietf.org/html/rfc8259#section-7
> >      >>
> >      >>     I don't recall ever making a claim about uppercase Unicode
> escapes, other than observing that it is the preferred form for examples in
> the JSON RFCs... what are you talking about?
> >      >
> >      >     You're right, I found it it in the
> https://gibson042.github.io/canonicaljson-spec/#changelog
> >      >
> >      >     Thanx,
> >      >     Anders
> >      >
> >      >     _______________________________________________
> >      >     es-discuss mailing list
> >      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:
> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>
> >      > https://mail.mozilla.org/listinfo/es-discuss
> >      >
> >
> >     _______________________________________________
> >     es-discuss mailing list
> >     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
> >     https://mail.mozilla.org/listinfo/es-discuss
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/cc578ce1/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 18:40, Mike Samuel wrote:

A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.

Feel free submitting an Internet-Draft which addresses a more generic Number handling. My guess is that it would be rejected due to [quite valid] interoperability concerns.

It would probably fall in the same category as "Fixing JSON" which has not happened either. www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON

Anders

On 2018-03-18 18:40, Mike Samuel wrote:
> A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.
Feel free submitting an Internet-Draft which addresses a more generic Number handling.
My guess is that it would be rejected due to [quite valid] interoperability concerns.

It would probably fall in the same category as "Fixing JSON" which has not happened either.
https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON

Anders

> 
> On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-18 16:47, Mike Samuel wrote:
>      > Interop with systems that use 64b ints is not a .001% issue.
> 
>     Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.
> 
>     This discussion (at least from my point of view), is about creating stuff that fits into standards.
> 
>     Anders
> 
>      >
>      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
>      >
>      >     On 2018-03-18 15:47, Michał Wadas wrote:
>      >      > JSON supports arbitrary precision numbers that can't be properly
>      >      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>      >
>      >     rfc7159:
>      >          Since software that implements
>      >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>      >          generally available and widely used, good interoperability can be
>      >          achieved by implementations that expect no more precision or range
>      >          than these provide, in the sense that implementations will
>      >          approximate JSON numbers within the expected precision
>      >
>      >     If interoperability is not an issue you are free to do whatever you feel useful.
>      >     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>      >
>      >     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>      >
>      >     Anders
>      >
>      >      >
>      >      >
>      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>> wrote:
>      >      >
>      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >      >>     On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>> wrote:
>      >      >>
>      >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >      >>>
>      >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >      >>
>      >      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >      >>
>      >      >>
>      >      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >      >
>      >      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >      >
>      >      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >      >>
>      >      >>
>      >      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >      >
>      >      >     Agreed.
>      >      >
>      >      >>
>      >      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >      >>
>      >      >>
>      >      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >      >
>      >      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >      >
>      >      >>
>      >      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >      >>
>      >      >> https://tools.ietf.org/html/rfc8259#section-7
>      >      >>
>      >      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >      >
>      >      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >      >
>      >      >     Thanx,
>      >      >     Anders
>      >      >
>      >      >     _______________________________________________
>      >      >     es-discuss mailing list
>      >      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>>
>      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >
>      >
>      >     _______________________________________________
>      >     es-discuss mailing list
>      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>
>      > https://mail.mozilla.org/listinfo/es-discuss
>      >
>

# Mike Samuel (7 years ago)

I think you misunderstood the criticism. JSON does not have numeric precision limits. There are plenty of systems that use JSON that never involve JavaScript and which pack int64s.

I think you misunderstood the criticism.  JSON does not have numeric
precision limits.  There are plenty of systems that use JSON that never
involve JavaScript and which pack int64s.

On Sun, Mar 18, 2018, 1:55 PM Anders Rundgren <anders.rundgren.net at gmail.com>
wrote:

> On 2018-03-18 18:40, Mike Samuel wrote:
> > A definition of canonical that is not tied to JavaScript's current range
> of values would fit into more standards than the proposal as it stands.
> Feel free submitting an Internet-Draft which addresses a more generic
> Number handling.
> My guess is that it would be rejected due to [quite valid]
> interoperability concerns.
>
> It would probably fall in the same category as "Fixing JSON" which has not
> happened either.
> https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON
>
> Anders
>
> >
> > On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
> wrote:
> >
> >     On 2018-03-18 16:47, Mike Samuel wrote:
> >      > Interop with systems that use 64b ints is not a .001% issue.
> >
> >     Certainly not but using "Number" for dealing with such data would
> never be considered by for example the IETF.
> >
> >     This discussion (at least from my point of view), is about creating
> stuff that fits into standards.
> >
> >     Anders
> >
> >      >
> >      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>
> <mailto:anders.rundgren.net at gmail.com <mailto:
> anders.rundgren.net at gmail.com>>> wrote:
> >      >
> >      >     On 2018-03-18 15:47, Michał Wadas wrote:
> >      >      > JSON supports arbitrary precision numbers that can't be
> properly
> >      >      > represented as 64 bit floats. This includes numbers like
> eg. 1e9999 or 1/1e9999.
> >      >
> >      >     rfc7159:
> >      >          Since software that implements
> >      >          IEEE 754-2008 binary64 (double precision) numbers
> [IEEE754] is
> >      >          generally available and widely used, good
> interoperability can be
> >      >          achieved by implementations that expect no more
> precision or range
> >      >          than these provide, in the sense that implementations
> will
> >      >          approximate JSON numbers within the expected precision
> >      >
> >      >     If interoperability is not an issue you are free to do
> whatever you feel useful.
> >      >     Targeting a 0.001% customer base with standards, I gladly
> leave to others to cater for.
> >      >
> >      >     The de-facto standard featured in any number of applications,
> is putting unusual/binary/whatever stuff in text strings.
> >      >
> >      >     Anders
> >      >
> >      >      >
> >      >      >
> >      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>
> <mailto:anders.rundgren.net at gmail.com <mailto:
> anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com
> <mailto:anders.rundgren.net at gmail.com> <mailto:
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>>
> wrote:
> >      >      >
> >      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
> >      >      >>     On Sunday, March 18, 2018, Anders Rundgren <
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>
> <mailto:anders.rundgren.net at gmail.com <mailto:
> anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com
> <mailto:anders.rundgren.net at gmail.com> <mailto:
> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>>
> wrote:
> >      >      >>
> >      >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
> >      >      >>>         Though ECMAScript JSON.stringify may suffice for
> certain Javascript-centric use cases or otherwise restricted subsets
> thereof as addressed by JOSE, it is not suitable for producing
> canonical/hashable/etc. JSON, which requires a fully general solution such
> as [1]. Both its number serialization [2] and string serialization [3]
> specify aspects that harm compatibility (the former having arbitrary
> branches dependent upon the value of numbers, the latter being capable of
> producing invalid UTF-8 octet sequences that represent unpaired surrogate
> code points—unacceptable for exchange outside of a closed ecosystem [4]).
> JSON is a general /language-agnostic/interchange format, and ECMAScript
> JSON.stringify is *not*a JSON canonicalization solution.
> >      >      >>>
> >      >      >>>         [1]: _
> http://gibson042.github.io/canonicaljson-spec/_
> >      >      >>>         [2]:
> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
> >      >      >>>         [3]:
> http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> >      >      >>>         [4]:
> https://tools.ietf.org/html/rfc8259#section-8.1
> >      >      >>
> >      >      >>         Richard, I may be wrong but AFAICT, our
> respective canoncalization schemes are in fact principally IDENTICAL.
> >      >      >>
> >      >      >>
> >      >      >>     In that they have the same goal, yes. In that they
> both achieve that goal, no. I'm not married to choices like exponential
> notation and uppercase escapes, but a JSON canonicalization scheme MUST
> cover all of JSON.
> >      >      >
> >      >      >     Here it gets interesting...  What in JSON cannot be
> expressed through JS and JSON.stringify()?
> >      >      >
> >      >      >>         That the number serialization provided by
> JSON.stringify() is unacceptable, is not generally taken as a fact.  I also
> think it looks a bit weird, but that's just a matter of esthetics.
> Compatibility is an entirely different issue.
> >      >      >>
> >      >      >>
> >      >      >>     I concede this point. The modified algorithm is
> sufficient, but note that a canonicalization scheme will remain static even
> if ECMAScript changes.
> >      >      >
> >      >      >     Agreed.
> >      >      >
> >      >      >>
> >      >      >>         Sorting on Unicode Code Points is of course
> "technically 100% right" but strictly put not necessary.
> >      >      >>
> >      >      >>
> >      >      >>     Certain scenarios call for different systems to
> _independently_ generate equivalent data structures, and it is a necessary
> property of canonical serialization that it yields identical results for
> equivalent data structures. JSON does not specify significance of object
> member ordering, so member ordering does not distinguish otherwise
> equivalent objects, so canonicalization MUST specify member ordering that
> is deterministic with respect to all valid data.
> >      >      >
> >      >      >     Violently agree but do not understand (I guess I'm
> just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not
> achieve the same goal (although the result would differ).
> >      >      >
> >      >      >>
> >      >      >>         Your claim about uppercase Unicode escapes is
> incorrect, there is no such requirement:
> >      >      >>
> >      >      >> https://tools.ietf.org/html/rfc8259#section-7
> >      >      >>
> >      >      >>     I don't recall ever making a claim about uppercase
> Unicode escapes, other than observing that it is the preferred form for
> examples in the JSON RFCs... what are you talking about?
> >      >      >
> >      >      >     You're right, I found it it in the
> https://gibson042.github.io/canonicaljson-spec/#changelog
> >      >      >
> >      >      >     Thanx,
> >      >      >     Anders
> >      >      >
> >      >      >     _______________________________________________
> >      >      >     es-discuss mailing list
> >      >      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>> <mailto:
> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:
> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>>
> >      >      > https://mail.mozilla.org/listinfo/es-discuss
> >      >      >
> >      >
> >      >     _______________________________________________
> >      >     es-discuss mailing list
> >      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:
> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>
> >      > https://mail.mozilla.org/listinfo/es-discuss
> >      >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/fcbdd8bc/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 19:04, Mike Samuel wrote:

I think you misunderstood the criticism. JSON does not have numeric precision limits.

I think I understood that, yes.

There are plenty of systems that use JSON that never involve JavaScript and which pack int64s.

Sure, but if these systems use the "Number" type they belong to a proprietary world where disregarding recommendations and best practices is OK.

BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?

Anders

On 2018-03-18 19:04, Mike Samuel wrote:
> I think you misunderstood the criticism.  JSON does not have numeric 
> precision limits.  

I think I understood that, yes.

> There are plenty of systems that use JSON that never
> involve JavaScript and which pack int64s.

Sure, but if these systems use the "Number" type they belong to a proprietary world where disregarding recommendations and best practices is OK.

BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?

Anders

> 
> On Sun, Mar 18, 2018, 1:55 PM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-18 18:40, Mike Samuel wrote:
>      > A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.
>     Feel free submitting an Internet-Draft which addresses a more generic Number handling.
>     My guess is that it would be rejected due to [quite valid] interoperability concerns.
> 
>     It would probably fall in the same category as "Fixing JSON" which has not happened either.
>     https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON
> 
>     Anders
> 
>      >
>      > On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
>      >
>      >     On 2018-03-18 16:47, Mike Samuel wrote:
>      >      > Interop with systems that use 64b ints is not a .001% issue.
>      >
>      >     Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.
>      >
>      >     This discussion (at least from my point of view), is about creating stuff that fits into standards.
>      >
>      >     Anders
>      >
>      >      >
>      >      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>> wrote:
>      >      >
>      >      >     On 2018-03-18 15:47, Michał Wadas wrote:
>      >      >      > JSON supports arbitrary precision numbers that can't be properly
>      >      >      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>      >      >
>      >      >     rfc7159:
>      >      >          Since software that implements
>      >      >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>      >      >          generally available and widely used, good interoperability can be
>      >      >          achieved by implementations that expect no more precision or range
>      >      >          than these provide, in the sense that implementations will
>      >      >          approximate JSON numbers within the expected precision
>      >      >
>      >      >     If interoperability is not an issue you are free to do whatever you feel useful.
>      >      >     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>      >      >
>      >      >     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>      >      >
>      >      >     Anders
>      >      >
>      >      >      >
>      >      >      >
>      >      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>>> wrote:
>      >      >      >
>      >      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >      >      >>     On Sunday, March 18, 2018, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>>>> wrote:
>      >      >      >>
>      >      >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >      >      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >      >      >>>
>      >      >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >      >      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >      >      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >      >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >      >      >>
>      >      >      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >      >      >>
>      >      >      >>
>      >      >      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >      >      >
>      >      >      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >      >      >
>      >      >      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >      >      >>
>      >      >      >>
>      >      >      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >      >      >
>      >      >      >     Agreed.
>      >      >      >
>      >      >      >>
>      >      >      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >      >      >>
>      >      >      >>
>      >      >      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >      >      >
>      >      >      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >      >      >
>      >      >      >>
>      >      >      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >      >      >>
>      >      >      >> https://tools.ietf.org/html/rfc8259#section-7
>      >      >      >>
>      >      >      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >      >      >
>      >      >      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >      >      >
>      >      >      >     Thanx,
>      >      >      >     Anders
>      >      >      >
>      >      >      >     _______________________________________________
>      >      >      >     es-discuss mailing list
>      >      >      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>>>
>      >      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >      >
>      >      >
>      >      >     _______________________________________________
>      >      >     es-discuss mailing list
>      >      > es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>>
>      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >
>      >
>

# Mike Samuel (7 years ago)

On Sun, Mar 18, 2018 at 2:18 PM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-18 19:04, Mike Samuel wrote:

I think you misunderstood the criticism. JSON does not have numeric precision limits.

I think I understood that, yes.

There are plenty of systems that use JSON that never

involve JavaScript and which pack int64s.

Sure, but if these systems use the "Number" type they belong to a proprietary world where disregarding recommendations and best practices is OK.

No. They are simply not following a SHOULD recommendation. I think you have a variance mismatch in your argument.

BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?

Let's review.

You asserted "This discussion (at least from my point of view), is about creating stuff that fits into standards."

I agreed and pointed out that not tying the definition to JavaScript's current value limitations would allow it to fit into standards that do not assume those limitations.

You leveled this criticism: "My guess is that it would be rejected due to [quite valid] interoperability concerns." Implicit in that is when one standard specifies that an input MUST have a property that conflicts with an output that a conforming implementation MAY or SHOULD produce then you have an interoperability concern.

But, you are trying to argue that your proposal is more interoperable because it works for fewer inputs in fewer contexts and, if it were ported to other languages, would reject JSON that is parseable without loss of precision in those languages. How you can say with a straight face that being non-runtime-agnostic makes a proposal more interoperable is beyond me.

Here's where variance comes in. MUST on output makes a standard more interoperable. MAY on input makes a standard more interoperable.

SHOULD and SHOULD NOT do not justify denying service. They are guidelines that should be followed absent a compelling reason -- specific rules trumps the general.

Your proposal is less interoperable because you are quoting a SHOULD, interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double without loss of precision.

This makes it strictly less interoperable than a proposal that does not have that constraint.

EmcaScript SHOULD encourage interoperability since it is often a glue language.

At the risk of getting meta-, TC39 SHOULD prefer library functions that provide service for arbitrary inputs in their range. TC39 SHOULD prefer library functions that MUST NOT, by virtue of their semantics, lose precision silently.

Your proposal fails to be more interoperable inasmuch as it reproduces JSON.stringify(JSON.parse('1e1000')) === 'null'

There is simply no need to convert a JSON string to JavaScript values in order to hash it. There is simply no need to specify this in terms of JavaScript values when a runtime agnostic implementation that takes a string and produces a string provides the same value.

This is all getting very tedious though. I and others have been trying to move towards consensus on what a hashable form of JSON should look like.

We've identified key areas including

property ordering,
number canonicalization,
string normalization,
whether the input should be a JS value or a string of JSON,
and others

but, as in this case, you seem to be arguing both sides of a position to support your proposal when you could just say "yes, the proposal could be adjusted along this dimension and still provide what's required."

If you plan on putting a proposal before TC39 are you willing to move on any of these. or are you asking for a YES/NO vote on a proposal that is largely the same as what you've presented?

If the former, then acknowledge that there is a range of options and collect feedback instead of sticking to "the presently drafted one is good enough." If the latter, then I vote NO because I think the proposal in its current form is a poor solution to the problem.

That's not to say that you've done bad work. Most non-incremental stage 0 proposals are poor, and the process is designed to integrate the ideas of people in different specialties to turn poor solutions to interesting problems into robust solutions to a wider range of problems than originally envisioned.

On Sun, Mar 18, 2018 at 2:18 PM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-18 19:04, Mike Samuel wrote:
>
>> I think you misunderstood the criticism.  JSON does not have numeric
>> precision limits.
>>
>
> I think I understood that, yes.
>
> There are plenty of systems that use JSON that never
>> involve JavaScript and which pack int64s.
>>
>
> Sure, but if these systems use the "Number" type they belong to a
> proprietary world where disregarding recommendations and best practices is
> OK.
>

No.  They are simply not following a SHOULD recommendation.
I think you have a variance mismatch in your argument.

> BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?
>

Let's review.

You asserted "This discussion (at least from my point of view), is about
creating stuff that fits into standards."

I agreed and pointed out that not tying the definition to JavaScript's
current value limitations would allow it to fit into
standards that do not assume those limitations.

You leveled this criticism: "My guess is that it would be rejected due to
[quite valid] interoperability concerns."
Implicit in that is when one standard specifies that an input MUST have a
property that conflicts with
an output that a conforming implementation MAY or SHOULD produce then you
have an interoperability concern.

But, you are trying to argue that your proposal is more interoperable
because it works for fewer inputs in fewer contexts
and, if it were ported to other languages, would reject JSON that is
parseable without loss of precision in those languages.
How you can say with a straight face that being non-runtime-agnostic makes
a proposal more interoperable is beyond me.

Here's where variance comes in.
MUST on *output* makes a standard more interoperable.
MAY on *input* makes a standard more interoperable.

SHOULD and SHOULD NOT do not justify denying service.
They are guidelines that should be followed absent a compelling reason --
specific rules trumps the general.

Your proposal is less interoperable because you are quoting a SHOULD,
interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double
without loss of precision.

This makes it strictly less interoperable than a proposal that does not
have that constraint.

EmcaScript SHOULD encourage interoperability since it is often a glue
language.

At the risk of getting meta-,
TC39 SHOULD prefer library functions that provide service for arbitrary
inputs in their range.
TC39 SHOULD prefer library functions that MUST NOT, by virtue of their
semantics,
lose precision silently.

Your proposal fails to be more interoperable inasmuch as it reproduces
    JSON.stringify(JSON.parse('1e1000')) === 'null'

There is simply no need to convert a JSON string to JavaScript values in
order to hash it.
There is simply no need to specify this in terms of JavaScript values when
a runtime
agnostic implementation that takes a string and produces a string provides
the same value.

This is all getting very tedious though.
I and others have been trying to move towards consensus on what a hashable
form of
JSON should look like.

We've identified key areas including
* property ordering,
* number canonicalization,
* string normalization,
* whether the input should be a JS value or a string of JSON,
* and others

but, as in this case, you seem to be arguing both sides of a position to
support your
proposal when you could just say "yes, the proposal could be adjusted along
this
dimension and still provide what's required."

If you plan on putting a proposal before TC39 are you willing to move on
any of these.
or are you asking for a YES/NO vote on a proposal that is largely the same
as what
you've presented?

If the former, then acknowledge that there is a range of options and
collect feedback
instead of sticking to "the presently drafted one is good enough."
If the latter, then I vote NO because I think the proposal in its current
form is a poor
solution to the problem.

That's not to say that you've done bad work.
Most non-incremental stage 0 proposals are poor, and the process is
designed to
integrate the ideas of people in different specialties to turn poor
solutions to interesting
problems into robust solutions to a wider range of problems than originally
envisioned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/bcec5ff5/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 20:15, Mike Samuel wrote:

I and others have been trying to move towards consensus on what a hashable form of JSON should look like.

We've identified key areas including

property ordering,

number canonicalization,

string normalization,

whether the input should be a JS value or a string of JSON,

and others

but, as in this case, you seem to be arguing both sides of a position to support your proposal when you could just say "yes, the proposal could be adjusted along this dimension and still provide what's required."

For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all. I'm not backing from that position because then things get way more complex and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.

"Number" is indeed mindless crap but it is what is.

OTOH, the "Number" problem was effectively solved some 10 years ago through putting stuff in "strings". Using JSON Schema or "Old School" strongly typed programmatic solutions of the kind I use, this actually works great.

Anders

*] The RFC gives you the right to do that but existing implementations do not.

On 2018-03-18 20:15, Mike Samuel wrote:
> I and others have been trying to move towards consensus on what a hashable form of
> JSON should look like.
> 
> We've identified key areas including
> * property ordering,
> * number canonicalization,
> * string normalization,
> * whether the input should be a JS value or a string of JSON,
> * and others
> 
> but, as in this case, you seem to be arguing both sides of a position to support your
> proposal when you could just say "yes, the proposal could be adjusted along this
> dimension and still provide what's required."

For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all.
I'm not backing from that position because then things get way more complex and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.

"Number" is indeed mindless crap but it is what is.

OTOH, the "Number" problem was effectively solved some 10 years ago through putting stuff in "strings".
Using JSON Schema or "Old School" strongly typed programmatic solutions of the kind I use, this actually works great.

Anders

*] The RFC gives you the right to do that but existing implementations do not.

# Mike Samuel (7 years ago)

On Sun, Mar 18, 2018, 4:50 PM Anders Rundgren <anders.rundgren.net at gmail.com>

wrote:

On 2018-03-18 20:15, Mike Samuel wrote:

I and others have been trying to move towards consensus on what a hashable form of JSON should look like.

We've identified key areas including

property ordering,

number canonicalization,

string normalization,

whether the input should be a JS value or a string of JSON,

and others

but, as in this case, you seem to be arguing both sides of a position to support your proposal when you could just say "yes, the proposal could be adjusted along this dimension and still provide what's required."

For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all. I'm not backing from that position because then things get way more complex and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.

Your proposal is limiting Number; my alternative is not extending Number.

"Number" is indeed mindless crap but it is what is.

On Sun, Mar 18, 2018, 4:50 PM Anders Rundgren <anders.rundgren.net at gmail.com>
wrote:

> On 2018-03-18 20:15, Mike Samuel wrote:
> > I and others have been trying to move towards consensus on what a
> hashable form of
> > JSON should look like.
> >
> > We've identified key areas including
> > * property ordering,
> > * number canonicalization,
> > * string normalization,
> > * whether the input should be a JS value or a string of JSON,
> > * and others
> >
> > but, as in this case, you seem to be arguing both sides of a position to
> support your
> > proposal when you could just say "yes, the proposal could be adjusted
> along this
> > dimension and still provide what's required."
>
> For good or for worse, my proposal is indeed about leveraging ES6's take
> on JSON including limitations, {bugs}, and all.
> I'm not backing from that position because then things get way more
> complex and probably never even happen.
>
> Extending [*] the range of "Number" is pretty much (in practical terms)
> the same thing as changing JSON itself.
>

Your proposal is limiting Number; my alternative is not extending Number.

"Number" is indeed mindless crap but it is what is.
>
> OTOH, the "Number" problem was effectively solved some 10 years ago
> through putting stuff in "strings".
> Using JSON Schema or "Old School" strongly typed programmatic solutions of
> the kind I use, this actually works great.
>
> Anders
>
> *] The RFC gives you the right to do that but existing implementations do
> not.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180318/8c9004e8/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-18 21:53, Mike Samuel wrote:

For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all.
I'm not backing from that position because then things get way more complex and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.

Your proposal is limiting Number; my alternative is not extending Number.

Quoting earlier messages from you:

"Your proposal is less interoperable because you are quoting a SHOULD, interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double without loss of precision. This makes it strictly less interoperable than a proposal that does not have that constraint"

"JSON does not have numeric precision limits. There are plenty of systems that use JSON that never involve JavaScript and which pack int64s"

Well, it took a while figuring this out. No harm done. Nobody died.

I think we can safely put this thread to rest now; you want to fix a problem that was fixed > 10Y+ back through other measures [*].

Thanx, Anders

*] Cryptography using JSON exchange integers that are 256 bit long and more Business system using JSON exchange long decimal numbers Scientific systems cramming 80-bit IEEE-754 into "Number" may exist but then we are probably talking about research projects using forked/home-grown JSON software

"Number" was never sufficient and will (IMO MUST) remain in its crippled form, at least if we stick to mainstream.

On 2018-03-18 21:53, Mike Samuel wrote:
>     For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all.
>     I'm not backing from that position because then things get way more complex and probably never even happen.
> 
>     Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.
> 
> 
> Your proposal is limiting Number; my alternative is not extending Number.


Quoting earlier messages from you:

   "Your proposal is less interoperable because you are quoting a SHOULD,
    interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double without loss of precision.
    This makes it strictly less interoperable than a proposal that does not have that constraint"

   "JSON does not have numeric precision limits.  There are plenty of systems that use JSON
    that never involve JavaScript and which pack int64s"

Well, it took a while figuring this out.  No harm done.  Nobody died.

I think we can safely put this thread to rest now; you want to fix a problem that was fixed > 10Y+ back through other measures [*].

Thanx,
Anders

*] Cryptography using JSON exchange integers that are 256 bit long and more
    Business system using JSON exchange long decimal numbers
    Scientific systems cramming 80-bit IEEE-754 into "Number" may exist but then we are probably talking about research projects using forked/home-grown JSON software

"Number" was never sufficient and will (IMO MUST) remain in its crippled form, at least if we stick to mainstream.

# Mike Samuel (7 years ago)

How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify( x, (_, x) => { if (x && typeof x === 'object' && !Array.isArray(x)) { const sorted = {} for (let key of Object.getOwnPropertyNames(x).sort()) { sorted[key] = x[key] } return sorted } return x })

The proposal says "in lexical (alphabetical) order." If "lexical order" differs from the lexicographic order that sort uses, then the above could be adjusted to pass a comparator function.

Applied to your example input,

JSON.canonicalize({ "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\"/", "other": [null, true, false], "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001] }) ===

String.raw{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]} // proposed {"escaping":"\u20ac$\u000f\nA'B"\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}

The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

""" If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c () or 0x0022 (") which MUST be serialized as \ and " respectively. """

So I think the "\u20ac" should actually be "€" and the implementation above matches your proposal.

How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify(
    x,
    (_, x) => {
      if (x && typeof x === 'object' && !Array.isArray(x)) {
        const sorted = {}
        for (let key of Object.getOwnPropertyNames(x).sort()) {
          sorted[key] = x[key]
        }
        return sorted
      }
      return x
    })

The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses, then
the above could be adjusted to pass a comparator function.

Applied to your example input,

JSON.canonicalize({
    "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
    "other":  [null, true, false],
    "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
  }) ===

String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed
{"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}

The canonicalized example from section 3.2.3 seems to conflict with the
text of 3.2.2:

"""
If the Unicode value is outside of the ASCII control character range, it
MUST be serialized "as is" unless it is equivalent to 0x005c (\) or
0x0022 (") which MUST be serialized as \\ and \" respectively.
"""

So I think the "\u20ac" should actually be "€" and the implementation above
matches your proposal.

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> Dear List,
>
> Here is a proposal that I would be very happy getting feedback on since it
> builds on ES but is not (at all) limited to ES.
>
> The request is for a complement to the ES "JSON" object called
> canonicalize() which would have identical parameters to the existing
> stringify() method.
>
> The JSON canonicalization scheme (including ES code for emulating it), is
> described in:
> https://cyberphone.github.io/doc/security/draft-rundgren-jso
> n-canonicalization-scheme.html
>
> Current workspace: https://github.com/cyberphone/json-canonicalization
>
> Thanx,
> Anders Rundgren
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180319/cf979870/attachment-0001.html>

# Anders Rundgren (7 years ago)

On 2018-03-19 14:34, Mike Samuel wrote:

How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify( x, (_, x) => { if (x && typeof x === 'object' && !Array.isArray(x)) { const sorted = {} for (let key of Object.getOwnPropertyNames(x).sort()) { sorted[key] = x[key] } return sorted } return x })

Probably not all. You are the JS guru, not me :-)

The proposal says "in lexical (alphabetical) order." If "lexical order" differs from the lexicographic order that sort uses, then the above could be adjusted to pass a comparator function.

I hope (and believe) that this is just a terminology problem.

Applied to your example input,

JSON.canonicalize({ "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\"/", "other": [null, true, false], "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001] }) === String.raw{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]} // proposed {"escaping":"\u20ac$\u000f\nA'B"\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}

The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

If you look a under the result you will find a pretty sad explanation:

     "Note: \u20ac denotes the Euro character, which not
      being ASCII, is currently not displayable in RFCs"

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates: cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md, cyberphone.github.io/doc/security/browser-json-canonicalization.html

Anders

On 2018-03-19 14:34, Mike Samuel wrote:
> How does the transform you propose differ from?
> 
> JSON.canonicalize = (x) => JSON.stringify(
>      x,
>      (_, x) => {
>        if (x && typeof x === 'object' && !Array.isArray(x)) {
>          const sorted = {}
>          for (let key of Object.getOwnPropertyNames(x).sort()) {
>            sorted[key] = x[key]
>          }
>          return sorted
>        }
>        return x
>      })

Probably not all.  You are the JS guru, not me :-)

> 
> The proposal says "in lexical (alphabetical) order."
> If "lexical order" differs from the lexicographic order that sort uses, then
> the above could be adjusted to pass a comparator function.

I hope (and believe) that this is just a terminology problem.

> Applied to your example input,
> 
> JSON.canonicalize({
>      "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>      "other":  [null, true, false],
>      "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
>    }) ===
>        String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
> // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}
> 
> 
> The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

If you look a under the result you will find a pretty sad explanation:

         "Note: \u20ac denotes the Euro character, which not
          being ASCII, is currently not displayable in RFCs"

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md
https://cyberphone.github.io/doc/security/browser-json-canonicalization.html

Anders

> 
> """
> If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized as \\ and \" respectively.
> """
> 
> So I think the "\u20ac" should actually be "€" and the implementation above matches your proposal.
> 
> 
> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     Dear List,
> 
>     Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
> 
>     The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
> 
>     The JSON canonicalization scheme (including ES code for emulating it), is described in:
>     https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>
> 
>     Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>
> 
>     Thanx,
>     Anders Rundgren
>     _______________________________________________
>     es-discuss mailing list
>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>     https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
> 
>

# Mike Samuel (7 years ago)

On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-19 14:34, Mike Samuel wrote:

How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify( x, (_, x) => { if (x && typeof x === 'object' && !Array.isArray(x)) { const sorted = {} for (let key of Object.getOwnPropertyNames(x).sort()) { sorted[key] = x[key] } return sorted } return x })

Probably not all. You are the JS guru, not me :-)

The proposal says "in lexical (alphabetical) order." If "lexical order" differs from the lexicographic order that sort uses, then the above could be adjusted to pass a comparator function.

I hope (and believe) that this is just a terminology problem.

I think you're right. www.ecma-international.org/ecma-262/6.0/#sec-sortcompare is where it's specified. After checking that no custom comparator is present:

Let xString be ToString www.ecma-international.org/ecma-262/6.0/#sec-tostring(x).
ReturnIfAbrupt www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt( xString).
Let yString be ToString www.ecma-international.org/ecma-262/6.0/#sec-tostring(y).
ReturnIfAbrupt www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt( yString).
If xString < yString, return −1.
If xString > yString, return 1.
Return +0.

(<) and (>) do not themselves bring in any locale-specific collation rules.

They bottom out on www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both px and py are Strings, then

If py is a prefix of px, return false. (A String value p is a prefix of String value q if q can be the result of concatenating p and some other String r. Note that any String is a prefix of itself, because r may be the empty String.)
If px is a prefix of py, return true.
Let k be the smallest nonnegative integer such that the code unit at index k within px is different from the code unit at index k within py. (There must be such a k, for neither String is a prefix of the other.)
Let m be the integer that is the code unit value at index k within px.
Let n be the integer that is the code unit value at index k within py.
If m < n, return true. Otherwise, return false.

Those code unit values are UTF-16 code unit values per www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons that use different code unit sizes can compute different results for the same semantic string value. Between UTF-8 and UTF-32 you should see no difference, but UTF-16 can differ from those given supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.

Applied to your example input,

JSON.canonicalize({ "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\"/", "other": [null, true, false], "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001] }) === String.raw{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[ 1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]} // proposed {"escaping":"\u20ac$\u000f\nA'B"\\"/","numbers":[1e+30,4 .5,6,0.002,1e-27],"other":[null,true,false]}

The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

If you look a under the result you will find a pretty sad explanation:
    "Note: \u20ac denotes the Euro character, which not
     being ASCII, is currently not displayable in RFCs"

Cool.

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates: cyberphone/json-canonicalization/blob/mas ter/JSON.canonicalize.md cyberphone.github.io/doc/security/browser-json-canon icalization.html

If this can be implemented in a small amount of library code, what do you need from TC39?

On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-19 14:34, Mike Samuel wrote:
>
>> How does the transform you propose differ from?
>>
>> JSON.canonicalize = (x) => JSON.stringify(
>>      x,
>>      (_, x) => {
>>        if (x && typeof x === 'object' && !Array.isArray(x)) {
>>          const sorted = {}
>>          for (let key of Object.getOwnPropertyNames(x).sort()) {
>>            sorted[key] = x[key]
>>          }
>>          return sorted
>>        }
>>        return x
>>      })
>>
>
> Probably not all.  You are the JS guru, not me :-)
>
>
>> The proposal says "in lexical (alphabetical) order."
>> If "lexical order" differs from the lexicographic order that sort uses,
>> then
>> the above could be adjusted to pass a comparator function.
>>
>
> I hope (and believe) that this is just a terminology problem.
>

I think you're right.
http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified.  After checking that no custom comparator is
present:

   1. Let *xString* be ToString
   <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(*x*).
   2. ReturnIfAbrupt
   <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(
   *xString*).
   3. Let *yString* be ToString
   <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(*y*).
   4. ReturnIfAbrupt
   <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(
   *yString*).
   5. If *xString* < *yString*, return −1.
   6. If *xString* > *yString*, return 1.
   7. Return +0.


(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on
http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both *px* and *py* are Strings, then

   1. If *py* is a prefix of *px*, return *false*. (A String value *p* is a
   prefix of String value *q* if *q* can be the result of concatenating *p* and
   some other String *r*. Note that any String is a prefix of itself,
   because *r* may be the empty String.)
   2. If *px* is a prefix of *py*, return *true*.
   3. Let *k* be the smallest nonnegative integer such that the code unit
   at index *k* within *px* is different from the code unit at index *k*
   within *py*. (There must be such a *k*, for neither String is a prefix
   of the other.)
   4. Let *m* be the integer that is the code unit value at index *k* within
    *px*.
   5. Let *n* be the integer that is the code unit value at index *k* within
    *py*.
   6. If *m* < *n*, return *true*. Otherwise, return *false*.

Those code unit values are UTF-16 code unit values per
http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string
comparisons that use different code
unit sizes can compute different results for the same semantic string
value.  Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given
supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16
strings if that's what you intend.



> Applied to your example input,
>>
>> JSON.canonicalize({
>>      "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>>      "other":  [null, true, false],
>>      "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
>>    }) ===
>>        String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[
>> 1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
>> // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4
>> .5,6,0.002,1e-27],"other":[null,true,false]}
>>
>>
>> The canonicalized example from section 3.2.3 seems to conflict with the
>> text of 3.2.2:
>>
>
> If you look a under the result you will find a pretty sad explanation:
>
>         "Note: \u20ac denotes the Euro character, which not
>          being ASCII, is currently not displayable in RFCs"
>

Cool.


> After 30 years with RFCs, we can still only use ASCII :-( :-(
>
> Updates:
> https://github.com/cyberphone/json-canonicalization/blob/mas
> ter/JSON.canonicalize.md
> https://cyberphone.github.io/doc/security/browser-json-canon
> icalization.html
>

If this can be implemented in a small amount of library code, what do you
need from TC39?



> Anders
>
>
>> """
>> If the Unicode value is outside of the ASCII control character range, it
>> MUST be serialized "as is" unless it is equivalent to 0x005c (\) or
>> 0x0022 (") which MUST be serialized as \\ and \" respectively.
>> """
>>
>> So I think the "\u20ac" should actually be "€" and the implementation
>> above matches your proposal.
>>
>>
>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>> wrote:
>>
>>     Dear List,
>>
>>     Here is a proposal that I would be very happy getting feedback on
>> since it builds on ES but is not (at all) limited to ES.
>>
>>     The request is for a complement to the ES "JSON" object called
>> canonicalize() which would have identical parameters to the existing
>> stringify() method.
>>
>>     The JSON canonicalization scheme (including ES code for emulating
>> it), is described in:
>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>> n-canonicalization-scheme.html <https://cyberphone.github.io/
>> doc/security/draft-rundgren-json-canonicalization-scheme.html>
>>
>>     Current workspace: https://github.com/cyberphone/
>> json-canonicalization <https://github.com/cyberphone
>> /json-canonicalization>
>>
>>     Thanx,
>>     Anders Rundgren
>>     _______________________________________________
>>     es-discuss mailing list
>>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>>     https://mail.mozilla.org/listinfo/es-discuss <
>> https://mail.mozilla.org/listinfo/es-discuss>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180319/d5b088d3/attachment.html>

# Anders Rundgren (7 years ago)

On 2018-03-19 15:17, Mike Samuel wrote:

On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
On 2018-03-19 14:34, Mike Samuel wrote:

    How does the transform you propose differ from?

    JSON.canonicalize = (x) => JSON.stringify(
          x,
          (_, x) => {
            if (x && typeof x === 'object' && !Array.isArray(x)) {
              const sorted = {}
              for (let key of Object.getOwnPropertyNames(x).sort()) {
                sorted[key] = x[key]
              }
              return sorted
            }
            return x
          })


Probably not all.  You are the JS guru, not me :-)


    The proposal says "in lexical (alphabetical) order."
    If "lexical order" differs from the lexicographic order that sort uses, then
    the above could be adjusted to pass a comparator function.


I hope (and believe) that this is just a terminology problem.
I think you're right. www.ecma-international.org/ecma-262/6.0/#sec-sortcompare is where it's specified. After checking that no custom comparator is present:

Let/xString/beToString www.ecma-international.org/ecma-262/6.0/#sec-tostring(/x/).

ReturnIfAbrupt www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt(/xString/).

Let/yString/beToString www.ecma-international.org/ecma-262/6.0/#sec-tostring(/y/).

ReturnIfAbrupt www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt(/yString/).

If/xString/</yString/, return −1.

If/xString/>/yString/, return 1.

Return +0.

(<) and (>) do not themselves bring in any locale-specific collation rules. They bottom out on www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both/px/and/py/are Strings, then

If/py/is a prefix of/px/, returnfalse. (A String value/p/is a prefix of String value/q/if/q/can be the result of concatenating/p/and some other String/r/. Note that any String is a prefix of itself, because/r/may be the empty String.)

If/px/is a prefix of/py/, returntrue.

Let/k/be the smallest nonnegative integer such that the code unit at index/k/within/px/is different from the code unit at index/k/within/py/. (There must be such a/k/, for neither String is a prefix of the other.)

Let/m/be the integer that is the code unit value at index/k/within/px/.

Let/n/be the integer that is the code unit value at index/k/within/py/.

If/m/</n/, returntrue. Otherwise, returnfalse.

Those code unit values are UTF-16 code unit values per www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons that use different code unit sizes can compute different results for the same semantic string value. Between UTF-8 and UTF-32 you should see no difference, but UTF-16 can differ from those given supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.

Right, it is actually already in 3.2.3:

Property strings to be sorted depend on that strings are represented as arrays of 16-bit unsigned integers where each integer holds a single UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value comparisons, independent of locale settings.

This maps "natively" to JS and Java. Probably to .NET as well. Other systems may need a specific comparator.

    Applied to your example input,

    JSON.canonicalize({
          "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
          "other":  [null, true, false],
          "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
        }) ===
            String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
    // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


    The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:


If you look a under the result you will find a pretty sad explanation:

         "Note: \u20ac denotes the Euro character, which not
          being ASCII, is currently not displayable in RFCs"

Cool.

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md <https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md>
https://cyberphone.github.io/doc/security/browser-json-canonicalization.html <https://cyberphone.github.io/doc/security/browser-json-canonicalization.html>

If this can be implemented in a small amount of library code, what do you need from TC39?

At this stage probably nothing, the BIG issue is the algorithm which I took the liberty airing in this forum. To date all efforts creating a JSON canonicalization standard has been shot down or been abandoned.

Anders

On 2018-03-19 15:17, Mike Samuel wrote:
> 
> 
> On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
> 
>     On 2018-03-19 14:34, Mike Samuel wrote:
> 
>         How does the transform you propose differ from?
> 
>         JSON.canonicalize = (x) => JSON.stringify(
>               x,
>               (_, x) => {
>                 if (x && typeof x === 'object' && !Array.isArray(x)) {
>                   const sorted = {}
>                   for (let key of Object.getOwnPropertyNames(x).sort()) {
>                     sorted[key] = x[key]
>                   }
>                   return sorted
>                 }
>                 return x
>               })
> 
> 
>     Probably not all.  You are the JS guru, not me :-)
> 
> 
>         The proposal says "in lexical (alphabetical) order."
>         If "lexical order" differs from the lexicographic order that sort uses, then
>         the above could be adjusted to pass a comparator function.
> 
> 
>     I hope (and believe) that this is just a terminology problem.
> 
> 
> I think you're right. http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
> is where it's specified.  After checking that no custom comparator is present:
> 
>  1. Let/xString/beToString <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/x/).
>  2. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/xString/).
>  3. Let/yString/beToString <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/y/).
>  4. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/yString/).
>  5. If/xString/</yString/, return −1.
>  6. If/xString/>/yString/, return 1.
>  7. Return +0.
> 
> 
> (<) and (>) do not themselves bring in any locale-specific collation rules.
> They bottom out on http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison
> 
> If both/px/and/py/are Strings, then
> 
>  1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a prefix of String value/q/if/q/can be the result of concatenating/p/and some other String/r/. Note that any String is a prefix of itself, because/r/may be the empty String.)
>  2. If/px/is a prefix of/py/, return*true*.
>  3. Let/k/be the smallest nonnegative integer such that the code unit at index/k/within/px/is different from the code unit at index/k/within/py/. (There must be such a/k/, for neither String is a prefix of the other.)
>  4. Let/m/be the integer that is the code unit value at index/k/within/px/.
>  5. Let/n/be the integer that is the code unit value at index/k/within/py/.
>  6. If/m/</n/, return*true*. Otherwise, return*false*.
> 
> Those code unit values are UTF-16 code unit values per
> http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type
> 
> each element in the String is treated as a UTF-16 code unit value
> 
> As someone mentioned earlier in this thread, lexicographic string comparisons that use different code
> unit sizes can compute different results for the same semantic string value.  Between UTF-8 and UTF-32
> you should see no difference, but UTF-16 can differ from those given supplementary codepoints.
> 
> It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.

Right, it is actually already in 3.2.3:

   Property strings to be sorted depend on that strings are represented
   as arrays of 16-bit unsigned integers where each integer holds a single
   UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
   comparisons, independent of locale settings.

This maps "natively" to JS and Java.  Probably to .NET as well.
Other systems may need a specific comparator.



> 
>         Applied to your example input,
> 
>         JSON.canonicalize({
>               "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>               "other":  [null, true, false],
>               "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
>             }) ===
>                 String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
>         // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}
> 
> 
>         The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:
> 
> 
>     If you look a under the result you will find a pretty sad explanation:
> 
>              "Note: \u20ac denotes the Euro character, which not
>               being ASCII, is currently not displayable in RFCs"
> 
> 
> Cool.
> 
>     After 30 years with RFCs, we can still only use ASCII :-( :-(
> 
>     Updates:
>     https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md <https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md>
>     https://cyberphone.github.io/doc/security/browser-json-canonicalization.html <https://cyberphone.github.io/doc/security/browser-json-canonicalization.html>
> 
> 
> If this can be implemented in a small amount of library code, what do you need from TC39?

At this stage probably nothing, the BIG issue is the algorithm which I took the liberty airing in this forum.
To date all efforts creating a JSON canonicalization standard has been shot down or been abandoned.

Anders

> 
>     Anders
> 
> 
>         """
>         If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized as \\ and \" respectively.
>         """
> 
>         So I think the "\u20ac" should actually be "€" and the implementation above matches your proposal.
> 
> 
>         On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com> <mailto:anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>> wrote:
> 
>              Dear List,
> 
>              Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
> 
>              The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
> 
>              The JSON canonicalization scheme (including ES code for emulating it), is described in:
>         https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html> <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>>
> 
>              Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization> <https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>>
> 
>              Thanx,
>              Anders Rundgren
>              _______________________________________________
>              es-discuss mailing list
>         es-discuss at mozilla.org <mailto:es-discuss at mozilla.org> <mailto:es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>>
>         https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss> <https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>>
> 
> 
> 
>

# Mike Samuel (7 years ago)

On Mon, Mar 19, 2018 at 10:30 AM, Anders Rundgren < anders.rundgren.net at gmail.com> wrote:

On 2018-03-19 15:17, Mike Samuel wrote:
On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren < anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>> wrote:
On 2018-03-19 14:34, Mike Samuel wrote:

    How does the transform you propose differ from?

    JSON.canonicalize = (x) => JSON.stringify(
          x,
          (_, x) => {
            if (x && typeof x === 'object' && !Array.isArray(x)) {
              const sorted = {}
              for (let key of Object.getOwnPropertyNames(x).sort()) {
                sorted[key] = x[key]
              }
              return sorted
            }
            return x
          })


Probably not all.  You are the JS guru, not me :-)


    The proposal says "in lexical (alphabetical) order."
    If "lexical order" differs from the lexicographic order that sort
uses, then the above could be adjusted to pass a comparator function.
I hope (and believe) that this is just a terminology problem.
I think you're right. www.ecma-international. org/ecma-262/6.0/#sec-sortcompare is where it's specified. After checking that no custom comparator is present:

Let/xString/beToString <www.ecma-international .org/ecma-262/6.0/#sec-tostring>(/x/).

ReturnIfAbrupt <www.ecma-international.org/ecma-262/6.0/#sec- returnifabrupt>(/xString/).

Let/yString/beToString <www.ecma-international .org/ecma-262/6.0/#sec-tostring>(/y/).

ReturnIfAbrupt <www.ecma-international.org/ecma-262/6.0/#sec- returnifabrupt>(/yString/).

If/xString/</yString/, return −1.

If/xString/>/yString/, return 1.

Return +0.

(<) and (>) do not themselves bring in any locale-specific collation rules. They bottom out on www.ecma-international. org/ecma-262/6.0/#sec-abstract-relational-comparison

If both/px/and/py/are Strings, then

If/py/is a prefix of/px/, returnfalse. (A String value/p/is a prefix of String value/q/if/q/can be the result of concatenating/p/and some other String/r/. Note that any String is a prefix of itself, because/r/may be the empty String.)

If/px/is a prefix of/py/, returntrue.

Let/k/be the smallest nonnegative integer such that the code unit at index/k/within/px/is different from the code unit at index/k/within/py/. (There must be such a/k/, for neither String is a prefix of the other.)

Let/m/be the integer that is the code unit value at index/k/within/px/.

Let/n/be the integer that is the code unit value at index/k/within/py/.

If/m/</n/, returntrue. Otherwise, returnfalse.

Those code unit values are UTF-16 code unit values per www.ecma-international.org/ecma-262/6.0/#sec-ecmascri pt-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons that use different code unit sizes can compute different results for the same semantic string value. Between UTF-8 and UTF-32 you should see no difference, but UTF-16 can differ from those given supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.
Right, it is actually already in 3.2.3:

My apologies. I missed that.

Property strings to be sorted depend on that strings are represented

as arrays of 16-bit unsigned integers where each integer holds a single UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value comparisons, independent of locale settings.

This maps "natively" to JS and Java. Probably to .NET as well. Other systems may need a specific comparator.

Yep. Off the top of my head: Go and Rust use UTF-8. Python3 is UTF-16, Python2 is usually UTF-16 but may be UTF-32 depending on sizeof(wchar) when compiling the interpreter. C++ as is its wont is all of them.

    Applied to your example input,

    JSON.canonicalize({
          "escaping": "\u20ac$\u000F\u000aA'\u0042\u
0022\u005c\"/", "other": [null, true, false], "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001] }) === String.raw{"escaping":"€$\u00 0f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"othe r":[null,true,false]} // proposed {"escaping":"\u20ac$\u000f\nA' B"\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[nul l,true,false]}
    The canonicalized example from section 3.2.3 seems to conflict
with the text of 3.2.2:
If you look a under the result you will find a pretty sad explanation:

         "Note: \u20ac denotes the Euro character, which not
          being ASCII, is currently not displayable in RFCs"
Cool.
After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:
https://github.com/cyberphone/json-canonicalization/blob/mas
ter/JSON.canonicalize.md <cyberphone /json-canonicalization/blob/master/JSON.canonicalize.md> cyberphone.github.io/doc/security/browser-json-canon icalization.html <cyberphone.github.io doc/security/browser-json-canonicalization.html>

If this can be implemented in a small amount of library code, what do you need from TC39?
At this stage probably nothing, the BIG issue is the algorithm which I took the liberty airing in this forum. To date all efforts creating a JSON canonicalization standard has been shot down or been abandoned.

Like I said, I think the hashing use case is worthwhile.

On Mon, Mar 19, 2018 at 10:30 AM, Anders Rundgren <
anders.rundgren.net at gmail.com> wrote:

> On 2018-03-19 15:17, Mike Samuel wrote:
>
>>
>>
>> On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <
>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>> wrote:
>>
>>     On 2018-03-19 14:34, Mike Samuel wrote:
>>
>>         How does the transform you propose differ from?
>>
>>         JSON.canonicalize = (x) => JSON.stringify(
>>               x,
>>               (_, x) => {
>>                 if (x && typeof x === 'object' && !Array.isArray(x)) {
>>                   const sorted = {}
>>                   for (let key of Object.getOwnPropertyNames(x).sort()) {
>>                     sorted[key] = x[key]
>>                   }
>>                   return sorted
>>                 }
>>                 return x
>>               })
>>
>>
>>     Probably not all.  You are the JS guru, not me :-)
>>
>>
>>         The proposal says "in lexical (alphabetical) order."
>>         If "lexical order" differs from the lexicographic order that sort
>> uses, then
>>         the above could be adjusted to pass a comparator function.
>>
>>
>>     I hope (and believe) that this is just a terminology problem.
>>
>>
>> I think you're right. http://www.ecma-international.
>> org/ecma-262/6.0/#sec-sortcompare
>> is where it's specified.  After checking that no custom comparator is
>> present:
>>
>>  1. Let/xString/beToString <http://www.ecma-international
>> .org/ecma-262/6.0/#sec-tostring>(/x/).
>>  2. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-
>> returnifabrupt>(/xString/).
>>  3. Let/yString/beToString <http://www.ecma-international
>> .org/ecma-262/6.0/#sec-tostring>(/y/).
>>  4. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-
>> returnifabrupt>(/yString/).
>>  5. If/xString/</yString/, return −1.
>>  6. If/xString/>/yString/, return 1.
>>  7. Return +0.
>>
>>
>> (<) and (>) do not themselves bring in any locale-specific collation
>> rules.
>> They bottom out on http://www.ecma-international.
>> org/ecma-262/6.0/#sec-abstract-relational-comparison
>>
>> If both/px/and/py/are Strings, then
>>
>>  1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a
>> prefix of String value/q/if/q/can be the result of concatenating/p/and some
>> other String/r/. Note that any String is a prefix of itself, because/r/may
>> be the empty String.)
>>  2. If/px/is a prefix of/py/, return*true*.
>>  3. Let/k/be the smallest nonnegative integer such that the code unit at
>> index/k/within/px/is different from the code unit at index/k/within/py/.
>> (There must be such a/k/, for neither String is a prefix of the other.)
>>  4. Let/m/be the integer that is the code unit value at
>> index/k/within/px/.
>>  5. Let/n/be the integer that is the code unit value at
>> index/k/within/py/.
>>  6. If/m/</n/, return*true*. Otherwise, return*false*.
>>
>> Those code unit values are UTF-16 code unit values per
>> http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascri
>> pt-language-types-string-type
>>
>> each element in the String is treated as a UTF-16 code unit value
>>
>> As someone mentioned earlier in this thread, lexicographic string
>> comparisons that use different code
>> unit sizes can compute different results for the same semantic string
>> value.  Between UTF-8 and UTF-32
>> you should see no difference, but UTF-16 can differ from those given
>> supplementary codepoints.
>>
>> It might be worth making explicit that your lexical order is over UTF-16
>> strings if that's what you intend.
>>
>
> Right, it is actually already in 3.2.3:
>

My apologies.  I missed that.

  Property strings to be sorted depend on that strings are represented
>   as arrays of 16-bit unsigned integers where each integer holds a single
>   UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
>   comparisons, independent of locale settings.
>
> This maps "natively" to JS and Java.  Probably to .NET as well.
> Other systems may need a specific comparator.
>

Yep.  Off the top of my head:
Go and Rust use UTF-8.
Python3 is UTF-16, Python2 is usually UTF-16 but may be UTF-32 depending on
sizeof(wchar) when compiling the interpreter.
C++ as is its wont is all of them.



>
>>         Applied to your example input,
>>
>>         JSON.canonicalize({
>>               "escaping": "\u20ac$\u000F\u000aA'\u0042\u
>> 0022\u005c\\\"\/",
>>               "other":  [null, true, false],
>>               "numbers": [1E30, 4.50, 6, 2e-3,
>> 0.000000000000000000000000001]
>>             }) ===
>>                 String.raw`{"escaping":"€$\u00
>> 0f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"othe
>> r":[null,true,false]}`
>>         // proposed {"escaping":"\u20ac$\u000f\nA'
>> B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[nul
>> l,true,false]}
>>
>>
>>         The canonicalized example from section 3.2.3 seems to conflict
>> with the text of 3.2.2:
>>
>>
>>     If you look a under the result you will find a pretty sad explanation:
>>
>>              "Note: \u20ac denotes the Euro character, which not
>>               being ASCII, is currently not displayable in RFCs"
>>
>>
>> Cool.
>>
>>     After 30 years with RFCs, we can still only use ASCII :-( :-(
>>
>>     Updates:
>>     https://github.com/cyberphone/json-canonicalization/blob/mas
>> ter/JSON.canonicalize.md <https://github.com/cyberphone
>> /json-canonicalization/blob/master/JSON.canonicalize.md>
>>     https://cyberphone.github.io/doc/security/browser-json-canon
>> icalization.html <https://cyberphone.github.io/
>> doc/security/browser-json-canonicalization.html>
>>
>>
>> If this can be implemented in a small amount of library code, what do you
>> need from TC39?
>>
>
> At this stage probably nothing, the BIG issue is the algorithm which I
> took the liberty airing in this forum.
> To date all efforts creating a JSON canonicalization standard has been
> shot down or been abandoned.
>

Like I said, I think the hashing use case is worthwhile.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180319/f948479b/attachment-0001.html>

# Michael J. Ryan (7 years ago)

JSON is utf-8 ... As far as 16 but coffee points, there are still astral character pairs. Binary data should be enclosed to avoid this, such as with base-64.

JSON is utf-8 ... As far as 16 but coffee points, there are still astral
character pairs.  Binary data should be enclosed to avoid this, such as
with base-64.

On Fri, Mar 16, 2018, 09:23 Mike Samuel <mikesamuel at gmail.com> wrote:

>
>
> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <ecmascript at cscott.net>
> wrote:
>
>> Canonical JSON is often used to imply a security property: two JSON blobs
>> with identical contents are expected to have identical canonical JSON forms
>> (and thus identical hashed values).
>>
>
> What does "identical contents" mean in the context of numbers?  JSON
> intentionally avoids specifying any precision for numbers.
>
> JSON.stringify(1/3) === '0.3333333333333333'
>
> What would happen with JSON from systems that allow higher precision?
> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>
>
>
>
>
>> However, unicode normalization allows multiple representations of "the
>> same" string, which defeats this security property.  Depending on your
>> implementation language
>>
>
> We shouldn't normalize unicode in strings that contain packed binary
> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
> values and any system that assumes the latter will break often.
>
>
>> and use, a string with precomposed accepts could compare equal to a
>> string with separated accents, even though the canonical JSON or hash
>> differed.  In an extreme case (with a weak hash function, say MD5), this
>> can be used to break security by re-encoding all strings in multiple
>> variants until a collision is found.  This is just a slight variant on the
>> fact that JSON allows multiple ways to encode a character using escape
>> sequences.  You've already taken the trouble to disambiguate this case;
>> security-conscious applications should take care to perform unicode
>> normalization as well, for the same reason.
>>
>> Similarly, if you don't offer a verifier to ensure that the input is in
>> "canonical JSON" format, then an attacker can try to create collisions by
>> violating the rules of canonical JSON format, whether by using different
>> escape sequences, adding whitespace, etc.  This can be used to make JSON
>> which is "the same" appear "different", violating the intent of the
>> canonicalization.  Any security application of canonical JSON will require
>> a strict mode for JSON.parse() as well as a strict mode for
>> JSON.stringify().
>>
>
> Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't
> endpoints be re-canonicalizing before hashing anyway?  Why would one want
> to ship the canonical form over the wire if it loses precision?
>
>
>
>>   --scott
>>
>> On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <
>> anders.rundgren.net at gmail.com> wrote:
>>
>>> On 2018-03-16 08:52, C. Scott Ananian wrote:
>>>
>>>> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
>>>> least
>>>> mention unicode normalization of strings.
>>>>
>>>
>>> Yes, I could add that unicode normalization of strings is out of scope
>>> for this specification.
>>>
>>>
>>> You probably should also specify a validator: it doesn't matter if you
>>>> emit canonical JSON if you can tweak the hash of the value by feeding
>>>> non-canonical JSON as an input.
>>>>
>>>
>>> Pardon me, but I don't understand what you are writing here.
>>>
>>> Hash functions only "raison d'être" are providing collision safe
>>> checksums.
>>>
>>> thanx,
>>> Anders
>>>
>>>
>>>    --scott
>>>>
>>>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>>>> anders.rundgren.net at gmail.com <mailto:anders.rundgren.net at gmail.com>>
>>>> wrote:
>>>>
>>>>     Dear List,
>>>>
>>>>     Here is a proposal that I would be very happy getting feedback on
>>>> since it builds on ES but is not (at all) limited to ES.
>>>>
>>>>     The request is for a complement to the ES "JSON" object called
>>>> canonicalize() which would have identical parameters to the existing
>>>> stringify() method.
>>>>
>>>
> Why should canonicalize take a replacer?  Hasn't replacement already
> happened?
>
>
>
>>     The JSON canonicalization scheme (including ES code for emulating
>>>> it), is described in:
>>>>
>>>> https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
>>>> <
>>>> https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
>>>> >
>>>>
>>>>     Current workspace:
>>>> https://github.com/cyberphone/json-canonicalization <
>>>> https://github.com/cyberphone/json-canonicalization>
>>>>
>>>>     Thanx,
>>>>     Anders Rundgren
>>>>     _______________________________________________
>>>>     es-discuss mailing list
>>>>     es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>>>>     https://mail.mozilla.org/listinfo/es-discuss <
>>>> https://mail.mozilla.org/listinfo/es-discuss>
>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20180319/17a3d846/attachment.html>