JSON decoding
On Oct 11, 2006, at 3:06 PM, Robert Sayre wrote:
I'm pretty sure the group accepted some form of Crockford's parseJSON proposal, but I think it would be handy to add something analogous to simplejson's object_hook argument. One weakness JSON has is annotating literals (like strings with custom attributes), and this facility can make that smoother.
<svn.red-bean.com/bob/simplejson/trunk/docs/module- simplejson.html#load>
See "Specializing JSON object decoding" above as well.
Thanks, the wiki will be re-exported after the next face-to-face TG1
meeting (which is the end of next week). We should have finalized
the JSON APIs by then. We were already considering an optional
filter funarg, but Bob's object_hook is more powerful. I've added a
comment to the wiki with that link.
On Oct 11, 2006, at 8:06 PM, Brendan Eich wrote:
On Oct 11, 2006, at 3:06 PM, Robert Sayre wrote:
I'm pretty sure the group accepted some form of Crockford's parseJSON proposal, but I think it would be handy to add something analogous to simplejson's object_hook argument. One weakness JSON has is
annotating literals (like strings with custom attributes), and this facility can make that smoother.<svn.red-bean.com/bob/simplejson/trunk/docs/module- simplejson.html#load>
See "Specializing JSON object decoding" above as well.
Thanks, the wiki will be re-exported after the next face-to-face
TG1 meeting (which is the end of next week). We should have
finalized the JSON APIs by then. We were already considering an
optional filter funarg, but Bob's object_hook is more powerful.
I've added a comment to the wiki with that link.
Sneak preview (wiki re-export coming soon) of Doug Crockford's latest
rev of the proposal:
JSON encoding and decoding
The following built-in functions facilitate handling JSON data, as
specified in Douglas Crockford's RFC 4627.
Encoding
* Object.prototype.toJSONString( filter , pretty )
Returns a String containing the JSON representation of an object,
array, date, string, number, boolean, or null . The proto link is not
used when finding members. A Date object is serialized as an ISO date
string in double quotes. An Array object is serialized as a sequence
of comma separated values wrapped in square brackets. Otherwise, an
object is serialized as a a sequenced of comma separated pairs
wrapped in curly braces. A string is serialized as a quoted string
with backslash escapement. A finite number is serialized by toString.
true , false , and null are encoded an unquoted strings. See http://
www.ietf.org/rfc/rfc4627.txt?number=4627
An encodingError will be thrown if the encoder reache a value which
* cannot be encoded as valid JSON (such as a function,
undefined, and NaN) OR
* represents a cycle (as happens when an object has a property
that refers to itself)
Recurring objects which do not cause cycles are allowed, but will
produce a complete text for each occurrence.
A filter is an optional function which takes two parameters: a name
and a value. The function will be called for each candidate pair in
each object being serialized. If the function returns true, then the
pair will be included. If the function returns false, then the pair
will be excluded. If the function returns any other value, or if it
throws, then an encodingError exception is thrown.
If the pretty parameter is true, then linefeeds are inserted after
each { , [ , and , and before } and ] , and multiples of 4 spaces are
inserted to indicate the level of nesting, and one space will be
inserted after : . Otherwise, no whitespace is inserted between the
tokens.
Decoding
* String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown
if the string was not a strict JSON text.
The optional hook function will be called for each object value
found. It is passed each object. Its return value will be used
instead of the object in the final structure.
I like this proposal a lot; it avoids too many bells and whistles,
and captures the desired (different) hooks for encoding and decoding,
as well as pretty-encoding. Comments?
If Date is serialized as an ISO strings the process of encoding and decoding loses information. If Dates are added to JSON then they should be encoded using new Date(ms). However Dates are not supported in JSON today and removing them from JS2 seems OK.
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 11, 2006, at 8:06 PM, Brendan Eich wrote:
On Oct 11, 2006, at 3:06 PM, Robert Sayre wrote:
I'm pretty sure the group accepted some form of Crockford's parseJSON proposal, but I think it would be handy to add something analogous to simplejson's object_hook argument. One weakness JSON has is annotating literals (like strings with custom attributes), and this facility can make that smoother.
<svn.red-bean.com/bob/simplejson/trunk/docs/module- simplejson.html#load>
See "Specializing JSON object decoding" above as well.
Thanks, the wiki will be re-exported after the next face-to-face TG1 meeting (which is the end of next week). We should have finalized the JSON APIs by then. We were already considering an optional filter funarg, but Bob's object_hook is more powerful. I've added a comment to the wiki with that link.
Sneak preview (wiki re-export coming soon) of Doug Crockford's latest rev of the proposal:
JSON encoding and decoding
The following built-in functions facilitate handling JSON data, as specified in Douglas Crockford's RFC 4627. Encoding
* Object.prototype.toJSONString( filter , pretty )
Returns a String containing the JSON representation of an object, array, date, string, number, boolean, or null . The proto link is not used when finding members. A Date object is serialized as an ISO date string in double quotes. An Array object is serialized as a sequence of comma separated values wrapped in square brackets. Otherwise, an object is serialized as a a sequenced of comma separated pairs wrapped in curly braces. A string is serialized as a quoted string with backslash escapement. A finite number is serialized by toString. true , false , and null are encoded an unquoted strings. See http:// www.ietf.org/rfc/rfc4627.txt?number=4627
An encodingError will be thrown if the encoder reache a value which
* cannot be encoded as valid JSON (such as a function,
undefined, and NaN) OR * represents a cycle (as happens when an object has a property that refers to itself)
Recurring objects which do not cause cycles are allowed, but will produce a complete text for each occurrence.
A filter is an optional function which takes two parameters: a name and a value. The function will be called for each candidate pair in each object being serialized. If the function returns true, then the pair will be included. If the function returns false, then the pair will be excluded. If the function returns any other value, or if it throws, then an encodingError exception is thrown.
If the pretty parameter is true, then linefeeds are inserted after each { , [ , and , and before } and ] , and multiples of 4 spaces are inserted to indicate the level of nesting, and one space will be inserted after : . Otherwise, no whitespace is inserted between the tokens. Decoding
* String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown if the string was not a strict JSON text.
The optional hook function will be called for each object value found. It is passed each object. Its return value will be used instead of the object in the final structure.
I like this proposal a lot; it avoids too many bells and whistles, and captures the desired (different) hooks for encoding and decoding, as well as pretty-encoding. Comments?
Encoding seems incredibly weak. You can put Date objects in, but you'll never get a Date object back out. The filter function seems really bizarre and a little useless. Why key:value pair? Why no context? No opportunity for object replacement?
Decoding looks great, though.
On Oct 20, 2006, at 3:36 PM, Erik Arvidsson wrote:
If Date is serialized as an ISO strings the process of encoding and decoding loses information. If Dates are added to JSON then they should be encoded using new Date(ms). However Dates are not supported in JSON today and removing them from JS2 seems OK.
ES4 also standardizes Date.parse to accept the same ISO 8601 date
strings, so I don't believe any information is lost.
You're right that this automatic encoding of Date objects as ISO
strings does not result, when decoding, in Date objects again.
Fixing that would require a pass over the decoded structure, or a use
of the optional object hook on the enclosing object. Is this a problem?
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 3:36 PM, Erik Arvidsson wrote:
If Date is serialized as an ISO strings the process of encoding and decoding loses information. If Dates are added to JSON then they should be encoded using new Date(ms). However Dates are not supported in JSON today and removing them from JS2 seems OK.
ES4 also standardizes Date.parse to accept the same ISO 8601 date strings, so I don't believe any information is lost.
Even if there wasn't, you can always turn strings into dates the hard way as MochiKit does...
The important part is that the metadata that it is a date is lost, even if there's an easy way to make dates from strings. You have to know which strings should be treated as dates.
You're right that this automatic encoding of Date objects as ISO strings does not result, when decoding, in Date objects again. Fixing that would require a pass over the decoded structure, or a use of the optional object hook on the enclosing object. Is this a problem?
The object hook would be no good here because it would have to be able to find ISO strings by key (or regexing all string values looking for dates.. bleh).
On Oct 20, 2006, at 3:51 PM, Bob Ippolito wrote:
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 3:36 PM, Erik Arvidsson wrote:
If Date is serialized as an ISO strings the process of encoding and decoding loses information. If Dates are added to JSON then they should be encoded using new Date(ms). However Dates are not
supported in JSON today and removing them from JS2 seems OK.ES4 also standardizes Date.parse to accept the same ISO 8601 date strings, so I don't believe any information is lost.
Even if there wasn't, you can always turn strings into dates the hard way as MochiKit does...
Sure. That was more an FYI (ECMA-262 left Date.parse unspecified, a
botch that requires browsers to reverse engineer IE JScript's
Date.parse implementation).
The important part is that the metadata that it is a date is lost, even if there's an easy way to make dates from strings. You have to know which strings should be treated as dates.
You're right, but JSON does not provide any way to distinguish this
case (preserve the bit of metadata you cite).
You're right that this automatic encoding of Date objects as ISO strings does not result, when decoding, in Date objects again. Fixing that would require a pass over the decoded structure, or a use of the optional object hook on the enclosing object. Is this a
problem?The object hook would be no good here because it would have to be able to find ISO strings by key (or regexing all string values looking for dates.. bleh).
The context would have to determine what's a date and what is not.
Yeah, it's poor, but without extending JSON, what can be done?
[Replying all this time]
Yes. This is a problem because you cannot tell if the data contained a String or a Date.
({string: "20061020125156", date: new Date}).toJSONString() -> {"string":"20061020125156","date":"20061020125156"}
...like Bob said...
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
The context would have to determine what's a date and what is not. Yeah, it's poor, but without extending JSON, what can be done?
It's problem if you want to write something like excel for arbitrary JSON objects. Otherwise there is always agreement on a schema ahead of time. Maybe dates should serialize as quoted ISO8601 strings by default.
Are the purveyors of JSON are familiar with the evolution of Lisp's
print/read into make-load-form?
www.lisp.org/HyperSpec/Body/stagenfun_make-load-form.html#make- load
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 3:51 PM, Bob Ippolito wrote:
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 3:36 PM, Erik Arvidsson wrote:
If Date is serialized as an ISO strings the process of encoding and decoding loses information. If Dates are added to JSON then they should be encoded using new Date(ms). However Dates are not supported in JSON today and removing them from JS2 seems OK.
ES4 also standardizes Date.parse to accept the same ISO 8601 date strings, so I don't believe any information is lost.
Even if there wasn't, you can always turn strings into dates the hard way as MochiKit does...
Sure. That was more an FYI (ECMA-262 left Date.parse unspecified, a botch that requires browsers to reverse engineer IE JScript's Date.parse implementation).
The important part is that the metadata that it is a date is lost, even if there's an easy way to make dates from strings. You have to know which strings should be treated as dates.
You're right, but JSON does not provide any way to distinguish this case (preserve the bit of metadata you cite).
You're right that this automatic encoding of Date objects as ISO strings does not result, when decoding, in Date objects again. Fixing that would require a pass over the decoded structure, or a use of the optional object hook on the enclosing object. Is this a problem?
The object hook would be no good here because it would have to be able to find ISO strings by key (or regexing all string values looking for dates.. bleh).
The context would have to determine what's a date and what is not. Yeah, it's poor, but without extending JSON, what can be done?
The encoder should have an object hook the same way the decoder does. Throw away the filter.
You'd have an encoder hook that knows how to turn a Date object into some specific object representation, and a decoder hook that does the inverse. Then you could use the JSON encoder to encode whatever you wanted into a valid JSON document however it needed to be laid out.
Filtering functionality can be implemented if necessary by writing an encoder hook that inspects the object. If it needs to remove some key:value pairs, then it would create a new object that is missing the keys that should be filtered.
Brendan Eich scripsit:
* cannot be encoded as valid JSON (such as a function,
undefined, and NaN)
I think NaN is a place where there should be some pushback; there is no reason why JSON should not implement NaN, perhaps in the form +nan or +0.nan .
On Oct 20, 2006, at 4:11 PM, Bob Ippolito wrote:
The encoder should have an object hook the same way the decoder does.
Symmetry, what a good idea. TG1 members should consider this seriously.
Throw away the filter.
There was some desire to filter without copying.
Filtering functionality can be implemented if necessary by writing an encoder hook that inspects the object. If it needs to remove some key:value pairs, then it would create a new object that is missing the keys that should be filtered.
That's the copy we hoped to avoid. Not a problem in your simplejson
experience?
On Oct 20, 2006, at 4:15 PM, John Cowan wrote:
Brendan Eich scripsit:
* cannot be encoded as valid JSON (such as a function,
undefined, and NaN)
I think NaN is a place where there should be some pushback; there is no reason why JSON should not implement NaN, perhaps in the form +nan or +0.nan .
Doug is on this list, but the RFC is published and we are not the
group to evolve it. So for ES4 we are not going beyond what's in the
JSON RFC.
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 4:11 PM, Bob Ippolito wrote:
The encoder should have an object hook the same way the decoder does.
Symmetry, what a good idea. TG1 members should consider this seriously.
Throw away the filter.
There was some desire to filter without copying.
What was the use case for filtering? I've never seen anyone do that.
Filtering functionality can be implemented if necessary by writing an encoder hook that inspects the object. If it needs to remove some key:value pairs, then it would create a new object that is missing the keys that should be filtered.
That's the copy we hoped to avoid. Not a problem in your simplejson experience?
Not a problem because I've never seen anyone perform such a filter. If the output needs to differ from the input it's usually based on the value not the key:value pair. When the encoding needs to differ it's usually not strict subset of the input, but a different object altogether.
On Oct 20, 2006, at 4:35 PM, Bob Ippolito wrote:
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 4:11 PM, Bob Ippolito wrote:
The encoder should have an object hook the same way the decoder
does.Symmetry, what a good idea. TG1 members should consider this
seriously.Throw away the filter.
There was some desire to filter without copying.
What was the use case for filtering? I've never seen anyone do that.
The use case is taking a vanilla object that may have ad-hoc
properties that should not be serialized. The filter could be used
to whitelist or blacklist properties.
Filtering functionality can be implemented if necessary by
writing an encoder hook that inspects the object. If it needs to remove some key:value pairs, then it would create a new object that is
missing the keys that should be filtered.That's the copy we hoped to avoid. Not a problem in your simplejson experience?
Not a problem because I've never seen anyone perform such a filter. If the output needs to differ from the input it's usually based on the value not the key:value pair. When the encoding needs to differ it's usually not strict subset of the input, but a different object altogether.
Ok, good feedback. TG1 (or at least yours truly) will take it to heart.
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 4:35 PM, Bob Ippolito wrote:
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 4:11 PM, Bob Ippolito wrote:
The encoder should have an object hook the same way the decoder does.
Symmetry, what a good idea. TG1 members should consider this seriously.
Throw away the filter.
There was some desire to filter without copying.
What was the use case for filtering? I've never seen anyone do that.
The use case is taking a vanilla object that may have ad-hoc properties that should not be serialized. The filter could be used to whitelist or blacklist properties.
That sounds mostly like hand-waving... I don't think that actually happens in a way where the filter function would really help out.
Also, the idea that the filter function should be done without copying is kinda silly. Is it faster to make N function calls (returning false M times), or to make 1 function call that copies an object with N-M key:value pairs (only when M>0)?
I'd imagine that for most values of N and M, that the latter is better. When M is zero, which is probably usually will be, then there is no copy made. It also provides symmetry with decoding and is infinitely more flexible and useful.
On Oct 20, 2006, at 4:53 PM, Bob Ippolito wrote:
I'd imagine that for most values of N and M, that the latter is better. When M is zero, which is probably usually will be, then there is no copy made. It also provides symmetry with decoding and is infinitely more flexible and useful.
I agree it's many times better to have a symmetric object hook. I
was re-reading
svn.red-bean.com/bob/simplejson/trunk/docs/module- simplejson.html#load
and wondered why you don't have an object hook for dump/dumps. Also,
is there a way to have the object_hook cause the key/value for which
the object passed into the hook is the value to be skipped entirely?
You have infinitely more JSON encoding and decoding experience than I
do (and I mean that mathematically ;-). Any more rationale on the
design decisions that you'd be willing to share here would be helpful.
On 10/20/06, Brendan Eich <brendan at mozilla.org> wrote:
On Oct 20, 2006, at 4:53 PM, Bob Ippolito wrote:
I'd imagine that for most values of N and M, that the latter is better. When M is zero, which is probably usually will be, then there is no copy made. It also provides symmetry with decoding and is infinitely more flexible and useful.
I agree it's many times better to have a symmetric object hook. I was re-reading
svn.red-bean.com/bob/simplejson/trunk/docs/module- simplejson.html#load
and wondered why you don't have an object hook for dump/dumps. Also, is there a way to have the object_hook cause the key/value for which the object passed into the hook is the value to be skipped entirely?
It does have a hook, but it's a method not a parameter...
""" To extend this to recognize other objects, subclass and implement a .default() method with another method that returns a serializable object for o if possible, otherwise it should call the superclass implementation (to raise TypeError). """
The object hook gets just object values, not key:value pairs. If you don't want a key:value pair included, return an object that does not have that key:value pair.
If you worked only with key:value pairs, then you wouldn't be able to hook elements of an Array... and that would be silly. It's also not reasonable to implement skip for Arrays, because if they're used as a tuple then you've changed the meaning of the Array. Skipping an object wholesale is best done by returning a placeholder like null, but in general most people implement encoders that raise exceptions if stuff they don't want ends up in the object graph...
Here's the latest wiki'd proposal (thanks to Doug for updating it)
for a standard JSON codec ([pretty] means optional argument, not
array initialiser):
JSON encoding and decoding
The following built-in functions facilitate handling JSON data, as
specified in Douglas Crockford's RFC 4627.
Encoding
* Object.prototype.toJSONString( [pretty] )
Returns a String containing the JSON representation of an object,
array, date, string, number, boolean, or null . The [[Prototype]]
internal property is not used when finding members. A Date object is
serialized as an ISO date string in double quotes. An Array object is
serialized as a sequence of comma separated values wrapped in square
brackets. Otherwise, an object is serialized as a sequenced of comma
separated pairs wrapped in curly braces. A string is serialized as a
quoted string with backslash escapement. A finite number is
serialized by toString. true , false , and null are encoded an
unquoted strings. See www.ietf.org/rfc/rfc4627.txt?number=4627
An application can replace the toJSONString method on an instance or
a class to change its JSON serialization. So, for example, to
customize the serialization of dates, add a custom toJSONString
method to either the individual date object or Date.prototype .
Members in objects whose valus are functions will be skipped.
A new EncodingError instance will be thrown if the encoder reaches a
value that
* cannot be encoded as valid JSON (such as a function in an
array, undefined, and NaN) OR
* represents a cycle (as happens when an object has a property
that refers to itself)
Recurring objects that do not cause cycles are allowed, but will
produce a complete text for each occurrence.
If the optional pretty parameter is true, then linefeeds are inserted
after each { , [ , and , and before } and ] , and multiples of 4
spaces are inserted to indicate the level of nesting, and one space
will be inserted after : . Otherwise, no whitespace is inserted
between the tokens.
* EncodingError is a new exception type. It contains a name
property with value “EncodingError” and a message (an implementation- defined string).
Decoding
* String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown
if the string was not a strict JSON text.
The optional hook function will be called for each object value
found. It is passed each object. Its return value will be used
instead of the object in the final structure.
This is pretty minimal. It reflects Bob's helpful comments. It
still gives Date default encoding better than what toString produces,
to wit toISOString's result. Without evolving JSON to talk about ISO
date literals, this is as far as we want to go. We could try to
invent a string date convention that can round-trip, but it could
conflict with JSON schemas in use now or in the future. So we leave
it to anyone who wants that to hook up Date.prototype.toJSONString.
Comments welcome as always.
On 2006-10-24, at 15:46 EDT, Brendan Eich wrote:
Here's the latest wiki'd proposal (thanks to Doug for updating it)
for a standard JSON codec ([pretty] means optional argument, not
array initialiser):
[...]
Seems much cleaner.
Too bad about RFC 4627 and Date Literal, though.
On Oct 24, 2006, at 4:38 PM, P T Withington wrote:
On 2006-10-24, at 15:46 EDT, Brendan Eich wrote:
Here's the latest wiki'd proposal (thanks to Doug for updating it)
for a standard JSON codec ([pretty] means optional argument, not
array initialiser):[...]
Seems much cleaner.
Too bad about RFC 4627 and Date Literal, though.
It's not a big deal in practice, from what I hear. And more
important, the KISS principle really does not want JSON to subsume
ISO Date details. I'll let Bob, Doug, and others who actually deal
with lots of JSON (not saying you don't, just haven't heard that
yet ;-) talk about how infrequently Date objects turn up in trees to
be serialized.
On 10/24/06, Brendan Eich <brendan at mozilla.com> wrote:
An application can replace the toJSONString method on an instance or a class to change its JSON serialization. <snip> This is pretty minimal. It reflects Bob's helpful comments. It still gives Date default encoding better than what toString produces, to wit toISOString's result. Without evolving JSON to talk about ISO date literals, this is as far as we want to go. We could try to invent a string date convention that can round-trip, but it could conflict with JSON schemas in use now or in the future. So we leave it to anyone who wants that to hook up Date.prototype.toJSONString.
Comments welcome as always.
Does ES4 will also provide functions for JSON deserialization ? and if yes, are those function will also be replaceable ? something in the line of Object.prototype.fromJSONString ?
zwetan
On Oct 24, 2006, at 5:24 PM, zwetan wrote:
On 10/24/06, Brendan Eich <brendan at mozilla.com> wrote:
An application can replace the toJSONString method on an instance or a class to change its JSON serialization. <snip> This is pretty minimal. It reflects Bob's helpful comments. It still gives Date default encoding better than what toString produces, to wit toISOString's result. Without evolving JSON to talk about ISO date literals, this is as far as we want to go. We could try to invent a string date convention that can round-trip, but it could conflict with JSON schemas in use now or in the future. So we leave it to anyone who wants that to hook up Date.prototype.toJSONString.
Comments welcome as always.
Does ES4 will also provide functions for JSON deserialization ? and if yes, are those function will also be replaceable ? something in the line of Object.prototype.fromJSONString ?
I realize the formatting wasn't ideal (copy-paste from dokuwiki) but
the answer was in the message to which you replied:
Decoding
* String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown
if the string was not a strict JSON text.
The optional hook function will be called for each object value
found. It is passed each object. Its return value will be used
instead of the object in the final structure.
Object.prototype.fromJSONString does not make sense, since the string
comes from |this| (if it came from a parameter, then you'd want a
String "static method"). So String.prototype.parseJSON is the
decoder, and it can be wrapped or replaced just like other
String.prototype method properties, although the object hook
(inspired directly by Bob Ippolito's simplejson) will probably
relieve the need to override.
'{"foo": [1, true, null]}'.parseJSON() -> {foo: [1, true, null]}
And using a hook:
'{"foo", [1, true, null]}'.parseJSON(function(obj) { if (typeof obj == 'object' && 'foo' in obj) { return 'Hello World' } })
->
'Hello World'
I guess I should have had an else case there as well :-)
On 10/24/06, Brendan Eich <brendan at mozilla.com> wrote: <snip>
I realize the formatting wasn't ideal (copy-paste from dokuwiki) but the answer was in the message to which you replied:
and I read it too fast so it didn't helped :)
Decoding
* String.prototype.parseJSON( hook )
<snip>
Object.prototype.fromJSONString does not make sense, since the string comes from |this| (if it came from a parameter, then you'd want a String "static method"). So String.prototype.parseJSON is the decoder, and it can be wrapped or replaced just like other String.prototype method properties, although the object hook (inspired directly by Bob Ippolito's simplejson) will probably relieve the need to override.
indeed the hook is a good approach but with what I try to achieve overriding the method look like the only option I will have
quick pseudo example:
Date.prototype.toJSONString return something as "new Date( 2006, 9, 24, ... )"
String.prototype.parseJSON should be able to scan the incoming string something as "{ a: new Date(2006, 9, 24), b:new Date( 2006, 11, 25 ), etc. }" to reckonize "new" keyword then "Date" keyword, then scan open/close parenthesis then dynamically create an instance of the Date object (thanks to the .call/.apply function)
I would prefer to use the hook but I suppose using a "new class(...) " notation inside a JSON string will yeld an EncodingError.
zwetan
On 10/24/06, zwetan <zwetan at gmail.com> wrote:
On 10/24/06, Brendan Eich <brendan at mozilla.com> wrote: <snip>
I realize the formatting wasn't ideal (copy-paste from dokuwiki) but the answer was in the message to which you replied:
and I read it too fast so it didn't helped :)
Decoding
* String.prototype.parseJSON( hook )
<snip>
Object.prototype.fromJSONString does not make sense, since the string comes from |this| (if it came from a parameter, then you'd want a String "static method"). So String.prototype.parseJSON is the decoder, and it can be wrapped or replaced just like other String.prototype method properties, although the object hook (inspired directly by Bob Ippolito's simplejson) will probably relieve the need to override.
indeed the hook is a good approach but with what I try to achieve overriding the method look like the only option I will have
quick pseudo example:
Date.prototype.toJSONString return something as "new Date( 2006, 9, 24, ... )"
String.prototype.parseJSON should be able to scan the incoming string something as "{ a: new Date(2006, 9, 24), b:new Date( 2006, 11, 25 ), etc. }" to reckonize "new" keyword then "Date" keyword, then scan open/close parenthesis then dynamically create an instance of the Date object (thanks to the .call/.apply function)
I would prefer to use the hook but I suppose using a "new class(...) " notation inside a JSON string will yeld an EncodingError.
This is exactly why toJSONString shouldn't be the API, it should deal in objects instead... toJSONString guarantees that the interoperability we currently enjoy will vanish.
On Oct 24, 2006, at 7:09 PM, Bob Ippolito wrote:
On 10/24/06, zwetan <zwetan at gmail.com> wrote:
quick pseudo example:
Date.prototype.toJSONString return something as "new Date( 2006, 9, 24, ... )"
String.prototype.parseJSON should be able to scan the incoming string something as "{ a: new Date(2006, 9, 24), b:new Date( 2006, 11,
25 ), etc. }" to reckonize "new" keyword then "Date" keyword, then scan open/close parenthesis then dynamically create an instance of the Date object (thanks to the .call/.apply function)I would prefer to use the hook but I suppose using a "new class(...) " notation inside a JSON string will yeld an EncodingError.
This is exactly why toJSONString shouldn't be the API, it should deal in objects instead... toJSONString guarantees that the interoperability we currently enjoy will vanish.
What exactly is the difference between a function that can be wrapped/
overridden (or class that can be subclassed) and a method that can be
overridden? Can you show an example of how this breaks? Sorry if
this is obvious.
On 10/24/06, Brendan Eich <brendan at mozilla.com> wrote:
On Oct 24, 2006, at 7:09 PM, Bob Ippolito wrote:
On 10/24/06, zwetan <zwetan at gmail.com> wrote:
quick pseudo example:
Date.prototype.toJSONString return something as "new Date( 2006, 9, 24, ... )"
String.prototype.parseJSON should be able to scan the incoming string something as "{ a: new Date(2006, 9, 24), b:new Date( 2006, 11, 25 ), etc. }" to reckonize "new" keyword then "Date" keyword, then scan open/close parenthesis then dynamically create an instance of the Date object (thanks to the .call/.apply function)
I would prefer to use the hook but I suppose using a "new class(...) " notation inside a JSON string will yeld an EncodingError.
This is exactly why toJSONString shouldn't be the API, it should deal in objects instead... toJSONString guarantees that the interoperability we currently enjoy will vanish.
What exactly is the difference between a function that can be wrapped/ overridden (or class that can be subclassed) and a method that can be overridden? Can you show an example of how this breaks? Sorry if this is obvious.
There is no meaningful difference, but that's not what I'm talking about.
toJSONString is the wrong level of abstraction. The return value should NOT be a string that inserts arbitrary text into the encoding stream. Injecting arbitrary text into the stream is an absolute guarantee that (most) people will do it incorrectly and produce invalid documents. Worst case they will use it to intermittently produce invalid documents. Maybe there's even something
The return value should be any JSON-encodable object that gets properly encoded into the string (called "toJSON" perhaps). If this approach is taken then it will be impossible to produce invalid documents, much like it's impossible to produce an invalid XML document by serializing a DOM tree.
On Oct 24, 2006, at 10:57 PM, Bob Ippolito wrote:
toJSONString is the wrong level of abstraction. The return value should NOT be a string that inserts arbitrary text into the encoding stream. Injecting arbitrary text into the stream is an absolute guarantee that (most) people will do it incorrectly and produce invalid documents. Worst case they will use it to intermittently produce invalid documents. Maybe there's even something
Last sentence was cut off.
I thought you might mean that, but I don't see how to do it. We
can't declare a type for "the set of all valid JSON values". If we
take an object of some type, we want to rule out cycles, Date objects
that don't have toJSON or whatever methods, etc. How does simplejson
do this?
The return value should be any JSON-encodable object that gets properly encoded into the string (called "toJSON" perhaps). If this approach is taken then it will be impossible to produce invalid documents, much like it's impossible to produce an invalid XML document by serializing a DOM tree.
That's circular in that a DOM implementation will build only well-
formed DOM trees. To express the type "valid JSON value" we would
need recursive structural types, and that still wouldn't catch
cycles. We can't force this type into the nominal part of the type
system.
Anyway, if we are missing something in simplejson, please show us the
way. Thanks,
On 10/24/06, Brendan Eich <brendan at mozilla.com> wrote:
On Oct 24, 2006, at 10:57 PM, Bob Ippolito wrote:
toJSONString is the wrong level of abstraction. The return value should NOT be a string that inserts arbitrary text into the encoding stream. Injecting arbitrary text into the stream is an absolute guarantee that (most) people will do it incorrectly and produce invalid documents. Worst case they will use it to intermittently produce invalid documents. Maybe there's even something
Last sentence was cut off.
Maybe there's even something like SQL injection attacks that will happen with toJSONString.
I thought you might mean that, but I don't see how to do it. We can't declare a type for "the set of all valid JSON values". If we take an object of some type, we want to rule out cycles, Date objects that don't have toJSON or whatever methods, etc. How does simplejson do this?
How would you rule out cycles with toJSONString? Do it exactly the same way.
simplejson (by default) does cycle detection by maintaining a set of objects it has seen. If it sees the same one twice, then it bails (of course the built-in immutable types are not included here). It's not a perfect strategy, because it really should be done with respect to the stack, but for all practical purposes it's fine. If people need to encode a weird object graph they can turn cycle detection off.
toJSONString will actually ensure undetectable cycles if people call it recursively, which they will... because if they don't call it recursively, then they can't be sure they're generating a valid document.
The return value should be any JSON-encodable object that gets properly encoded into the string (called "toJSON" perhaps). If this approach is taken then it will be impossible to produce invalid documents, much like it's impossible to produce an invalid XML document by serializing a DOM tree.
That's circular in that a DOM implementation will build only well- formed DOM trees. To express the type "valid JSON value" we would need recursive structural types, and that still wouldn't catch cycles. We can't force this type into the nominal part of the type system.
Once you find a valid that isn't valid in JSON, you throw an error and that's that. I don't see how it's circular. If you pass the wrong kind of object to the XML serializer you'll get an error too.
Anyway, if we are missing something in simplejson, please show us the way. Thanks,
The equivalent encoder with simplejson would be this:
class JSONMethodEncoder(simplejson.JSONEncoder): def default(self, obj): if callable(getattr(obj, 'toJSON', None)): return obj.toJSON() return simplejson.JSONEncoder.default(self)
oups hit the send button too fast
---------- Forwarded message ---------- From: zwetan <zwetan at gmail.com>
Date: Oct 25, 2006 8:31 AM Subject: Re: JSON decoding To: Bob Ippolito <bob at redivi.com>
On 10/25/06, Bob Ippolito <bob at redivi.com> wrote: <snip>
This is exactly why toJSONString shouldn't be the API, it should deal in objects instead... toJSONString guarantees that the interoperability we currently enjoy will vanish.
I agree it should deal in object, this could allow much simpler/faster custom serializer/deserializer process
that's why I asked in another thread if there were ES4 "language element" that would be dedicated to code introspection.
zwetan
I thought you might mean that, but I don't see how to do it. We can't declare a type for "the set of all valid JSON values". If we take an
Do we need to? If you can't detect it statically or express it as a type, you can still detect it dynamically.
On Oct 25, 2006, at 7:24 AM, Dave Herman wrote:
I thought you might mean that, but I don't see how to do it. We
can't declare a type for "the set of all valid JSON values". If
we take anDo we need to? If you can't detect it statically or express it as a
type, you can still detect it dynamically.
Sure, and that's what Bob is proposing. We just didn't see it last
week at the face to face. The built-in encoder must produce a string,
but the customizable method can produce only a JSON value, instead of
producing an encoded substring that would then have to be validated
by parsing. Then the encoder can check the value for cycles, do
standard stuff like pretty encoding, etc.
Doug, what do you think?
Brendan Eich wrote:
On Oct 25, 2006, at 7:24 AM, Dave Herman wrote:
I thought you might mean that, but I don't see how to do it. We can't declare a type for "the set of all valid JSON values". If we take an
Do we need to? If you can't detect it statically or express it as a type, you can still detect it dynamically.
Sure, and that's what Bob is proposing. We just didn't see it last week at the face to face. The built-in encoder must produce a string, but the customizable method can produce only a JSON value, instead of producing an encoded substring that would then have to be validated by parsing.
Then the encoder can check the value for cycles, do standard stuff like pretty encoding, etc.Doug, what do you think?
I think this is getting out of hand. What was originally intended was a convenient little serializer that would be a little faster and more convenient than the current json.js. Bob wants to turn it into a transformation engine. I think he should do all of his transformations in JavaScript, and then hand the structures to a minimal toJSONString method as the final step. I prefer to keep the standard simple, and let Bob do complicated on his side.
On 10/26/06, Douglas Crockford <crock at yahoo-inc.com> wrote:
Brendan Eich wrote:
On Oct 25, 2006, at 7:24 AM, Dave Herman wrote:
I thought you might mean that, but I don't see how to do it. We can't declare a type for "the set of all valid JSON values". If we take an
Do we need to? If you can't detect it statically or express it as a type, you can still detect it dynamically.
Sure, and that's what Bob is proposing. We just didn't see it last week at the face to face. The built-in encoder must produce a string, but the customizable method can produce only a JSON value, instead of producing an encoded substring that would then have to be validated by parsing. Then the encoder can check the value for cycles, do standard stuff like pretty encoding, etc.
Doug, what do you think?
I think this is getting out of hand. What was originally intended was a convenient little serializer that would be a little faster and more convenient than the current json.js. Bob wants to turn it into a transformation engine. I think he should do all of his transformations in JavaScript, and then hand the structures to a minimal toJSONString method as the final step. I prefer to keep the standard simple, and let Bob do complicated on his side.
I think the current proposal is dangerous. I don't think it's an API that people will use correctly. Having to generate an entirely new object graph before encoding really bites for performance and ease of use.
You also over-estimate the difficulty of implementing a "toJSON" method instead of "toJSONString". It's really not that hard, seriously, I've written at least two JSON encoders that do it. It sounds more like you don't want to change the spec for the sake of not having to change it.
Also, I've NEVER seen a serialization protocol that does something silly like toJSONString. If they offer transformations, it's always guaranteed that you get valid output... and the only way to do that is to allow the transformation to return a data structure instead of a raw stream.
On Oct 26, 2006, at 11:22 AM, Douglas Crockford wrote:
Brendan Eich wrote:
Doug, what do you think?
I think this is getting out of hand. What was originally intended
was a convenient little serializer that would be a little faster
and more convenient than the current json.js. Bob wants to turn it
into a transformation engine.
It seems to me Bob's point is that we shouldn't let the extension
mechanism produce arbitrary strings that get included in the string
result of the standard (non-extension-mechanism) JSON encoder,
because people will produce bent JSON-nails if you give them only a
string-hammer. Give them a self-leveling power screwdriver instead,
to stretch the metaphor. This means the extension mechanism cannot
be the same method as the standard encoder.
I'll draft a patch to the current proposal to illustrate this,
without complexifying things too much.
On Oct 26, 2006, at 8:41 AM, Bob Ippolito wrote:
I think the current proposal is dangerous. I don't think it's an API that people will use correctly. Having to generate an entirely new object graph before encoding really bites for performance and ease of use.
Having to generate a new object graph is required either way, Doug's
or Bob's:
Doug's: transform an object graph g to JSON-ready object graph jrg,
then use jrg.toJSONString().
Bob's: use a JSONEncoder to process g, which where it needs to
transform from non-JSON-ready object subgraph sg to JSON-ready graph
jrsg evaluates jrsg = sg.toJSON(). The encoder then uses
jrsg.toJSONString().
In practice, these are just different ways of doing the same work,
because no one using Doug's way would copy the entire graph, given
that some of it is JSON-ready as posited in the toJSON case (Bob's
way). So Doug's way ends up being "for each sub-graph sg that's not
JSON-ready, override toJSONString with a function that transforms sg
to JSON-ready jrsg and return jrsg.toJSONString()."
Ok, so what's really at stake here: whether users have
a) the opportunity to return any old string by overriding toJSONString;
b) the requirement to override toJSONString for each sub-graph sg
that's not JSON-ready.
Bob's way, there's a b' counterpart to b:
b') define (not override) toJSON for each sub-graph sg that's not
JSON-ready.
Bob's way doesn't have the (a) hazard. His (b') is different too in
how it requires defining a method, toJSON, that does not override a
standard method on a prototype object. So the JSONEncoder only has
to look for toJSON and use it if it's defined; otherwise it can use a
non-overrideable internal method to encode values.
Doug's way follows the toString pattern, and since non-JSON-ready
values in objects and arrays are skipped when encoding, it usually
does what is meant. Or we can hope that it does what is meant. Or
something; I'm not sure what JSON users want, so I'll stop guessing.
In any event, it's simple by one measure: a single toJSONString
method pattern, a method on various prototypes that can be overridden
to customize, instead of JSONEncoder and an optional toJSON.
You also over-estimate the difficulty of implementing a "toJSON" method instead of "toJSONString". It's really not that hard, seriously, I've written at least two JSON encoders that do it.
It's clear from the above that (b) and (b') are about the same amount
of work.
It sounds more like you don't want to change the spec for the sake
of not having to change it.
Doug is still editing the spec, changing it from what I forwarded, so
that's not the case. On the down side from your point of view, the
spec is changing to match www.json.org/json.js in detail. I'm
writing again to make sure that we agree on what's different between
the two proposals ((a) hazard; (b) vs. (b')), and that we include a
rationale in the final spec for whatever ends up there.
Also, I've NEVER seen a serialization protocol that does something silly like toJSONString. If they offer transformations, it's always guaranteed that you get valid output... and the only way to do that is to allow the transformation to return a data structure instead of a raw stream.
I think everyone gets this point. I agree it's a virtue. It sounds
like Doug values it less, but I'll let him speak.
Does anyone else have an opinion?
On 11/2/06, Brendan Eich <brendan at mozilla.com> wrote:
I think everyone gets this point. I agree it's a virtue. It sounds like Doug values it less, but I'll let him speak.
Does anyone else have an opinion?
I think the hazard is very real and very serious. I don't see any additional expressiveness enabled by toJSONString.
On 11/2/06, Brendan Eich <brendan at mozilla.com> wrote:
On Oct 26, 2006, at 8:41 AM, Bob Ippolito wrote:
I think the current proposal is dangerous. I don't think it's an API that people will use correctly. Having to generate an entirely new object graph before encoding really bites for performance and ease of use.
Having to generate a new object graph is required either way, Doug's or Bob's:
Doug's: transform an object graph g to JSON-ready object graph jrg, then use jrg.toJSONString().
Bob's: use a JSONEncoder to process g, which where it needs to transform from non-JSON-ready object subgraph sg to JSON-ready graph jrsg evaluates jrsg = sg.toJSON(). The encoder then uses jrsg.toJSONString().
In practice, these are just different ways of doing the same work, because no one using Doug's way would copy the entire graph, given that some of it is JSON-ready as posited in the toJSON case (Bob's way). So Doug's way ends up being "for each sub-graph sg that's not JSON-ready, override toJSONString with a function that transforms sg to JSON-ready jrsg and return jrsg.toJSONString()."
Ok, so what's really at stake here: whether users have
a) the opportunity to return any old string by overriding toJSONString; b) the requirement to override toJSONString for each sub-graph sg that's not JSON-ready.
Bob's way, there's a b' counterpart to b:
b') define (not override) toJSON for each sub-graph sg that's not JSON-ready.
Bob's way doesn't have the (a) hazard. His (b') is different too in how it requires defining a method, toJSON, that does not override a standard method on a prototype object. So the JSONEncoder only has to look for toJSON and use it if it's defined; otherwise it can use a non-overrideable internal method to encode values.
Doug's way follows the toString pattern, and since non-JSON-ready values in objects and arrays are skipped when encoding, it usually does what is meant. Or we can hope that it does what is meant. Or something; I'm not sure what JSON users want, so I'll stop guessing. In any event, it's simple by one measure: a single toJSONString method pattern, a method on various prototypes that can be overridden to customize, instead of JSONEncoder and an optional toJSON.
You also over-estimate the difficulty of implementing a "toJSON" method instead of "toJSONString". It's really not that hard, seriously, I've written at least two JSON encoders that do it.
It's clear from the above that (b) and (b') are about the same amount of work.
It sounds more like you don't want to change the spec for the sake of not having to change it.
Doug is still editing the spec, changing it from what I forwarded, so that's not the case. On the down side from your point of view, the spec is changing to match www.json.org/json.js in detail. I'm writing again to make sure that we agree on what's different between the two proposals ((a) hazard; (b) vs. (b')), and that we include a rationale in the final spec for whatever ends up there.
If an implementation that does "toJSON" is wanted, then I could extract one from MochiKit.Base.serializeJSON...
Does anyone else have an opinion?
I like how AS3 handles custom AMF encoding. Native objects are encoded using the default implementation, and most other objects just has its public properties encoded, as if it is a generic object. But you can choose to implement an interface, which the encoder checks for and lets you write out whatever you like.
I like this because it does not involve polluting all objects with this extra method, particularly if JSON is optional for implementation.
Peter
On Nov 2, 2006, at 2:50 PM, Bob Ippolito wrote:
If an implementation that does "toJSON" is wanted, then I could extract one from MochiKit.Base.serializeJSON...
Having something comparable in size to json.js would be helpful. We
could weigh API issues along with "simplicity", and convince
ourselves that the implementation costs when customizing are
comparable, if not almost identical.
On 11/2/06, Brendan Eich <brendan at mozilla.com> wrote: ...
Does anyone else have an opinion?
JSON as a standard is good because it's simple and is text-based, so with that is not so much a big deal to implement in other languages (because that's the goal right to communicate outside of ES4) but as the PHP serialization it will not be enougth for more advanced serialization/deserialization (ok maybe special case that do not apply to everyone), so what I would really want as a developper is other ways to do code introspection in ES4.
For example, why not be able to obtain an XML-graph of an object structure ? with E4X it would be trivial to serialize to any more advanced format needed, but if there are no introspection functions to do that, developpers will have few choices to implement serialization/deserialization mecanism in ES4:
- use JSON and nothing else
- maybe use implementor/vendor non-standard mecanism (AMF, etc.)
- struggle and be stuck at the end
So my opinion, even if narrow focused, ok having JSON as standard in ES4 is good, but having tools inside the language to be able to implement other serialization/deserialization mecanism is better.
Something as flash.utils.describeType in a little more advanced way and defined as standard in ES4 would be perfect imho, but ok perharps I'm asking too much.
zwetan
On 11/2/06, Brendan Eich <brendan at mozilla.com> wrote:
On Nov 2, 2006, at 2:50 PM, Bob Ippolito wrote:
If an implementation that does "toJSON" is wanted, then I could extract one from MochiKit.Base.serializeJSON...
Having something comparable in size to json.js would be helpful. We could weigh API issues along with "simplicity", and convince ourselves that the implementation costs when customizing are comparable, if not almost identical.
Ok, here you go. Comparable in size, implements toJSON.
So my opinion, even if narrow focused, ok having JSON as standard in ES4 is good, but having tools inside the language to be able to implement other serialization/deserialization mecanism is better.
Something as flash.utils.describeType in a little more advanced way and defined as standard in ES4 would be perfect imho, but ok perharps I'm asking too much.
There are two kind of serialization there.
- runtime serialization, based on values
- metadata serialization, based on compile-time types
JSON is a runtime-one, it's based on values and does not require additional type informations. However, it can't handle classes or custom unserialize methods.
haXe Serialization is also a runtime serialization, but uses some additional informations generated in JS/SWF code (such as classes name). It enables then to (un)serialize classes instances as well. Classes are reference by-name and only the instance fields are serialized (not the prototype ones). Plus, haXe Serialization supports a good number of standard types : arrays, but also dates, lists, enums and exceptions.
For these two kind of serialization, no extra runtime type infos are required than the ones already available. However they have one big flaw which is the lack of ability to check that a given unserialized value entirely matches a given type (structurally). You can do an additional check phase but it needs to be provide a type-description that is painful to write. That's in such cases (and for introspection) that a type description is available.
Flash9 adds the describeType function. Because the information is already available in the bytecode, it comes for free. That's not the case for other haXe supported languages (JS and Neko), so such XML informations is now optionaly generated if the class implements haxe.rtti.Infos. The good thing is that the XML also includes documentation comments.
Such additional type informations can be used as type signature when checking an untrusted data source in order to validate the data, but is not mandatory for serialization.
However I agree that a good Reflection/Introspection API is very important in a programming languages. For example, haXe have haxe.org/api/Reflect and haxe.org/api/Type APIs which can be used to implement a lot of different libraries.
Again, I'm in favor of opening the ES4 language as much as possible by providing the base API that needs VM-level support.
I think we have this upside down. We're worried about the user
creating an invalid JSON string. Why? Because they could break the
reader? Doesn't the reader have to protect itself? (Perhaps the
lure of eval as a reader has confused us.)
It seems more important to tell the user: 'You tried to create a
JSON representation of something, but it can't be represented in a
way that parseJSON can meaningfully reconstruct it".
From an historical point of view, perhaps we are trying to do too
much with JSON.
JSON reminds me of Lisp's print/read, which eventually evolved a flag
[print-readably](www.lisp.org/HyperSpec/Body/var_stprint-
readablyst.html#STprint-readablyST). If true, you will get an error
if you try to print an object that cannot be represented in a way
that read can parse and construct a 'similar' object. Print/read
relies, as does JSON, on the implicit type-encoding of the language
literals.
toJSON seems like 'half a loaf'. It lets me encode an arbitrary
object, but without a corresponding fromJSON, what use is it?
Compare that to dump/load, which is supported by [make-load-form]
(www.lisp.org/HyperSpec/Body/stagenfun_make-load-form.html),
which allows you to write arbitrary expressions for reconstructing
objects. Dangerous for reading arbitrary data. Or compare Java
serialization, which has a more restricted mechanism for
reconstructing arbitrary objects. Less dangerous. Pretty
heavyweight. Maybe not what you are looking for in a lightweight
scripting language.
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
I think we have this upside down. We're worried about the user creating an invalid JSON string. Why? Because they could break the reader? Doesn't the reader have to protect itself? (Perhaps the lure of eval as a reader has confused us.)
The proposal definitely doesn't use eval as a reader.
It seems more important to tell the user: 'You tried to create a JSON representation of something, but it can't be represented in a way that parseJSON can meaningfully reconstruct it".
The problem is injection attacks and intermittent brokenness based on input values.
From an historical point of view, perhaps we are trying to do too much with JSON.
JSON reminds me of Lisp's print/read, which eventually evolved a flag [print-readably](www.lisp.org/HyperSpec/Body/var_stprint- readablyst.html#STprint-readablyST). If true, you will get an error if you try to print an object that cannot be represented in a way that read can parse and construct a 'similar' object. Print/read relies, as does JSON, on the implicit type-encoding of the language literals.
toJSON seems like 'half a loaf'. It lets me encode an arbitrary object, but without a corresponding fromJSON, what use is it?
The whole load is there, just not in this particular implementation. The object of current discussion was encoding, not decoding, so I didn't implement a whole parser.
(from Brendan's email on the 20th)
* String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown if the string was not a strict JSON text.
The optional hook function will be called for each object value found. It is passed each object. Its return value will be used instead of the object in the final structure.
On 2006-11-03, at 10:14 EST, Bob Ippolito wrote:
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
I think we have this upside down. We're worried about the user creating an invalid JSON string. Why? Because they could break the reader? Doesn't the reader have to protect itself? (Perhaps the lure of eval as a reader has confused us.)
The proposal definitely doesn't use eval as a reader.
I was referring to www.json.org/json.js I understood the
original motivation for JSON was that all Javascripts already had a
built-in (although unsafe) JSON parser in the form of eval, whereas
not all of them had functional XML parsers. If I am wrong, my
parenthetical remark is not crucial to my questions.
It seems more important to tell the user: 'You tried to create a JSON representation of something, but it can't be represented in a way that parseJSON can meaningfully reconstruct it".
The problem is injection attacks and intermittent brokenness based on input values.
I understand that the 'from JSON' mechanism must protect itself
against such. But there is no point in preventing the 'to JSON'
mechanism from writing such, since 'from JSON' must validate
anyways. More important is to signal an error if an object that
cannot be round-tripped is passed to 'to JSON'. The end user would
be justifiably upset if 'to JSON' gave the impression that it
preserved their data, only to find out (much) later that 'from JSON'
cannot reconstruct it. These are almost the same condition, but
there is an important difference in emphasis: rather than preventing
the user from using 'to JSON' in a way that breaks 'from JSON', you
are preventing the user from writing data using 'to JSON' that can
not be reconstructed accurately using 'from JSON'.
From an historical point of view, perhaps we are trying to do too much with JSON.
JSON reminds me of Lisp's print/read, which eventually evolved a flag [print-readably](www.lisp.org/HyperSpec/Body/var_stprint- readablyst.html#STprint-readablyST). If true, you will get an error if you try to print an object that cannot be represented in a way that read can parse and construct a 'similar' object. Print/read relies, as does JSON, on the implicit type-encoding of the language literals.
toJSON seems like 'half a loaf'. It lets me encode an arbitrary object, but without a corresponding fromJSON, what use is it?
The whole load is there, just not in this particular implementation. The object of current discussion was encoding, not decoding, so I didn't implement a whole parser.
(from Brendan's email on the 20th)
- String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown if the string was not a strict JSON text.
The optional hook function will be called for each object value found. It is passed each object. Its return value will be used instead of the object in the final structure.
I understand this to mean that you intend to support building, on top
of the JSON substrate, a general dump/load that would preserve
arbitrary values, not just those that can be directly represented as
literals. I don't quite see how you can do that, since it would mean
that you would have to use some part of the literal space that can be
represented by JSON to represent these additional values. At the
very least, you need to define an escape in JSON to do that, no?
Otherwise how do you distinguish between a literal that represents,
say, an instance of a class, and the same pattern meant to represent
just that literal value? As a concrete example, if we did not have
date literals and you chose to implement 'to JSON' of a date object
as {date: <iso-date-string>}
, how can 'from JSON' tell whether it
is to reconstruct a Date vs. reconstruct the Object {date: <iso-date- string>}
?
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
On 2006-11-03, at 10:14 EST, Bob Ippolito wrote:
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
I think we have this upside down. We're worried about the user creating an invalid JSON string. Why? Because they could break the reader? Doesn't the reader have to protect itself? (Perhaps the lure of eval as a reader has confused us.)
The proposal definitely doesn't use eval as a reader.
I was referring to www.json.org/json.js I understood the original motivation for JSON was that all Javascripts already had a built-in (although unsafe) JSON parser in the form of eval, whereas not all of them had functional XML parsers. If I am wrong, my parenthetical remark is not crucial to my questions.
It seems more important to tell the user: 'You tried to create a JSON representation of something, but it can't be represented in a way that parseJSON can meaningfully reconstruct it".
The problem is injection attacks and intermittent brokenness based on input values.
I understand that the 'from JSON' mechanism must protect itself against such. But there is no point in preventing the 'to JSON' mechanism from writing such, since 'from JSON' must validate anyways. More important is to signal an error if an object that cannot be round-tripped is passed to 'to JSON'. The end user would be justifiably upset if 'to JSON' gave the impression that it preserved their data, only to find out (much) later that 'from JSON' cannot reconstruct it. These are almost the same condition, but there is an important difference in emphasis: rather than preventing the user from using 'to JSON' in a way that breaks 'from JSON', you are preventing the user from writing data using 'to JSON' that can not be reconstructed accurately using 'from JSON'.
toJSON does a better job at preserving data than toJSONString. Yes, it's possible that you can encode data in a way such that you don't get the same thing back out. So what? At least you get a valid document.
Your second point doesn't make any sense. toJSONString has no advantage whatsoever in reconstruction than toJSON. It is unsafe, however.
The discussion hasn't been "should we allow custom representations" but how we should allow custom representations.
From an historical point of view, perhaps we are trying to do too much with JSON.
JSON reminds me of Lisp's print/read, which eventually evolved a flag [print-readably](www.lisp.org/HyperSpec/Body/var_stprint- readablyst.html#STprint-readablyST). If true, you will get an error if you try to print an object that cannot be represented in a way that read can parse and construct a 'similar' object. Print/read relies, as does JSON, on the implicit type-encoding of the language literals.
toJSON seems like 'half a loaf'. It lets me encode an arbitrary object, but without a corresponding fromJSON, what use is it?
The whole load is there, just not in this particular implementation. The object of current discussion was encoding, not decoding, so I didn't implement a whole parser.
(from Brendan's email on the 20th)
- String.prototype.parseJSON( hook )
Returns the value represented by the string. A syntaxError is thrown if the string was not a strict JSON text.
The optional hook function will be called for each object value found. It is passed each object. Its return value will be used instead of the object in the final structure.
I understand this to mean that you intend to support building, on top of the JSON substrate, a general dump/load that would preserve arbitrary values, not just those that can be directly represented as literals. I don't quite see how you can do that, since it would mean that you would have to use some part of the literal space that can be represented by JSON to represent these additional values. At the very least, you need to define an escape in JSON to do that, no? Otherwise how do you distinguish between a literal that represents, say, an instance of a class, and the same pattern meant to represent just that literal value? As a concrete example, if we did not have date literals and you chose to implement 'to JSON' of a date object as
{date: <iso-date-string>}
, how can 'from JSON' tell whether it is to reconstruct a Date vs. reconstruct the Object{date: <iso-date- string>}
?
You can do that quite easily. JSON-RPC for example. You just need to use an appropriate decoding object hook that understands how the objects should be decoded.
On 2006-11-03, at 12:58 EST, Bob Ippolito wrote:
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
[...]
I understand that the 'from JSON' mechanism must protect itself against such. But there is no point in preventing the 'to JSON' mechanism from writing such, since 'from JSON' must validate anyways. More important is to signal an error if an object that cannot be round-tripped is passed to 'to JSON'. The end user would be justifiably upset if 'to JSON' gave the impression that it preserved their data, only to find out (much) later that 'from JSON' cannot reconstruct it. These are almost the same condition, but there is an important difference in emphasis: rather than preventing the user from using 'to JSON' in a way that breaks 'from JSON', you are preventing the user from writing data using 'to JSON' that can not be reconstructed accurately using 'from JSON'.
toJSON does a better job at preserving data than toJSONString. Yes, it's possible that you can encode data in a way such that you don't get the same thing back out. So what? At least you get a valid document.
Hm. Well, as a user, I would not want to use a serialization
mechanism that guarantees my document can be de-serialized but not
that it contains the data I serialized. As an implementer, such a
spec is easy to implement, though: the serializer can just ignore
its input, write an empty document, and return success.
Your second point doesn't make any sense. toJSONString has no advantage whatsoever in reconstruction than toJSON. It is unsafe, however.
The discussion hasn't been "should we allow custom representations" but how we should allow custom representations.
We're missing here. I am not arguing toJSONString vs. toJSON. I am
saying that we should not create a facility that gives the illusion
of preserving your data when it does not. I believe it would be
better to not have an extension mechanism at all than to have one
that is incomplete.
[...]
I understand this to mean that you intend to support building, on top of the JSON substrate, a general dump/load that would preserve arbitrary values, not just those that can be directly represented as literals. I don't quite see how you can do that, since it would mean that you would have to use some part of the literal space that can be represented by JSON to represent these additional values. At the very least, you need to define an escape in JSON to do that, no? Otherwise how do you distinguish between a literal that represents, say, an instance of a class, and the same pattern meant to represent just that literal value? As a concrete example, if we did not have date literals and you chose to implement 'to JSON' of a date object as
{date: <iso-date-string>}
, how can 'from JSON' tell whether it is to reconstruct a Date vs. reconstruct the Object{date: <iso- date- string>}
?You can do that quite easily. JSON-RPC for example. You just need to use an appropriate decoding object hook that understands how the objects should be decoded.
In other words, you have to have an out-of-band agreement on the
encoding. Once again, this creates the issue that you could save
your data in a file and not be able to recover that data (because the
encoding scheme is not in the file, if the two become separated, you
are out of luck).
Summary: I don't think JSON is powerful enough (in it's current
specification) to support an extension to general object
serialization, so I don't think we should try to layer than feature
on it. Either JSON has to evolve, or a different representation
should be chosen.
Sorry about being a curmudgeon, but its in my job description.
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
On 2006-11-03, at 12:58 EST, Bob Ippolito wrote:
On 11/3/06, P T Withington <ptw at pobox.com> wrote:
[...]
I understand that the 'from JSON' mechanism must protect itself against such. But there is no point in preventing the 'to JSON' mechanism from writing such, since 'from JSON' must validate anyways. More important is to signal an error if an object that cannot be round-tripped is passed to 'to JSON'. The end user would be justifiably upset if 'to JSON' gave the impression that it preserved their data, only to find out (much) later that 'from JSON' cannot reconstruct it. These are almost the same condition, but there is an important difference in emphasis: rather than preventing the user from using 'to JSON' in a way that breaks 'from JSON', you are preventing the user from writing data using 'to JSON' that can not be reconstructed accurately using 'from JSON'.
toJSON does a better job at preserving data than toJSONString. Yes, it's possible that you can encode data in a way such that you don't get the same thing back out. So what? At least you get a valid document.
Hm. Well, as a user, I would not want to use a serialization mechanism that guarantees my document can be de-serialized but not that it contains the data I serialized. As an implementer, such a spec is easy to implement, though: the serializer can just ignore its input, write an empty document, and return success.
You are totally crazy. toJSON is something that the user implements, not the serializer. This doesn't make any sense at all.
Your second point doesn't make any sense. toJSONString has no advantage whatsoever in reconstruction than toJSON. It is unsafe, however.
The discussion hasn't been "should we allow custom representations" but how we should allow custom representations.
We're missing here. I am not arguing toJSONString vs. toJSON. I am saying that we should not create a facility that gives the illusion of preserving your data when it does not. I believe it would be better to not have an extension mechanism at all than to have one that is incomplete.
[...]
You're confused. It doesn't give any illusion, and it's not incomplete.
I do agree that it's better to have no extension mechanism at all than have one that's incomplete... but the incomplete one is toJSONString. The toJSON extension mechanism is complete.
I understand this to mean that you intend to support building, on top of the JSON substrate, a general dump/load that would preserve arbitrary values, not just those that can be directly represented as literals. I don't quite see how you can do that, since it would mean that you would have to use some part of the literal space that can be represented by JSON to represent these additional values. At the very least, you need to define an escape in JSON to do that, no? Otherwise how do you distinguish between a literal that represents, say, an instance of a class, and the same pattern meant to represent just that literal value? As a concrete example, if we did not have date literals and you chose to implement 'to JSON' of a date object as
{date: <iso-date-string>}
, how can 'from JSON' tell whether it is to reconstruct a Date vs. reconstruct the Object{date: <iso- date- string>}
?You can do that quite easily. JSON-RPC for example. You just need to use an appropriate decoding object hook that understands how the objects should be decoded.
In other words, you have to have an out-of-band agreement on the encoding. Once again, this creates the issue that you could save your data in a file and not be able to recover that data (because the encoding scheme is not in the file, if the two become separated, you are out of luck).
Summary: I don't think JSON is powerful enough (in it's current specification) to support an extension to general object serialization, so I don't think we should try to layer than feature on it. Either JSON has to evolve, or a different representation should be chosen.
Sorry about being a curmudgeon, but its in my job description.
JSON is plenty powerful enough to represent anything you want it to in the same way that s-expressions are. Your point seems to be that you like to argue rather than you have something to contribute to the discussion.
On 2006-11-03, at 13:45 EST, Bob Ippolito wrote:
[...]
You are totally crazy.
[...]
You're confused.
[...]
Your point seems to be that you like to argue rather than you have something to contribute to the discussion.
You seem to have taken my comments as a personal attack, which they
surely were not meant as. I misunderstood what was being proposed
here. I thought the proposal was for a way to save and restore
arbitrary data. On re-reading I see that the proposal is simply for
a way to generate and parse valid JSON expressions.
I apologize to the list for the wasted bandwidth.
On Nov 3, 2006, at 1:09 AM, Nicolas Cannasse wrote:
However I agree that a good Reflection/Introspection API is very important in a programming languages. For example, haXe have haxe.org/api/Reflect and haxe.org/api/Type APIs which
can be used to implement a lot of different libraries.Again, I'm in favor of opening the ES4 language as much as possible by providing the base API that needs VM-level support.
See the newly re-exported wiki dump at developer.mozilla.org
es4/, specifically developer.mozilla.org/es4/proposals
meta_objects.html -- this is a normative spec proposed for an
optional part of the standard library. I.e., you can't expect to
find it on tiny embeddings, but if it's there, it will all be there
as specified.
I'm pretty sure the group accepted some form of Crockford's parseJSON proposal, but I think it would be handy to add something analogous to simplejson's object_hook argument. One weakness JSON has is annotating literals (like strings with custom attributes), and this facility can make that smoother.
svn.red-bean.com/bob/simplejson/trunk/docs/module-simplejson.html#load
See "Specializing JSON object decoding" above as well.