Please help with writing spec for async JSON APIs
The semantics of your proposal are straightforward, so I don't think you need to provide spec text at this point. Instead, what would be helpful is a quantitative analysis showing why these additional methods are needed. Is there any way you can demonstrate the benefit with numbers?
JSON parsing is such a slow process that it motivated me to re-invent Google Protobufs (in a nice, JS-friendly way, see joeedh/STRUCT/wiki/Intro-and-Examples ). I never use JSON in production code for this reason. An async api isn't a bad idea.
Joe
You probably don’t want to support reviver/replacer in the async methods as they would be very challenging to make performant.
I confess I don't see the point of this proposal at all, at least with respect to being specifically about JSON.
JSON parsing/stringification is pure computation; it's not like I/O where you need something special inside the language runtime's implementation in order to exploit the asynchrony.
While it would be generally useful to be able to hand a random chunk of CPU-bound work off to another thread on another processor core, there's no point whatsoever in treating JSON as a special case of this. JSON handling is just one use case for asynchronous computation in general. Presumably once the language's async features are fully baked you should be able to just wrap a call to the existing JSON API inside an async function, and get this functionality (and much else besides) directly.
Chip
On Jul 31, 2015, at 8:03 PM, Mohsen Azimi <me at azimi.me<mailto:me at azimi.me>> wrote:
Hi,
I stumbled on lack of async APIs for JSON parsing and stringifying in JavaScript a couple of weeks ago. I tried to hackazimi.me/2015/07/30/non-blocking-async-json-parse.html around it by abusing the W3C Fetch API but that's just a hack.
Domenic suggestedtwitter.com/domenic/status/626958415181393920 that we should write the proposal spec for native non-blocking JSON processing. I don't know what the API should look like but I made some assumptions and wrote the initial spec (if I can call it spec!) and published it in GitHubmohsen1/async-json.
I need to learn the spec lingo and rewrite the spec in proper and standard language. I need help and resources to learn the language of the spec.
Would you please review the proposal so far (including the outstanding PR)?
Thanks, Mohsen
I agree that we should probably look for a more general solution. What we have at the moment is woefully inadequate though (I.e. WebWorkers on the client and separate processes in node.js
What we need some way of doing multi-threading baked into the language, but that could take a long time to properly standardise.
If we're speaking normatively, some of us don't see the point of using unstructured object serialization for web communication at all (it's not simply a binary-versus-text thing; I once implemented JSON's object model in a binary format, and it had the same speed as the native JSON parser). That said, JSON is a standard, and it'd be nice if I could support it in more use cases (instead of telling cilents "can't do that, too slow").
If we must have JSON, let's at least make it minimally usable. If that's too much to ask, than perhaps it's time browsers supported ProtoBufs/STRUCT type systems natively.
Joe
Synchronous JSON parsing can block Node.js application. See following test case - Chromium native parser can handle up to 44MB per second on my hardware. jsperf.com/json-parse-vs-1mb-json (BTW - I'm quite impressed by V8 garbage collector). jsperf.com/json-parse-vs-1mb-json
It's enough to perform easy DoS on reasonably configured low-end server.
Anyway - I don't think that asynchronous parsing/stringifing would solve this problem (it can solve some perfomance issues in WebSockets-based games preventing frame-drop).
A common use case is large JSON feeds: header + lots of entries + trailer
When processing such feeds, you should not bring the whole JSON in memory all at once. Instead you should process the feed incrementally.
So, IMO, an alternate API should not be just asynchronous, it should also be incremental.
FWIW, I have implemented an incremental/evented parser for V8 with a simple API. This parser is incremental but not async (because V8 imposes that materialization happen in the main JS thread). But, if the V8 restriction could be lifted, it could be made async with the same API. See bjouhier/i-json
i-json's API is a simple low level API. A more sophisticated solution would be a duplex stream.
There was also a long discussion on this topic on node's GitHub: joyent/node#7543
Bruno
Exactly! Incremental and async, i.e., streaming.
XML quickly needed such APIs (en.wikipedia.org/wiki/Simple_API_for_XML, en.wikipedia.org/wiki/StAX). JSON's in the same boat.
Personally I just use small JSON records delimited by newlines in my 'streaming' applications. Best of both worlds IMO.
The SAX approach is not ideal for JSON because we don't want the overhead of a callback on every key (especially if parsing and callbacks are handled by different threads).
To be efficient we need is a hybrid approach with an evented API (SAX-like)
for top level keys, and direct mapping to JS for deeper keys. In the feed
case, you only need one event for the header, one for every entry and one
for the trailer. In the i-json API I'm handling this with a maxDepth
option in the parser constructor.
Bruno
2015-08-03 3:25 GMT+02:00 Brendan Eich <brendan at mozilla.org>:
So, to summarize some things that have been said or are implicit in this thread and related discussions:
-
New JSON APIs could be added to JS. We don’t have to be limited to JSON.parse/stringify
-
We don’t have to be restricted to the JSON.stringify/parse mapping of JS objects from/to JSON texts
-
Streaming is a better processing model for some applications
-
JSON.parse/stringify are pure computational operations. There is no perf benefit to making them asynchronous unless some of their computation can be performed concurrently.
-
You can't just run JSON.parse (or stringify) concurrently with other JS “jobs” because of possible races
-
You could concurrently run the parsing phase of JSON.parse (steps 3-5 of ecma-international.org/ecma-262/6.0/#sec-json.parse, ecma-international.org/ecma-262/6.0/#sec-json.parse ).
-
You can not run the step 8 substeps (reviver processing) concurrently because they may call back into JS code and hence could introduce races.
-
Making JSON.stringify concurrent probably requires first copying/transferring the input object graph, but that is essentially the computation that JSON.stringify performs so it is hard to see any benefit.
On 8/3/15 11:34 AM, Allen Wirfs-Brock wrote:
- JSON.parse/stringify are pure computational operations. There is no perf benefit to making them asynchronous unless some of their computation can be performed concurrently.
Or even just incrementally, right?
In practice, 500 chunks of 5ms of processing may be better for a GUI application that wants to remain responsive than a single chunk of 2500ms, even if it doesn't happen concurrently, as long as it yields to processing of user events.
On Aug 3, 2015, at 8:45 AM, Boris Zbarsky wrote:
On 8/3/15 11:34 AM, Allen Wirfs-Brock wrote:
- JSON.parse/stringify are pure computational operations. There is no perf benefit to making them asynchronous unless some of their computation can be performed concurrently.
Or even just incrementally, right?
In practice, 500 chunks of 5ms of processing may be better for a GUI application that wants to remain responsive than a single chunk of 2500ms, even if it doesn't happen concurrently, as long as it yields to processing of user events.
-Boris
sure, but that's a user interactiveness benefit, not a "perf benefit". There is almost always some overhead introduced when making a computation incremental. Responsiveness is a fine reason to make such a trade-off.
On 8/3/15 11:56 AM, Allen Wirfs-Brock wrote:
sure, but that's a user interactiveness benefit, not a "perf benefit".
OK, fair. I just wanted it to be clear that there is a benefit to incremental/asynchronous behavior here apart from raw throughput.
On Mon, Aug 3, 2015 at 8:34 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: [snip]
- JSON.parse/stringify are pure computational operations. There is no perf benefit to making them asynchronous unless some of their computation can be performed concurrently.
If we're speaking strictly about making the JSON parsing asynchronous, then correct, there is really no performance benefit to speak of. You may be able to offload the parsing to a separate thread, but it's going to take the same amount of time. The real benefit will come when (a) JSON parsing becomes incremental and (b) a developer is given greater control over exactly how the JSON is converted to/from strings.
Something along the lines of...
JSON.parser(input). on('key', function(key, context) { if (key === 'foo') console.log(context.value()); else if (key === 'bar') context.on('key', ...); }). on('end', function() { });
In other words: allowing for incremental access to the stream and fine grained control over the parsing process, rather than having to block while everything is parsed out, building up the in-memory object model, then being forced to walk that model in order to do anything interesting.
Personally, I'm not overly concerned about the possibility of races.
On Aug 3, 2015, at 9:02 AM, James M Snell wrote:
On Mon, Aug 3, 2015 at 8:34 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: [snip]
- JSON.parse/stringify are pure computational operations. There is no perf benefit to making them asynchronous unless some of their computation can be performed concurrently.
If we're speaking strictly about making the JSON parsing asynchronous, then correct, there is really no performance benefit to speak of. You may be able to offload the parsing to a separate thread, but it's going to take the same amount of time. The real benefit will come when (a) JSON parsing becomes incremental
yes, incremental is good. But do you really mean just "parsing" rather than "processing"?
and (b) a developer is given greater control over exactly how the JSON is converted to/from strings.
Strictly speaking JSON is strings. JSON.stringify/parse converts JS values (including objects) to/from such strings.
Something along the lines of...
JSON.parser(input). on('key', function(key, context) { if (key === 'foo') console.log(context.value()); else if (key === 'bar') context.on('key', ...); }). on('end', function() { });
I have to guess at your semantics, but what you are trying to express above seems like something that can already be accomplished using the reviver
argument to JSON.parse.
In other words: allowing for incremental access to the stream and fine grained control over the parsing process, rather than having to block while everything is parsed out, building up the in-memory object model, then being forced to walk that model in order to do anything interesting.
Personally, I'm not overly concerned about the possibility of races.
But, TC39 is concerned about races.
On Mon, Aug 3, 2015 at 10:29 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: [snip]
I have to guess at your semantics, but what you are trying to express above seems like something that can already be accomplished using the
reviver
argument to JSON.parse.
Yes and no. reviver
achieves part of goal but but still assumes that
parsing is fundamentally blocking and assumes that I want to return
something and have that in-memory obj built up and returned. However,
if what I want instead is to forgo the creation of an in memory model
altogether and working simply from an incremental, async stream of
events, then I'm out of luck.
In other words: allowing for incremental access to the stream and fine grained control over the parsing process, rather than having to block while everything is parsed out, building up the in-memory object model, then being forced to walk that model in order to do anything interesting.
Personally, I'm not overly concerned about the possibility of races.
But, TC39 is concerned about races.
Granted ;-) ... there's a reason I prefixed that sentence with 'Personally' ;-)
Reviver is a bit of a killer feature for async parsing because it imposes a callback on every key. It makes it difficult to efficiently offload parsing to a worker thread. Without it, feed entries could be parsed and materialized safely (provided GC allows it) in a separate thread and then emitted to the main JS thread.
In our "big JSON feeds" scenarios we never use revivers, and actually I'm not sure we even use them on small JSON payloads.
Is this feature really necessary in an async/incremental API variant?
On Aug 3, 2015, at 12:30 PM, Bruno Jouhier wrote:
Reviver is a bit of a killer feature for async parsing because it imposes a callback on every key. It makes it difficult to efficiently offload parsing to a worker thread. Without it, feed entries could be parsed and materialized safely (provided GC allows it) in a separate thread and then emitted to the main JS thread.
Exactly, that's why it's naive for anybody to propose that a concurrent JSON reader is simply a matter of wrapping JSON.parse with some async infrastructure.
In our "big JSON feeds" scenarios we never use revivers, and actually I'm not sure we even use them on small JSON payloads.
Is this feature really necessary in an async/incremental API variant?
There's nothing sacred about the JSON.parse reviver
argument. It would certainly be possible to have a deserialization function that did not include that functionality.
My understanding of most "streaming JSON" use in Node.js is that it actually is "newline-delimited JSON." That is, each line is parsed, delivered, and processed as a single chunk containing a JSON value, and the streaming nature comes from the incremental processing and backpressure as applied in between newline-delimited chunks.
Part of the advantage of this is that it's extremely easy to specify and implement how it works.
Bruno seems to indicate he's run in to use cases where it'd be useful to process a normal JSON object in a streaming fashion. That seems like a harder problem, indeed necessitating a SAX-like API.
It's true that the newline delimited feed is often "good enough". But it has a number of drawbacks:
- It is not JSON. This is especially annoying during development because most of the tools understand real JSON, not this format. For example during debugging you often work with small feeds (few entries but the entries can be long lines); it is very handy to be able to paste a whole feed into an editor and reformat it to visualize (or modify) the entries.
- There is a special MIME type and a special file extension for them
(see
en.wikipedia.org/wiki/Line_Delimited_JSON#MIME_type_and_file_extensions)
but I've never seen anyone use them (maybe I'm not curious enough). On the
other hand I've often seen such feeds saved as
.json
files. So the level of interoperability is poor. - It does not let you cleanly associate header/trailer metadata with
your feed. With true JSON, you can design your feed as
{ header: { ... }, body: [...], trailer: {...} }
. With ldjson you can still do it by formatting the header as a first line and the trailer (if any) as a trailer line but this is brittle (kinda of CSV in JSON, trailer forces you to test every entry).
Note: I'm not screaming for an async/incremental parser API in core JS because I already have libraries to handle arbitrary large JSON feeds. But if such an API were standardized I'm sure I would get a more robust/optimized parser.
Bruno
2015-08-03 22:22 GMT+02:00 Domenic Denicola <d at domenic.me>:
We have a spec for that: RFC 7464
Grüße, Carsten
RFC 7464 has a different format (0x1E at beginning of every record) and a different media type (application/json-seq vs application/x-ldjson) than line delimited JSON (en.wikipedia.org/wiki/Line_Delimited_JSON). The 0x1E at the beginning of every record makes it hard to edit these files.
Not sure it will improve interoperability, rather fragment things because it is likely that most people will stick to ldjson.
2015-08-04 1:29 GMT+02:00 Carsten Bormann <cabo at tzi.org>:
+1 for line delimited JSON. It would be good to switch all users of json-seq over to it and to deprecate json-seq. Perhaps an RFC would help.
On Tue, Aug 4, 2015 at 9:53 AM, Mark Miller <erights at gmail.com> wrote:
+1 for line delimited JSON. It would be good to switch all users of json-seq over to it and to deprecate json-seq. Perhaps an RFC would help.
On Mon, Aug 3, 2015 at 11:53 PM, Bruno Jouhier <bjouhier at gmail.com> wrote:
RFC 7464 has a different format (0x1E at beginning of every record) and a different media type (application/json-seq vs application/x-ldjson) than line delimited JSON (en.wikipedia.org/wiki/Line_Delimited_JSON). The 0x1E at the beginning of every record makes it hard to edit these files.
Not sure it will improve interoperability, rather fragment things because it is likely that most people will stick to ldjson.
2015-08-04 1:29 GMT+02:00 Carsten Bormann <cabo at tzi.org>:
Hi Domenic,
We have a spec for that: RFC 7464
-- Cheers, --MarkM
es-discuss mailing list es-discuss at mozilla.org, mail.mozilla.org/listinfo/es-discuss
FWIW there are very nice ways to encode stream-ready JSON such that its
perfectly backward compatible with JSON.parse
but still approximately as
easy as line-delimited JSON. The whitespace within stringified JSON can be
given optional significance, essentially it can act as a stream chunk
delimiter hint. For example each element of a top level array on single
line -- essentially equivalent to line-delimited JSON with 1 additional
char per line, the comma. And its nearly as easy to parse -- it's only
added complexity is the fact that as this is a valid profile of JSON, it'd
be nice for parsers to validate syntax (really just trailing commas), but
this is not strictly necessary.
Top level arrays can be emitted as kv pairs per line, which allows for
elegant streaming over associative array types. This is also really easy to
parse and maintains backcompat w/ JSON.parse
.
It would also be possible to introduce optional nesting sigils using more
whitespace, e.g. encoding chunk boundaries for two or more level nesting in
a way that would allow parsers to allow opt-in SAXisms to any arbitrary
level of nesting, but in a way that cleanly falls back to JSON.parse
.
As a side benefit, the layout of these nesting sigils could be designed in
a way that makes it a lot easier to read than the extremely dense
JSON.stringify
default behavior. As it happens, using leading whitespace
to indicate nesting would look a lot like pretty printing -- this wouldn't
be anything more than a pretty printer with some optional embedded
semantics attached.
Async encode and decode behavior can be layered on top of the depth: 1
serialization and deserialization. If you want more depth just add
recursion. If at any point you don't want streaming, just use JSON.parse
.
I'm not sure something like this is strictly necessary in the language, but if it is, ISTM it would be pretty low-footprint way to get it in -- it'd be nice to avoid yet another format.
JSON is not that hard to parse incrementally. The i-json parser is implemented in C++ with a fallback JS implementation. The C++ implementation is less than 1000 locs and the JS implementation less than 400 locs. The C++ implementation is 1.65 times slower than JSON.parse but, unlike JSON.parse, it does not use any V8 internals and I have spent very little time to optimize it. So there is room to get closer to JSON.parse.
Parsing JSON incrementally has some advantages:
- No need to specify a new format.
- The parser is generic and not limited to feeds. It can handle big, arbitrarily complex JSON payloads.
- Parsing can be performed by a tight automata in a single pass over the source buffer, without any backtracking nor lookahead. This may be more efficient than a 2-stages approach in which the first stage identifies the fragments and the second one passes them to JSON.parse.
I stumbled on lack of async APIs for JSON parsing and stringifying in JavaScript a couple of weeks ago. I tried to hack azimi.me/2015/07/30/non-blocking-async-json-parse.html around it
by abusing the W3C Fetch API but that's just a hack.
Domenic suggested twitter.com/domenic/status/626958415181393920
that we should write the proposal spec for native non-blocking JSON processing. I don't know what the API should look like but I made some assumptions and wrote the initial spec (if I can call it spec!) and published it in GitHub mohsen1/async-json.
I need to learn the spec lingo and rewrite the spec in proper and standard language. I need help and resources to learn the language of the spec.
Would you please review the proposal so far (including the outstanding PR)?
Thanks, Mohsen