Stage 0 proposal: specifying concurrency for "for...of" loops potentially containing "await" statements
The current way to write what you want is:
await Promise.all(itemsCursor.map(item => db.insert(item));
or
await Promise.all(itemsCursor.map(Promise.guard(5, item =>
db.insert(item))));
if you are using a library like prfun
(
cscott/prfun#promiseguardfunctionnumber-condition-function-fn--function
).
I'm not sure adding a new parallel loop construct is an obvious improvement on this, especially since unguarded concurrent execution is usually a mistake (can cause memory requirements to blow up), as you point out yourself.
I'd be more in favor of new syntax if it was integrated with a work-stealing mechanism, since (a) that can't as easily be done in a library function (you need access to the entire set of runnable tasks, not just the ones created in this loop), and (b) is more likely to be correct/fast by default and not lead to subtle resource-exhaustion problems.
I am more interested in syntax two than syntax one, which I felt should probably be included for completeness. But hey, as you say, maybe not since unguarded concurrency is indeed usually a mistake.
Taken on its own, do you have an objection to for (item of items concurrency 5) { ... }
?
It could be probably added as a Promise.all(iterable, concurrency?)
Some existing implementations:
With the promise.all, would each item in the iterable be an async function (or function returning a promise)? I mean, I'm presuming the point is to not have every item execute at the same time.
Can someone tell me exactly how just omitting "await" doesn't broadly achieve the "concurrency" objective?
Can someone tell me exactly how just omitting "await" doesn't broadly
achieve the "concurrency" objective?
Omitting "await" gives us no assurance that all of the iterations completed prior to exit from the "for...of" loop.
My proposal should have specified that regardless of whether "concurrency N" or "concurrent" is used, the loop will not exit until all of the items have successfully executed the loop body.
It could be probably added as a Promise.all(iterable, concurrency?)
While this would be a useful extension to Promise.all and might be part of the underlying implementation of the language feature, this does not meet the goal of avoiding the cognitive load of shifting gears from the "async/await" pattern to thinking overtly about promises. There are similar implementations, such as the one in Bluebird, but the goal of the language feature is to expand the set of problems that can be solved without the cognitive load of collecting async function return values and then awaiting a library function that operates on them.
My own belief is that the percentage of "await-friendly" use cases for asynchronous programming that come up for most developers is pretty close to 90%, and that a syntax for specifying concurrency would take it the rest of the way there, and be much appreciated by the developer who has written the "for...of { await ... }" loop and now wants to improve the performance in a simple way that is also likely to be safe (bounded concurrency).
-- Chief Software Architect Apostrophe Technologies Pronouns: he / him / his
With the promise.all, would each item in the iterable be an async
function (or function returning a promise)? I mean, I'm presuming the point is to not have every item execute at the same time.
Ah, correct, like in my example it should be an array of thunks (function with no arguments returning the promise)
So it should also have a different name than Promise.all
,
Promise.allThunks
maybe?
In my mind there is a cognitive load cost in adding language constructs (unless they significantly otherwise reduce cognitive load). In this case, I find omitting the "await" would suffice for a lot of people, and for those who want the loop to conceptually "hang" until all are completed (which I find strange in of itself), I think synchronously collecting the operations then doing a Promise.all is more understandable to a reader. Can you imagine explaining a for loop that doesn't behave linearly, e.g. imagine you have a variable outside the loop that is being modified on each iteration - the whole thing would go haywire. Especially compared to building an array followed by a Promise.all.
In my mind, the whole point of "await async" is to linearize asynchronous programming, so that logical flow that happens to be asynchronous is trivial to reason about, and read. In my mind the "concurrent" proposal jeopardises that.
Omitting the "await" would not suffice for the original use case of "await": you need to know it actually completed before you continue to the code after the "for" loop. It would also unleash unbounded concurrency which usually doesn't end well.
It takes us back to:
"Sometimes my code works, sometimes it doesn't, I'm not sure why. Some kind of timing thing?"
In my experience, the developers who try that approach have a terrible time debugging their code later. This is why the ability to write "for...of" loops with "await" has been such a big win for junior JavaScript developers.
As for variables outside the loop, they actually would behave reasonably. For instance, this code would yield sensible results:
const interval = setInterval(progressDisplay, 250);
let progress = 0;
for (const piece of pieces concurrency 5) {
await db.insert(piece);
progress++;
}
clearInterval(interval);
function progressDisplay() {
console.log(progress + ' of ' + pieces.length + ' completed');
}
Even in more complex cases, keep in mind that this is just suspension via "await", not preemptive multitasking, so the side effects are not that strange.
Can you give an example of a situation where this would not behave reasonably?
I can see that my proposal would be stronger if I removed "concurrently" and kept only "concurrency N". That is the feature I really wanted, and including "concurrently" was just a gesture to completeness but as others have pointed out, it's not a completeness we want.
The only place unbounded concurrency really tends to make sense is when you don't have a giant array of things to process - you have a handful of predetermined tasks that you know can run concurrently. I can picture a language feature for that, but it wouldn't involve "for...of" loops (:
REVISED PROPOSAL (thanks for the input so far!)
Background
Developers learning async programming in the modern async/await era frequently discover this useful pattern:
for (item of items) {
await db.insert(item);
// additional awaited operations, etc.
}
This pattern extends the readability of "await" to cases where each item in an array, or each item in an iterable, requires some asynchronous processing. It also ensures that items are processed one at a time, avoiding unbounded concurrency that results in excessive stress on back ends, triggers API rate limiting, and otherwise results in unpredictable bugs.
However, a common next step is to wish for a manageable level of concurrency. For instance, processing 500 asynchronous calls at once is unwise, especially in a web application that is already handling 100 requests at once. But, processing 5 at once may be reasonable and improve the processing time per request.
Unfortunately, at present, this requires shifting mental gears from async/await to promises. Here is an example based on Bluebird, there are of course other libraries for this:
const Promise = require('bluebird');
await Promise.map(items, async function(item) {
await db.insert(item);
// additional awaited operations, etc.
}, { concurrency: 5 });
While effective, this is a lot of boilerplate and a shift of mental model. And in my experience as a mentor to many developers, this is *the only situation in which they frequently need to reach for a promise library. *A language feature to naturally integrate it with async/await would substantially expand the effectiveness of async/await as a tool for reducing the cognitive load of asynchronous programming.
Proposed Feature
I propose extending the existing async / await syntax to accommodate specifying the concurrency of a for...of loop:
for (item of items concurrency 5) {
// db.insert is an async function
await db.insert(item);
}
console.log('Processing Complete');
The major benefit here is that *the developer does not have to shift gears mentally from async/await to thinking about promises to deal with a common case in systems programming. *They are able to continue with the pattern they are already comfortable with.
Up to 5 loop bodies commence concurrently with regard to "await" statements (see below).
There is no guarantee that item 3 will finish before item 2, or that item 4 won't start (due to 3 being finished) before item 2 ends, etc.
If an exception is not caught inside the loop body, the loop stops, that exception is thrown beyond the loop, and any further exceptions from that invocation of the loop due to concurrently executing loop bodies are discarded.
Just as with an ordinary "for...of" loop containing an "await" statement, it is guaranteed that, barring an exception, the loop body will execute completely for every item in "items" before the loop exits and the console.log statement executes. The only difference is that the specified amount of concurrency is permitted during the loop.
"5" may be any expression resulting in a number. It is cast to an integer. If the result is not a natural number, an error is thrown (it must not be 0 or a negative number).
FAQs
"What if I want unlimited concurrency?"
It is rarely a good idea. It results in excessive stress on back ends, unnecessary guards that force serialization in interface libraries just to cope with (and effectively negate) it, and API rate limiting. This feature teaches the best practice that the level of concurrency should be mindfully chosen. However, those who really want it can specify "concurrency items.length" or similar.
"What about async iterators?"
The feature should also be supported here:
for async (item of items concurrency 5) {
// db.insert is an async function
await db.insert(item);
}
While the async iterator itself is still sequential rather than concurrent, frequently these can supply values considerably faster than they are processed by the loop body, and so there is still potential benefit in having several items "in the hopper" (up to the concurrency limit) at a time.
I don't understand how this would work.
for (const thing of things concurrency 5) {
const result = await thing();
console.log(result); // <== what is `result` here, if the call to thing()
hasn't completed?
}
Also, it's intellectually unsatisfying that I can't specify concurrency for all the async calls in
await foo();
await bar();
for (const thing of things) await thing();
await baz();
Bob
Up to five instances of the loop body would be in progress, assuming at least one of them awaits at some point. any await inside the loop body would be respected with regard to code that follows it in the loop body. The only place concurrency comes into play is that more than one of these loopbodies could be in progress for separate items, for instance they could all be awaiting a long database operation or API call. To avoid the frequent problems that come up with unlimited concurrency when dealing with apis and similar, you set the concurrency so that you don't exceed those reasonable limits.
It should really be in the form of a static method like Promise.map
or
some other naming
Suppose you wanted to build an ordered array of results given the async
processes. Suppose the iterations vary in duration. Without the
"concurrent" keyword, the ordering is guaranteed. Suppose someone adds
"concurrent" just motivated by performance. All of a sudden the ordering is
messed up as some iterations take longer than others. Of course this can be
overcome, but my point is that it is not as obvious as when just using
Promise.all and keeping the "guaranteed" behaviour of await
inside loops.
Suppose you wanted to build an ordered array of results given the async
processes. Suppose the iterations vary in duration. Without the "concurrent" keyword, the ordering is guaranteed.
That's a really good point. And I did after all reference Promise.map from bluebird in my own discussion. I wasn't interested in its primary use case
- mapping input to output, yielding a new array with the same ordering - but of course others may be.
I suspect the set of cases where you're seriously thinking about using "concurrency 5" is the same set of cases where mapping the whole thing to a new array isn't interesting because you're storing back to a database as you go along, etc., but I recognize it's not a strict correlation between use cases for concurrency and use cases where you wouldn't wish for map() behavior.
So yes, there is some developer education required around the feature's side effects. I still think it's worthwhile. I don't recall having actually used the "map" capability of Promise.map, just the ability to run through all the inputs with bounded concurrency. But others here may tell me they use this all the time and would find the behavior of the new language feature actively frustrating. Let's see.
Also, wouldn't this be better formulated in terms of tasks and not promises? Tasks could be decoupled from list iteration, for one.
Also, wouldn't this be better formulated in terms of tasks and not
promises? Tasks could be decoupled from list iteration, for one.
That's interesting, what would this look like?
There's a tradeoff here.
If it's an extension of for...of
, it's much clearer for users. If it's
a method of Promise
, it's much more extensible.
Perhaps it could be a little bit of both?
Maybe something like:
for(item of items : Scheduler.limit(5)) {
// ...
}
where Scheduler.limit
is one of several possible execution strategies?
One could imagine more advanced strategies, such as sharing a scheduler
between several loops, or adapting scheduling to external constraints
such as memory pressure.
One possible semantics for this would be to treat it as syntactic sugar for:
Scheduler.limit(5)(function*() {
for(item of items) {
// ...
}
})
There's a tradeoff here.
If it's an extension of
for...of
, it's much clearer for users. If it's a method ofPromise
, it's much more extensible.Perhaps it could be a little bit of both?
Maybe something like:
for(item of items : Scheduler.limit(5)) { // ... }
where
Scheduler.limit
is one of several possible execution strategies?Cheers, David
I like this idea a lot; it builds on the goal of staying in the linear async/await mindset (as you say, clearer for users) while providing extensibility.
The idea of passing in a function that implements the concurrency strategy is great. 'm a little unclear on how the syntax transformation you suggested would do the trick, I think the loop body would have to get wrapped in an anonymous async function. I'm not sure whether a generator is required.
I would imagine the strategy function would receive an iterable (or async iterable, depending on whether "for await" was used) that retrieves promises, one for each loop body, and can thus pull them at its own pace.
Specifying concurrency for "for...of" loops potentially containing "await" statements in the loop body
In the async/await era, I see most developers using the async and await keywords in 90% of situations, shifting to "Promise.all" or the bluebird library only to cope with concurrency issues.
The most common case in my experience is the need to admit a manageable level of parallelism when iterating a large array (or iterator, see the final section) and performing asynchronous work on each item. Unlimited concurrency (Promise.all) tends to overwhelm backends involved in a way that confuses developers as to what is happening. Bluebird's "Promise.map" permits concurrency to be specified, which is great, but requires pulling in a library and switching of mental gears ("OK right, these async functions return promises," etc).
To build on the friendliness of async/await, I propose these two syntaxes be accepted:
SYNTAX ONE
for (item of items concurrent) { // db.insert is an async function await db.insert(item); }
In Syntax One, all loop bodies commence concurrently (see below for the definition of "concurrently" with regard to async). If an exception is not caught inside the loop, it is thrown beyond the loop, and all exceptions subsequently thrown by concurrently executing loop bodies are discarded (like Promise.all).
While I feel that unlimited concurrency is usually a mistake in a situation where you have an array of items of unpredictable number, it seems odd not to have a syntax for this case, and "concurrency 0" seems clunky.
SYNTAX TWO
for (item of items concurrency 5) { // db.insert is an async function await db.insert(item); }
in Syntax Two, up to 5 loop bodies commence concurrently (see below). There is no guarantee that item 3 will finish before item 2, or that item 4 won't start (due to 3 being finished) before item 2 ends, etc. If an exception is not caught inside the loop, it is thrown beyond the loop, and all exceptions subsequently thrown by concurrently executing loop bodies are discarded (like Promise.all in this respect, except for the restriction of concurrency).
DEFINING CONCURRENCY FOR ASYNC
For purposes of this proposal, "concurrent" execution means that multiple loop bodies may be suspended via "await" at any given time. It does NOT refer to multithreaded execution, worker threads, etc.
CONSIDERATIONS FOR ASYNC ITERATORS
Async iterator syntax for "for...of" loops, as in:
for await (item of itemsCursor) { ... }
Should also support concurrency for the loop body, with the same syntax:
for await (item of itemsCursor concurrency 5) { ... }
*It is important to note that this syntax does not add concurrency to the async iterator itself, *at least not at this time, as I believe the interface for defining async iterators does not currently accommodate this. However this syntax is still useful because it fetches the items sequentially from the iterator, but may "fill the hopper" with up to five iterator results that are currently being actively processed by loop bodies. In many cases, fetching items via an iterator is much faster than the processing that will be done to them in the loop bodies, and so this is still useful.
Thanks for reading!