Dataflow concurrency and promises
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Just FYI, there has also been significant discussion on CommonJS regarding the use promises [1] and concurrency [2] in JavaScript, CommonJS is focused on standardizing library APIs for JavaScript, in particular on the server. Obviously on the server the concurrency needs are extensive, and without the traditional browser platform, it is has been important to define concurrency expectations. Generally consensus has been around maintaining the worker API/model of the browser, and employing promises for the majority of asynchronous functions (here is current API proposal/outline, albeit perhaps in need of update [3]). I don't know that our discussions will provide any significant revelations in these areas, it is primarily rehashing well-covered ground. But, obviously, CommonJS would want to stay in sync with EcmaScript plans. [1] groups.google.com/group/commonjs/browse_thread/thread/fa20fc6a3649b3a, groups.google.com/group/commonjs/browse_thread/thread/fa20fc6a3649b3a [2] groups.google.com/group/commonjs/browse_thread/thread/e93f73ef97e88439/50fc4d51eb0a9ee5 [3] wiki.commonjs.org/wiki/Promises Kris
David-Sarah Hopwood wrote:
Brendan Eich wrote:
Beyond this, concurrency via workers is great for certain use-cases but not enough for others.
In TC39 we are talking about formalizing the run-to-completion execution model of JS, along with asynchronous message passing concurrency. In particular, we're looking at Promises (precedent from E) and Futures (differently, in MultiLisp and Alice-ML). At least one contributor on es-discuss has advocated lower-level components such as dataflow
variables.
That was presumably me: mail.mozilla.org/pipermail/es5-discuss/2009-May/002557.html. However, I don't agree that dataflow variables are "lower level".
It's too early to predict what we'll do but I hear strong consensus in favor of asynchronous messaging and shared-nothing, with higher-level abstractions such as Promises favored over lower-level concurrent programming features such as dataflow variables.
"Dataflow variable", "promise", and "future" are different kinds of delayed reference; they are at about the same level of abstraction, and are very similar to each other. The fact that different terminology is used for them that obscures the similarities is something of a historical accident.
See en.wikipedia.org/wiki/Futures_and_promises. (Full disclosure: I wrote quite a bit of this article.)
There are, however, some differences. Attempting to synchronously use an unbound dataflow variable will block until it is bound. Attempting to synchronously use an unbound (unresolved) promise will, at least in E, throw an exception. Note that in both cases, you can use a 'when' construct to wait for the delayed reference to become bound/resolved, and then the observable semantics are the same.
Because the dataflow model (as supported in Oz) does not treat synchronously accessing an unbound delayed reference as an error, it is more general: it allows either a "dataflow style" of programming that depends on blocking, or an "event-loop style" that avoids blocking. An event-loop model (as supported in E) enforces the event-loop style.
I am not arguing that a more general model is necessarily better. A less general model can in some cases be preferable because it can be an advantage for all code to be written in the same style.
OTOH, the dataflow model does provide more flexibility: a programmer can choose to use a pure dataflow style, or a pure event-loop style, or to combine them. Combining them is not entirely without difficulty, but that difficulty is restricted to liveness issues rather than safety issues (that is, the combined programming style is still highly resistant to low-level race conditions, although more care is needed to avoid deadlocks). If a programmer chooses to use the combined style, for programs that need it, there is IMHO a significant payoff in
expressiveness.
(This is the model that ideally I would like to be able to program in all the time. Of course you don't have to pay attention to my personal preference, but there is plenty of support for it among Oz programmers, and in the book "Concepts, Techniques, and Models of Computer
Programming".)
The dataflow model potentially allows easier interoperation with APIs that are not based on event loops.
It also seems that there is some confusion about "asynchronous messaging and shared-nothing". First, I agree completely that we want to support asynchronous message passing in JavaScript. In fact that combines very naturally with dataflow variables, just as well as it does with promises.
"Shared-nothing" is a term used in the Erlang community to contrast message-passing models with shared-memory models. The term is inaccurate; it is not the case that processes "share nothing" in Erlang or in other message-passing languages. The restriction that this term is intended to refer to is that processes do not have synchronous shared access to the same mutable state.
The following forms of sharing between processes are consistent with maintaining the advantages of message-passing models relative to shared-memory models:
processes can share references to other processes.
In the case of a vat-based model (where vats are the units of concurrency and each object belongs to a vat), an object can have a reference to an object in another vat, but it can only use the reference by asynchronous message passing. References within the same vat can also be used via synchronous message passing.
The model I would suggest for JavaScript would be like E in this respect: it provides significant and useful additional expressiveness. The partial isolation between vats, which allows vats to fail or be destroyed independently, is also very useful in a web context. For example the default behaviour (when no additional vats are created explicitly), could be that a vat is created for each JavaScript context.
processes can have shared access to a single copy of a deeply immutable structure. This is equivalent to copying the structure, except for the important issue of reduced memory usage.
processes can have shared access to declarative structures -- that is, structures that can be extended but not mutated. This is in practice relatively easy to reason about, and does not introduce the same programming difficulties as a shared-memory model.
(It does introduce a limited form of nondeterminism: if two processes attempt to make a conflicting extension, the program will fail. This is a programming error. Programs without such errors behave deterministically, and programs with such errors deterministically fail, but the side-effects that occur before they fail may be nondeterministic.)
Kris Zyp SitePen (503) 806-1841 sitepen.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - enigmail.mozdev.org
iEYEARECAAYFAkrC2X0ACgkQ9VpNnHc4zAzkwACeOg3nnPGoNTCrjpo9kYsCs1gj VhcAoIZX7YzkO3TUQnOMTcaqZulpk92X =JZKF -----END PGP SIGNATURE-----
Brendan Eich wrote:
That was presumably me: mail.mozilla.org/pipermail/es5-discuss/2009-May/002557.html.
However, I don't agree that dataflow variables are "lower level".
"Dataflow variable", "promise", and "future" are different kinds of delayed reference; they are at about the same level of abstraction, and are very similar to each other. The fact that different terminology is used for them that obscures the similarities is something of a historical accident.
See en.wikipedia.org/wiki/Futures_and_promises. (Full disclosure:
I wrote quite a bit of this article.)
There are, however, some differences. Attempting to synchronously use an unbound dataflow variable will block until it is bound. Attempting to synchronously use an unbound (unresolved) promise will, at least in E, throw an exception. Note that in both cases, you can use a 'when' construct to wait for the delayed reference to become bound/resolved, and then the observable semantics are the same.
Because the dataflow model (as supported in Oz) does not treat synchronously accessing an unbound delayed reference as an error, it is more general: it allows either a "dataflow style" of programming that depends on blocking, or an "event-loop style" that avoids blocking. An event-loop model (as supported in E) enforces the event-loop style.
I am not arguing that a more general model is necessarily better. A less general model can in some cases be preferable because it can be an advantage for all code to be written in the same style.
OTOH, the dataflow model does provide more flexibility: a programmer can choose to use a pure dataflow style, or a pure event-loop style, or to combine them. Combining them is not entirely without difficulty, but that difficulty is restricted to liveness issues rather than safety issues (that is, the combined programming style is still highly resistant to low-level race conditions, although more care is needed to avoid deadlocks). If a programmer chooses to use the combined style, for programs that need it, there is IMHO a significant payoff in expressiveness.
(This is the model that ideally I would like to be able to program in all the time. Of course you don't have to pay attention to my personal preference, but there is plenty of support for it among Oz programmers, and in the book "Concepts, Techniques, and Models of Computer Programming".)
The dataflow model potentially allows easier interoperation with APIs that are not based on event loops.
It also seems that there is some confusion about "asynchronous messaging and shared-nothing". First, I agree completely that we want to support asynchronous message passing in JavaScript. In fact that combines very naturally with dataflow variables, just as well as it does with promises.
"Shared-nothing" is a term used in the Erlang community to contrast message-passing models with shared-memory models. The term is inaccurate; it is not the case that processes "share nothing" in Erlang or in other message-passing languages. The restriction that this term is intended to refer to is that processes do not have synchronous shared access to the same mutable state.
The following forms of sharing between processes are consistent with maintaining the advantages of message-passing models relative to shared-memory models:
processes can share references to other processes.
In the case of a vat-based model (where vats are the units of concurrency and each object belongs to a vat), an object can have a reference to an object in another vat, but it can only use the reference by asynchronous message passing. References within the same vat can also be used via synchronous message passing.
The model I would suggest for JavaScript would be like E in this respect: it provides significant and useful additional expressiveness. The partial isolation between vats, which allows vats to fail or be destroyed independently, is also very useful in a web context. For example the default behaviour (when no additional vats are created explicitly), could be that a vat is created for each JavaScript context.
processes can have shared access to a single copy of a deeply immutable structure. This is equivalent to copying the structure, except for the important issue of reduced memory usage.
processes can have shared access to declarative structures -- that is, structures that can be extended but not mutated. This is in practice relatively easy to reason about, and does not introduce the same programming difficulties as a shared-memory model.
(It does introduce a limited form of nondeterminism: if two processes attempt to make a conflicting extension, the program will fail. This is a programming error. Programs without such errors behave deterministically, and programs with such errors deterministically fail, but the side-effects that occur before they fail may be nondeterministic.)