Function#fork
FWIW, my own primary concern is that most (all?) JS applications are driven by an external event loop, and I suspect that many of the native API wrappers assume that the JS state is immutable while they do their (synch) work. This would break that expectation, assuming there's any shared state.
Besides web workers there are two straw man proposals that address adding parallelism and concurrency to JavaScript.
strawman:data_parallelism and strawman:concurrency.
The Parallel JavaScript (River Trail) proposal has a prototype implementation available at rivertrail/rivertrail/wiki. You should be able to implement your example's functionality using this API.
The latest HotPar www.usenix.org/conference/hotpar12/tech-schedule/workshop-program had two interesting papers
Parallel Programming for the Webwww.usenix.org/conference/hotpar12/parallel-programming-web, www.usenix.org/conference/hotpar12/parallel-programming-web
and Parallel Closures: A New Twist on an Old Idea www.usenix.org/conference/hotpar12/parallel-closures-new-twist-old-idea
These projects each address some important part of the general problem of adding parallelism and concurrency to JavaScript.
Feedback is always appreciated.
-
Rick
From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Jussi Kalliokoski Sent: Monday, September 24, 2012 8:44 AM To: es-discuss Subject: Function#fork
Hello everyone,
I've been thinking a lot about parallel processing in the context of JavaScript, and this is really a hard problem. I'm very curious to hear what everyone's opinions are about it's problems and so forth, but I don't think an open question like that will give very interesting results, so I have an example problem for discussion (while it seems like a bad idea to me, and unlikely to ever get to the language, what I want to know is everyone's reasoning behind their opinions whether it's for or against).
What if we introduce Function#fork(), which would call the function in another thread that shares state with the current one (how much state it shares is an open question I'd like to hear ideas about, but one possibility is that only the function arguments are shared) using a similar signature to Function#call except that the first argument would be a callback, which would have error as its first argument (if the forked function throws with the given arguments, it can be controlled) and the return value of the forked function as the second argument.
- What are the technical limitations of this?
- What are the bad/good implications of this on the language users?
- Better ideas?
- etc.
I have a detailed example of showing Function#fork in action [1] (I was supposed to make a simplified test, but got a bit carried away and made it do "parallel" fragment shading), it uses a simple fill-in for the Function#fork using setTimeout instead of an actual thread.
Cheers, Jussi
Thanks for the links, very interesting! I was already aware of River Trail and other concurrency proposals for JavaScript, my purpose for this thread was anyway to get good clarification on what approaches are impossible and why and what approaches are possible and what are their virtues / downsides. So thanks again, those two papers are more than I hoped for! But I hope that there will be more discussion about this.
Le 24/09/2012 14:43, Jussi Kalliokoski a écrit :
Hello everyone,
I've been thinking a lot about parallel processing in the context of JavaScript, and this is really a hard problem. I'm very curious to hear what everyone's opinions are about it's problems and so forth, but I don't think an open question like that will give very interesting results, so I have an example problem for discussion (while it seems like a bad idea to me, and unlikely to ever get to the language, what I want to know is everyone's reasoning behind their opinions whether it's for or against).
The concurrency strawman [1] defines a concurrency as well as parallelism model. So far, it's been expressed as the favorite model for general purpose parallelism. Different use cases are efficiently solved by different forms of parallelism, for instance, there is another strawman on data parallelism [2] for the case of applying the same computation to a large amount of data.
What if we introduce Function#fork(), which would call the function in another thread that shares state with the current one.
Shared state (no matter how much) always has the same story. 2 computations units want to access the shared state concurrently, but for the sake of the shared state integrity, they can't access the state simultaenously. So we need to define a form of mutex (for "MUTual EXclusion") for a computation unit to express the intention to use the state that should be used by one computation unit at once. With mutexes as we know them, used at scale, you end up with deadlocks which are nasty bugs to find out and debug. This is all a consequence of the idea of shared state.
Of all this story, 2 parts can be attacked to fix the problem. Either, define something better than what we know of mutexes (I have no idea of what it would look like, but that's an interesting idea) or get rid of shared state. The current concurrency strawman is doing the latter.
One annoying thing of naive no-shared-state systems as we know them is that everything has to be copied from a computation unit to another. That's not exactly true though. It's always possible to implement a copy-on-right mechanism. Another idea is to define ownership over data. HTML5 defines "transferable" objects [3] which can be passed back and forth form worker to worker but can always be used in one worker at a time. Rust has a concept of "unique pointer" which is the same idea. Another idea would be to have data structures which live in 2 or more computation units, showing just an interface to each and which integrity would be taken care of under the hood by the "VM" and not client code. This is what local storage does for instance.
I will fight very hard against the idea of shared state, because there are very few benefits against all what it costs in large-scale programs.
David
[1] strawman:concurrency [2] strawman:data_parallelism [3] updates.html5rocks.com/2011/12/Transferable
On Mon, Sep 24, 2012 at 10:12 AM, David Bruant <bruant.d at gmail.com> wrote:
Le 24/09/2012 14:43, Jussi Kalliokoski a écrit :
Hello everyone,
I've been thinking a lot about parallel processing in the context of JavaScript, and this is really a hard problem. I'm very curious to hear what everyone's opinions are about it's problems and so forth, but I don't think an open question like that will give very interesting results, so I have an example problem for discussion (while it seems like a bad idea to me, and unlikely to ever get to the language, what I want to know is everyone's reasoning behind their opinions whether it's for or against). The concurrency strawman [1] defines a concurrency as well as parallelism model. So far, it's been expressed as the favorite model for general purpose parallelism. Different use cases are efficiently solved by different forms of parallelism, for instance, there is another strawman on data parallelism [2] for the case of applying the same computation to a large amount of data.
What if we introduce Function#fork(), which would call the function in another thread that shares state with the current one. Shared state (no matter how much) always has the same story. 2 computations units want to access the shared state concurrently, but for the sake of the shared state integrity, they can't access the state simultaenously. So we need to define a form of mutex (for "MUTual EXclusion") for a computation unit to express the intention to use the state that should be used by one computation unit at once. With mutexes as we know them, used at scale, you end up with deadlocks which are nasty bugs to find out and debug. This is all a consequence of the idea of shared state.
Of all this story, 2 parts can be attacked to fix the problem. Either, define something better than what we know of mutexes (I have no idea of what it would look like, but that's an interesting idea) or get rid of shared state. The current concurrency strawman is doing the latter.
One annoying thing of naive no-shared-state systems as we know them is that everything has to be copied from a computation unit to another. That's not exactly true though. It's always possible to implement a copy-on-right mechanism. Another idea is to define ownership over data. HTML5 defines "transferable" objects [3] which can be passed back and forth form worker to worker but can always be used in one worker at a time. Rust has a concept of "unique pointer" which is the same idea. Another idea would be to have data structures which live in 2 or more computation units, showing just an interface to each and which integrity would be taken care of under the hood by the "VM" and not client code. This is what local storage does for instance.
I will fight very hard against the idea of shared state, because there are very few benefits against all what it costs in large-scale programs.
I'd also be concerned about something like this...
/* main process */ global.foo = { value: 1 };
function computer( value ) { /* fork process */
global.foo.value = value; }
computer.fork( 20 );
// nothing is stopping me from calling like this either: computer( 10 );
If the function that's called with .fork() tries to use references that only exist in the main process, throw? This works in your example because the fork() exists in the same global scope, but I can't see how it would work for an actual new process. The solution to that problem will immediately run into the issues that David mentioned above.
On Mon, Sep 24, 2012 at 10:12 AM, David Bruant <bruant.d at gmail.com> wrote:
Le 24/09/2012 14:43, Jussi Kalliokoski a écrit :
Hello everyone,
I've been thinking a lot about parallel processing in the context of JavaScript, and this is really a hard problem. I'm very curious to hear what everyone's opinions are about it's problems and so forth, but I don't think an open question like that will give very interesting results, so I have an example problem for discussion (while it seems like a bad idea to me, and unlikely to ever get to the language, what I want to know is everyone's reasoning behind their opinions whether it's for or against). The concurrency strawman [1] defines a concurrency as well as parallelism model. So far, it's been expressed as the favorite model for general purpose parallelism. Different use cases are efficiently solved by different forms of parallelism, for instance, there is another strawman on data parallelism [2] for the case of applying the same computation to a large amount of data.
What if we introduce Function#fork(), which would call the function in another thread that shares state with the current one. Shared state (no matter how much) always has the same story. 2 computations units want to access the shared state concurrently, but for the sake of the shared state integrity, they can't access the state simultaenously. So we need to define a form of mutex (for "MUTual EXclusion") for a computation unit to express the intention to use the state that should be used by one computation unit at once. With mutexes as we know them, used at scale, you end up with deadlocks which are nasty bugs to find out and debug. This is all a consequence of the idea of shared state.
Of all this story, 2 parts can be attacked to fix the problem. Either, define something better than what we know of mutexes (I have no idea of what it would look like, but that's an interesting idea) or get rid of shared state. The current concurrency strawman is doing the latter.
One annoying thing of naive no-shared-state systems as we know them is that everything has to be copied from a computation unit to another. That's not exactly true though. It's always possible to implement a copy-on-right mechanism. Another idea is to define ownership over data. HTML5 defines "transferable" objects [3] which can be passed back and forth form worker to worker but can always be used in one worker at a time. Rust has a concept of "unique pointer" which is the same idea. Another idea would be to have data structures which live in 2 or more computation units, showing just an interface to each and which integrity would be taken care of under the hood by the "VM" and not client code. This is what local storage does for instance.
I will fight very hard against the idea of shared state, because there are very few benefits against all what it costs in large-scale programs.
s/shared state/shared mutable state/
You left out perhaps the most interesting option: persistent immutable data structures. Some implementations already exist in user land but in order to really get the concurrency win I suspect we'd probably need vm support.
Interestingly, IIRC all modern js vms already bake in a form of persistent data structure with ropes -- it's not too much of a stretch to imagine similarly immutable collection APIs. (And just to be clear: immutable here does not mean an expensive CoW to "modify" -- with structural sharing these operations can be far cheaper.)
Very nice insight, thanks. I agree with you on the shared state, transferable ownership is a much more tempting option. Especially since even immutable shared state is hard to achieve in JS, given we have getters, setters and proxies, etc.
Le 24/09/2012 16:39, Dean Landolt a écrit :
s/shared state/shared mutable state/
True, I took it as granted, since objects are by default (very) mutable in JavaScript.
On Mon, Sep 24, 2012 at 10:44 AM, David Bruant <bruant.d at gmail.com> wrote:
Le 24/09/2012 16:39, Dean Landolt a écrit :
s/shared state/shared mutable state/
True, I took it as granted, since objects are by default (very) mutable in JavaScript.
Not necessarily, and they certainly don't have to be. To reiterate: immutable doesn't mean they have no update operations, just that these operations don't update-in-place. It's the difference between Array.prototype.concat and Array.prototype.push. With vm support this could be done cheaply. Code would be easier to reason about. There are some other subtle wins like being able to keep aggregate calculations in the tree branches (like how CouchDB stores its reduce calcs, but more generalizable, see [1]). Oh, and you'd be able to pass these objects freely between worker boundaries.
There's plenty of room in the language for persistent immutable collections.
Let me put bounds on this, then:
Approaches that enable shared mutable state are non-starters. A "send" based-approach might work (e.g., Worker Tranferrables) as might automatic parallelization (e.g., RiverTrail) -- but threads and thread-like semantics aren't gonna happen. Turn-based execution with an event loop is how JS works and anything that changes that apparent semantic won't fly.
As I suspected. Glad to hear my assumptions were correct. :) I think this is a good thing actually, we'll have a good "excuse" not to have shared state in the language (fp yay).
For the record, I've updated my initial JS fragment shading experiment using Workers (on which my previous example was based) to take use of the Transferrables [1] [2]. If you compare the results on Chrome and Firefox, the benefit of Transferrables is quite impressive.
There seems to be a small downside to Transferrables though, as I couldn't figure out a way to send parts of an ArrayBuffer using them.
Cheers, Jussi
[1] labs.avd.io/parallel-shading/test.html [2] gist.github.com/2689799
On Mon, Sep 24, 2012 at 9:17 AM, Jussi Kalliokoski < jussi.kalliokoski at gmail.com> wrote:
As I suspected. Glad to hear my assumptions were correct. :) I think this is a good thing actually, we'll have a good "excuse" not to have shared state in the language (fp yay).
For the record, I've updated my initial JS fragment shading experiment using Workers (on which my previous example was based) to take use of the Transferrables [1] [2]. If you compare the results on Chrome and Firefox, the benefit of Transferrables is quite impressive.
There seems to be a small downside to Transferrables though, as I couldn't figure out a way to send parts of an ArrayBuffer using them.
Cheers, Jussi
Made laptop too hot for lap ;).
On Mon, Sep 24, 2012 at 9:19 PM, Mark S. Miller <erights at google.com> wrote:
On Mon, Sep 24, 2012 at 9:17 AM, Jussi Kalliokoski < jussi.kalliokoski at gmail.com> wrote:
As I suspected. Glad to hear my assumptions were correct. :) I think this is a good thing actually, we'll have a good "excuse" not to have shared state in the language (fp yay).
For the record, I've updated my initial JS fragment shading experiment using Workers (on which my previous example was based) to take use of the Transferrables [1] [2]. If you compare the results on Chrome and Firefox, the benefit of Transferrables is quite impressive.
There seems to be a small downside to Transferrables though, as I couldn't figure out a way to send parts of an ArrayBuffer using them.
Cheers, Jussi
Made laptop too hot for lap ;).
Heheh, luckily there are better technologies available for this exact use case. :D
A blog post from 2007 on this topic, which you might enjoy:
Hello everyone,
I've been thinking a lot about parallel processing in the context of JavaScript, and this is really a hard problem. I'm very curious to hear what everyone's opinions are about it's problems and so forth, but I don't think an open question like that will give very interesting results, so I have an example problem for discussion (while it seems like a bad idea to me, and unlikely to ever get to the language, what I want to know is everyone's reasoning behind their opinions whether it's for or against).
What if we introduce Function#fork(), which would call the function in another thread that shares state with the current one (how much state it shares is an open question I'd like to hear ideas about, but one possibility is that only the function arguments are shared) using a similar signature to Function#call except that the first argument would be a callback, which would have error as its first argument (if the forked function throws with the given arguments, it can be controlled) and the return value of the forked function as the second argument.
I have a detailed example of showing Function#fork in action [1] (I was supposed to make a simplified test, but got a bit carried away and made it do "parallel" fragment shading), it uses a simple fill-in for the Function#fork using setTimeout instead of an actual thread.
Cheers, Jussi
[1] gist.github.com/3775697