Dataflow concurrency instead of generators

# David-Sarah Hopwood (16 years ago)

Brendan Eich wrote:

On May 14, 2009, at 4:34 PM, David-Sarah Hopwood wrote:

This approach avoids any problems due to a generator being able to interfere with the control flow of its callers.

A generator can't interfere with the control flow of its callers.

Can you give an example of what you meant by that?

I meant this:

Brendan Eich wrote:

Jason Orendorff wrote:

In ES5, when you call a function, you can expect it to return or throw eventually. (Unless you run out of memory, or time, and the whole script gets terminated.) With shallow generators, this is still true. A 'yield' might never return control, but function calls are ok. But with generators+lambdas, almost any function call anywhere in the program might never return or throw. This weakens 'finally', at least.

To make this clear with an example (thanks to Jason for some IRC interaction):

function gen(arg) { foo((lambda (x) yield x), arg); } function foo(callback, arg) { try { callback(arg); } finally { alert("I'm ok!"); } } g = gen(42); print(g.next()); // tell the user the meaning of life, etc. g = null; gc();

I think finally is the only issue, since how else can you tell that foo didn't see a return or exception from the callback?

The consequences of this issue are not restricted to code using 'finally'; even without finally, yield+generators complicates the conceptual model of call-return control flow, in ways that are not possible with yield alone (since yield is restricted to only making non-local jumps to lexically enclosing labels) or shallow generators alone (since they can be modelled as a local transformation).

The recent huge thread on python-ideas about "yield-from" disabused me once and for all of the idea that (non-shallow) generator semantics are simple to understand, even for experts.

# David-Sarah Hopwood (16 years ago)

[I sent this to es5-discuss, when I intended es-discuss. Sorry for the noise for people subscribed to both.]

# Brendan Eich (16 years ago)

On May 14, 2009, at 5:29 PM, David-Sarah Hopwood wrote:

Brendan Eich wrote:

On May 14, 2009, at 4:34 PM, David-Sarah Hopwood wrote:

This approach avoids any problems due to a generator being able to interfere with the control flow of its callers.

A generator can't interfere with the control flow of its callers.

Can you give an example of what you meant by that?

I meant this:

Brendan Eich wrote:

Jason Orendorff wrote:

In ES5, when you call a function, you can expect it to return or
throw eventually. (Unless you run out of memory, or time, and the whole script gets terminated.) With shallow generators, this is still
true. A 'yield' might never return control, but function calls are ok.
But with generators+lambdas, almost any function call anywhere in the program might never return or throw. This weakens 'finally', at least.

To make this clear with an example (thanks to Jason for some IRC
interaction):

function gen(arg) { foo((lambda (x) yield x), arg);

Yeah, you meant generators plus lambdas -- not just generators.

Leave out lambda and there's no effect on control flow in callers of
the thing (generator function, not lambda) that yields.

# David-Sarah Hopwood (16 years ago)

John Cowan wrote:

David-Sarah Hopwood scripsit:

Then the functionality of a generator can be implemented using a process/thread that extends a list or queue constructed from dataflow variables.

Quite so. How, if at all, do these dataflow variables differ from Prolog variables?

Prolog itself is a sequential language (although there have been many concurrent extensions of it). Prolog supports logic variables, which are a generalisation of single-assignment variables that use a unification algorithm for update. Dataflow variables are generalised from single-assignment variables in a different direction, in order to support sychronization between concurrent threads or lightweight processes.

(view in fixed-width font)

single-assignment + unification logic variables ---------------> variables | | | + concurrency | + concurrency | | v + unification v dataflow variables ---------------> concurrent logic variables

Historically (and at the risk of oversimplifying), the development was from logic programming languages such as Prolog, to concurrent logic languages such as Flat Concurrent Prolog, and later to concurrent constraint languages such as AKL and Oz. Dataflow variables are a simplified version of concurrent logic variables that do not support update by full unification, but do support being bound more than once to identical values.

As a less complex option, lambdas + Lua-style semicoroutines.

Why do you say that the option I've suggested is complex? Anecdotal evidence from programmers with experience of Oz and other languages supporting dataflow concurrency suggests that this programming model is typically found to be quite simple to use. Certainly it is not complicated to specify or implement.

These are first-class and deep; you can yield at any point, not just lexically within the coroutine, but don't support resuming arbitrary coroutines, only the caller (but it's easy to write a general coroutine dispatcher). Lua also provides multiple value returns for both coroutines and functions, but currently has no support for native threads.

I don't agree that coroutines are less complex than dataflow concurrency. In fact, coroutines present almost exactly the same potential complexities in the programming model as threading with shared memory (which is significantly more complex than dataflow concurrency), but without the performance advantage of being able to support actual parallel execution.

# Jason Orendorff (16 years ago)

On Thu, May 14, 2009 at 7:35 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:

For instance, suppose that you have dataflow concurrency, as supported by Oz and by recent dataflow extensions for Ruby, Python, and Scala:

www.mozart-oz.org/documentation/tutorial/node8.html, larrytheliquid/dataflow/tree/master, pypi.python.org/pypi/dataflow/0.1.0, jboner/scala-dataflow/tree/master

Then the functionality of a generator can be implemented using a process/thread that extends a list or queue constructed from dataflow variables.

I think you are proposing a model in which threads cannot share mutable state and cannot have side effects which other threads can see.

Even the simplest generator use cases can't be implemented using threads and dataflow variables in that kind of model. To take a silly example:

function iter(arraylike) { for (var i = 0; i < arraylike.length; i++) yield arraylike[i]; }

for (elt in iter(document.getElementsByTagName("A"))) ...

DOM NodeLists are mutable and so are the nodes being yielded.

I like the idea of going after the concurrency problem now. I like the idea of doing one language feature instead of two (concurrency and generators). Do you have a solution in mind for this kind of problem?

# Brendan Eich (16 years ago)

On May 15, 2009, at 11:15 AM, Jason Orendorff wrote:

I like the idea of going after the concurrency problem now. I like the idea of doing one language feature instead of two (concurrency and generators). Do you have a solution in mind for this kind of problem?

Not to rain on anyone's parade here, but Ecma TC39 is not the right
group to do research and then standardize it.

We can spend unlimited es-discuss messages on this, of couse, but (a)
at a cost to other discussions; (b) not if the undelivered, always- over-the-next-hill promise of a unified solution is used to stop
progress of extensions toward standardization in TC39. One can easily
wait forever for the best -- Voltaire had something to say about that.

Implementors should experiment and take risks in the market. TC39
should standardize after that all shakes out. If we can have
agreements in committee on direction and even details, great. But the
committee does not have the breadth or competence to invent much
that's new while standardizing.

Meanwhile, until some two-birds-one-stone solution emerges (which
could happen, I'm skeptical about it happening soon), we have
generators shipping in SpiderMonkey and Rhino, and they help
programmers avoid having to hand-code tedious and error-prone state
management (control blocks for callbacks) when solving common
iteration and pseudo-threading problems.

But perhaps generators could be informative for a while. The reason
they came up was, as Mark said, because they were an extension already
considered Harmonious in TC39.

# Graydon Hoare (16 years ago)

On 15/05/09 11:15 AM, Jason Orendorff wrote:

Even the simplest generator use cases can't be implemented using threads and dataflow variables in that kind of model. To take a silly example:

function iter(arraylike) { for (var i = 0; i< arraylike.length; i++) yield arraylike[i]; }

for (elt in iter(document.getElementsByTagName("A"))) ...

DOM NodeLists are mutable and so are the nodes being yielded.

Oh, this will work in the supposed "dataflow" system. It works in the libraries David-Sarah pointed to. It'll just be completely unsafe in the bolt-on-the-language cases (i.e. "not Oz"). Because in those cases, the "dataflow" system is being used as little more than a voluntary half-duplex channel scheme. Key point being voluntary. Threads can still interfere via the shared graph, bypassing the channels. Which everyone will.

Unfortunately ES, ruby, python, etc. are all cyclic-memory, universal-shared-ownership OO languages, so you are transmitting nodes in your shared graph between threads. Even if you lock each thread in a box, you can't reasonably make a full copy of the world on each transmission, and confinement isn't designed-in to the ownership structure, so you're forced to do something cheesy like "attempt deep copy and throw when someone tries to transmit any cyclic bits" as a heuristic for preventing transmission of something that might be shared-ownership.

This is a design problem in the language. You want to be writing in newsqueak or erlang or something properly designed for channels (CoW acyclic ownership, or immutability, take your pick). ES isn't. This is why workers[1] live in isolated universes and communicate via JSON (or possibly this has been changed to an acyclic "structured clone"[2]). Anyway, both approaches do the "deep copy and throw on cyclic" thing. It's a kludge, but the best you can do here. The language wasn't designed to blur concurrency and iteration. If you try, you'll just get a mess (or systematic lack of safety).

Generators are comparatively much lighter (eg. no OS thread) and can be used to structure walks through existing object graphs, by a single thread, without being locked in a box or marshalling yield values. They're just an iteration-modularity device.

-Graydon

[1] www.whatwg.org/specs/web-workers/current-work [2] www.w3.org/TR/html5/infrastructure.html#structured

# David-Sarah Hopwood (16 years ago)

Jason Orendorff wrote:

On Thu, May 14, 2009 at 7:35 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:

For instance, suppose that you have dataflow concurrency, as supported by Oz and by recent dataflow extensions for Ruby, Python, and Scala:

www.mozart-oz.org/documentation/tutorial/node8.html, larrytheliquid/dataflow/tree/master, pypi.python.org/pypi/dataflow/0.1.0, jboner/scala-dataflow/tree/master

Then the functionality of a generator can be implemented using a process/thread that extends a list or queue constructed from dataflow variables.

I think you are proposing a model in which threads cannot share mutable state and cannot have side effects which other threads can see.

No. Even in a pure dataflow model, binding a dataflow variable causes a side effect that other threads can see, if they have access to that variable.

A pure dataflow model is sufficient to simulate generators, but as you point out, it doesn't suport some of the things you might want to do from a generator (or from other code). What I would like to propose for Harmony is a dataflow + asynchronous message passing model.

Even the simplest generator use cases can't be implemented using threads and dataflow variables in that kind of model.

The use case you've identified requires message passing in addition to dataflow variables, and it ideally requires the message passing to have a particular semantics that maximises compatibility with existing JavaScript. Actually this is needed to support the call to a DOM object in your example, rather than to support generators. But you are right that dataflow concurrency is probably not sufficient on its own if we want to make Harmony a practical concurrent language.

Bear in mind, though, that adding message passing significantly increases the expressiveness of the language, so we should take that into account when comparing the complexity of (lambdas + generators) with that of (lambdas + dataflow concurrency + message passing). The latter is much more powerful.

I'll describe the details of what I'm proposing, and how it applies to your example (which does work in this model), in a separate message, with subject "Dataflow concurrency and message passing for Harmony".

# Brendan Eich (16 years ago)

On May 16, 2009, at 5:50 PM, David-Sarah Hopwood wrote:

But you are right that dataflow concurrency is probably not sufficient on its own if we want to make Harmony a practical concurrent language.

That is not a goal at this point, and TC39 wouldn't hold the "H"- release for it.

Bear in mind, though, that adding message passing significantly increases the expressiveness of the language, so we should take that into account when comparing the complexity of (lambdas + generators) with that of (lambdas + dataflow concurrency + message passing). The latter is much more powerful.

Who says lambdas are "in"? Who says power is the good to maximize?
We've talked about avoiding things like lambdas with completion result
and TCP hazards, and full call/cc, to avoid inappropriate programmer
and implementor burdens and honey traps.

Generators do not require concurrency in the implementation or the
language, and they're easy to implement in current engines.

Harmony is not a research language. Anything like what you describe
would have to be prototyped by one or more of the major implementors.
We'll see. Some grains of salt are prescribed along with enthusiasm in
the mean time.

I'll describe the details of what I'm proposing, and how it applies to your example (which does work in this model), in a separate message, with subject "Dataflow concurrency and message passing for Harmony".

Looking forward to it.

# David-Sarah Hopwood (16 years ago)

Brendan Eich wrote:

On May 16, 2009, at 5:50 PM, David-Sarah Hopwood wrote:

Bear in mind, though, that adding message passing significantly increases the expressiveness of the language, so we should take that into account when comparing the complexity of (lambdas + generators) with that of (lambdas + dataflow concurrency + message passing). The latter is much more powerful.

Who says lambdas are "in"?

The above argument applies equally to a comparison between just (generators) and (dataflow concurrency + message passing).

(But I would be very disappointed if Harmony doesn't end up with lambdas :-)

Who says power is the good to maximize?

Expressive power isn't good to maximize if it is at the expense of safety or the ability to reason about programs (preferably using local reasoning whereever possible). However, if it is feasible to increase expressive power without compromising those properties -- as I believe it is in this case -- then yes, I think that's definitely a good thing.

(Implementation complexity and performance are also important, but personally I consider them to be lower priorities than safety and expressiveness -- especially since safety often has a lower performance cost than many people believe.)

Remember, I started this thread by arguing against (lambdas + generators) precisely because that combination caused a safety problem. So we have no disagreement on the principle that unsafe combinations of features should be avoided even if they would be highly expressive. We disagree on whether the identified problem is best avoided by ditching lambdas (and replacing the lost functionality with what?) or by ditching generators (and exceeding their functionality using dataflow concurrency and message passing, neither of which interact badly with lambdas).

Note that the fact that a language supports highly expressive features doesn't mean that any particular program must or should use those features. A program that is restricted to only using features that are necessary to solve the problem that it is supposed to be solving, will probably be easier to understand, debug, and maintain. This is essentially the "Principle of Least Expressiveness" discussed in the book "Concepts, Techniques and Models of Computer Programming". However, programming in a highly expressive language that supports multiple programming paradigms (but designed with considerable attention to maintaining safety properties) makes it easier, not harder, to follow this principle. And maybe JavaScript just isn't cut out to be that language, but I'd like to at least have a stab at making it closer.

# Brendan Eich (16 years ago)

On May 16, 2009, at 9:29 PM, David-Sarah Hopwood wrote:

(Implementation complexity and performance are also important, but personally I consider them to be lower priorities than safety and expressiveness -- especially since safety often has a lower
performance cost than many people believe.)

I'm still looking for evidence other than your certainty re: the last
dispute about semantic cleanliness (by your lights) vs. performance:

esdiscuss/2009-February/008862

When we see SFX (Nitro, heh), V8, and TraceMonkey on board then I will
be convinced. Until then hand-waves and generalizations don't really
cut it.

Remember, I started this thread by arguing against (lambdas +
generators) precisely because that combination caused a safety problem.

It's already the case that finallys may not run, whether or not you
add generators and lambdas.

So we have no disagreement on the principle that unsafe combinations of features should be avoided even if they would be highly expressive. We disagree on whether the identified problem is best avoided by ditching lambdas (and replacing the lost functionality with what?)

Waldemar argued for reforming function. If we add rest and default
parameters then there isn't much to do, other than try to force
Tennent into a language it doesn't fit.

Note that the fact that a language supports highly expressive features doesn't mean that any particular program must or should use those features.

Yeah, yeah... There are many bodies, fingers and toes already chalked
up to this kind of thinking applied elsewhere (C++).

Besides the honey traps, implementation barriers should not be raised
too high. One good reason to avoid this is to achieve interoperation
among many competing browser vendors. To pick on C++ again, it took
too long to get semi-interoperable implementations due in part to the
complexity inherent in all that expressiveness. There are other
languages from which to learn the same cautionary tale.

However, programming in a highly expressive language that supports multiple programming paradigms (but designed with considerable attention to maintaining safety properties) makes it easier, not harder, to follow this principle. And maybe JavaScript just isn't cut out to be that language, but I'd like to at least have a stab at making it closer.

On this very general point, we agree.