[[strawman:data_parallelism]] |this| and fat arrows

# Hudson, Rick (13 years ago)

Proposed change to [[strawman:data_parallelism]]

The ParallelArray methods map, combine, reduce, scan, scatter, and filter, each take a kernel function as an argument. Within this kernel function |this| is currently bound to the ParallelArray. This was natural and non-controversial as long as we used the function(){..} form which did not restrict how |this| was bound and explaining the semantics in terms of call or apply was perfectly reasonable. => is likely to change that. =>, as proposed, enforces a lexical |this| and is semantically different than the function () {..} form. Going forward we expect folks to use the => form more than the function form. For the most part we will be leaving the kernel signatures as they are except that we will no longer bind |this| to the ParalleArray is inside a kernel function. Instead |this| will be bound in accordance to existing JavaScript specifications and [[strawman:data_parallelism]] will no longer refer to |this|. For the combine method which currently gets only an index as an argument we will now also pass the ParallelArray in. If the programmer wants the ParallelArray in methods like reduce or map then it will have to be passed in as a free variable.

We considered and decided not to mimic Array's function (element, index, array) { ... } form. We felt that it causes intellectual confusion about parallel programming since passing in index and array force the programmer to think about things like order and location when using methods like map. It is even more intellectually confusing when using reduce since in a parallel world both of the values passed in may be the results of previous kernel invocations and as such not have an index in any reasonable sense. For combine and filter we pass the ParallelArray in as the second argument. For the other methods one can pass the ParallelArray in using a free variable.

# Mark S. Miller (13 years ago)

Even without =>, I think this is an improvement to the API overall, as

it makes it more similar to the corresponding array methods. However, I do not see the problem with making it much more similar. I agree regarding reduce and reduceRight, but fortunately, these already violate the general pattern for the other higher order array methods -- in that they call their callbackfn with "this" always bound to undefined. In retrospect, I wish we had called the existing function "reduceLeft" so yours could be called simply "reduce" and have the name difference suggest the lack of order.

Regarding the others, the general pattern from the ho array methods is

array.foo(callbackfn, possible-other-args, optional-this-arg)

calls back

callbackfn(value, index, array, this-arg-or-undefined)

I wish that the optional-this-arg had been omitted from ES5, but for the sake of compat with the Prototype library and other implementations of these methods, I lost that argument. In retrospect I agree with that decision, even though I still believe that the this-arg has net negative value. By the same reasoning, I think you should follow this pattern as well except when there's a good argument not to. For "reduce" you make a good argument.

Further comments inline below.

On Fri, Jun 15, 2012 at 4:27 AM, Hudson, Rick <rick.hudson at intel.com> wrote:

Proposed change to [[strawman:data_parallelism]]

The ParallelArray methods map, combine, reduce, scan, scatter, and filter, each take a kernel function as an argument. Within this kernel function |this| is currently bound to the ParallelArray. This was natural and non-controversial

I disagree that it was non-controversial -- I argued against it on compat grounds at the time. But I agree with your point that => makes

that old design even less viable.

as long as we used the function(){..} form which did not restrict how |this| was bound and explaining the semantics in terms of call or apply was perfectly reasonable. => is likely to change that.  =>, as proposed, enforces a lexical |this| and is semantically different than the function () {..} form. Going forward we expect folks to use the => form more than the function form. For the most part we will be leaving the kernel signatures as they are except that we will no longer bind |this| to the ParalleArray is inside a kernel function. Instead |this| will be bound in accordance to existing JavaScript specifications and [[strawman:data_parallelism]] will no longer refer to |this|. For the combine method which currently gets only an index as an argument we will now also pass the ParallelArray in. If the programmer wants the ParallelArray in methods like reduce or map then it will have to be passed in as a free variable.

We considered and decided not to mimic Array's function (element, index, array) { ... } form. We felt that it causes intellectual confusion about parallel programming since passing in index and array force the programmer to think about things like order and location when using methods like map.

This seems to be your key point, and I don't understand it at all. Since you single out map, could you please explain how this causes any confusion when using map? Thanks.

# Hudson, Rick (13 years ago)

Hey Mark, ParallelArray and index are left out because of our desire to provide a few good methods that help/force programmers to think about parallel algorithms and not just speeding up sequential algorithms. Array map is really just syntactic sugar for for loops and invites thinking that depends on order. For ParallelArray map we felt that the value was the semantically important thing and the user should not be distracted by the index. Not having index available is one step towards thinking in more parallel ways.

In situations where index is more semantically important than the value we provide the method combine. Here we pass the ParallelArray and the index and not the value to the kernel function. A blur function that averages the values surrounding some location would naturally use combine. The documentation encourages the programmer to think of index as the destination of the value the kernel function returns. Again the intent is to try and break away from sequential thinking.

Filter is like combine, it gets the ParallelArray and the index and allows filtering based on location. If one accepts the above map/combine arguments then one could make a reasonable argument that there should be a filter based on value and a filter based on location. (Thanks you've forced me to think about this again.)

The other methods have less use for index. Reduce) we agree that passing in the index doesn't make sense. Scan) has the same problems with index that reduce has. Scatter) takes an array of destination indices and a 2 arg conflict function. The index of both of the source elements isn't always available so it doesn't make sense here either.

Not passing in the ParallelArray was a tougher decision. We really liked the way we had |this| bound to the ParallelArray since it made composition / nested parallelism straight forward and natural. With the exception of combine and filter the programmer has to arrange for the ParallelArray to be available in order to do composition.

I think a clean break from high order Array methods is the way to go here, syntactically similar forms with semantically different meaning is worth avoiding.

  • Rick

Guy Steele "Language design is as much the art of what to leave out as what to put in."

# Mark S. Miller (13 years ago)

On Fri, Jun 15, 2012 at 11:35 PM, Hudson, Rick <rick.hudson at intel.com> wrote:

Hey Mark, ParallelArray and index are left out because of our desire to provide a few good methods that help/force programmers to think about parallel algorithms and not just speeding up sequential algorithms. Array map is really just syntactic sugar for for loops and invites thinking that depends on order. For ParallelArray map we felt that the value was the semantically important thing and the user should not be distracted by the index. Not having index available is one step towards thinking in more parallel ways.

Hi Rick, the claim made in the paragraph above seems to be the core argument. I respect the kind of argument you're making -- programmer psychology is important, and it is our responsibility as language designers to take it into account, and to help steer programmers towards certain ways of thinking about the problem and away from others. Sometimes these psychological issues have no corresponding formal basis, but are still important nevertheless. Arguments by non-psychologists like us about psychology can often be fuzzy, but this does not relieve us of responsibility of taking these into account.

However, I don't have any intuition that supports the specific claim. Let's take "map" specifically. How/why might including index and the array itself distract the programmer from parallel thinking? First, do we agree that there's no formal problem, and the issue is only psychology? If so, perhaps you could provide some examples that would help illustrate the psychological issue you have in mind? At this point, I just don't get it.

# Hudson, Rick (13 years ago)

Hey Mark,

You asked "How/why might including index and the array itself distract the programmer from parallel thinking?" If we present index as a location and not as the nth invocation of kernel function then we are fine. Array.map overloads index to serve both purposes. My concern is that if we are too close to Array then programmers will assume we are the same and the same sequential semantics will hold. There is nothing inherently non-parallel about a location.

You are correct that we are talking more about programmer psychology, art, and pedagogy and there is no formal technical reason to go either way. While passing 3 arguments might be more expensive than passing one in today's implementations there are probably compiler optimizations that can ameliorate the effects.

Let's look closely at the change you are suggesting on a very simple map invocation.

function add1(element) {return element+1;} strawmanPA.map(add1); alternatePA.map(add1);

OK, JavaScript's argument passing semantics mean no difference for the common case where the result is dependent upon just the element. The alternate is even upwardly compatible.

Now let's consider a typical geometric decomposition function like a vector blur.

function blurStrawman(index, array) { if (index < 1) return array[index]; return (array[index]+array[index-1])/2; } function blurAlternate(element, index, array) { if (index < 1) return element; return (element+array[index-1])/2; }

strawmanPA.combine(blurStrawman); strawmanPA.map(blurAlternate);

OK, blurAlternate again seems reasonable.

I've played with other codes and I'm finding it increasingly hard to argue against your suggestions for arrays. Future proofing the cases where we might want to extend ParallelArray to objects seems fine since JavaScript blurs field names and indices allowing for index to have a reasonable meaning in the context of objects.

Unless someone else speaks up I'll drop combine and change map and filter so the kernel function takes element, index, array.

# Brendan Eich (13 years ago)

Hudson, Rick wrote:

Unless someone else speaks up I'll drop combine and change map and filter so the kernel function takes element, index, array.

+1

# Herhut, Stephan A (13 years ago)

Rick.

Apart from offering a cleaner mental model, which can be discussed at great length, I believe there are practical advantages to having an index free model, e.g., code reuse:

A minimal example: adding two vectors. Assume you already have a convenient addition function on scalar elements like

function add (a,b) { return a+b;}

How do we go to vectors from here? The current map allows you to write

vecA.map(add, vecB);

Note here that all extra arguments to map are implicitly indexed in the same way that the source array vecA is and both values are passed to the elemental function. So the above applies add to all pairs of elements from vecA and vecB (the usual JS semantics for [] applies to implicit selections also).

If we add index and array to the mix, we have to build an adapter for the differing interfaces:

vecA.map((a, ignore, ignore2, b) => add(a, b), vecB)

Maybe just about bearable with fat arrows, plain horrible with function:

vecA.map( function(a, ignore, ignore2, b) { return add(a,b); }, vecB)

Stephan