Aligning from() and map()

# Niko Matsakis (12 years ago)

The ES6 draft specifies that map(), when applied to a typed array, yields another instance of that same type. For converting between array types, the from() method is added:

var f1 = new Float32Array();
...
var f2 = f1.map(x => x * 2); // Yields Float32Array
var i1 = Int32Array.from(f, x => x * 2); // Yields Int32Array

This is nice. However, there are some small differences between the two that I think ought to be aligned.

In particular, map supplies its closure with three arguments: the Element (this[i]), Index (i), and Collection (this). The index is particularly useful, since you can write things like:

var f3 = f1.map((e, i) => e + i)

It seems that from only supplies the E argument though and not the I and C. I propose this be changed to match map().

# Niko Matsakis (12 years ago)

Let me expand on this e-mail a bit more. I am motivated here by our work on the data parallelism API strawman:data_parallelism. As I mentioned in the Boston meeting, we have been adding alternatives to methods like from() that attempt parallel execution (e.g., fromPar()). In general, we aim for these methods to be as compatible with the existing signatures as possible.

In the case of from(), in a sequential world, one might work around the lack of an index argument by writing code like:

var counter = 0;
var f1 = Int32Array.from(f, e => e + counter++)

rather than

var f1 = Int32Array.from(f, (e, i) => e + i)

However, this would not work in a parallel setting, since counter would be shared mutable state. Parallel APIs require the second approach. I would argue the second approach is preferable anyhow, as it is a more declarative style.

I realize that the spec includes this comment about reusability:

"NOTE The from function is an intentionally generic factory method; it does not require that its this value be the Array constructor. Therefore it can be transferred to or inherited by any other constructors that may be called with a single numeric argument."

There is no particular reason, though, that from() cannot include an index and collection argument, even if the source value is not an array. In that event, index simply represents the number of items that have been iterated over thus far.

# Allen Wirfs-Brock (12 years ago)

Note that from is different from map in two important ways

  1. For the map method, the source and destination collection are the same object so the collection argument to the closure parameter identifiers both the source and destination object and the sole index parameter identifies both the source and destination index. With the from method, the source and destination objects may have distinct identity and (because of the use of iterators) the actual source index (if there even is one) may be different from the destination index. You might equally argue that the signature of the closure should be (value, destCollection, destIndex, srcCollection, srcIndex(,

  2. For the from method, the source collection is usually accessed using an Iterator. That itself may be problematic from a parallel perspective that you need to consider.

Finally, I can imagine that the security conscience might view the implicit passing of the target collection to the closure as a capability leak. Imagine a closure provided by an untrusted source. Passing the target collection allows them to capture a reference to the collection that they might later misuse. I know that the existing Array methods already have this characteristic, but perhaps we should think about whether it is a good idea to add more such capability leaks.

I'm actually not unsympathetic to this request, but I think we should examine some of these design issues before committing to any changes.

Do you have any real use cases where knowledge of the destination collection and index is actually needed.

# Hudson, Rick (12 years ago)

Allen

  1. For the map method, the source and destination collection are the same object so the collection argument to the closure parameter identifiers both the source and destination object

Not sure I'm following you here. The collection passed to the closure parameter (aka callback function) in map is the source and not the destination. As far as I can tell the destination isn't available until map returns it.

>>> x = [1,2]
[1,2]
>>> y = x.map((e,i,c) => c)
[[1,2], [1,2]]
>>> x === y[0]
true
# Niko Matsakis (12 years ago)

On Fri, Dec 06, 2013 at 09:08:09AM -0800, Allen Wirfs-Brock wrote:

  1. For the map method, the source and destination collection are the same object ...

How are the source and destination collection the same in the case of map()? Perhaps you meant that they are the same type of collection?

... the actual source index (if there even is one) may be different from the destination index ...

This is true, but I'm not sure how important it is. Just define the index to be the number of items traversed thus far. If users move from to other types for which a numeric index is inappropriate (e.g., set), so be it.

2. For the from method, the source collection is usually accessed using an Iterator. That itself may be problematic from a parallel perspective that you need to consider.

Yes, a parallel version of from can only be used with collections that support indexing.

[As a side note, we offer a "parallel build" operation that simply iterates over an iteration space without reference to any collection. It is possible to rewrite both from and map in terms of build, at least for any indexable collection. So in some sense this change is not necessary for expressiveness, even in the parallel setting, but I still think it's a good idea.]

...perhaps we should think about whether it is a good idea to add more such capability leaks...

It's easy enough to plug capability leaks like so:

Int32Array.from(e => insecureClosure(e))

Given that the cat is out of the bag thanks to map, this doesn't seem like it's even a footgun.

Do you have any real use cases where knowledge of the destination collection and index is actually needed.

I don't personally think the collection argument is especially useful. You can always capture it in the closure if you like. I only included the collection argument for consistency with map.

The index however is clearly useful. Most every generic iteration facility I've seen includes an adapter like Python's enumerate() or Scala's zipWithIndices() precisely because having access to the index is a handy thing. As examples, consider a function that wants to access the neighbors of the current element.

Furthermore, I just think people will find it surprising that from/map don't operate as analogously as possible (I certainly did).

# Allen Wirfs-Brock (12 years ago)

On Dec 6, 2013, at 12:18 PM, Hudson, Rick wrote:

Not sure I'm following you here. The collection passed to the closure parameter (aka callback function) in map is the source and not the destination. As far as I can tell the destination isn't available until map returns it.

Right, sorry first post of the morning. Coffee wasn't active yet.

There is still a difference. With map, the source collection is the this value of the map method and has implementation level visibility of the source collection. The map method is specified to do explicit index iteration over the indices of the source collection. With the from method, the source may simply be an Iterable Iterator (an Iterator with a @@iterator method). The ultimate source of the data isn't necessarily known to the from method and since index-based iteration isn't used to access the iterator, any index passed to the closure would be a destination index. But, as you point out, the destination object isn't available to the closure.

# Allen Wirfs-Brock (12 years ago)

On Dec 6, 2013, at 1:00 PM, Niko Matsakis wrote:

How are the source and destination collection the same in the case of map()? Perhaps you meant that they are the same type of collection?

No, I just wasn't fully awake yet...

This is true, but I'm not sure how important it is. Just define the index to be the number of items traversed thus far. If users move from to other types for which a numeric index is inappropriate (e.g., set), so be it.

But aren't we now back to your parallelism problem. It would seem that the only index/count like value that is order and Iterator protocol independent is the destination index.

Yes, a parallel version of from can only be used with collections that support indexing.

Note as currently specified, the from methods prefer Iterator protocol over indexing for when presented with a source collection that supports both.

I don't personally think the collection argument is especially useful. You can always capture it in the closure if you like. I only included the collection argument for consistency with map.

The index however is clearly useful. Most every generic iteration facility I've seen includes an adapter like Python's enumerate() or Scala's zipWithIndices() precisely because having access to the index is a handy thing. As examples, consider a function that wants to access the neighbors of the current element.

Yes, but in the interator source scenario indexed access to the source collection may not be available.

Furthermore, I just think people will find it surprising that from/map don't operate as analogously as possible (I certainly did).

Does the surprise then lead you to deeper understanding of essential differences?

Essentially what we are discussing is whether or not they actually already are as close as possible,

# Hudson, Rick (12 years ago)

Agreed, index is the index of the destination where the value returned from the closure parameter will be placed. This is the right way to think about index. Using it directly on the source is at best redundant since element === collection[index]. The fact that the destination isn't available yet doesn't make the destination index useless. One use case that comes to mind is a convolution like a blur function in image processing. The index indicates the location in the destination where the newly created value is to be placed.

# Niko Matsakis (12 years ago)

On Fri, Dec 06, 2013 at 01:20:20PM -0800, Allen Wirfs-Brock wrote:

Does the surprise then lead you to deeper understanding of essential differences?

I don't disagree with you that there are potential differences, but I do disagree that these differences are essential. It seems to me that the primary argument against supplying the index is that it might be misused. That is true, but on the other side there are definite advantages:

  1. The index is always meaningful: it is the index in the destination vector.

  2. Often, but not always, the index will also correspond with the index in the source vector. In those cases, the user may make use of the index to access neighboring elements in the source vector.

  3. Even if the source collection is not known to be indexable, the index can still be of use. For example, it may be used to index into other arrays that are not the source collection. Or it could be used as a part of a weighting factor, in which case it is not used to index at all. Etc etc.

  4. Not supplying the same arguments for from means that it is harder to convert code that uses map to code that uses from. To me this argues for supplying the source collection as well, although as I said it seems to add very little in practice. But it does no particular harm.

# Brendan Eich (12 years ago)

I agree with all your points; #4 is particularly good.