Katelyn Gadd (2015-02-15T15:27:26.000Z)
So there's no point in going in circles here, I just want to address
the statement:
Even if we allow this 'optimization' it will take the engines as much
time to implement it so there is no gain from this anyway.

My premise is that this is a rather obvious bottleneck and it applies
to new APIs with no for-loop equivalent. As far as I know, I cannot
use a for-loop to iterate over a Map or Set. So if I want to iterate
over Map or Set, I'm eating the cost of the current iterator API as
specified. The alternative is to use a custom implementation of those
containers (that is what I do right now, and will continue to do.)

Given the constraint that the .next() method is self-hosted in JS, my
proposal is trivial to implement. I think this is a reasonable
constraint, even though at present the API does not appear to be
self-hosted in SpiderMonkey. Many builtins are self-hosted in both
runtimes at present and more become self-hosted on a regular basis.
The current self-hosted JS would look like this:

function Iterator () {
  // ...
}

function next () {
  // ...
  return {
    done: isDone,
    value: value
  };
}

The proposal would change the self-hosted JS to:

function Iterator () {
  // ...
  this.$cachedResultObject = { done: false, value: null };
}

function next () {
  // ...
  this.$cachedResultObject.done = isDone;
  this.$cachedResultObject.value = value;
  return this.$cachedResultObject;
}

As long as the spec allows this behavior change (it's certainly
observable), that's all you really have to do at a basic level to make
this work. I hope it is obvious that a C++ engine-level implementation
of this optimization is not *significantly* more complex, though I
will not say it is 'easy'.

It is certainly, absolutely, I-will-bet-you-real-money simpler to do
this than to implement sufficiently smart escape analysis to be able
to optimize out the new object. I think if the next() method were
self-hosted and fully inlined into the caller, at that point it would
be possible for the .done and .value sets to be store-to-load
forwarded and the new object optimized out. That's the most realistic
case I can think of; I have yet to see a VM do that but I can imagine
it happening sometime soon, given that LuaJIT does it.

-kg

On 15 February 2015 at 07:16, Benjamin (Inglor) Gruenbaum
<inglor at gmail.com> wrote:
> On Sun, Feb 15, 2015 at 3:06 PM, Katelyn Gadd <kg at luminance.org> wrote:
>>
>> In my testing Map and Set are outperformed by a trivial Object or
>> Array based data structure in every case, *despite the fact* that
>> using an Object as a Map requires the use of Object.keys() to be able
>> to sequentially iterate elements. The cost of iterator.next() in v8
>> and spidermonkey is currently extremely profound and profiling shows
>> all the time is being spent in object creation and GC. (To be fair,
>> self-hosting of iterations might improve on this some.)
>
>
> Yes, this is of course true simply because iterating Set/Map was not
> optimized yet in V8/SpiderMoney since usually features are first implemented
> and then optimized. Benchmarking current naive engine implementations to
> make conclusions about the spec is a very risky thing to do. It's entirely
> possible for engines to optimize this. By the same logic I could say that
> `const` has bad performance because how it is currently implemented as much
> much slower than `var` in V8.
>
>> I am sure that given some sufficient amount of compiler engineering,
> the allocations could be entirely optimized out, but this is far from
> a trivial case so odds are it will not happen soon.
>
> Right, I agree with you there it probably won't happen soon. Changing the
> spec won't change that fact. Remember that the engine doesn't actually need
> to allocate a new object in cases where it can reuse the old one safely
> since it's not used anymore - again, this can be solved at the engine level.
>
>> My point is that for my current test cases, were the .next()
> allocation optimized out via a spec revision and *very trivial* change
> to current implementations, the performance would be great. The
> bottleneck would go away. This is much, much easier than having to
> introduce sophisticated escape analysis & allocation removal into JS
> VMs.
>
> This is something that can be solved at the VM level. VMs perform much much
> smarter optimizations than this already. There is no reason to sacrifice
> code correctness and the principle of least astonishment in order to get
> acceptable performance. Benchmarking against "proof of concept"
> implementations is counter-productive in my opinion. The engines are already
> 'allowed' to use the same object when there is no observable difference.
> Even if we allow this 'optimization' it will take the engines as much time
> to implement it so there is no gain from this anyway.
d at domenic.me (2015-02-21T00:41:47.900Z)
So there's no point in going in circles here, I just want to address
the statement:
Even if we allow this 'optimization' it will take the engines as much
time to implement it so there is no gain from this anyway.

My premise is that this is a rather obvious bottleneck and it applies
to new APIs with no for-loop equivalent. As far as I know, I cannot
use a for-loop to iterate over a Map or Set. So if I want to iterate
over Map or Set, I'm eating the cost of the current iterator API as
specified. The alternative is to use a custom implementation of those
containers (that is what I do right now, and will continue to do.)

Given the constraint that the .next() method is self-hosted in JS, my
proposal is trivial to implement. I think this is a reasonable
constraint, even though at present the API does not appear to be
self-hosted in SpiderMonkey. Many builtins are self-hosted in both
runtimes at present and more become self-hosted on a regular basis.
The current self-hosted JS would look like this:

```js
function Iterator () {
  // ...
}

function next () {
  // ...
  return {
    done: isDone,
    value: value
  };
}
```

The proposal would change the self-hosted JS to:

```js
function Iterator () {
  // ...
  this.$cachedResultObject = { done: false, value: null };
}

function next () {
  // ...
  this.$cachedResultObject.done = isDone;
  this.$cachedResultObject.value = value;
  return this.$cachedResultObject;
}
```

As long as the spec allows this behavior change (it's certainly
observable), that's all you really have to do at a basic level to make
this work. I hope it is obvious that a C++ engine-level implementation
of this optimization is not *significantly* more complex, though I
will not say it is 'easy'.

It is certainly, absolutely, I-will-bet-you-real-money simpler to do
this than to implement sufficiently smart escape analysis to be able
to optimize out the new object. I think if the next() method were
self-hosted and fully inlined into the caller, at that point it would
be possible for the .done and .value sets to be store-to-load
forwarded and the new object optimized out. That's the most realistic
case I can think of; I have yet to see a VM do that but I can imagine
it happening sometime soon, given that LuaJIT does it.