Subject: Performance of iterator .next() as specified
On 15 February 2015 at 06:16, Benjamin (Inglor) Gruenbaum <inglor at gmail.com> wrote:
Why?
To quote my original message:
In my testing Map and Set are outperformed by a trivial Object or Array based data structure in every case, despite the fact that using an Object as a Map requires the use of Object.keys() to be able to sequentially iterate elements. The cost of iterator.next() in v8 and spidermonkey is currently extremely profound and profiling shows all the time is being spent in object creation and GC. (To be fair, self-hosting of iterations might improve on this some.)
To make this clearer: I have an existing dictionary container based on Object and Object.keys. Replacing this with Map and Map iterator produced a slowdown, and profiles showed almost all the time was being spent in result object creation and GC.
Allocating a new object isn't really expensive, in cases it is and it introduces GC pressure a clever engine can optimize it away by not really allocating a new object where it is not required. Your suggestion can be fully implemented by the engine already in the cases you speak of (where no reference is kept) so there is no real speed benefit from it performance wise. However in the case an iteration result is kept for later use this can be destructive and create unpredictable results.
It's expensive. In general object allocation & GCs show up in profiles for almost every performance-sensitive application I work on. I'm sure there are applications where this is not the case, though. Also, to quote my original message:
I think that past APIs with this problem (for example TypedArray.set) have proven that 'a sufficiently smart VM can optimize this' is not representative of real VMs or real use cases.
I am sure that given some sufficient amount of compiler engineering, the allocations could be entirely optimized out, but this is far from a trivial case so odds are it will not happen soon.
Also - of course for... of and iterating iterators right now is slower than iterating in ways that have been there for years. Modern JS engines usually add support for features and only then optimize them.
My point is that for my current test cases, were the .next() allocation optimized out via a spec revision and very trivial change to current implementations, the performance would be great. The bottleneck would go away. This is much, much easier than having to introduce sophisticated escape analysis & allocation removal into JS VMs.
On Sun, Feb 15, 2015 at 3:06 PM, Katelyn Gadd <kg at luminance.org> wrote:
In my testing Map and Set are outperformed by a trivial Object or Array based data structure in every case, despite the fact that using an Object as a Map requires the use of Object.keys() to be able to sequentially iterate elements. The cost of iterator.next() in v8 and spidermonkey is currently extremely profound and profiling shows all the time is being spent in object creation and GC. (To be fair, self-hosting of iterations might improve on this some.)
Yes, this is of course true simply because iterating Set/Map was not
optimized yet in V8/SpiderMoney since usually features are first
implemented and then optimized. Benchmarking current naive engine
implementations to make conclusions about the spec is a very risky thing to
do. It's entirely possible for engines to optimize this. By the same logic
I could say that const
has bad performance because how it is currently
implemented as much much slower than var
in V8.
I am sure that given some sufficient amount of compiler engineering, the allocations could be entirely optimized out, but this is far from a trivial case so odds are it will not happen soon.
Right, I agree with you there it probably won't happen soon. Changing the spec won't change that fact. Remember that the engine doesn't actually need to allocate a new object in cases where it can reuse the old one safely since it's not used anymore - again, this can be solved at the engine level.
My point is that for my current test cases, were the .next() allocation optimized out via a spec revision and very trivial change to current implementations, the performance would be great. The bottleneck would go away. This is much, much easier than having to introduce sophisticated escape analysis & allocation removal into JS VMs.
This is something that can be solved at the VM level. VMs perform much much smarter optimizations than this already. There is no reason to sacrifice code correctness and the principle of least astonishment in order to get acceptable performance. Benchmarking against "proof of concept" implementations is counter-productive in my opinion. The engines are already 'allowed' to use the same object when there is no observable difference. Even if we allow this 'optimization' it will take the engines as much time to implement it so there is no gain from this anyway.
So there's no point in going in circles here, I just want to address the statement: Even if we allow this 'optimization' it will take the engines as much time to implement it so there is no gain from this anyway.
My premise is that this is a rather obvious bottleneck and it applies to new APIs with no for-loop equivalent. As far as I know, I cannot use a for-loop to iterate over a Map or Set. So if I want to iterate over Map or Set, I'm eating the cost of the current iterator API as specified. The alternative is to use a custom implementation of those containers (that is what I do right now, and will continue to do.)
Given the constraint that the .next() method is self-hosted in JS, my proposal is trivial to implement. I think this is a reasonable constraint, even though at present the API does not appear to be self-hosted in SpiderMonkey. Many builtins are self-hosted in both runtimes at present and more become self-hosted on a regular basis. The current self-hosted JS would look like this:
function Iterator () {
// ...
}
function next () {
// ...
return {
done: isDone,
value: value
};
}
The proposal would change the self-hosted JS to:
function Iterator () {
// ...
this.$cachedResultObject = { done: false, value: null };
}
function next () {
// ...
this.$cachedResultObject.done = isDone;
this.$cachedResultObject.value = value;
return this.$cachedResultObject;
}
As long as the spec allows this behavior change (it's certainly observable), that's all you really have to do at a basic level to make this work. I hope it is obvious that a C++ engine-level implementation of this optimization is not significantly more complex, though I will not say it is 'easy'.
It is certainly, absolutely, I-will-bet-you-real-money simpler to do this than to implement sufficiently smart escape analysis to be able to optimize out the new object. I think if the next() method were self-hosted and fully inlined into the caller, at that point it would be possible for the .done and .value sets to be store-to-load forwarded and the new object optimized out. That's the most realistic case I can think of; I have yet to see a VM do that but I can imagine it happening sometime soon, given that LuaJIT does it.
On Sun, Feb 15, 2015 at 7:16 AM, Benjamin (Inglor) Gruenbaum <inglor at gmail.com> wrote:
This is something that can be solved at the VM level. VMs perform much much smarter optimizations than this already. There is no reason to sacrifice code correctness and the principle of least astonishment in order to get acceptable performance. Benchmarking against "proof of concept" implementations is counter-productive in my opinion. The engines are already 'allowed' to use the same object when there is no observable difference. Even if we allow this 'optimization' it will take the engines as much time to implement it so there is no gain from this anyway.
Is 60,000 lines of source code a good enough "proof of concept for you"?
I don't see what the problem is. My transpiler generators the following ES5 code:
//for (var item in set)
var iter = obj.iterator();
while (1) {
var ret = obj.next();
if (ret.done) break;
var item = ret.value;
...loop body
}
It works quite well. You may be right about the VM vendors, however. This requirement basically makes JS useless for anything other than minor scripting tasks. The VM people may not have any choice but to optimize it.
This is actually exactly the kind of slow code Katelyn was talking about.
Also, you probably meant set[Symbol.iterator]()
and for... of
.
What Katelyn is talking about is optimizing .next in the example code to
not return a new object but to return the same one every time with
different values filled in. In order to optimize that with your transpiler
you'd need to implement your own Set
and Map
because like Katelyn
correctly noted Set
and Map
do not expose a fast way (in current
implementations) to iterate them at the moment - at least until engines
optimize them.
Katelyn is arguing it would be useful to allow the implementation to return the same instance and I'm arguing the reason it's slow is because it's benchmarked against a slow implementation that was not optimized yet but can easily be - just like Katelyn noted as LuaJIT does.
When I can get more sound abstractions for no additional cost (eventually) I'd rather not introduce implicit mutable state into the application. Developers won't be aware of it being mutating and the whole notion of a mutating-at-a-later-point return value is very error-prone in my opinion.
On Sun, Feb 15, 2015 at 7:43 AM, Benjamin (Inglor) Gruenbaum < inglor at gmail.com> wrote:
This is actually exactly the kind of slow code Katelyn was talking about. Also, you probably meant
set[Symbol.iterator]()
andfor... of
.
You are correct. I haven't switched to the 'of' syntax yet (need to do that).
What Katelyn is talking about is optimizing .next in the example code to not return a new object but to return the same one every time with different values filled in. In order to optimize that with your transpiler you'd need to implement your own
Set
andMap
because like Katelyn correctly notedSet
andMap
do not expose a fast way (in current implementations) to iterate them at the moment - at least until engines optimize them
I did implement my own, yes.
Katelyn is arguing it would be useful to allow the implementation to return the same instance and I'm arguing the reason it's slow is because it's benchmarked against a slow implementation that was not optimized yet but can easily be - just like Katelyn noted as LuaJIT does.
I may have misunderstood; it sounded to me like the two of you were agreeing that JS VMs won't be doing this anytime soon.
When I can get more sound abstractions for no additional cost (eventually) I'd rather not introduce implicit mutable state into the application. Developers won't be aware of it being mutating and the whole notion of a mutating-at-a-later-point return value is very error-prone in my opinion.
Exactly how would developers mutate the return object in the first place? And frankly, I don't see how you can care so little about a major, show-stopping issue such as this. JS performance has gotten to the point where people like me write CAD software in it. Usable iterators are important for such things.
Frankly, I find this sudden embrace of good coding practices odd in a language that practically sets the floor for how horrendously mutable a dynamic runtime language can get.
When I can get more sound abstractions for no additional cost (eventually) I'd rather not introduce implicit mutable state into the application. Developers won't be aware of it being mutating and the whole notion of a mutating-at-a-later-point return value is very error-prone in my opinion.
Right. Mutating the returned object isn't really tenable from a usability point of view.
Again, where does this mutation occur? The spec shouldn't allow any such thing to begin with; it should mandate exactly what my compiler does: as soon as .next returns, copy .value into the loop variable. The developer shouldn't have any access to the return value from within the loop at all; if he does, that should (ideally) be considered a violation of the standard.
Joe, I don't think we're having the same discussion.
Again, this is about the issue Katelyn raised about the return value of
.next
.
Katelyn suggested that the return value of a .next
call in the iteration
protocol - an object of the form {value: value, done: boolean}
should be
allowed to be reused between calls - so an iterator can return the same
object without allocating a new one.
I argue (and from what I understand Kevin agrees with me) that allowing this is error prone and might cause surprises for developers since they have an object returned that has mutated implicitly without the consumer being aware. We can have the same performance gain by a clever engine that figures this sort of thing out itself.
Sorry about that, I think I'm unconsciously substituting "return value of .next" for ".next.value." I am talking about reusing the return object, yes; for some reason in my head, "return object of .next" refers to .next(), while "return value of .next" refers to .next().value. Communication error on my part.
Why?
Allocating a new object isn't really expensive, in cases it is and it introduces GC pressure a clever engine can optimize it away by not really allocating a new object where it is not required. Your suggestion can be fully implemented by the engine already in the cases you speak of (where no reference is kept) so there is no real speed benefit from it performance wise. However in the case an iteration result is kept for later use this can be destructive and create unpredictable results.
Also - of course for... of and iterating iterators right now is slower than iterating in ways that have been there for years. Modern JS engines usually add support for features and only then optimize them.