Iterators, generators, finally, and scarce resources (Was: April 10 2014 Meeting Notes)
I agree with pretty much everything Andy had to say, and would like to add a meta-perspective:
We should be viewing this as a last-minute feature request. The churn that this request has introduced is (in my opinion) exactly the kind of problem that the ES7 process is meant to address. In fact, I would go so far as to say that requirements churn has been the number one problem during ES6 development.
If the "iterator-as-resource-manager" feature is truly desirable (and I'm not even convinced that it is), then the appropriate action is to figure out a way to defer it until ES7. It is simply too late to add this feature now.
On Apr 29, 2014, at 12:40 AM, Andy Wingo <wingo at igalia.com> wrote:
I'm a bit grumpy that this is being brought up again, and this late, and in multiple forums, but as it seems that people want to talk about it again, that talking about it again is the thing to do...
Sorry about that. :( But the fact is Jafar joined TC39 very late and his technical points have merit, so we do have to grapple with them. (As for multiple forums, I wish I had a solution.)
If I may summarize Jafar's argument, it's that the iterator in a for-of may hold a scarce resource, like a file descriptor, and because of that, for-of should be able to release this scarce resource on an early exit via "break". The provisional consensus elaborates a method to do this.
Is this a fair summary?
I don't quite agree with that summary. These are IMO the most important points:
-
Iterators might include "on cleanup" logic, although this is admittedly rare in synchronous code.
-
Nevertheless, generators are the common implementation strategy for iterators, and try/finally is a part of the language, making "on cleanup" logic more likely to arise. While we can't ever guarantee that code won't run a first-class iterator object to its completion (just like you can't guarantee that a try block won't iloop), it's a bad smell if code that creates an iterator in a for-of loop head, loops over it, and never even touches the iterator directly doesn't shut down the iterator.
-
Iterators are intended to be short-lived (in particular, long-lived iterators over mutable data sources are invalidation hazards). So the common consumer of iterators, for-of, should properly dispose of them.
-
The uncommon case of using an iterator partially and in several loops can be easily implemented with combinators (or while loops), but we should be safe by default.
-
The problems of "on cleanup" logic will be greatly exacerbated with future asynchronous iteration features, which are IMO an extremely high priority for the future. The overwhelming majority of JS programs that operate on sequences of data are doing so asynchronously. The moment we start going down the road of designing asynchronous iteration features (such as
for (await ... of ...)
), which in fact Jafar and I have been starting work on, the try/finally hazards will show up much more often. If we don't do proper disposal of synchronous iterators, we'll create an asymmetry between synchronous and asynchronous iteration, which would not only be a nasty wart but also a refactoring hazard.
Indeed I expect that in practice most iterators in an ES6 program will be map, set, and array iterators, which in practice will not be implemented with generators.
I strongly disagree with this. Generators will by far be the most convenient and common way to implement iterators, regardless of their data source.
Incidentally I think that if TC39 decided to re-add this method, it should be called close() instead, because it doesn't make sense to "return" from a non-generator iterator.
I thought so at first too, until I remembered that iterators have a return value. So I still think return is the right name.
== Calling return() on iterators is rarely appropriate
You're arguing "rarely necessary," not "rarely appropriate," which is a weaker claim. But I dispute that too because of asynchronous iteration, as I explained above.
However in this case it is possible to arrange to close the iterator, with a different interface:
This is a dramatic weakening of the power of iterators, in that you force all iteration abstractions to expose their external resources to consumers. Again, it may not seem like a big deal now but it'll be completely unacceptable for the asynchronous case.
The other case is when you have an iterator consumer which is decoupled from the code that created the iterator, as in:
function (iterable) { ... for (var x of iterable) { if foo(x) break; } ... }
But it is precisely in this case when you would not want to close the iterator, because you don't know its lifetime.
As I said above, we should not optimize the design for this case. It's easy enough to create combinators for it:
function (iterable) { ... for (var x of keepalive(iterable)) { if foo(x) break; } ... }
But this will not be the common case. The majority of the time when you're working with sequences you use the data you need and then you're done with it.
== return() in generators is semantically weird
It's not as weird as you make it out to be. Return is not much different from throw, first of all. Note also that ES7 do-expressions allow expressions to return.
Also, the insistence on a return() that doesn't run catch blocks seems to me to be ill-placed. I think it's telling that the counter-examples are from Python, which has a different semantic model, as it has finalization. Implementing abstractions over scarce resources in JS is going to necessarily involve different design patterns than those used by Python. For the given use-case, throw() is entirely sufficient. If you don't trust your generators to do the right thing on an exception, you shouldn't be acquiring scarce resources!
It's just more painful with exceptions. It requires us to create some special kind of IteratorAbort exception, and it requires every catch block in a generator to add if-tests.
Finally, the given use-case is incompletely specified; a loop can exit prematurely through exceptions as well as through "break".
Yes, also return in a for-of block exits the loop prematurely. Break is a representative example, but the spec would have to invoke .return() on all kinds of abrupt completions.
== Calling return() on early exit from for-of is expensive
Wrapping a try/finally around each for-of is going to be really expensive in all engines right now. I'm skeptical about our ability to optimize this one away. Avoiding try/catch around for-of was one reason to move away from StopIteration, and it would be a pity to re-impose this cost on every for-of because of what is, in the end, an uncommon use case.
This glosses over a critical difference: the StopIteration semantics required catching exceptions on every iteration of the loop. This semantics only requires a check on loop exit.
I think the expected result of doing this would be performance lore to recommend using other iteration syntaxen instead of for-of.
This argument seems fishy to me. There is no comparable syntax in JS for for-of and generators, so I think the alternative would be stuff like higher-order methods (a la .forEach). I find it hard to believe that a single check on the outside of the loop will make or break their ability to compete with higher-order methods.
There's no perfect answer when it comes to abstractions over scarce resources. Given the constraints of what JS is, its finalization model, its deployment in the browser, and its engines, for me the status quo is the best we can do. I know that for people that open file descriptors, that's somewhat unsatisfying, but perhaps such a cross is what goes with the crown of being a true Unix hacker ;)
This isn't about Unix hackers. We have to be thinking ahead to where custom iteration will really shine in JS, and that is over asynchronous streams of data. Then everyone will be doing this.
I'm sympathetic to the "simplicity" argument that says that .throw
is the
proper way in JS to clean up an iterator. The .return
proposal seems
like a kludge to work around the fact that ES6 still doesn't have a
discriminatory catch
clause that can avoid catching the thrown cleanup
exception. But to continue the throw
-based design you have to provide a
way to throw an exception when exiting the for-of, and I feel like the
result is ultimately inferior, both in performance and aesthetics.
I find Dave's rebuttal convincing. From a performance standpoint, it has
been mentioned that it is thrown exceptions which are the performance
cliff, not merely setting up handlers via try. From that perspective,
invoking .return
on break or exceptional exit from for-of
seems the
cheapest safe option.
--scott
ps. fwiw, I believe that concerns about uptake of for of
are to some
degree putting the cart before the horse: the usual rule of thumb is that
80% of code is not performance critical, and for-of
will be used in this
code if it helps code clarity, etc. Predictable clean up of iterators may
well be one of the factors which allow it to increase code
clarity/robustness/etc. If/when developers use it frequently, it will
become a target for optimization.
Dave and Andy's responses have me pinging back and forth as to which "side" I'm on. Both seem convincing. Dave's response especially brought the issue into focus for me in a way that I think is clear, so let me explain what I learned from it:
What we are essentially talking about here are two types of things:
- Disposable resources
- Iterable resources
We are talking about both the sync case, and the async case. Andy's contention is that most sync iterable resources are not disposable, whereas Dave's is that most async resources are disposable. Both of these positions seem plausible to me.
The question then comes, should for-of handle iterable resources only (as it does today), or should it take care of disposable resources as well (as it does in e.g. C#)?
Andy's code example, viz.
var file = openFile("foo.txt");
try {
for (var line of lines(file)) {
if (line == '-- mark --') {
break;
}
}
} finally {
file.close();
}
advocates for this separation, putting the burden on the user to handle the disposable part, letting for-of focus on the iterable aspect. Dave advocates that this code become
for (var line of lines(openFile("foo.txt"))) {
if (line == '-- mark --') {
break;
}
}
and the disposableness be handled automatically, which is certainly more convenient for the user.
This second code example, however, hides two kinds of magic: iterability, and disposability, in the same syntax.
An alternative would be to introduce a construct specifically to handle disposability, like C#'s using
. You could use it generically such that using(x) { ... }
becomes try { ... } finally { x.dispose(); }
. In particular the example would become
using (var file = openFile("foo.txt")) {
for (var line of lines(file)) {
if (line == '-- mark --') {
break;
}
}
}
This still isn't all that convenient, of course. And going along with Dave's argument, it will become especially inconvenient when async iterables, most of which will be disposable, start appearing. Perhaps this is why C# decided to include both iterable and disposable functionality in their foreach
.
But inconvenience is easily solved via MOAR SUGAR:
for (var line using files) {
if (line == '-- mark --') {
break;
}
}
I like this approach for a few reasons:
- It decouples iterability and disposability, giving each distinct syntax constructs
- Via sugar, it composes them into something just as convenient as if we had baked both of them into
for
-of
, while giving you an explicit signal of what's going on and what the different semantics are. - It of course avoids any optimization hazards, being opt-in.
- Most importantly, it pushes off this question into ES7, when we can properly design a counterpart
using
block to build on top of.
The drawback of this approach is that it doesn't bake in a disposability protocol into the language. By saying that for
-of
will invoke return()
, we are essentially saying "if you want a disposable object, use the method named return()
to dispose of it. This kind of ecosystem standardization is a good thing. But on the other hand, if in ES6 this disposability protocol is only useful for synchronous iterators---which, as Dave admits, are less likely to represent disposable resources than async ones---then it's unclear that much is gained. I'd rather give the ecosystem another year or so without a standard dispose protocol, if it means we avoid making changes to ES6 this late in the game.
Anyway, regardless of the specifics of my using
proposal, I hope that highlighting the iterability vs. disposability aspects of this conversation was helpful to people.
Domenic Denicola wrote:
Dave and Andy's responses have me pinging back and forth as to which "side" I'm on.
Are you off the fence yet? I can't tell :-P.
But inconvenience is easily solved via MOAR SUGAR:
for (var line using files) { if (line == '-- mark --') { break; } }
No, that's syntactic salt. It will be forgotten when needed. It mixes with sugar (for-of) to leave a bad taste. It bloats surface syntax.
The reason to revive close as return is convenience. It's a good reason when fully rationalized. Yes, scenario solving and uncompositional primitives are bad in general. But as Dave argues, the specific case survives by the full rationale given.
I'd rather give the ecosystem another year or so without a standard dispose protocol, if it means we avoid making changes to ES6 this late in the game.
That's not a good argument, compared to the now-or-never one Mark made. Indeed with rapid release, penalizing convenience and waiting for ecosystem effects can make overcomplicated, convenient, and just bad total designs out of piecewise steps that you might like because they avoid committing to convenience :-P.
Design is an art still (Knuth: we can't teach a computer to do it). Robo-processes and ecosystem robots from the future do not replace it.
Brendan Eich wrote:
Indeed with rapid release, penalizing convenience and waiting for ecosystem effects can make overcomplicated, convenient
"inconvenient"
On Tue, Apr 29, 2014 at 2:40 AM, Andy Wingo <wingo at igalia.com> wrote:
== Calling return() on early exit from for-of is expensive
Wrapping a try/finally around each for-of is going to be really expensive in all engines right now. I'm skeptical about our ability to optimize this one away. Avoiding try/catch around for-of was one reason to move away from StopIteration, and it would be a pity to re-impose this cost on every for-of because of what is, in the end, an uncommon use case.
Andy, please see esdiscuss.org/topic/april-8-2014-meeting-notes#content-16. I
don't see a big performance cost here.
On Apr 29, 2014, at 12:17 PM, Domenic Denicola <domenic at domenicdenicola.com> wrote:
Anyway, regardless of the specifics of my
using
proposal, I hope that highlighting the iterability vs. disposability aspects of this conversation was helpful to people.
I do think it's helpful for understanding the space, thanks.
But here's why I don't think it ultimately helps: we are going to need combinators for iterators, and they are going to have to make the same decision. We aren't going to have e.g. map and mapDispose. I still contend that the common case is that iterators are short-lived, and the language should safely ensure they are disposed of. So the combinators should ensure that their underlying iterator is disposed of, and for-of should behave consistently with the combinators.
On Tue, Apr 29, 2014 at 4:07 PM, Jason Orendorff <jason.orendorff at gmail.com>wrote:
On Tue, Apr 29, 2014 at 2:40 AM, Andy Wingo <wingo at igalia.com> wrote:
== Calling return() on early exit from for-of is expensive
Wrapping a try/finally around each for-of is going to be really expensive in all engines right now. I'm skeptical about our ability to optimize this one away. Avoiding try/catch around for-of was one reason to move away from StopIteration, and it would be a pity to re-impose this cost on every for-of because of what is, in the end, an uncommon use case.
Andy, please see esdiscuss.org/topic/april-8-2014-meeting-notes#content-16. I don't see a big performance cost here.
I also found Allen's response to my question at < esdiscuss.org/topic/april-8-2014-meeting-notes#content-25>
informative.
Minor correction to Domenic's comments in this (interesting) discussion; IEnumerable and IDisposable are separate concepts in C#. Neither IEnumerable or IEnumerator are disposable objects in C#; however, if you use 'for each' on an object that yields an enumerator that is also disposable, the compiler-generated sugar for the enumeration will dispose it.
This is an important distinction: As described in the discussion thread here, not all enumerables and/or enumerators are disposable. Many of them are not, and have zero cleanup logic, so there's no need to include or call a Dispose method. That sugar support is explicitly there for enumerators that, if disposed explicitly, can release important resources earlier than the garbage collector would (file handles, sockets, etc.)
On the other side (enumerator/enumerable authoring), C# solves this rather lazily: You can use try/finally (but not try/catch) around a 'yield' in an enumerator function, allowing you to do basic cleanup - the cleanup logic is moved into the compiler-generated object's Dispose method. This means that if you want cleanup, you opt into it explicitly, and use the same mechanism you use in normal non-enumerator functions. There's one caveat that this causes unexpected disposal behavior in some corner cases (due to how C# initializes generators), but that's more of an issue with their generator implementation than anything else.
On 29 April 2014 20:35, David Herman <dherman at mozilla.com> wrote:
On Apr 29, 2014, at 12:40 AM, Andy Wingo <wingo at igalia.com> wrote:
Indeed I expect that in practice most iterators in an ES6 program will be map, set, and array iterators, which in practice will not be implemented with generators.
I strongly disagree with this. Generators will by far be the most convenient and common way to implement iterators, regardless of their data source.
Yes, but Andy was talking about VM-provided iterators, where convenience of implementation does not matter. It is safe to assume that for all VMs a generator-based implementation will be substantially more expensive than a hand-written one (and will remain so in the foreseeable future, I'm pretty sure of that -- e.g. inlining is tricky for generators). So VM implementers won't use generators internally.
I don't think I understand the issue. AFAICT, all the system implemented iterators don't need to clean up anything that's not already cleaned up by GC, so they wouldn't need a .return method anyway. Is there a counter-example?
On Tue, Apr 29, 2014 at 4:32 AM, Kevin Smith <zenparsing at gmail.com> wrote:
I agree with pretty much everything Andy had to say, and would like to add a meta-perspective:
We should be viewing this as a last-minute feature request. The churn that this request has introduced is (in my opinion) exactly the kind of problem that the ES7 process is meant to address. In fact, I would go so far as to say that requirements churn has been the number one problem during ES6 development.
If the "iterator-as-resource-manager" feature is truly desirable (and I'm not even convinced that it is), then the appropriate action is to figure out a way to defer it until ES7. It is simply too late to add this feature now.
Btw, I still agree that if a way can be found to defer this to ES7 without breaking anything, then or now, we should. Suggestions?
To date, none of the suggested means of deferrals work. The only reason we're considering this for ES6 is we do not see any other viable choice. Please provide us one.
Can someone summarize the argument as to why this can't be added later? Is the fear that people will create iterators with return
methods and depend on those methods not being called, but then ES7 would start calling them?
Domenic: the argument against is that changing the semantics of for of
--
and all of the standard library methods, in the case of exceptional exit --
would result in a user-visible change to the state of the iterator. That
is, the iterator would not be closed, whereas ES6 as it stands now would
let you continue iteration.
The various proposed deferrals don't prevent the user from observing a different iterator state.
The only working deferral I can think of is to remove for of
from the
spec entirely, and prevent most methods from taking Iterators
as
arguments (since the completion state of the Iterator
would change in
ES7). You might be able to get away with taking Iterable
s as arguments,
even though the user could (for example) abuse the Iterable
interface to
return a singleton Iterator
and thus observe its state. But they'd have
to work hard to subvert the spec that way. This 'deferral' would be a huge
change to the spec, but once would expect most implementors to ignore the
neutered ES6 spec and just implement ES7 iterators.
--scott
I'm not concerned about manually added .return methods, nor manual calls to .return methods, though perhaps I should be. If this is a concern, we could solve it by using an @return symbol in ES7 rather than a string name. Neither am I concerned about system iterators, assuming that none of the system iterators needs a .return method, which depends on Andreas' response to my question.
Rather, I am concerned specifically about the interaction of for/of and generators, even in the absence of any finally clauses in the generator code. If the for/of exits early, under this proposal, the generator will be put into a non-resumable state. In current ES6, the generator is resumable. If ES6 ships with the generators resumable under these conditions, I doubt we will dare make them non-resumable in ES7.
I'm not pointing out an issue in particular, just explaining why Andy is right with his specific statement about implementations not using generators (which Dave seemed surprised about, or have misread, I'm not sure).
(You are right to assume that built-in iterators would not need .return.)
On Fri 25 Apr 2014 16:22, Domenic Denicola <domenic at domenicdenicola.com> writes:
So in the meeting notes the consensus was to send the issue to "generator champions" -- brendan and david herman, and somehow I ended up on CC. We had some backs and forths but as it seems that TC39 members are choosing to discuss this issue here, I'll repost my initial note here. Please read the meeting notes for a description of Jafar's use case. I'm a bit grumpy that this is being brought up again, and this late, and in multiple forums, but as it seems that people want to talk about it again, that talking about it again is the thing to do...
If I may summarize Jafar's argument, it's that the iterator in a for-of may hold a scarce resource, like a file descriptor, and because of that, for-of should be able to release this scarce resource on an early exit via "break". The provisional consensus elaborates a method to do this.
Is this a fair summary?
I sympathise with Jafar's plight but I think that the current setup is the best we can do. The summary of my argument is this:
(1) calling return() on iterators is rarely appropriate;
(2) return() in generators is semantically weird; and
(3) making for-of call return() on early exit is expensive at run-time.
I should note first that this situation is not limited to generators, so the starting point mention of "finally" blocks is something of a distraction. To the extent that this issue applies to generators, it also applies to other kinds of iterators. Indeed I expect that in practice most iterators in an ES6 program will be map, set, and array iterators, which in practice will not be implemented with generators. Incidentally I think that if TC39 decided to re-add this method, it should be called close() instead, because it doesn't make sense to "return" from a non-generator iterator.
== Calling return() on iterators is rarely appropriate
Again I do sympathise with the use case, but we should start with a discussion of what is the common case.
If we knew that the @@iterator call in the for-of would return a fresh iterator, then it would make more sense to provide some means for closing on early exit. Jafar argues that this is in fact the common case, which sounds about right to me.
However, holding a scarce resource is also likely to be uncommon. It certainly doesn't come up in the browser, for example. I think it's reasonable in that rare case to require some thought on the part of the user as to what scarce resources they have acquired, and to arrange to release them as appropriate.
Granted, if you are a user of an iterator, you might not know that it has a scarce resource. So there are two cases here: one in which the iterator was created by its consumer, and one in which the consumer is decoupled from the producer.
The first case is the one Jafar gives in his notes:
for (var line of openFile("foo.txt")) if (line == '-- mark --') break;
However in this case it is possible to arrange to close the iterator, with a different interface:
var file = openFile("foo.txt"); try { for (var line of lines(file)) if (line == '-- mark --') break; } finally { file.close(); }
Among other possibilities. Something like Python's "with" might be appropriate here. The point is that although in this case, calling return() on the iterator may indeed be appropriate, the desired behavior can still be implemented.
Note that there is nothing special about for-of or iterators in this example; any abstraction that captures a scarce resource has to do the same thing. It is not that generators are unable to abstract over IO -- it is that they are unable to transparently abstract over scarce resource acquisition. No surprise there.
The other case is when you have an iterator consumer which is decoupled from the code that created the iterator, as in:
function (iterable) { ... for (var x of iterable) { if foo(x) break; } ... }
But it is precisely in this case when you would not want to close the iterator, because you don't know its lifetime.
== return() in generators is semantically weird
I know the argument has already been made, but I would like to repeat my point (2) from esdiscuss/2013-May/030683, namely that close() "complicates the mental model of what happens when you yield." It's really strange to consider a yield as not only an expression that produces a value, or possibly a point at which an exception could be thrown, but also a "return". Bizarre. It's a hazard to reading generator functions.
Also, the insistence on a return() that doesn't run catch blocks seems to me to be ill-placed. I think it's telling that the counter-examples are from Python, which has a different semantic model, as it has finalization. Implementing abstractions over scarce resources in JS is going to necessarily involve different design patterns than those used by Python. For the given use-case, throw() is entirely sufficient. If you don't trust your generators to do the right thing on an exception, you shouldn't be acquiring scarce resources!
Finally, the given use-case is incompletely specified; a loop can exit prematurely through exceptions as well as through "break". So really what is proposed is a finally block in every for-of statement, which brings me to my next point...
== Calling return() on early exit from for-of is expensive
Wrapping a try/finally around each for-of is going to be really expensive in all engines right now. I'm skeptical about our ability to optimize this one away. Avoiding try/catch around for-of was one reason to move away from StopIteration, and it would be a pity to re-impose this cost on every for-of because of what is, in the end, an uncommon use case. I think the expected result of doing this would be performance lore to recommend using other iteration syntaxen instead of for-of.
There's no perfect answer when it comes to abstractions over scarce resources. Given the constraints of what JS is, its finalization model, its deployment in the browser, and its engines, for me the status quo is the best we can do. I know that for people that open file descriptors, that's somewhat unsatisfying, but perhaps such a cross is what goes with the crown of being a true Unix hacker ;)
,
Andy