Immediate closing of iterators
On Dec 14, 2006, at 9:01 PM, Jeff Thompson wrote:
This is a follow-up to a bugzilla discussion at: bugzilla.mozilla.org/show_bug.cgi?id=349326
Jeff, thanks for writing. I'll try to add relevant detail below,
since this question is complicated by a bug you hit when testing in
the js shell built from SpiderMonkey sources, I think. Plus, the
whole topic is complicated, period.
var was_closed = false; function gen() { try { yield 1; } finally { was_closed = true; } }
for (var i in gen()) break; print("was_closed=" + was_closed);
Right now, this prints "was_closed=false" because when it breaks
out of the for loop, Javascript does not close the iterator, and does not execute the
finally clause to set was_closed true.
Only if you run the above in the js shell, which shows up a bug
(bugzilla.mozilla.org/show_bug.cgi?id=363917). I just fixed
that bug in my build, and with this variation on your testcase:
var was_closed = false; function gen() { try { yield 1; } finally { was_closed = true; print("closed!"); } }
for (var i in gen()) break; print("was_closed=" + was_closed);
gc(); // <=== force a GC here
I get this output:
was_closed=false global closed!
The gc() call is needed, otherwise the shell does a final GC after
its global object has become unreachable, and any generator scoped by
an unreachable global object is not closed, which is intentional (see
below for why).
This HTML version of your test:
<textarea id="t" rows="4"></textarea> <script type="application/javascript;version=1.7">
var tarea = document.getElementById('t'); var print = function (s) { t.value += s + '\n'; } var was_closed = false; function gen() { try { yield 1; } finally { was_closed = true; print("closed!"); } }
for (var i in gen()) break; print("was_closed=" + was_closed);
function toString() { return "global"; } print(this); </script>
works as expected (if not as desired), with no forced GC (Firefox
runs a GC soon enough after page load that you can see the "closed!"
text in the textarea appear, but off the page-load critical path).
This may seem cold comfort, since the finalization happens well after
the loop terminates.
IOW, what's wanted is not a guarantee that close always runs the
generator (a finally clause). If ES4 did require that, trivial
denial-of-service attacks exist; this is why JS1.7 doesn't close
generators that are unreachable if their scope is unreachable.
What's wanted is that close run promptly only in the case of a for-
in loop where no reference to the iterator-generator escapes to the
heap.
This is a reasonable thing to want, and it's what https://
bugzilla.mozilla.org/show_bug.cgi?id=349326 requests. The question
is, should ES4 require such prompt finalization in the case where the
for-in loop creates the iterator and it never escapes to the heap?
Here's another variation on your testcase:
var was_closed = false; function gen() { try { yield 1; } finally { was_closed = true; print("closed!"); } }
var i = gen().next(); print("was_closed=" + was_closed);
With the js shell bugfix, but without the explicit gc() call at the
end, this too will fail to close the iterator returned by gen(). But
there's no for-in loop here, so if you think this case shows a bug,
then you are not just asking for for-in loops to close non-escaping
generator-iterators -- you are asking all ES4 implementations to use
reference counting or something equivalent, which promptly finalizes
unreachable generator-iterators. That's a harsher requirement for a
standard to make on all implementations.
I want to argue that Javascript should close the iterator:
- No guarantee is made that the 'finally' clause will ever be executed, so if you need it to close a file, etc. it might never
happen.
You can't count on deferred scripted functions such as timeouts
running in the browser. The page may be unloaded and the timeout
canceled. The situation with close hooks (finally clauses in
generators) is entirely analogous.
Also (and this is really an aside -- I'm not arguing about the for-in-
loops-should-close-non-escaping-generator-iterators point):
finalization should never be required to promptly free scarce
resources that have been explicitly allocated by a program. ECMA
requires some kind of GC, but not promply finalizing GC.
Note that finalization is never canceled, but finally clauses in a
scripted finalize hook (that's what a generator that yields from a
try with a finally is) may have to be canceled, just as timeouts may
be canceled.
- This seems the opposite of how 'finally' works, which usually
means that you can be sure it will execute, even if the 'try' block throws an
exception, etc.
Python actually does not guarantee that finally clauses in all
generators run -- if the generator misbehaves by yielding while being
closed, then an outer finally may not be run. This is considered the
best way to deal with a misbehaving generator. The ES4 design avoids
this by throwing an exception from yield during close, which runs
finallys on the way out if uncaught.
Anyway, the js shell bug aside, the issue for es4-discuss is not
whether finally must always run (it can't or DOS attacks are trivial
using generators); it's whether for-in loops should promptly finalize
non-escaping iterators.
- Python 2.5 and C# both close the iterator and execute the
'finally' clause (as one would expect)- In fact, before version 2.5, Python gives a compiler error for
this code:'yield' not allowed in a 'try' block with a 'finally' clause I guess this is because before 2.5, Python did not always close
the iterator when leaving a for loop from an exception, etc., so they didn't let
you write code that wouldn't work like you expect.
Right; we prototyped this but then tracked 2.5; later, we eliminated
GeneratorExit (see the python-dev thread mentioned above).
So, could the spec require Javascript to always close the iterator
when leaving a loop?
So long as it's a non-escaping generator-iterator created by the
loop, then the spec could mandate that. It requires some extra work
by non-reference-counting implementations. They need to keep track
of such generator-iterators across nested loops in each live function
or script activation, and close each generator-iterator as control
exits its loop.
We'll talk about it in the current TG1 meeting, which finishes
tomorrow (day 3). More then, or here in this thread of TG1'ers prefer.
Brendan Eich wrote:
What's wanted is that close run promptly only in the case of a for-in loop where no reference to the iterator-generator escapes to the heap.
This is a reasonable thing to want, and it's what bugzilla.mozilla.org/show_bug.cgi?id=349326 requests.
Yes, as you say, I'm only talking about the case of the for-in loop. Thanks for looking into this.
On Dec 15, 2006, at 11:34 AM, Jeff Thompson wrote:
Brendan Eich wrote:
What's wanted is that close run promptly only in the case of a
for-in loop where no reference to the iterator-generator escapes
to the heap. This is a reasonable thing to want, and it's what https:// bugzilla.mozilla.org/show_bug.cgi?id=349326 requests.Yes, as you say, I'm only talking about the case of the for-in loop. Thanks for looking into this.
TG1 members here today agreed to specify normative prompt close if
the for-in's generator-iterator can't escape -- yay!
I'll comment in the Mozilla bug.
On Dec 14, 2006, at 11:51 PM, Brendan Eich wrote:
<textarea id="t" rows="4"></textarea> <script type="application/javascript;version=1.7"> var tarea = document.getElementById('t'); var print = function (s) { t.value += s + '\n'; }
Heh; this works, but careful readers may wonder why, since the print
function uses t.value, not tarea.value. Turns out Firefox emulates an
IE DOM quirk, only if the name lookup would fail, where elements
given ids (or names? I forget) reflect as global properties. Sick,
but we found too many pages counting on it without detecting non-IE
user agent.
So, could the spec require Javascript to always close the iterator when leaving a loop?
So long as it's a non-escaping generator-iterator created by the loop, then the spec could mandate that. It requires some extra work by non-reference-counting implementations. They need to keep track of such generator-iterators across nested loops in each live function or script activation, and close each generator-iterator as control exits its loop.
Unless the definition of "created by the loop" is very strict won't this effectively mandate a full GC whenever you leave a for-in loop? Even reference counting implementations would have to detect unreachable cycles in the object graph.
On Dec 20, 2006, at 3:02 AM, Chris Hansen wrote:
So, could the spec require Javascript to always close the iterator when leaving a loop?
So long as it's a non-escaping generator-iterator created by the loop, then the spec could mandate that. It requires some extra work by non-reference-counting implementations. They need to keep track of such generator-iterators across nested loops in each live function or script activation, and close each generator-iterator as control exits its loop.
Unless the definition of "created by the loop" is very strict won't this effectively mandate a full GC whenever you leave a for-in loop? Even reference counting implementations would have to detect unreachable cycles in the object graph.
function gen() { yield 1; yield 2 } for (let i in gen()) print(i)
The generator function gen, when invoked to the right of 'in' in the
loop head, constructs a generator-iterator that is referenced only by
the implementation (a stack slot in typical implementations). Its
reference can't be discovered by any meta-object protocol. It should
die when the loop completes, abruptly or normally.
function gen() { yield 1; yield 2 } for (let i in gen()) print(i)
The generator function gen, when invoked to the right of 'in' in the loop head, constructs a generator-iterator that is referenced only by the implementation (a stack slot in typical implementations). Its reference can't be discovered by any meta-object protocol. It should die when the loop completes, abruptly or normally.
That's true, the question is: how do you make sure that it's only on the stack and not stored in some instance variable or closure if you don't run a full GC?
function gen1() { yield 1; yield 2; }
function gen2() { globalVar = gen1(); return globalVar; }
for (let i in gen2()) { print(i); }
In this case the generator shouldn't be closed. If gen2 had been defined as
function gen2() { return gen1(); }
I would expect that it should.
On Dec 20, 2006, at 3:49 PM, Chris Hansen wrote:
In this case the generator shouldn't be closed. If gen2 had been
defined asfunction gen2() { return gen1(); }
I would expect that it should.
This is, as you say, asking for a full reference graph analysis. We
don't propose that.
Consider two cases:
-
The Mozilla bug Jeff Thompson mentioned, https:// bugzilla.mozilla.org/show_bug.cgi?id=349326, wants only a generator- iterator created "under the hood" to be closed promptly:
for (i in o) break;
where o denotes some object that's not an iterator -- an object that
does not have an iterator::get method at all, or whose iterator::get
returns a different object (o is the iterable, the returned object is
its iterator).
In this case, the for-in loop can tell whether it is calling a well-
known iterator::get native method, or the default used when there is
no o.iterator::get, whose result is newborn and where the result
can't escape. The returned iterator could be a generator-iterator:
o.iterator::get = function () { yield 1; yield 2 }
The internal Generator class's iterator::get method (called
iterator in JS1.7 in Firefox, for want of full namespace support)
is immutable (in ECMA-262 terms a generator-iterator's iterator::get
property has the ReadOnly and DontDelete attributes set). This
method returns its |this| parameter, since a generator is an iterator.
The for-in loop can tell whether o's iterator::get (mutable, but it
doesn't matter for this analysis) references a generator-function.
It knows that the generator-function will return a new generator-
iterator that can't escape, because the Generator class prototype's
iterator::get method can't be replaced or shadowed, so its return
value can't be spied on. Therefore in this case, the implementation
can guarantee prompt close without a full heap scan.
-
To handle the
for (i in gen()) break;
case, the implementation would need to recognize gen, whether via
static analysis in strict mode or in any sufficiently optimizing
implementation, or dynamically in standard mode or in cases where
static analysis can't decide, as a generator function. At this point
the analysis for case 1 applies.
Anything else (function returning generator-iterator that might have
escaped to a global, etc.) defeats the prompt close promise.
Does this work for the cases users care about? I believe it does,
but welcome counterexamples.
This means that you can't even delegate a call that returns a generator without voiding this guarantee?
class CollectionWrapper { var collection = ...some collection...; function gen() { return this.collection.gen(); } }
var c = new CollectionWrapper(); for (i in c.gen()) break;
If the guarantee is voided so easily, and by operations that users will percieve as completely trivial, I think people should be very reluctant to ever rely on it. And then why even have it?
On Dec 21, 2006, at 3:18 AM, Chris Hansen wrote:
This means that you can't even delegate a call that returns a generator without voiding this guarantee?
class CollectionWrapper { var collection = ...some collection...; function gen() { return this.collection.gen(); }
You're right, the case 2 analysis I wrote up in my last message was
too restrictive. If you put a type annotation on function gen, then
the guarantee should hold:
class CollectionWrapper { var collection = ...some collection...; function gen():Generator.<,,*> { return this.collection.gen(); } }
(You could use narrower types than * if appropriate.)
From developer.mozilla.org/es4/proposals iterators_and_generators.html:
"A function containing a yield expression is a generator function,
which when called binds formal parameters to actual arguments but
evaluates no part of its body. Instead, it returns a generator-
iterator of nominal type Generator:
class Generator.<O, I, E> { public function send(i: I) : O, public function next() : O; public function throw(e : E) : O, public function close() : void }; "
So the guarantee has two conditions: that the object to the right of
'in' in the 'for-in' loop is not an iterator (case 1 in my last
message); or if it is, that it is a generator-iterator (case 2,
revised). A generator-iterator is an instance of Generator.<O,I,E>.
In this light, case 1 is really just an optimization to allow
finalization of any non-generator iterator created for an iterable o
given for (i in o), where o has an iterator::get method that's not a
generator-function. It's not just an early out to calling close,
because the iterator may not have a close hook (may not be a
Generator); the benefit is that it can be finalized promptly. This is
informative not normative, since case 2 would uphold the guarantee. I
wasn't clear last time, sorry about that.
You're right, the case 2 analysis I wrote up in my last message was too restrictive. If you put a type annotation on function gen, then the guarantee should hold:
class CollectionWrapper { var collection = ...some collection...; function gen():Generator.<,,*> { return this.collection.gen(); } }
But if the rule is less restrictive I would claim that my original objection applies: you need to run a full gc whenever a loop exits abruptly to determine whether or not the generator is reachable:
class CollectionWrapper { var collection = ...some collection...; function gen():Generator.<,,*> { var result = this.collection.gen(); if (someCondition) myGlobal = result; return result; } }
On Dec 21, 2006, at 9:55 AM, Chris Hansen wrote:
You're right, the case 2 analysis I wrote up in my last message was too restrictive. If you put a type annotation on function gen, then the guarantee should hold:
class CollectionWrapper { var collection = ...some collection...; function gen():Generator.<,,*> { return this.collection.gen(); } }
But if the rule is less restrictive I would claim that my original objection applies: you need to run a full gc whenever a loop exits abruptly to determine whether or not the generator is reachable:
Sorry, caffeinated now. The outcomes to avoid are:
- Requiring a full gc on loop exit, obviously a non-starter for
performance reasons. - Requiring reference counting of all implementations.
- Requiring static escape analysis of all implementations.
What I was groping for with the revision to case 2 is this: a way for
the type system to promise no escape, and for all implementations to
trivially check it, when a delegate returns the result of a generator-
function call. This type-checking can't require 3, so the form of
the delegate would have to be restricted to linear flow, if not
return of generator call.
Given the constraints, what's better: simple rules for direct
generator calls, or special-casing in the type system just to allow
delegation with guarantee of prompt close?
In C#, as I understand it, they sidestep this problem by (I'll try to formulate this in terms of ES4) simply not allowing you to iterate over generators. You can iterate over objects with an iterator::get method but the returned object is owned by the loop and if it is a generator it will be closed on loop exit whether or not others have references to it. Also, a generator doesn't have an iterator::get method since that would complicate the question of who "owns" it.
That only "solves" Jeff's problem by disallowing it (ta-daa! ;-) but it does away with the need for any kind of finalization, prompt or not. In my experience (from java) GC finalization is something you want to steer well clear of.
What do you think?
On Dec 21, 2006, at 12:04 PM, Chris Hansen wrote:
In C#, as I understand it, they sidestep this problem by (I'll try to formulate this in terms of ES4) simply not allowing you to iterate over generators. You can iterate over objects with an iterator::get method but the returned object is owned by the loop and if it is a generator it will be closed on loop exit whether or not others have references to it.
That's not Pythonic, but it certainly is simple. I like it.
Also, a generator doesn't have an iterator::get method since that would complicate the question of who "owns" it.
This is not a problem in the ES4 proposal, or in Python. Ownership
of storage and close are coupled only to guarantee that close happens
eventually, even if the client code fails to call gen.close()
explicitly. More below.
That only "solves" Jeff's problem by disallowing it (ta-daa! ;-) but it does away with the need for any kind of finalization, prompt or not. In my experience (from java) GC finalization is something you want to steer well clear of.
Finalization is definitely two-phase in systems that have to support
close (which might resurrect the generator) and then release its
storage. Those of us burdened with GC-based memory management for ES/
JS/AS implementations have to dance with the GC here.
The extensions in SpiderMonkey were not pretty, but became better
after some over-general wrong turns were undone. The current code is
quite concrete: only generators participate in the close phase;
scheduling close operations requires cooperation from the browser
embedding, to prevent trivial Denial Of Service attacks (similar to
those possible already via setTimeout). IOW, a browser embedding may
cancel outstanding close ops in some cases (e.g., the document whose
script created the generator has been unloaded already).
For ref-counting implementations, there's still the possibility of a
reference cycle. So CPython, e.g., has to run close from its cycle
collector, if it didn't already get called by the client code.
CPython doesn't worry about DOSes from generators that resurrect
themselves or spawn more generators from their close hooks, AFAIK.
What do you think?
It's a great simplifying change, but it doesn't avoid the need to add
a close protocol to one's GC, as noted above (there's no way around
that problem, so I'm not dinging your suggestion, just trying to
separate issues). I'll run it up the pole with SpiderMonkey hackers
and look for comments here from other ES4 parties.
Also, a generator doesn't have an iterator::get method since that would complicate the question of who "owns" it.
This is not a problem in the ES4 proposal, or in Python. Ownership of storage and close are coupled only to guarantee that close happens eventually, even if the client code fails to call gen.close() explicitly. More below.
By the "owner" of a generator I meant the loop responsible for closing it, not the storage owner. But maybe there should be an iterator::get method on generators -- people just have to be aware that loops close generators, which might be confusing.
That only "solves" Jeff's problem by disallowing it (ta-daa! ;-) but it does away with the need for any kind of finalization, prompt or not. In my experience (from java) GC finalization is something you want to steer well clear of.
Finalization is definitely two-phase in systems that have to support close (which might resurrect the generator) and then release its storage. Those of us burdened with GC-based memory management for ES/ JS/AS implementations have to dance with the GC here.
I see, you would still need finalization to guarantee that generators not created by loops are eventually closed. But is that a guarantee you actually need to make -- especially if it complicates the implementation and potentially opens the browser up for a new type of DOS attacks? C# doesn't guarantee this. In java, even though you're guaranteed that finalizers will be run, they advise people not to rely on them. Instead, it could just be part of the contract on Generator: if you create it, you have to close it (unless you know that close is a no-op). It's not something that people are likely to do often and I think they will close explicitly anyway rather than rely on finalization, which adds a source of nondeterminism to a program. Especially if the browser might actually cancel close ops.
Also, having finalization will mandate a non-conservative GC.
On Dec 21, 2006, at 2:43 PM, Chris Hansen wrote:
Also, a generator doesn't have an iterator::get method since that would complicate the question of who "owns" it.
This is not a problem in the ES4 proposal, or in Python. Ownership of storage and close are coupled only to guarantee that close happens eventually, even if the client code fails to call gen.close() explicitly. More below.
By the "owner" of a generator I meant the loop responsible for closing it, not the storage owner.
I see. That makes sense in the context of a loop or comprehension,
but generators are used otherwise. The "Motivation" lead-in from PEP
342 (www.python.org/dev/peps/pep-0342) says:
[PEP 255 generators, which lacked close] do not allow execution to be
paused within the "try" portion of try/finally blocks, and therefore
make it difficult for an aborted coroutine to clean up after itself.
Automating close only for generators iterated by for-in loops leaves
non-loop use cases that nevertheless need to clean up after
themselves out in the cold.
But maybe there should be an iterator::get method on generators
I think we should stick to the Pythonic rule that iterators return
themselves from their iterator::get method (Python calls it iter
but we can avoid ugly names with less::ugly namespacing).
-- people just have to be aware that loops close generators, which might be confusing.
Yes. A Python hacker moving to ES4 might be outraged by any
deviation, but we've already done away with GeneratorExit in favor of
a forced return (which PEP 325 suggested as one of two approaches,
not the one PEP 342 chose; python-dev interactions have led to both
PEP 342 authors favoring the forced-return approach in a future
version of Python). I've argued that we should avoid gratuitous
differences with Python, given that we are specifying Python-like
generators and not threads, call/cc, or general coroutines for ES4.
But clearly we can afford to diverge, or even try to anticipate, at
the boundary cases.
OTOH, it sounds like C# (and perhaps IronPython? How would it do
otherwise given the single GC ruling the CLR?) may set the "close on
loop exit" expectations for another segment of hackers.
We can document this and stick to it, my gut says.
That only "solves" Jeff's problem by disallowing it (ta-daa! ;-)
but it does away with the need for any kind of finalization, prompt or not. In my experience (from java) GC finalization is something you want to steer well clear of.Finalization is definitely two-phase in systems that have to support close (which might resurrect the generator) and then release its storage. Those of us burdened with GC-based memory management for
ES/ JS/AS implementations have to dance with the GC here.I see, you would still need finalization to guarantee that generators not created by loops are eventually closed. But is that a guarantee you actually need to make -- especially if it complicates the implementation and potentially opens the browser up for a new type of DOS attacks?
As Gosling remarked over a decade ago, "DOS attacks are a dime a
dozen" (my paraphrase ;-).
The DOS hazard is like setTimeout, and we've dealt with it for
Firefox 2. We don't bother closing any generator-iterator whose
static scope parent (the window in whose scope the generator was
constructed) is unmarked after the GC's mark phase. In this case,
both the generator and its window object are gone (just not yet
finalized). If you reparent a generator from window A to window B,
by creating it in a script loaded in A, storing it in B, unloading or
closing window A, and then unloading or closing B without manually
calling the generator's close, again the system won't call close for
you.
Any long-lived generator whose window affinity changes will have to
be manually closed.
But consider the general case, outside of the cramped world of web
page scripts: XUL apps and extensions, or other SpiderMonkey
embeddings, will use generators in various ways, and not be subject
to DOS attacks. These embeddings expose stateful APIs to trusted
code, and failing to call close may leak an OS resource or fail to
synchronize important state.
C# doesn't guarantee this. In java, even though you're guaranteed that finalizers will be run, they advise people not to rely on them. Instead, it could just be part of the contract on Generator: if you create it, you have to close it (unless you know that close is a no-op). It's not something that people are likely to do often and I think they will close explicitly anyway rather than rely on finalization, which adds a source of nondeterminism to a program. Especially if the browser might actually cancel close ops.
Agreed on the wisdom of untimely cleanup from finalization.
However, the try/finally issue still makes us want to follow Python,
for general purposes (not just browser purposes). Lacking a DOS
threat, and assuming the generator is written correctly, we convinced
ourselves this summer that finally clauses should run from close
after the last yield in a try has returned a value, and the caller or
the GC is done with the generator. If the caller forgets to close,
finally still should run (says the ES3 spec in all of the cases it
defines).
You could argue that breaking finally to avoid DOS attacks means no
one can count on it in a generator that yields from its try block,
but I think that overstates the case.
We tried that out on some Pythonistas with an earlier variation:
outer finally should run even if the generator yields when called
from close. The response was "the generator's broken, it should get
an exception immediately rather than a chance to run outer finally
blocks." Python is not totally consistent here, since an explicit
close that yields will fail, aborting the close, and the GC will
retry, so the misbehaving generator will get another chance to run.
In Firefox 2, a bad generator of the form
function badsanta() { try { yield "rock"; } finally { try { yield "barf"; } finally { print("closing"); } } } var it = badsanta(); it.next(); it.close();
will throw from the 'yield "barf"', and print "closing" from the
inner finally. We say if you're bad, you still get finally
guarantees. :-)
You're right that "timely release" means "don't count on the GC", and
we've seen bad embeddings of SpiderMonkey go down the path that Java
warns against, time after time. We're not trying to facilitate such
bad programming. But we do think finally should work in generators
even when the user forgets to close, in the absence of DOS threats.
Also, having finalization will mandate a non-conservative GC.
Why?
Chris Hansen writes:
Also, having finalization will mandate a non-conservative GC.
I assume you're trying to guarantee that the finalizer is run once the object becomes garbage. But non-conservative generational GCs in general do not make any guarantees about the promptness of collection of any particular object, thus also do not guarantee anything about the running of finalizers. A generational GC that maintains an set of independently collectable generations and guesses which ones of them are the best to collect may never collect particular generations if it believes those generations contain very little garbage. A very quick scan of the work by Detlefs et al on "Garbage-first" GC suggests that this collector might behave like that, for example.
You're right, non-conservative GC is still possible.
I thought the guarantee was that an unreachable generator will eventually be closed. If you're using a conservative GC you may not be able to discover whether or not an object is dead and hence may keep dead objects alive indefinitely. With a generational GC you may keep objects alive long after they're dead but at least you can, if you want, determine with certainty whether or not an object is dead. If you had a policy that caused a full collection to be run of all generations with some regularity (and the spec could mandate that) a generational GC could offer (what I thought was) the guarantee.
If, on the hand, the guarantee is that a generator will be closed before its space is reclaimed then using a conservative GC is still fine because the guarantee doesn't deal with generators that are not discovered to be garbage. But then is that a guarantee that is really useful to anyone?
What I meant was of course that conservative GC is still possible, not non-conservative. D'oh.
Automating close only for generators iterated by for-in loops leaves non-loop use cases that nevertheless need to clean up after themselves out in the cold.
I wouldn't call the absence of automatic closure "leaving them out in the cold". The coroutine generators described in PEP 342 are similar to threads, and the analogous situation for threads would be to guarantee that a thread is automatically "closed" and has all its pending finally clauses run, even if it never returns. If a thread blocks on some call and never runs again I wouldn't consider it "being left out in the cold" if the underlying system didn't forcefully close it eventually and run finally clauses. I would argue that the same reasoning applies to generators: if you create them and want to make sure they're closed then you have to close them yourself. I don't think people (except pythonians maybe) would consider that unreasonable.
But consider the general case, outside of the cramped world of web page scripts: XUL apps and extensions, or other SpiderMonkey embeddings, will use generators in various ways, and not be subject to DOS attacks. These embeddings expose stateful APIs to trusted code, and failing to call close may leak an OS resource or fail to synchronize important state.
Since finalization is tied to the GC (which might be conservative) it cannot, by definition, be relied on to release resources in something that resembles a timely fashion, or at all.
However, the try/finally issue still makes us want to follow Python, for general purposes (not just browser purposes). Lacking a DOS threat, and assuming the generator is written correctly, we convinced ourselves this summer that finally clauses should run from close after the last yield in a try has returned a value, and the caller or the GC is done with the generator. If the caller forgets to close, finally still should run (says the ES3 spec in all of the cases it defines).
I would claim that as soon as generators start to resemble threads or coroutines the situation changes. In java, if a thread blocks indefinitely in a try clause then then the finally clause will never be executed (at least not as far as I can tell).
We say if you're bad, you still get finally guarantees. :-)
If you do have finally guarantees I think that's the right thing to do. Of course I'm arguing for the polar opposite: even if you're good you don't get any guarantees :-).
There's another issue with finally guarantees that I don't think has been mentioned: which thread is it that closes the generators? Consider this example:
let genCount = 0; function makeGenerator() { genCount++; return myGenerator(); }
function myGenerator() { try { yield 1; yield 2; } finally { genCount--; } }
Imagine that the program has been running for a while and the GC decides to set in right after makeGenerator has read genCount but before it has incremented it. If the GC happens to close some of these generators then genCount will be in an inconsistent state when execution continues. If the GC doesn't then who does?
Sorry about the mail address mixup by the way; that's what you get when you mix private and work mail... I did think it was odd, though, that the posting even got through since my work mail address isn't a subscriber to es4-discuss.
On Dec 22, 2006, at 1:23 AM, Chris Hansen wrote:
I thought the guarantee was that an unreachable generator will eventually be closed. If you're using a conservative GC you may not be able to discover whether or not an object is dead and hence may keep dead objects alive indefinitely.
This is a "quality of implementation" issue, about which the spec
will not dictate.
Conservative GC in runtimes I'm familiar with works pretty well,
trading reduced risk of humans failing to manage the root set
correctly against risk of false positive. Such codebases use
classifying allocators pervasively, putting images and other non-
pointer data in unscanned allocations. The hard cases are the thread
stacks managed by good old C and C++, where the GC has no type
information. False positives are possible here, and they can cause
bloat, but it's bounded by the last-in-first-out discipline. If your
event loop has a float on the stack that aliases a pointer into the
heap, though, ....
With a generational GC you may keep objects alive long after they're dead but at least you can, if you want, determine with certainty whether or not an object is dead. If you had a policy that caused a full collection to be run of all generations with some regularity (and the spec could mandate that) a generational GC could offer (what I thought was) the guarantee.
The spec is not going to mandate that; it's not the guarantee we're
looking for.
If, on the hand, the guarantee is that a generator will be closed before its space is reclaimed then using a conservative GC is still fine because the guarantee doesn't deal with generators that are not discovered to be garbage. But then is that a guarantee that is really useful to anyone?
Good question. We thought so, more on formal grounds to-do with try/
finally, than on practical grounds that don't hold up as soon as you
hypothesize code failing to call close on a generator and expecting
timely GC to call it. Anyone else care to weigh in?
On Dec 22, 2006, at 9:05 AM, Chris Hansen wrote:
I would argue that the same reasoning applies to generators: if you create them and want to make sure they're closed then you have to close them yourself. I don't think people (except pythonians maybe) would consider that unreasonable.
This is the 64,000 eollar (or euro, stabler currency :-/) question.
Since finalization is tied to the GC (which might be conservative) it cannot, by definition, be relied on to release resources in something that resembles a timely fashion, or at all.
If you can't rely on the GC to recover memory in the absence of
client code bugs that generate uncollectible graphs, you need a new GC.
I would claim that as soon as generators start to resemble threads or coroutines the situation changes. In java, if a thread blocks indefinitely in a try clause then then the finally clause will never be executed (at least not as far as I can tell).
It's a good point, I say again I like it. Need to get feedback from
others.
There's another issue with finally guarantees that I don't think has been mentioned: which thread is it that closes the generators? Consider this example:
let genCount = 0; function makeGenerator() { genCount++; return myGenerator(); }
function myGenerator() { try { yield 1; yield 2; } finally { genCount--; } }
Imagine that the program has been running for a while and the GC decides to set in right after makeGenerator has read genCount but before it has incremented it. If the GC happens to close some of these generators then genCount will be in an inconsistent state when execution continues. If the GC doesn't then who does?
ECMA-262 does not define anything to-do with threads, and ES4 won't
either.
In the browser, the execution model is run-to-completion. No event
handler, not even a timeout, may preempt a running script (even if
the script is flying a modal dialog; contrary behavior including in
Firefox is a bug). The GC will not preempt makeGenerator at
arbitrary points, either. It might run synchronously (nesting on the
current thread stack) at a backward branch or return (after the
result has been evaluated), or from a native method, getter, or
setter. It won't run in the midst of genCount++.
Any multi-threaded browser has to preserve the invariants of this
model, for backward compatibility and developer sanity.
The browser execution rules are not well-specified, but they should
be, in a browser embedding spec (not directly in ES4). Fodder for
the WHATWG or W3C.
Since finalization is tied to the GC (which might be conservative) it cannot, by definition, be relied on to release resources in something that resembles a timely fashion, or at all.
If you can't rely on the GC to recover memory in the absence of client code bugs that generate uncollectible graphs, you need a new GC.
If the spec is written such that an implementation is allowed to use a conservative GC then, by the definition of conservative GC, the spec must allow for implementations to occasionally not reclaim memory and hence not close generators. If the spec mandates generators to be closed eventually then conservative GCs cannot be used to implement ES4.
If the spec does require implementations to eventually close generators then, as Lars points out, there are still non-conservative GCs that keep several generations and very rarely collect older generations. In that case, while you can expect the GC to close generators, you cannot expect it to happen in a timely fashion. That makes it problematic to rely on the GC to release external resources.
Finally, the spec might require generators to be closed in a timely fashion, for some suitable definition of "timely fashion". I'm sure you don't want to do that.
In the absence of finalization none of these problems occur because then the collection or non-collection of an unreachable object cannot be observed from the program.
There's another issue with finally guarantees that I don't think has been mentioned: which thread is it that closes the generators? Consider this example:
let genCount = 0; function makeGenerator() { genCount++; return myGenerator(); }
function myGenerator() { try { yield 1; yield 2; } finally { genCount--; } }
Imagine that the program has been running for a while and the GC decides to set in right after makeGenerator has read genCount but before it has incremented it. If the GC happens to close some of these generators then genCount will be in an inconsistent state when execution continues. If the GC doesn't then who does?
ECMA-262 does not define anything to-do with threads, and ES4 won't either.
In the browser, the execution model is run-to-completion. No event handler, not even a timeout, may preempt a running script (even if the script is flying a modal dialog; contrary behavior including in Firefox is a bug).
So there's actually a "negative" finalization guarantee: in a browser, unreachable and unclosed generators are guaranteed not to be closed until after the script or event has run to completion. Long-running scripts should not rely on generators being closed for them I guess...
The GC will not preempt makeGenerator at arbitrary points, either. It might run synchronously (nesting on the current thread stack) at a backward branch or return (after the result has been evaluated), or from a native method, getter, or setter. It won't run in the midst of genCount++. Any multi-threaded browser has to preserve the invariants of this model, for backward compatibility and developer sanity.
The browser execution rules are not well-specified, but they should be, in a browser embedding spec (not directly in ES4). Fodder for the WHATWG or W3C.
I used a counter to keep the example simple but you can easily imagine a more complex example where the critical region contains a backward branch, return, or one of the other cases.
As for the spec not specifying threading behavior, I would claim that this issue forces you to specify it. If my program relies on nothing but the semantics defined in the spec then the program should run correctly on all spec-compliant implementations. If the spec leaves this question unanswered then I can't hope to write implementation- or embedding-independent code. If the spec doesn't explicitly disallow close methods to be run preemptively then you have, in effect, introduced multi threading -- at least, if I want to write an implementation-independent program I have to consider the possibility that there will be (spec compliant) implementations that finalize preemptively.
On the other hand, if you do disallow close methods to be run preemptively then you're more or less guaranteeing that on any compliant implementation, close methods will not be run in a timely fashion, since they will have to wait for the script to finish. That may not be a problem in browsers but who knows where people might want to embed ES4 and run long-running scripts
On Jan 1, 2007, at 8:36 AM, Chris Hansen wrote:
In the absence of finalization none of these problems occur because then the collection or non-collection of an unreachable object cannot be observed from the program.
Thanks, this is the compelling argument for C#-style close automation
(i.e., the ES4 spec automates close calling on exit from for-in and
for-each-in loops only); all other generator use-cases that want
close must do it themselves. As I've said several times, I'm in
agreement. Is everyone else?
As for the spec not specifying threading behavior, I would claim that this issue forces you to specify it. If my program relies on nothing but the semantics defined in the spec then the program should run correctly on all spec-compliant implementations. If the spec leaves this question unanswered then I can't hope to write implementation- or embedding-independent code.
It's true that ECMA-262 alone cannot be used to write portable "JS",
across (e.g. tellme.com's) VXML server, Macromedia server, Web
server, and Web browser embeddings, just to name some embeddings with
different execution models.
If the spec doesn't explicitly disallow close methods to be run preemptively then you have, in effect, introduced multi threading -- at least, if I want to write an implementation-independent program I have to consider the possibility that there will be (spec compliant) implementations that finalize preemptively.
The spec should not automate close calling from the GC, we agree, so
(I hope we agree that) it can go back to sticking its head in the
sand and pretending the world is single-threaded. Its SML-NJ
reference implementation will not use threads in any way that could
violate the run-to-completion model.
On the other hand, if you do disallow close methods to be run preemptively then you're more or less guaranteeing that on any compliant implementation, close methods will not be run in a timely fashion, since they will have to wait for the script to finish.
The important timely-close use-case is the for-in loop. I can't think
of any others that aren't contrived. But we agree, so it would be
helpful for others on the list who see things differently (including
you Pythonistas) to speak up.
In the absence of finalization none of these problems occur because then the collection or non-collection of an unreachable object cannot be observed from the program.
Thanks, this is the compelling argument for C#-style close automation (i.e., the ES4 spec automates close calling on exit from for-in and for-each-in loops only); all other generator use-cases that want close must do it themselves. As I've said several times, I'm in agreement. Is everyone else?
We agree about C#-style close and that's not what I'm arguing for. I'm arguing against having automatically closing of generators that are not used in for-in loops, for instance ones used as coroutines. It's not my impression that we agree there. But in any case I'll give it a rest and hopefully someone else will have an opinion on this.
The important timely-close use-case is the for-in loop. I can't think of any others that aren't contrived.
You could easily imagine using generators to access files, networks sockets or other external resources, and I don't think it is contrived to imagine such a generator used outside a for-in loop. In that case you don't need prompt finalization and that's not what I mean when I say "timely fashion", but you would want it to happen eventually and you probably don't want too much time to pass before it happens.
On Jan 2, 2007, at 3:23 PM, Chris Hansen wrote:
In the absence of finalization none of these problems occur because then the collection or non-collection of an unreachable object
cannot be observed from the program.Thanks, this is the compelling argument for C#-style close automation (i.e., the ES4 spec automates close calling on exit from for-in and for-each-in loops only); all other generator use-cases that want close must do it themselves. As I've said several times, I'm in agreement. Is everyone else?
We agree about C#-style close and that's not what I'm arguing for. I'm arguing against having automatically closing of generators that are not used in for-in loops, for instance ones used as coroutines. It's not my impression that we agree there.
No, we agree. That's what I keep saying ("I'm sold", etc.), and the
above words are intentionally exhaustive: "all other generator use-
cases [than for-in loops] that want close must do it themselves."
Sorry if I was unclear. You and I have always agreed that timely
release of scarce resources should not depend on GC, but for JS1.7
(in Firefox 2), we followed Python 2.5 closely for the sake of
guaranteeing finally execution when the last yield is from the
matching try. This seems to cater to bad practices; it creates a
hazard for users, especially those coming from the Python world.
Still troubling, but less so, is the problem for Python people who
expect close to be automated in non-for-in-loop cases.
The important timely-close use-case is the for-in loop. I can't think of any others that aren't contrived.
You could easily imagine using generators to access files, networks sockets or other external resources, and I don't think it is contrived to imagine such a generator used outside a for-in loop. In that case you don't need prompt finalization and that's not what I mean when I say "timely fashion", but you would want it to happen eventually and you probably don't want too much time to pass before it happens.
This is a use-case mentioned in the PEPs, but as you've argued it
limits GC implementation choices if ES4 were to require it. Since at
least one ECMA member's ECMA_262 + ES4-like extensions implementation
(ActionScript 3 in the Flash Player) uses conservative GC, I do not
believe that TG1 will or should specify any further close automation
than for for-in loops. That's my opinion, at any rate.
Expect more responses, I've reminded interested people about this list.
On 1/3/07, Brendan Eich <brendan at mozilla.org> wrote:
On Jan 2, 2007, at 3:23 PM, Chris Hansen wrote: for JS1.7 (in Firefox 2), we followed Python 2.5 closely for the sake of guaranteeing finally execution when the last yield is from the matching try.
A simple way to address it is to require at the runtime that when yield is executed inside try with finally, the iterator object must be a part of for-in loop or, potentially, C#-like autoclose(obj) {} block. If this is not the case, an exception should be thrown.
But that means a programmer wouldn't have the option to close it manually in that circumstance..
Peter
On 2007-01-02, at 15:30 EST, Brendan Eich wrote:
On Jan 1, 2007, at 8:36 AM, Chris Hansen wrote:
In the absence of finalization none of these problems occur because then the collection or non-collection of an unreachable object cannot be observed from the program.
Thanks, this is the compelling argument for C#-style close
automation (i.e., the ES4 spec automates close calling on exit from
for-in and for-each-in loops only); all other generator use-cases
that want close must do it themselves. As I've said several times,
I'm in agreement. Is everyone else?
Agreed.
My 2p: finalization is the goto
of GC-ed languages
On 1/4/07, Peter Hall <peterjoel at gmail.com> wrote:
But that means a programmer wouldn't have the option to close it manually in that circumstance..
You mean that it would not be possible to use outside for-in loop a generator with yield inside try with finally? Surely calling generatorInstance.next() will throw an exception in that case but I do not see what would prevent calling generatorInstance.close().
For me the guarantee that finally is always executed at the clear defined moment is worth the restriction. And if some generator want to allow its usage outside for-in, then it must not use yield with finally and rely on the explicit close.
, Igor
On 1/4/07, Igor Bukanov <igor.bukanov at gmail.com> wrote:
On 1/4/07, Peter Hall <peterjoel at gmail.com> wrote:
But that means a programmer wouldn't have the option to close it manually in that circumstance..
You mean that it would not be possible to use outside for-in loop a generator with yield inside try with finally? Surely calling generatorInstance.next() will throw an exception in that case but I do not see what would prevent calling generatorInstance.close().
Nothing would prevent you calling it. I was rather meaning that, in cases where the generator cannot be automatically closed, a programmer should have the option to close it manually, rather than the language disallowing it and throwing an error.
But this is looking like a non-argument, since a concensus seems to have been reached already.
Peter
This is a follow-up to a bugzilla discussion at: bugzilla.mozilla.org/show_bug.cgi?id=349326
var was_closed = false; function gen() { try { yield 1; } finally { was_closed = true; } }
for (var i in gen()) break; print("was_closed=" + was_closed);
Right now, this prints "was_closed=false" because when it breaks out of the for loop, Javascript does not close the iterator, and does not execute the finally clause to set was_closed true. I want to argue that Javascript should close the iterator:
So, could the spec require Javascript to always close the iterator when leaving a loop?
Thanks,