Weak references and destructors

# Charles Jolley (16 years ago)

I was wondering if any thought has gone into adding weak references and destructors to Harmony.

We are finding that as we build large, long running JS apps, it is very hard to keep memory under control using the built-in GC since any reference - even for caching - will prevent the memory from being reclaimed.

If we had a way to keep weak references for caches, the GC could reclaim a lot more of our memory automatically. If a destructor were called before an object was dealloc'ed, we could clean up caches and tear down additional references, possibly allowing further memory to be reclaimed as well.

Of course I can implement something like explicit reference counting in existing ES engines to get around this but then we lose many of the benefits of automated GC.

# Tom Van Cutsem (16 years ago)

I think what you are looking for is this: strawman:weak_references

# Charles Jolley (16 years ago)

That looks pretty much right on. Thanks Tom! Can anyone give any insight as to how likely it is this might actually happen?

# Brendan Eich (16 years ago)

On Dec 10, 2009, at 10:11 AM, Charles Jolley wrote:

That looks pretty much right on. Thanks Tom! Can anyone give any
insight as to how likely it is this might actually happen?

You want me to lay odds? ;-)

It might help to read these wiki pages, in this order:

harmony:harmony, harmony:proposals, strawman:strawman

According to Goals 1(II) "libraries (possibly including the DOM)
shared by those applications" and 1(III) "code generators targeting
the new edition", weak references are pretty important.

I think we'll get some kind of weak reference / ephemeron support in
for the next edition, but we need to hash out details of what's
normative and what's implementation-dependent, and finalize things to
the point where implementors can take the chance to invest in
prototyping.

# Mark S. Miller (16 years ago)

On Thu, Dec 10, 2009 at 11:10 AM, Brendan Eich <brendan at mozilla.com> wrote:

On Dec 10, 2009, at 10:11 AM, Charles Jolley wrote:

That looks pretty much right on. Thanks Tom! Can anyone give any insight as to how likely it is this might actually happen?

You want me to lay odds? ;-)

It might help to read these wiki pages, in this order:

harmony:harmony, harmony:proposals, strawman:strawman

According to Goals 1(II) "libraries (possibly including the DOM) shared by those applications" and 1(III) "code generators targeting the new edition", weak references are pretty important.

I think we'll get some kind of weak reference / ephemeron support in for the next edition, but we need to hash out details of what's normative and what's implementation-dependent, and finalize things to the point where implementors can take the chance to invest in prototyping.

+1.

By all means, let's continue hashing it out. I posted this proposal to es-discuss and presented it to the committee some time ago. I do not recall any serious objections, and I do recall several positive responses. However, the committee has not yet made any decision. If there were serious objections I have forgotten, my apologies, and I ask that you (re?)post them to es-discuss.

There is one very good suggestion from Cormac Flanagan (cc'ed) that I have yet to incorporate that would make the specification much simpler and clearer. Or, Cormac, please feel free to modify the wiki to incorporate your proposal. Since Cormac's proposal doesn't really affect the meaning of the proposal or how programmers would use the API, we should proceed to hash out this proposal now.

If we seem to have approximate consensus here on es-discuss, then I propose we try to make a decision on this at the next committee meeting. (At this stage, a positive decision operationally means moving it from "strawman" to "proposals".) Istvan & John, I'm cc'ing you to request this be put on the agenda. Thanks.

# Brendan Eich (16 years ago)

On Dec 10, 2009, at 11:27 AM, Mark S. Miller wrote:

By all means, let's continue hashing it out. I posted this proposal
to es-discuss and presented it to the committee some time ago. I do
not recall any serious objections, and I do recall several positive
responses. However, the committee has not yet made any decision. If
there were serious objections I have forgotten, my apologies, and I
ask that you (re?)post them to es-discuss.

Allen had some thoughts, but we were out of time at the last face to
face. I'll let him speak for himself.

The issue that I raised at the last meeting, other than naming nits
(which we can defer for now), was in response to this:

"All visible notification happens via setTimeout() in order to avoid
plan interference hazards. Side effects from these notifications
happen in their own event-loop turn, rather than being interleaved
with any ongoing sequential computation. However, this requires us to
promote setTimeout() and event-loop concurrency into the ES-Harmony
specification, which is still controversial."

from strawman:weak_references.

We may not standardize the execution model to the degree you hope.

We also may not agree on notification being guaranteed. At the last
f2f I mentioned how generators in JS1.7+ have a close method, again
after Python, but without the unnecessary GeneratorExit built-in
exception object thrown by close at a generator in case it has yielded
in a try with a finally. Naively supporting notification guarantees
creates trivial denial of service attacks and accidents.

Of course, we could say the universe ends without pending
notifications being delivered, but in the effecttful browser, the
multiverse goes on and there are lots of channels between 'verses ;-).

In general I'd like to decouple weak references from hairy execution
model issues. If we can't do this, then the risk we'll fail to get
weak refs into the next edition go up quite a bit. The obvious way to
decouple is to underspecify.

# David-Sarah Hopwood (16 years ago)

Tom Van Cutsem wrote:

Hi,

I think what you are looking for is this: strawman:weak_references

On Thu, Dec 10, 2009 at 9:44 AM, Charles Jolley <charles at sproutit.com>wrote: [...]

If we had a way to keep weak references for caches, the GC could reclaim a lot more of our memory automatically. If a destructor were called before an object was dealloc'ed, we could clean up caches and tear down additional references, possibly allowing further memory to be reclaimed as well.

The EphemeronTable abstraction described on the above page is ideal for implementing caches. It doesn't require any extra work to explicitly remove entries from the cache; they just go away when the key does.

# Mark S. Miller (16 years ago)

On Thu, Dec 10, 2009 at 11:38 AM, Brendan Eich <brendan at mozilla.com> wrote:

On Dec 10, 2009, at 11:27 AM, Mark S. Miller wrote:

By all means, let's continue hashing it out. I posted this proposal to es-discuss and presented it to the committee some time ago. I do not recall any serious objections, and I do recall several positive responses. However, the committee has not yet made any decision. If there were serious objections I have forgotten, my apologies, and I ask that you (re?)post them to es-discuss.

Allen had some thoughts, but we were out of time at the last face to face. I'll let him speak for himself.

The issue that I raised at the last meeting, other than naming nits (which we can defer for now), was in response to this:

"All visible notification happens via setTimeout() in order to avoid plan interference hazards. Side effects from these notifications happen in their own event-loop turn, rather than being interleaved with any ongoing sequential computation. However, this requires us to promote setTimeout()and event-loop concurrency into the ES-Harmony specification, which is still controversial."

from strawman:weak_references.

We may not standardize the execution model to the degree you hope.

I do think we should standardize (set|clear)(Timeout|Interval) and

event-loop concurrency, as server-side JavaScript use is already moving towards adopting it, making its need independent of the browser. However, I agree that these proposals should be decoupled if possible. Accordingly, I have kludged the weak pointer proposal by modifying the definitional WeakPtr code at the top and adding the following text to the paragraph you quote above:

"In order to postpone the issue, the spec implied by the above code should be taken literally: If there is no global binding for setTimeout or if it bound to a non-callable value (as the time WeakPtr is called), then no notifications happen. If the value of the global setTimeout is callable, then the GC calls it at some arbitrary time, passing in a frozen function whose only purpose is to call the registered executor function. If setTimeout has its normal binding (e.g., in the browser), then the executor will only be called later in a separate turn as expected, protecting us from plan interference hazards. A secure runtime in such an environment can always freeze the global setTimeout property, preventing its redefinition to something that could cause plan interference."

We also may not agree on notification being guaranteed. At the last f2f I mentioned how generators in JS1.7+ have a close method, again after Python, but without the unnecessary GeneratorExit built-in exception object thrown by close at a generator in case it has yielded in a try with a finally. Naively supporting notification guarantees creates trivial denial of service attacks and accidents.

Of course, we could say the universe ends without pending notifications being delivered, but in the effecttful browser, the multiverse goes on and there are lots of channels between 'verses ;-).

In general I'd like to decouple weak references from hairy execution model issues. If we can't do this, then the risk we'll fail to get weak refs into the next edition go up quite a bit. The obvious way to decouple is to underspecify.

I think I'd be willing to weaken this from "eventual notification" to "optional eventual notification." But I do not yet understand this issue. How does a guarantee of eventual notification lead to any more vulnerability to denial of service than "while(true){}" ?

# Brendan Eich (16 years ago)

On Dec 10, 2009, at 10:17 PM, Mark S. Miller wrote:

"In order to postpone the issue, the spec implied by the above code
should be taken literally: If there is no global binding for
setTimeout or if it bound to a non-callable value (as the time
WeakPtr is called), then no notifications happen. If the value of
the global setTimeout is callable, then the GC calls it at some
arbitrary time, passing in a frozen function whose only purpose is
to call the registered executor function. If setTimeout has its
normal binding (e.g., in the browser), then the executor will only
be called later in a separate turn as expected, protecting us from
plan interference hazards. A secure runtime in such an environment
can always freeze the global setTimeout property, preventing its
redefinition to something that could cause plan interference."

"When in doubt, use brute force." - K. Thompson

In general I'd like to decouple weak references from hairy execution
model issues. If we can't do this, then the risk we'll fail to get
weak refs into the next edition go up quite a bit. The obvious way
to decouple is to underspecify.

I think I'd be willing to weaken this from "eventual notification"
to "optional eventual notification." But I do not yet understand
this issue. How does a guarantee of eventual notification lead to
any more vulnerability to denial of service than "while(true){}" ?

There is no guarantee of eventual notification, any more than there's
a guarantee that an armed timeout will fire (navigation away cancels),
or that an iloop can take forever. If there's no guarantee, then the
spec should not say there is.

# David-Sarah Hopwood (16 years ago)

Brendan Eich wrote:

On Dec 10, 2009, at 10:17 PM, Mark S. Miller wrote:

I think I'd be willing to weaken this from "eventual notification" to "optional eventual notification." But I do not yet understand this issue. How does a guarantee of eventual notification lead to any more vulnerability to denial of service than "while(true){}" ?

There is no guarantee of eventual notification, any more than there's a guarantee that an armed timeout will fire (navigation away cancels), or that an iloop can take forever. If there's no guarantee, then the spec should not say there is.

Ah. When I hear MarkM say "eventual", I assume its meaning from E, which does not actually guarantee that the event eventually happens (because the destination vat may be destroyed, or the message may not reach it).

This is probably a defect in E terminology that we shouldn't reproduce.

OTOH, the notification shouldn't be arbitrarily cancelled in situations where events set using setTimeout wouldn't normally be cancelled.

# Erik Corry (16 years ago)

2009/12/11 Mark S. Miller <erights at google.com>:

On Thu, Dec 10, 2009 at 11:38 AM, Brendan Eich <brendan at mozilla.com> wrote:

On Dec 10, 2009, at 11:27 AM, Mark S. Miller wrote:

By all means, let's continue hashing it out. I posted this proposal to es-discuss and presented it to the committee some time ago. I do not recall any serious objections, and I do recall several positive responses. However, the committee has not yet made any decision. If there were serious objections I have forgotten, my apologies, and I ask that you (re?)post them to es-discuss.

Allen had some thoughts, but we were out of time at the last face to face. I'll let him speak for himself. The issue that I raised at the last meeting, other than naming nits (which we can defer for now), was in response to this:  "All visible notification happens via setTimeout() in order to avoid plan interference hazards. Side effects from these notifications happen in their own event-loop turn, rather than being interleaved with any ongoing sequential computation. However, this requires us to promote setTimeout() and event-loop concurrency into the ES-Harmony specification, which is still controversial." from strawman:weak_references. We may not standardize the execution model to the degree you hope.

I do think we should standardize (set|clear)(Timeout|Interval) and event-loop concurrency, as server-side JavaScript use is already moving towards adopting it, making its need independent of the browser. However, I agree that these proposals should be decoupled if possible. Accordingly, I have kludged the weak pointer proposal by modifying the definitional WeakPtr code at the top and adding the following text to the paragraph you quote above:

"In order to postpone the issue, the spec implied by the above code should be taken literally: If there is no global binding for setTimeout or if it bound to a non-callable value (as the time WeakPtr is called), then no notifications happen. If the value of the global setTimeout is callable, then the GC calls it at some arbitrary time, passing in a frozen function whose only purpose is to call the registered executor function. If setTimeout has its normal binding (e.g., in the browser), then the executor will only be called later in a separate turn as expected, protecting us from plan interference hazards. A secure runtime in such an environment can always freeze the global setTimeout property, preventing its redefinition to something that could cause plan interference."

I really dislike this definition. This would imply that anyone could overwrite setTimeout and get a completely different behaviour. If overwriting is impossible then it introduces setTimeout into the standard by the backdoor.

I'd prefer an underspecified [[QueueForProcessing]] operation with no connection to the global object and a note to say that in a browser it would be expected to use the same mechanism as a setTimeout with a timeout of zero.

We also may not agree on notification being guaranteed. At the last f2f I mentioned how generators in JS1.7+ have a close method, again after Python, but without the unnecessary GeneratorExit built-in exception object thrown by close at a generator in case it has yielded in a try with a finally. Naively supporting notification guarantees creates trivial denial of service attacks and accidents. Of course, we could say the universe ends without pending notifications being delivered, but in the effecttful browser, the multiverse goes on and there are lots of channels between 'verses ;-).

There are lots of misunderstandings around GC, where people expect this sort of callback to happen at some predictable time. If there's no memory pressure then there's no reason to expect the GC to ever be run even if the program runs for ever. It would be nice to have some indication in the text of the standard that discouraged people from expecting a callback at some predictable time. For example if people want to close file descriptors or collect other resources that are not memory using this mechanism it would be nice to discourage them (because it won't work on a machine with lots of memory and not so many max open fds).

# Brendan Eich (16 years ago)

On Dec 11, 2009, at 12:45 AM, Erik Corry wrote:

I really dislike this definition. This would imply that anyone could overwrite setTimeout and get a completely different behaviour. If overwriting is impossible then it introduces setTimeout into the standard by the backdoor.

I agree -- I should add that my quoting ken's "use brute force"
chestnut was not meant as an endorsement.

The ES spec should not be jury-rigged on top of not only DOM or
browser API standards, especially not by "object detectiong" as if
written in JS code running in the browser. That may be necessary for
Caja or other present-day translators, but the Harmony-era spec can do
better, without overspecifying.

I'd prefer an underspecified [[QueueForProcessing]] operation with no connection to the global object and a note to say that in a browser it would be expected to use the same mechanism as a setTimeout with a timeout of zero.

But then you go on to make an excellent point:

There are lots of misunderstandings around GC, where people expect this sort of callback to happen at some predictable time. If there's no memory pressure then there's no reason to expect the GC to ever be run even if the program runs for ever. It would be nice to have some indication in the text of the standard that discouraged people from expecting a callback at some predictable time. For example if people want to close file descriptors or collect other resources that are not memory using this mechanism it would be nice to discourage them (because it won't work on a machine with lots of memory and not so many max open fds).

It would be more than nice. It is important that the spec not mandate
any particular schedule. We have seen endless over-coupling to GC
implementation details where programmers who can hook into
finalization or another GC phase do so for all the wrong reasons: to
close fds, free database cursors, send a message, update UI, etc.
Crazy stuff.

But if there is no guarantee of when the notification might happen,
then programmers should not expect any scheduling akin to setTimeout
with a timeout of zero.

# Mark S. Miller (16 years ago)

On Fri, Dec 11, 2009 at 12:45 AM, Erik Corry <erik.corry at gmail.com> wrote:

2009/12/11 Mark S. Miller <erights at google.com>:

[...] However, I agree that these proposals should be decoupled if possible. Accordingly, I have kludged [...]

I really dislike this definition. This would imply that anyone could overwrite setTimeout and get a completely different behaviour. If overwriting is impossible then it introduces setTimeout into the standard by the backdoor.

I'd prefer an underspecified [[QueueForProcessing]] operation with no connection to the global object and a note to say that in a browser it would be expected to use the same mechanism as a setTimeout with a timeout of zero.

I agree. Done. To be consistent with the spec style on the rest of that

page -- perhaps a bad idea -- I called your [[QueueForProcessing]] operation POSTPONE. This is a minor issue and I'm not attached to the choice. In any case, the most relevant new text is at < strawman:weak_references#safe_post_mortem_notification>.

Thanks for the suggestion.

There are lots of misunderstandings around GC, where people expect this sort of callback to happen at some predictable time. If there's no memory pressure then there's no reason to expect the GC to ever be run even if the program runs for ever. It would be nice to have some indication in the text of the standard that discouraged people from expecting a callback at some predictable time. For example if people want to close file descriptors or collect other resources that are not memory using this mechanism it would be nice to discourage them (because it won't work on a machine with lots of memory and not so many max open fds).

Does the current text clarify this to your satisfaction?

# P T Withington (16 years ago)

On 2009-12-11, at 12:43, Brendan Eich wrote:

It would be more than nice. It is important that the spec not mandate any particular schedule. We have seen endless over-coupling to GC implementation details where programmers who can hook into finalization or another GC phase do so for all the wrong reasons: to close fds, free database cursors, send a message, update UI, etc. Crazy stuff.

+n

In my experience, it is always a bad idea for the GC to invoke user-code.

Please don't throw out the weak-key tables with the finalization bathwater.

But if there is no guarantee of when the notification might happen, then programmers should not expect any scheduling akin to setTimeout with a timeout of zero.

I initially mis-read this as saying setTimeout with a timeout of 0 might never be scheduled. But that's not what you said.

--

I was amused by this aside from the strawman:

# Mark S. Miller (16 years ago)

On Sat, Dec 12, 2009 at 3:07 PM, P T Withington <ptw at pobox.com> wrote:

On 2009-12-11, at 12:43, Brendan Eich wrote:

It would be more than nice. It is important that the spec not mandate any particular schedule. We have seen endless over-coupling to GC implementation details where programmers who can hook into finalization or another GC phase do so for all the wrong reasons: to close fds, free database cursors, send a message, update UI, etc. Crazy stuff.

+n

In my experience, it is always a bad idea for the GC to invoke user-code.

Always? E uses async post mortem finalization to implement distributed acyclic GC. Without some way for "GC to invoke user code", I don't see how this is possible. Without it, the resulting memory leak is severe enough to discourage many good distributed programming patterns.

# P T Withington (16 years ago)

On 2009-12-12, at 18:14, Mark S. Miller wrote:

In my experience, it is always a bad idea for the GC to invoke user-code.

Always? E uses async post mortem finalization to implement distributed acyclic GC. Without some way for "GC to invoke user code", I don't see how this is possible. Without it, the resulting memory leak is severe enough to discourage many good distributed programming patterns.

I guess it depends on how you define "user code". And I should never say 'always'.

My main point was to not give up weak-key tables just because finalization semantics are messy.