WeakMap not the weak needed for zombie views

# Peter Michaux (11 years ago)

I've been reading about WeakMap in the draft. To my surprise, it is not at all what I thought it would be or what I was hoping to use. At least that is my understanding.

My use case is in MV* architectures. With current MV* frameworks, a model holds strong references to the views observing that model. If a view is removed from the DOM, all other references in the application are lost to that view, but the view never stopped observing the model object, that strong reference from model to view results in a zombie view. Avoiding this means views need to have destroy methods that unsubscribes the view from the model. It is easy for the application programmer to forget to call a view's destroy method and the application leaks memory. As a result of the leak, the user experience and ultimately the reputation of the Web suffers. If a model could hold weak references to its observers, this would safeguard against accidental and inevitable application programmer forgetfulness.

It appears that WeakMap cannot help solve the current MV* zombie view problem. Or did I miss something?

I was expecting WeakMap to hold its values weakly and set them to undefined or delete the associated key when the value was garbage collected.

Does anything exist or is coming to help solve the zombie problem?


Smalltalk Squeak models use a WeakIdentityKeyDictionary which holds its keys weakly. The difference compared with the ECMAScript WeakMap is that instances of WeakIdentityKeyDictionary have an iterator so the observers can be stored as the keys and still discoverable without keeping other strong references. The ECMAScript standard specifically disallows in iterator.

# Till Schneidereit (11 years ago)

There is an ES7 proposal for weak references that would satisfy your requirements. However, at least at Mozilla there is very strong opposition to this from people working on the memory management subsystems (i.e. the GC and CC). It's not clear to me that their arguments have been defeated and I'm not aware of any more recent discussions about this topic than those on Mozilla's platform development mailing list 2, 3.

While I think that weak references are an important feature, I don't think this particular use case is a good argument for them: in my personal experience working with and implementing systems like you describe, weak listeners were eventually deprecated and replaced by forced explicity unsubscription every time. If a view is destroyed, you really don't want it to receive any events anymore, regardless of the GC's timing. Now you could say that in the framework's event dispatching or handling mechanism you can detect this situation. If so, you can also just unsubscribe a strongly-held event listener at that point.

# Katelyn Gadd (11 years ago)

There are some fairly recent es-discuss threads about weak references. I don't know if there is a consensus yet (it is very hard to tell) but some people like Brendan are on record that there are real use cases that require them. I've been pushing hard for them, along with the author of embind. The question is mostly whether solving those problems is worth the cost of exposing GC to content JS (though, if memory serves, there was a claim in one of the discussion threads that you can implement weakrefs without exposing GC - I'm not sure if that was an 'I've figured it out' statement or just a hypothesis).

At present I don't believe WRs will ever make it onto the open web. It seems like there's a huge amount of resistance to the idea that is never going to go away, so any application that needs them is best-served by an emscripten-style heap and manually implemented collector (boehm, etc). You could probably achieve some sort of compromise by writing your own user-space collector that walks the reachable JS heap if you manage to root your JS objects correctly, but I suspect that would only be viable as a codegen strategy for a compiler targeting JS, not something you'd do by hand when writing JS.

For some WR use cases you can probably do manual refcounting yourself, as long as you do it right - you'd want to replace the actual object references with a lightweight 'handle' object that forwards onto the real instance via a handle lookup, so that you can 'collect' the actual instance without requiring the handles to die.

# Mark S. Miller (11 years ago)

On Sun, Jul 6, 2014 at 7:47 AM, Katelyn Gadd <kg at luminance.org> wrote:

There are some fairly recent es-discuss threads about weak references. I don't know if there is a consensus yet (it is very hard to tell) but some people like Brendan are on record that there are real use cases that require them. I've been pushing hard for them, along with the author of embind. The question is mostly whether solving those problems is worth the cost of exposing GC to content JS (though, if memory serves, there was a claim in one of the discussion threads that you can implement weakrefs without exposing GC - I'm not sure if that was an 'I've figured it out' statement or just a hypothesis).

I would be very curious. It seems impossible by definition. Could you (or anyone) please try to find this? Thanks.

What I have claimed is that we can isolate the communications channel that this provides in ways that make it a reasonable (IMO) security risk. Perhaps this is what you are thinking of?

# Filip Pizlo (11 years ago)

I've read this exchange and might be missing context. I'm intrigued by it and want to know more.

Is the main opposition to weak references just the security implications of information revealed by GC? Has anyone quantified how much information is leaked, or proved that this information cannot be obtained through already exposed APIs or language features? I presume it has something to do with detecting if anyone else has a reference to an object.

# Russell Leggett (11 years ago)

Sorry to take this on a tangent from the topic of WeakRefs, but the way I've solved the OP's problem in my own code is by tying anything that needs cleanup to element ids. Any time I need to update the HTML, I go through a central method that crawl that part of the dom and purges it using the ids as keys in maps of bindings/widgets. Has worked very well for me.

# Till Schneidereit (11 years ago)

On Sun, Jul 6, 2014 at 7:45 PM, Filip Pizlo <fpizlo at apple.com> wrote:

Is the main opposition to weak references just the security implications of information revealed by GC? Has anyone quantified how much information is leaked, or proved that this information cannot be obtained through already exposed APIs or language features? I presume it has something to do with detecting if anyone else has a reference to an object.

Security is one concern, but I think that Mark's proposal covers this with the "only collect weakrefs between turns" semantics.

I CC'd a few people who voiced strong opposition on our dev mailing list. Posts containing arguments for their position are:

And an argument for alternative solutions to common weakref use cases:

There's a lot more in that thread, but I think this roughly covers the main arguments against weakrefs.

# Filip Pizlo (11 years ago)

Thanks for gathering those links. I rather like Mark's proposal. Does anyone believe that there are security holes if we do the between-turns semantics?

My reading of the linked Mozilla discussions seems to be that some GC implementors think it's hard to get the feature right and that it pessimises the theoretical performance of some algorithms. Jvms have had this feature for a long time and at least one JS engine (JSC) has had it internally for years, and combining weak refs with all manner of exotic GCs is very well understood in the art.

# Boris Zbarsky (11 years ago)

On 7/6/14, 4:11 PM, Filip Pizlo wrote:

My reading of the linked Mozilla discussions seems to be that some GC implementors think it's hard to get the feature right

I'm not sure how you can possibly read groups.google.com/forum/#!msg/mozilla.dev.tech.js-engine.internals/V__5zqll3zc/hLJiNqd8Xq8J that way. That post isn't even from a GC implementor and says nothing about implementation issues!

I think that post presents the strongest argument I know against the "use GC to reclaim your non-memory resources" argument, and the summary is that while that approach looks promising at first glance in practice it leads to resources not being reclaimed when they should be because the GC is not aiming for whatever sort of resource management those particular resources want.

# Jussi Kalliokoski (11 years ago)

To first address the particular case of using weak maps for custom event listeners via iteration:

I think the only relatively sane approach to iterating a WeakMap would be to force GC whenever the WeakMap is being iterated. This would make sure that you couldn't get references to items that are about to be garbage-collected (and thus don't also introduce non-deterministic errors and memory leaks for event listeners firing on disposed views). However, this would make iterating a WeakMap potentially unbearably slow and thus not worth using for this case. The performance hit may be tuned down by traversing the reference tree only from the items contained in the WeakMap, but I'm not sure if that's feasible and it would probably also make the performance worse if the WeakMap is large enough and has a lot of resources that are alive. Another drawback is that this would potentially lead to abuse where for example all views would be stored in a WeakMap and then the WeakMap would be iterated through just to force GC on the views.

On the discussion thread linked, it's also discussed that weakrefs would be used for DOM event listeners, but I'm not exactly sure if that's a very workable solution either. You'll basically get a weak reference locally, but the DOM event listener will still hold a strong reference to the function. You could of course add a weak addEventListener variant, but soon you'd notice that you also need a weak setTimeout, setInterval, requestAnimationFrame, Object.observe and maybe even weak promises. :/

All in all, I'm doubtful that weak references can solve the use cases presented very well. They would basically encourage people to start building frameworks that use weakrefs instead of lifecycle hooks only to notice that there's some part of the platform where they need manual reference clearing anyway. The solution, I think, is to just use frameworks and libraries like angular and react that provide these lifecycle hooks and take care that these hooks are triggered for you, instead of having to manually call a destroy method.

# Katelyn Gadd (11 years ago)

Jussi, one thing about your (totally correct) statements here is that you're addressing this from the perspective of 'I want to observe GC reliably from user code'. But that's not really what is desired in most cases.

For example, forcing a GC whenever iterating the weakmap would ensure you don't get a reference to an 'effectively dead' object, but nobody is likely to want that in most cases. The point isn't that you don't process objects that are about to die; the point is that weakrefs ensure that the GC can collect these object graphs that otherwise form uncollectable cycles.

Once the GC can collect them, the additional layer you want on top is that collections like a weakmap don't expose dead - already collected - objects to the user. It's fine if iteration yields an object that is about to be collected; in fact, it is probably good if it does. Anyone making use of weak references should, as a cost of entry, expect nondeterminism. The point is specifically that weakrefs ensure the GC can collect your object graphs, and they allow you to respond correctly once a graph is collected.

This is also why the 'use lifecycle hooks and/or manual reference clearing instead' solution isn't an alternative to weakrefs. It solves some use cases that you might otherwise solve with weakrefs, but it does so at the cost of considerable manual effort (and bugs/leaks when manual lifetime management is done incorrectly). For the use cases that can't be reliably solved by manual lifetime management, you still need weakrefs.

Similarly, it's important to realize that while some use cases for weakrefs are about managing native resources or doing other 'automatic cleanup' behaviors, many use cases are simply about ensuring that the GC can free up large graphs of dead objects as soon as memory pressure strikes instead of waiting until the (likely fragile or slow) user-space collector gets around to running and collecting user-space objects. Memory pressure is something the browser and JS VM have knowledge of that userspace doesn't know about - if a graph is effectively dead but can't easily have its references cleaned up automatically (as can happen in complex object layouts, where you would normally use refcounting or some other mechanism), it's possible it could remain 'alive' for a long period of time without weakrefs, eating up valuable heap, moving between GC generations, and slowing GCs.

WRs also enable safe interaction with third-party JS that isn't generally possible otherwise. This occurred to me the other day after I suggested the idea of a user-space JS collector that walks the visible JS heap from roots - you can't walk closures, so any JS object held in a closure would escape the sight of your collector (there are other problems, but this is the most obvious one).

Closures are used heavily in modern JS, and have the ability to retain references to a JS object. It becomes non-trivial to figure out the lifetime of a given closure and know when you need to manually release any resources it relies on, whether a graphics context or a big buffer in an asm.js heap. For a simple use case like a setInterval handler, you can manually clean up when removing the setInterval handler, but what if you have 3 different event listeners that all hold a reference to that resource in their closure? How do you clean those up at the appropriate time? The only vaguely reliable answer here is 'every consumer of my library has to painstakingly increment/decrement reference counts any time they retain a reference to my objects', which is not just tedious but extremely easy to mess up. This is further complicated by the fact that currently V8 and SpiderMonkey closures have the ability to capture references to values that are never actually used within the function, so JS that seems like it shouldn't retain an object actually retains it.

As before, manual lifetime management - where possible - is king, but there are far too many scenarios where it's either near-impossible or far too difficult to make it your only option. A combination of manual lifetime management + weakrefs for corner cases is the ideal approach here (and is in fact the approach used in some desktop scenarios), in my opinion. If we want to have robust, widespread manual lifetime management, people will either need to adopt non-JS languages that compile to JS (ensuring that all the elaborate lifetime management rules are followed), or JS needs to expose construct(s) to simplify lifetime management (C#-style using, python-style 'with resource' blocks, C++ scoped RAII). Even then, doing it in user space still requires all the JS running in your application to conform to these rules - once you pull in third party code, or run user scripts, your lifetime management is vulnerable to leaks if that outsider doesn't carefully follow the rules.

# Till Schneidereit (11 years ago)

I largely agree with your arguments, but one point is actually more of a counter argument to having weakrefs:

On Mon, Jul 7, 2014 at 10:41 AM, Katelyn Gadd <kg at luminance.org> wrote:

Similarly, it's important to realize that while some use cases for weakrefs are about managing native resources or doing other 'automatic cleanup' behaviors, many use cases are simply about ensuring that the GC can free up large graphs of dead objects as soon as memory pressure strikes instead of waiting until the (likely fragile or slow) user-space collector gets around to running and collecting user-space objects. Memory pressure is something the browser and JS VM have knowledge of that userspace doesn't know about - if a graph is effectively dead but can't easily have its references cleaned up automatically (as can happen in complex object layouts, where you would normally use refcounting or some other mechanism), it's possible it could remain 'alive' for a long period of time without weakrefs, eating up valuable heap, moving between GC generations, and slowing GCs.

While this is true, I think that, as others have argued in the discussion thread I linked to and elsewhere, weakrefs are a bad solution for this. The GC cannot distinguish between different types of resources and their freshness, so they'll just blow away everything they can. In most real-world cases, you'd want to take into account both how frequently and how recently a resource is/was accessed. And, of equal importance, how expensive it is to re-create. You can easily have a large, easily re-created buffer that you'd want to dump at the slightest hint of memory pressure (maybe even to prevent costly GCs from running?), while at the same time you have small-ish objects that are expensive to re-create, so you'd only do so under serious memory pressure.

All in all, I think the platform should expose tiered memory pressure notifications, regardless of whether weakrefs are introduced for other reasons.

# Katelyn Gadd (11 years ago)

On Mon, Jul 7, 2014 at 2:05 AM, Till Schneidereit <till at tillschneidereit.net> wrote:

While this is true, I think that, as others have argued in the discussion thread I linked to and elsewhere, weakrefs are a bad solution for this. The GC cannot distinguish between different types of resources and their freshness, so they'll just blow away everything they can. In most real-world cases, you'd want to take into account both how frequently and how recently a resource is/was accessed. And, of equal importance, how expensive it is to re-create. You can easily have a large, easily re-created buffer that you'd want to dump at the slightest hint of memory pressure (maybe even to prevent costly GCs from running?), while at the same time you have small-ish objects that are expensive to re-create, so you'd only do so under serious memory pressure.

All in all, I think the platform should expose tiered memory pressure notifications, regardless of whether weakrefs are introduced for other reasons.

Maybe I misunderstand, but you seem to be talking about caching? I'm talking about scenarios where the userspace code can't trivially verify whether an object is dead, and is okay with waiting until the next time the GC collects. Resource freshness and resource type don't matter in this case. The object just needs to be dead. I absolutely agree that weakrefs are not a solution for caching or pooling. My comments are in reference to scenarios where it is non-trivial to identify a point where your object graph is dead so you can go in and break references. In those scenarios you might use something like refcounting in order to ensure that no one component has to be responsible for deciding 'okay, it's dead now', and then you are subject to the types of leaks that occur when using refcounts as your lifetime management strategy (especially since you don't have WRs, which would otherwise mitigate the risk of leaks caused by refcounting+cycles.)

Memory pressure notifications are a neat idea but seem like they expose their own GC visibility and fingerprinting concerns. They would at least provide a good opportunity to trigger your own user-space garbage collections, as long as they can occur during an event loop turn instead of having to wait until the next one. If you can't get a pressure notification while a turn is going (as a result of your allocations, etc), that would hurt pressure notifications' viability as anything other than a way to respond to memory usage changes in other tabs/applications.

# Till Schneidereit (11 years ago)

On Mon, Jul 7, 2014 at 11:21 AM, Katelyn Gadd <kg at luminance.org> wrote:

Maybe I misunderstand, but you seem to be talking about caching? I'm talking about scenarios where the userspace code can't trivially verify whether an object is dead, and is okay with waiting until the next time the GC collects.

Ah, no, you didn't - I misunderstood your argument and did indeed think it was about caching. I'm still hesitant about this particular argument because it seems like your framework would still have issues with delayed cleanup if it relied on GC to do that. I know, however, that in practice it's Hard to ensure that all references in a complex system are properly managed (especially in scenarios involving third-party code as you describe), so I also don't think this can be outright dismissed.

Memory pressure notifications are a neat idea but seem like they expose their own GC visibility and fingerprinting concerns. They would at least provide a good opportunity to trigger your own user-space garbage collections, as long as they can occur during an event loop turn instead of having to wait until the next one. If you can't get a pressure notification while a turn is going (as a result of your allocations, etc), that would hurt pressure notifications' viability as anything other than a way to respond to memory usage changes in other tabs/applications.

That's a good point, and I'm pretty sure that notifications happening during a turn (job) just won't happen. You're right about the security and privacy concerns, I think. More fundamentally, they'd violate run-to-completion semantics, so I don't see how they'd even work. I think for the same reason in-turn collection of weakrefs would be impossible at least if their post-mortem finalizers are also supposed to be run in-turn.

# Jussi Kalliokoski (11 years ago)

On Mon, Jul 7, 2014 at 12:44 PM, Till Schneidereit < till at tillschneidereit.net> wrote:

Ah, no, you didn't - I misunderstood your argument and did indeed think it was about caching. I'm still hesitant about this particular argument because it seems like your framework would still have issues with delayed cleanup if it relied on GC to do that. I know, however, that in practice it's Hard to ensure that all references in a complex system are properly managed (especially in scenarios involving third-party code as you describe), so I also don't think this can be outright dismissed.

True. However, I think that the non-determinism will not help the situation of a complex system as it can introduce more leaks. For example, the custom event handler scenario can trigger handlers that would otherwise be dead, and those handlers might cause other things to become active again, so it requires even deeper a level of understanding of the system to reason with this than with manual cleanup. I find this a similar issue to null pointer exceptions caused when somebody else cleans up your stuff but forgets to tell you, so the way I see it it's just replacing one class of problems with another.

Still, I also acknowledge that weak references have their place in making reasoning about systems easier. For example WeakMap already solves a lot of the problems that are caused by not knowing the lifecycle of a (possibly 3rd party) closure. For example, if the closure holds some state that's associated with an object provided as an input to the closure, it can use the object as the key and then GC can just do it's job as the closure holds no strong references to its inputs or outputs. I'm just not very convinced that adding any features that make GC observable solve any problems big enough to justify the problems caused.

# Allen Wirfs-Brock (11 years ago)

On Jul 6, 2014, at 6:49 PM, Boris Zbarsky wrote:

On 7/6/14, 4:11 PM, Filip Pizlo wrote:

My reading of the linked Mozilla discussions seems to be that some GC implementors think it's hard to get the feature right

I'm not sure how you can possibly read groups.google.com/forum/#!msg/mozilla.dev.tech.js-engine.internals/V__5zqll3zc/hLJiNqd8Xq8J that way. That post isn't even from a GC implementor and says nothing about implementation issues!

Well, I might disagree with part of this characterization ;-) www.wirfs-brock.com/allen/things/smalltalk-things/tektronix-smalltalk-document-archi BTW, George Bosworth, who I mentioned in that message is the person who originally came up with the Ephemeron idea.

I think that post presents the strongest argument I know against the "use GC to reclaim your non-memory resources" argument, and the summary is that while that approach looks promising at first glance in practice it leads to resources not being reclaimed when they should be because the GC is not aiming for whatever sort of resource management those particular resources want.

My position reflects the experience of not only implementing high perf GCs that include support for various kinds of weak references but also of observing and supporting the uses of a full stack commercial application development environment that exposed them. They just aren't the secret source that people expect them to be and this leads to obscure application level bugs and memory leaks which often go completely unnoticed.

# Boris Zbarsky (11 years ago)

On 7/9/14, 12:21 PM, Allen Wirfs-Brock wrote:

Well, I might disagree with part of this characterization ;-)

I mean "current JS engine GC implementor", sorry. The important part is that your post is not about implementation difficulties in current JS GC implementations but about something entirely different.