Observable GC

# Michał Wadas (7 years ago)

Is there any comprehensive answer about why ability to observe garbage collection is considered undesirable in JavaScript?

Michał Wadas

# Mike Samuel (7 years ago)

IIRC, Most sources of non-determinism are obvious:

  • Builtins like Date and Random
  • Host environment artifacts like network message receipt order and a piece of code that doesn't rely on one of those is deterministic. There are a few sources of non-determinism that are hard to statically rule out:
  • Catchable exceptions due to lack of memory or stack overflow but you can still reason about fail-stop deterministic code.

If GC is observable then the set of statements that are obviously deterministic by either metric is much smaller.

Sources of non-determinism are also sources of hard-to-diagnose heisenbugs. I don't know too much about how carving out a deterministic subset of the language might be handy otherwise, but non-determinism and side-channels often go hand in hand.

# Domenic Denicola (7 years ago)

w3ctag.github.io/design-principles/#js-gc

From: es-discuss [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Michal Wadas Sent: Friday, October 20, 2017 08:07 To: es-discuss at mozilla.org Subject: Observable GC

Hi. Is there any comprehensive answer about why ability to observe garbage collection is considered undesirable in JavaScript? Michał Wadas

# Filip Pizlo (7 years ago)

For what it’s worth, I have never agreed with this policy. This policy seems to be based on feelings not facts.

I remember implementing real time GCs for Java, which changed GC timing and behavior a lot, and having zero problem getting that aspect of the GC to work well with existing code. It seems like we are using non-problems to make excuses to avoid supporting something useful.

In fact, WeakMap is more restrictive constraint on GC algo than weak refs or finalization or whatever, since it means that a Siebert-style fine-grained incremental GC with O(1) increments is off the table.

# Mike Samuel (7 years ago)

On Fri, Oct 20, 2017 at 10:33 AM, Filip Pizlo <fpizlo at apple.com> wrote:

For what it’s worth, I have never agreed with this policy. This policy seems to be based on feelings not facts.

I remember implementing real time GCs for Java, which changed GC timing and behavior a lot, and having zero problem getting that aspect of the GC to work well with existing code. It seems like we are using non-problems to make excuses to avoid supporting something useful.

In fact, WeakMap is more restrictive constraint on GC algo than weak refs or finalization or whatever, since it means that a Siebert-style fine-grained incremental GC with O(1) increments is off the table.

I'm not familiar with Siebert GCs so I apologize if this is beside your point. My recollection of those discussions was that we rejected weak refs in favor of ephemerons because weak references are still prone to uncollectible cycles that involve a weakly referenced object being used as both a key and a value.

# Filip Pizlo (7 years ago)

On Oct 20, 2017, at 7:45 AM, Mike Samuel <mikesamuel at gmail.com> wrote:

On Fri, Oct 20, 2017 at 10:33 AM, Filip Pizlo <fpizlo at apple.com> wrote: For what it’s worth, I have never agreed with this policy. This policy seems to be based on feelings not facts.

I remember implementing real time GCs for Java, which changed GC timing and behavior a lot, and having zero problem getting that aspect of the GC to work well with existing code. It seems like we are using non-problems to make excuses to avoid supporting something useful.

In fact, WeakMap is more restrictive constraint on GC algo than weak refs or finalization or whatever, since it means that a Siebert-style fine-grained incremental GC with O(1) increments is off the table.

I'm not familiar with Siebert GCs so I apologize if this is beside your point. My recollection of those discussions was that we rejected weak refs in favor of ephemerons because weak references are still prone to uncollectible cycles that involve a weakly referenced object being used as both a key and a value.

It’s better to have both. Some use cases are not covered by WeakMap, as evidenced by the fact that people still ask for weak refs or a gc trigger or notification.

# Dean Tribble (7 years ago)

I do think that we need weak references for all the reasons given in the proposal. But indeed non-determinism is a concern. The reference Dominic pointed at is one of two primary (areas of) considerations. The other is just how many amazing things you can do if turns are deterministic (more reliable testing, replay debugging, checkpoint/retry in the event of component failure, simpler correctness analysis on code, etc.).

Exposing non-determinism only at turn boundaries and controlling access to the ability to observe GC both help some with the first motivation above (and a lot with the second). However, not enough. I'm hoping to make another run at weakrefs in November with some changes to help concern #1 further.

# Mark Miller (7 years ago)

There is a glaring inconsistency in the criteria we use to evaluate these issues. While we are understandably reluctant to admit more non-determinism into the platform via weakrefs, we have admitted an astonishingly greater degree of non-determinism into the platform via "Shared Array Buffers" (SAB), i.e., shared memory multithreading with data races.

The scenario we legitimately fear for weakrefs: A developer writes code that is not correct according to the weakref specification but happens to work on present implementations. Perhaps it relies on something being collected that is not guaranteed to be collected. Perhaps it relies on something not being collected that is not guaranteed not to be collected. A later correct change to an implementation, or another correct implementation, causes that code to break. The game theory punishes the correct implementation rather than the incorrect code.

The parallel scenario for data races is clear. And on some measures, the magnitude of the problem is orders of magnitude larger for SAB because of

  • the fine granularity of non-deterministic interleaving (especially if observable gc interleaving happens only at turn boundaries),
  • the greater difficultly of reasoning about memory models vs observable gc (again, especially if observable gc interleaving happens only at turn boundaries).

At least we all already understand that the side channel information-leakage opened up by SAB is orders of magnitude larger than that opened up by weakrefs, and so this is not generally raised as an additional argument against weakrefs for a platform that has already admitted SAB.

I will start a separate thread on making the computation within an individual turn more deterministic. SAB aside, the threats to intra-turn determinism can and should be reduced. All the arguments against non-determinism in general should be at least as strong against intra-turn non-determinism specifically.

# Filip Pizlo (7 years ago)

On Oct 20, 2017, at 10:29 AM, Mark Miller <erights at gmail.com> wrote:

There is a glaring inconsistency in the criteria we use to evaluate these issues. While we are understandably reluctant to admit more non-determinism into the platform via weakrefs, we have admitted an astonishingly greater degree of non-determinism into the platform via "Shared Array Buffers" (SAB), i.e., shared memory multithreading with data races.

The scenario we legitimately fear for weakrefs: A developer writes code that is not correct according to the weakref specification but happens to work on present implementations. Perhaps it relies on something being collected that is not guaranteed to be collected.

Being collected when it shouldn’t have been? Like a dangling reference. The game theory of security exploits forces implementations to keep things alive longer, not shorter.

Perhaps it relies on something not being collected that is not guaranteed not to be collected. A later correct change to an implementation, or another correct implementation, causes that code to break. The game theory punishes the correct implementation rather than the incorrect code.

Java had weak refs and multiple different implementations. My claim, as someone who implemented lots of weird GC algorithms in Java, is that I don’t know of a case where different weak ref semantics breaks something. The only time that getting it wrong ever had an observably bad effect is when you break weak refs or finalizers so badly that they never get cleared or called, and then some resource has an unbounded leak. This usually results in not being able to run any benchmarks that have weak references or finalizers, so you fix those bugs pretty quickly.

Here are the motivations:

  • Competitive perf motivates GC authors to try to free things as soon as possible. Smaller heaps mean more speed. Some benchmarks won’t run to completion if you aren’t aggressive enough.
  • The need to avoid dangling references forces us to keep alive at least what we need to, and sometimes a bit more.

I guess a program could rely on the weak references actually being strong in some implementation. I haven’t heard of Java programs ever doing that. It’s unlikely to happen because the major implementations will try to clear weak refs as aggressively as they can to compete on benchmarks.

Perhaps part of the problem here is that we’re assuming that GC is already not observable. It’s totally observable. We have to fix bugs sometimes because the GC didn’t free something it should have, and the memory footprint of an app balloons so much that it crashes. Also, people obsess over GC perf and GC pauses exactly because they observe them. Considering all of the existing ways that GCs are observable, I don’t think that weak refs change the game.

I think there are features of the web platform that are significantly more likely to create incompatibilities than this issue, and you seem to agree:

The parallel scenario for data races is clear. And on some measures, the magnitude of the problem is orders of magnitude larger for SAB because of

  • the fine granularity of non-deterministic interleaving (especially if observable gc interleaving happens only at turn boundaries),
  • the greater difficultly of reasoning about memory models vs observable gc (again, especially if observable gc interleaving happens only at turn boundaries).

At least we all already understand that the side channel information-leakage opened up by SAB is orders of magnitude larger than that opened up by weakrefs, and so this is not generally raised as an additional argument against weakrefs for a platform that has already admitted SAB.

I will start a separate thread on making the computation within an individual turn more deterministic. SAB aside, the threats to intra-turn determinism can and should be reduced. All the arguments against non-determinism in general should be at least as strong against intra-turn non-determinism specifically.

Can you clarify what you think this means for weak refs?

# Mark S. Miller (7 years ago)

On Fri, Oct 20, 2017 at 10:52 AM, Filip Pizlo <fpizlo at apple.com> wrote:

On Oct 20, 2017, at 10:29 AM, Mark Miller <erights at gmail.com> wrote:

There is a glaring inconsistency in the criteria we use to evaluate these issues. While we are understandably reluctant to admit more non-determinism into the platform via weakrefs, we have admitted an astonishingly greater degree of non-determinism into the platform via "Shared Array Buffers" (SAB), i.e., shared memory multithreading with data races.

The scenario we legitimately fear for weakrefs: A developer writes code that is not correct according to the weakref specification but happens to work on present implementations. Perhaps it relies on something being collected that is not guaranteed to be collected.

Being collected when it shouldn’t have been? Like a dangling reference. The game theory of security exploits forces implementations to keep things alive longer, not shorter.

Not what I meant. Being collected when it should not be would clearly violate the spec. Rather, the weakref spec never guarantees that anything is collected. Any program whose correct behavior ever relies on anything being observably collected is an incorrect program. When run on a different correct implementation --- one on which those objects never get observably collected --- it will misbehave.

Perhaps it relies on something not being collected that is not guaranteed not to be collected. A later correct change to an implementation, or another correct implementation, causes that code to break. The game theory punishes the correct implementation rather than the incorrect code.

Java had weak refs and multiple different implementations. My claim, as someone who implemented lots of weird GC algorithms in Java, is that I don’t know of a case where different weak ref semantics breaks something. The only time that getting it wrong ever had an observably bad effect is when you break weak refs or finalizers so badly that they never get cleared or called, and then some resource has an unbounded leak. This usually results in not being able to run any benchmarks that have weak references or finalizers, so you fix those bugs pretty quickly.

Here are the motivations:

  • Competitive perf motivates GC authors to try to free things as soon as possible. Smaller heaps mean more speed. Some benchmarks won’t run to completion if you aren’t aggressive enough.

This causes only a tendency to collect more as a quality-of-implementation issue. And even for that, it is at best a statistical issue. There is never a guarantee that anything in particular will get collected.

  • The need to avoid dangling references forces us to keep alive at least

what we need to, and sometimes a bit more.

"Need to" in the sense of strongly referenced, absolutely. Any implementation that fails to do so seriously violates the spec and more. There is no controversy on this point.

The concern is "need to" in the sense that a given program points at something only weakly at t1, fetches is at t2 and finds it still there and comes to count on that behavior. A different correct implementation might cause this same program to find that the object has been collected before t2, causing this buggy program to break.

I guess a program could rely on the weak references actually being strong in some implementation. I haven’t heard of Java programs ever doing that. It’s unlikely to happen because the major implementations will try to clear weak refs as aggressively as they can to compete on benchmarks.

Perhaps part of the problem here is that we’re assuming that GC is already not observable. It’s totally observable. We have to fix bugs sometimes because the GC didn’t free something it should have, and the memory footprint of an app balloons so much that it crashes. Also, people obsess over GC perf and GC pauses exactly because they observe them. Considering all of the existing ways that GCs are observable, I don’t think that weak refs change the game.

I can only see this argument becoming solid if we were willing to consider obligating the platform to do accurate gc, rather than conservative gc. 1) no one is willing to advocate that, including me. 2) Even if we were, the timing would still cause all the same issues.

I think there are features of the web platform that are significantly more likely to create incompatibilities than this issue, and you seem to agree:

The parallel scenario for data races is clear. And on some measures, the magnitude of the problem is orders of magnitude larger for SAB because of

  • the fine granularity of non-deterministic interleaving (especially if observable gc interleaving happens only at turn boundaries),
  • the greater difficultly of reasoning about memory models vs observable gc (again, especially if observable gc interleaving happens only at turn boundaries).

At least we all already understand that the side channel information-leakage opened up by SAB is orders of magnitude larger than that opened up by weakrefs, and so this is not generally raised as an additional argument against weakrefs for a platform that has already admitted SAB.

Yes, I do agree with that. I am making a comparative argument, not an

argument about absolutes.

I will start a separate thread on making the computation within an individual turn more deterministic. SAB aside, the threats to intra-turn determinism can and should be reduced. All the arguments against non-determinism in general should be at least as strong against intra-turn non-determinism specifically.

Can you clarify what you think this means for weak refs?

I will in that other thread ;)

# Steve Fink (7 years ago)

On 10/20/17 10:52 AM, Filip Pizlo wrote:

On Oct 20, 2017, at 10:29 AM, Mark Miller <erights at gmail.com <mailto:erights at gmail.com>> wrote:

There is a glaring inconsistency in the criteria we use to evaluate these issues. While we are understandably reluctant to admit more non-determinism into the platform via weakrefs, we have admitted an astonishingly greater degree of non-determinism into the platform via "Shared Array Buffers" (SAB), i.e., shared memory multithreading with data races.

The scenario we legitimately fear for weakrefs: A developer writes code that is not correct according to the weakref specification but happens to work on present implementations. Perhaps it relies on something being collected that is not guaranteed to be collected.

Being collected when it shouldn’t have been?  Like a dangling reference.  The game theory of security exploits forces implementations to keep things alive longer, not shorter.

Perhaps it relies on something not being collected that is not guaranteed not to be collected. A later correct change to an implementation, or another correct implementation, causes that code to break. The game theory punishes the correct implementation rather than the incorrect code.

Java had weak refs and multiple different implementations.  My claim, as someone who implemented lots of weird GC algorithms in Java, is that I don’t know of a case where different weak ref semantics breaks something.  The only time that getting it wrong ever had an observably bad effect is when you break weak refs or finalizers so badly that they never get cleared or called, and then some resource has an unbounded leak.  This usually results in not being able to run any benchmarks that have weak references or finalizers, so you fix those bugs pretty quickly.

Here are the motivations:

  • Competitive perf motivates GC authors to try to free things as soon as possible.  Smaller heaps mean more speed.  Some benchmarks won’t run to completion if you aren’t aggressive enough.

I don't follow this. My GC optimization work usually pushes in the opposite direction -- scanning less, not more (but hopefully not collecting much less). We [spidermonkey] partition the heap in all kinds of ways so we don't have to collect the whole thing all the time. It's partitioned into processes, the processes have thread-local heaps, and the thread-local heaps are partitioned into independently-collectable zones specific to various purposes (in the web browser, they're for tabs, iframes, and some internal things.) It doesn't seem unlikely to have a weakref in a lightly-used zone pointing into a more active zone. So yes, we'd aggressively collect the pointee zone to keep the heap small, but scanning the pointer zones is a waste of time. And we're always looking for more ways to subdivide the heap, given that the overhead of GC is mostly determined by the amount of live stuff you have to scan.

Generational GC similarly partitions the heap, for the same reason. If nothing is surviving minor GCs, you won't bother doing any of the major GCs that would collect the weakref pointees. I have considered (and I believe other engines have implemented) having more generations, by splitting off very long-lived (or alternatively, observed to be read-only) portions of the tenured heap and not scanning those during most major GCs. (I haven't looked enough to decide whether the extra cost and complexity of the write barriers is worth it for us at this point.)

That said, we could treat a weakref as a cue to collect the source and destination zones together. Which would mean weakrefs would be something of a "go slow" bit, but it might help contain the damage.

  • The need to avoid dangling references forces us to keep alive at least what we need to, and sometimes a bit more.

I guess a program could rely on the weak references actually being strong in some implementation.  I haven’t heard of Java programs ever doing that.  It’s unlikely to happen because the major implementations will try to clear weak refs as aggressively as they can to compete on benchmarks.

GC-related competition on benchmarks gets really silly without anyone even intentionally gaming things. I remember making a minor improvement and seeing a benchmark score absolutely plummet. I tracked it down to the benchmark having a "warmup" phase to each subtest (which is important for eliminating variance that prevents detecting small changes, so it's not a wholly bad thing). The minor change shifted several GCs from the unmeasured warmup phase into the measured phase. At which point I realized that much of our parameter tuning, in particular, had been having the effect of hiding GCs under the covers, not necessarily speeding anything up. If a benchmark score started depending on the promptness of weakref cleanup, then you're right, we'll probably end up messing up our heap partitioning to satisfy whatever arbitrary object graph it's testing, at a cost to real-world performance.

We have multiple benchmarks and treat in-the-field telemetry as the gold standard, so the worst of the effects get caught eventually. The length of the feedback loop matters, though.

# Filip Pizlo (7 years ago)

On Oct 26, 2017, at 8:44 AM, Steve Fink <sphink at gmail.com> wrote:

On 10/20/17 10:52 AM, Filip Pizlo wrote:

On Oct 20, 2017, at 10:29 AM, Mark Miller <erights at gmail.com> wrote:

There is a glaring inconsistency in the criteria we use to evaluate these issues. While we are understandably reluctant to admit more non-determinism into the platform via weakrefs, we have admitted an astonishingly greater degree of non-determinism into the platform via "Shared Array Buffers" (SAB), i.e., shared memory multithreading with data races.

The scenario we legitimately fear for weakrefs: A developer writes code that is not correct according to the weakref specification but happens to work on present implementations. Perhaps it relies on something being collected that is not guaranteed to be collected.

Being collected when it shouldn’t have been? Like a dangling reference. The game theory of security exploits forces implementations to keep things alive longer, not shorter.

Perhaps it relies on something not being collected that is not guaranteed not to be collected. A later correct change to an implementation, or another correct implementation, causes that code to break. The game theory punishes the correct implementation rather than the incorrect code.

Java had weak refs and multiple different implementations. My claim, as someone who implemented lots of weird GC algorithms in Java, is that I don’t know of a case where different weak ref semantics breaks something. The only time that getting it wrong ever had an observably bad effect is when you break weak refs or finalizers so badly that they never get cleared or called, and then some resource has an unbounded leak. This usually results in not being able to run any benchmarks that have weak references or finalizers, so you fix those bugs pretty quickly.

Here are the motivations:

  • Competitive perf motivates GC authors to try to free things as soon as possible. Smaller heaps mean more speed. Some benchmarks won’t run to completion if you aren’t aggressive enough. I don't follow this. My GC optimization work usually pushes in the opposite direction -- scanning less, not more (but hopefully not collecting much less). We [spidermonkey] partition the heap in all kinds of ways so we don't have to collect the whole thing all the time. It's partitioned into processes, the processes have thread-local heaps, and the thread-local heaps are partitioned into independently-collectable zones specific to various purposes (in the web browser, they're for tabs, iframes, and some internal things.) It doesn't seem unlikely to have a weakref in a lightly-used zone pointing into a more active zone. So yes, we'd aggressively collect the pointee zone to keep the heap small, but scanning the pointer zones is a waste of time. And we're always looking for more ways to subdivide the heap, given that the overhead of GC is mostly determined by the amount of live stuff you have to scan.

Generational GC similarly partitions the heap, for the same reason. If nothing is surviving minor GCs, you won't bother doing any of the major GCs that would collect the weakref pointees. I have considered (and I believe other engines have implemented) having more generations, by splitting off very long-lived (or alternatively, observed to be read-only) portions of the tenured heap and not scanning those during most major GCs. (I haven't looked enough to decide whether the extra cost and complexity of the write barriers is worth it for us at this point.)

I think you missed my point. I never said that GCs should scan more things.

That said, we could treat a weakref as a cue to collect the source and destination zones together. Which would mean weakrefs would be something of a "go slow" bit, but it might help contain the damage.

  • The need to avoid dangling references forces us to keep alive at least what we need to, and sometimes a bit more.

I guess a program could rely on the weak references actually being strong in some implementation. I haven’t heard of Java programs ever doing that. It’s unlikely to happen because the major implementations will try to clear weak refs as aggressively as they can to compete on benchmarks. GC-related competition on benchmarks gets really silly without anyone even intentionally gaming things. I remember making a minor improvement and seeing a benchmark score absolutely plummet. I tracked it down to the benchmark having a "warmup" phase to each subtest (which is important for eliminating variance that prevents detecting small changes, so it's not a wholly bad thing). The minor change shifted several GCs from the unmeasured warmup phase into the measured phase. At which point I realized that much of our parameter tuning, in particular, had been having the effect of hiding GCs under the covers, not necessarily speeding anything up. If a benchmark score started depending on the promptness of weakref cleanup, then you're right, we'll probably end up messing up our heap partitioning to satisfy whatever arbitrary object graph it's testing, at a cost to real-world performance.

Using benchmarks effectively is a skill.