memory safety and weak references

# David Herman (11 years ago)

Patrick Walton send me this link to a fascinating approach to exploiting weak references in engines using conservative stack scanning to discover the address of objects:

https://github.com/justdionysus/gcwoah

I don't fully grok all the details, but IIUC the attacker sprays the heap with objects that it holds weak references to, synthesizes a fake reference as an integer, triggers a conservative GC, and then uses the state of the weak references to figure out which object lived at that address. As a concrete example of how this can be used to do bad things: in conjunction with an exploit that allows jumping to an arbitrary memory location, this would effectively enable arbitrary code execution.

One immediate takeaway: Mark deserves serious kudos, because Dionysus was not able to figure out how to use this attack on WeakMaps. He explicitly mentions the work on WeakMaps and credits them for having been well designed for security. Well done!

But we need to take this into account as we consider what to do about weak references in ES7.

# David Herman (11 years ago)

Interestingly, I wonder if the idea of only collecting weak references between turns is immune to such attacks, since it's not possible to have a bogus reference on the stack between turns, where there is no stack.

# Brendan Eich (11 years ago)

David Herman wrote:

Patrick Walton send me this link to a fascinating approach to exploiting weak references in engines using conservative stack scanning to discover the address of objects:

 https://github.com/justdionysus/gcwoah

I don't fully grok all the details, but IIUC the attacker sprays the heap with objects that it holds weak references to, synthesizes a fake reference as an integer, triggers a conservative GC, and then uses the state of the weak references to figure out which object lived at that address. As a concrete example of how this can be used to do bad things: in conjunction with an exploit that allows jumping to an arbitrary memory location, this would effectively enable arbitrary code execution.

Dion did the JITSpray paper at BlackHat 2010:

One immediate takeaway: Mark deserves serious kudos, because Dionysus was not able to figure out how to use this attack on WeakMaps. He explicitly mentions the work on WeakMaps and credits them for having been well designed for security. Well done!

Yes, and somehow Andreas Gal and Andrew McCreight's impl in SpiderMonkey / Firefox resisted Dion's hashtable-growth timing channel attack. Double kudos, even if luck!

(This needs more investigation, though.)

But we need to take this into account as we consider what to do about weak references in ES7.

Definitely.

# Brendan Eich (11 years ago)

Brendan Eich wrote:

Dion did the JITSpray paper at BlackHat 2010:

This paper is very hard to find now! I downloaded a copy, but I'm not sure about protocol. Breadcrumbs that time out for me are at

www.woodmann.com/forum/archive/index.php/t-13412.html

# Oliver Hunt (11 years ago)

On Mar 27, 2013, at 1:56 PM, David Herman <dherman at mozilla.com> wrote:

Interestingly, I wonder if the idea of only collecting weak references between turns is immune to such attacks, since it's not possible to have a bogus reference on the stack between turns, where there is no stack.

If you could induce an integer with a controlled value to be on the stack between turns (not entirely inconceivable) it may be attackable, but once you're talking about a fixed number of samples per turn i suspect the time required renders the attack infeasible.

That said I believe that this does kill any dreams i may have had w.r.t primitive-keyed WeakMaps, kudos to MarkM.

# Sam Tobin-Hochstadt (11 years ago)

On Tue, Mar 26, 2013 at 11:44 PM, Oliver Hunt <oliver at apple.com> wrote:

That said I believe that this does kill any dreams i may have had w.r.t primitive-keyed WeakMaps, kudos to MarkM.

Wouldn't a primitive-keyed WeakMap just be a strong Map for those keys? And therefore immune to any GC attacks?

# David Bruant (11 years ago)

Le 27/03/2013 01:55, David Herman a écrit :

But we need to take this into account as we consider what to do about weak references in ES7. From what I understand, doing exact rooting (instead of conservative stack scanning) solves the problem or more precisely prevents the attack by design (because the attack would be based on numbers being interpreted as pointers addresses). Assuming I understand correctly (and tell me if I don't), this is more an attack based on an implementation detail than an attack based on the inclusion of a weak references to the language, so I'm puzzled as to why this attack should be taken into account when discussing the inclusion of weak references.

Over the last month after Opera announced moving to WebKit, people on Twitter have been rounds and rounds about Webkits monoculture and how making spec decisions based on specific implementations is a bad thing ("if specs followed WebKit implementation, we couldn't have parallel rendering engines like Servo", etc.). I don't see why that could be a good thing at the ECMAScript level.

# Brendan Eich (11 years ago)

You have it backwards. You are advocating a GC design monoculture (exact rooting only) as an assumption informing a security analysis that must take into account both language features (enumerable weakmaps; weak references) and implementation vulnerabilities across a large space of different implementation strategies.

ECMA-262 does not dictate GC design, and won't.

Dave's observation about turn-based weak-ref notification seems worth considering more deeply, though. I don't think Oliver's numeric-type stack value puns a weak ref issue necessarily disposes of the idea that an engine with a conservative stack scanner, combined with weak references and turn-based notification, would be safe against attacks of the kind Dionysus demonstrated. NaN-boxing engines can't generally have such controllable wild NaNs in double-typed stack slots, AFAIK.

# Jason Orendorff (11 years ago)

On Wed, Mar 27, 2013 at 4:53 PM, David Bruant <bruant.d at gmail.com> wrote:

Over the last month after Opera announced moving to WebKit, people on Twitter have been rounds and rounds about Webkits monoculture and how making spec decisions based on specific implementations is a bad thing ("if specs followed WebKit implementation, we couldn't have parallel rendering engines like Servo", etc.). I don't see why that could be a good thing at the ECMAScript level.

This seemed backwards to me too. Brendan's arguing for more latitude for implementations to use different techniques.

# David Bruant (11 years ago)

[answering to different messages at once] [cc'ing Oliver Hunt as Apple representative and Luke Hoban as Microsoft representative for questions at the bottom]

Le 27/03/2013 23:47, Brendan Eich a écrit :

You have it backwards. You are advocating a GC design monoculture (exact rooting only)

My point was that some implementations strategies don't have the issue and I pointed at one as an example. I wasn't implying it will/can/should be the only one. I'm open to other implementations that wouldn't use exact rooting, but wouldn't have the pointed security issue of conservative scanning. I'm open to any solution within the "memory-safe monoculture". I probably won't be shocking anyone if I say that interpreting a number as a pointer is a bad idea? I think it's the first argument MarkM gives when he says that C++ isn't a memory-safe language.

Jason Orendorff wrote:

Brendan's arguing formore latitude for implementations to use different techniques.

I see. But clearly, the attack demonstrates that some implementation techniques (conservative scanning namely) have flaws. Maybe the latitude that goes up to accepting conservative scanning as an acceptable implementation technique isn't worth it as we want to see the language evolve.

Brendan Eich wrote:

ECMA-262 does not dictate GC design, and won't.

If TC39 makes a judgement call on what features can go in ECMA-262 or how they are designed based on what implementations are considered acceptable, then ECMA-262, by not having some features or having features design in a certain way carries an invisible weight that doesn't "dictate", but definitely drives implementors into making some choices (like keeping a conservative scanning GC).

[From the "Weak event listener" thread]

Can you re-defend enumerability of weakmaps now that I've pointed out the security risk does not apply only to SES users, to be addressed by SES removing the @iterator?

Just to clarify, I still feel (60%) against WeakRefs and enumerable WeakMaps/WeakSets. What I was defending was that if WeakRefs are in, then there is no reason to pretend making WeakMaps non-enumerable because of determinism, because the non-determinism brought by enumerable weakmaps isn't worse than the non-determinism brought by weakrefs (and I still need to answer Kevin Gadd's post). That said, the security risk you pointed out is based on what I consider to be a flawed implementation (try implementing conservative stack scanning in a memory safe language like Rust), so assuming implementors are willing to admit conservative scanning is a bad idea and they plan to move away from it, WeakRefs and enumerable WeakMaps can still be discussed and their design can be freely discussed. V8 has exact rooting, SpiderMonkey is getting there. As Sam pointed out, there are other incentives (generational GC?) for engines to get rid of conservative scanning (what an interesting non-coincidence :-) ). That's a lot of signal showing that implementors are moving away from conservative scanning.

I guess 2 relevant questions would be addressed directly to JSC and the IE team [cc'ing Oliver Hunt and Luke Hoban for that reason]: if you added weakrefs to your JS engine, would you be subject to the described attack? Do you have a plan to move away at some point in the future from your current implementation (to prevent the attack or whatever other good reason)? "at some point in the future", because the when is not important, but the intent really is.

If the answers are 'yes' and 'no' for one of the 2 JS engines, then only does TC39 have a problem with including Weakrefs to ECMA-262, I think. I haven't read such a thing yet (did off-list/off-meeting notes discussion occur?), so let's wait until they bring these exact answers and then only either forget about WeakRefs until they change their mind or tweak the WeakRef design until it becomes something they'd accept to implement. But making one of these decisions before being forced to seems prematurate to me.

# David Herman (11 years ago)

On Mar 27, 2013, at 4:52 AM, Sam Tobin-Hochstadt <samth at ccs.neu.edu> wrote:

On Tue, Mar 26, 2013 at 11:44 PM, Oliver Hunt <oliver at apple.com> wrote:

That said I believe that this does kill any dreams i may have had w.r.t primitive-keyed WeakMaps, kudos to MarkM.

Wouldn't a primitive-keyed WeakMap just be a strong Map for those keys? And therefore immune to any GC attacks?

Indeed, and also deeply misleading (a weak map with strongly held entries?), which is why I argued that WeakMap should disallow primitive keys.

Oliver-- can you clarify what you were hoping for?

# Oliver Hunt (11 years ago)

On Mar 29, 2013, at 7:36 AM, David Herman <dherman at mozilla.com> wrote:

On Mar 27, 2013, at 4:52 AM, Sam Tobin-Hochstadt <samth at ccs.neu.edu> wrote:

On Tue, Mar 26, 2013 at 11:44 PM, Oliver Hunt <oliver at apple.com> wrote:

That said I believe that this does kill any dreams i may have had w.r.t primitive-keyed WeakMaps, kudos to MarkM.

Wouldn't a primitive-keyed WeakMap just be a strong Map for those keys? And therefore immune to any GC attacks?

Indeed, and also deeply misleading (a weak map with strongly held entries?), which is why I argued that WeakMap should disallow primitive keys.

Oliver-- can you clarify what you were hoping for?

I was dreaming of primitive keys, i was convinced in an earlier meeting of the problems that they would cause, but this security problem is a nail in the coffin :-/

# Marius Gundersen (11 years ago)

This seems to be more a problem with the garbage collector than with weak references. If I understood it correctly, any double value can look like a pointer, and the garbage collector will check what it is pointing at. To me this seems like a source for memory leaks. This problem exists even without weak references (or weak iterable maps/sets); the weak references just makes it observable. Does this mean the main reason weak references (or, again, weak iterable maps/sets) are not to be implemented is because of a bug in the garbage collector of popular JS enginges? As noted earlier, the implementation of the garbage collector is not specified in the ecmascript standard, so this is a problem with implementors, not with the specification.

Again, I'm far from an expert on GC or JS implementations (and would love a simplified explanation if I have misunderstood the problem), but this seems less like a problem with weak references, and more like a problem with specific implementations of GCs.

Marius Gundersen

# Oliver Hunt (11 years ago)

There are numerous problems with weak references and primitives, mostly revolving around the ability to regenerate a primitive, e.g.

someWeakRef.set("foo")
gc()
var something = "foo"
someWeakRef.get() // null or "foo"?

vs.

someWeakRef.set("foo")
var something = "foo"
gc()
someWeakRef.get() // null or "foo"?

vs.

someWeakRef.set("foo")
var something = "fo"
something += "o"
gc()
someWeakRef.get() // null or "foo"?

And of course all this just becomes worse for numeric primitives -- All existing engines use tagged values for some set of numeric values, and can also end up with the same value stored in different ways. V8 (at least in 32bit) gc allocates doubles, but not a subset of integers, this means that if you get the value 1 as a double then it might be gc'd and so the weak ref could go away, but if it were in the tagged int form it would not.

JSC doesn't immediately intern strings, but over time duplicates do get merged, at which point weakness starts acquiring odd behaviour. Because off the implicitly shared heap in different pages this may even be exploitable as a way to find out information about other pages (essentially the weak reference to a primitive allows a side channel for determining content of other pages that you would not otherwise have access to)

This means that any consistent semantic for primitives results in useless behaviour - either the weak ref has to be (essentially) strong on primitives, or be cleared on ever gc() regardless of "liveness" of other references.

# Brendan Eich (11 years ago)

Marius Gundersen wrote:

This seems to be more a problem with the garbage collector than with weak references. If I understood it correctly, any double value can look like a pointer,

No, that's not the issue in this (sub-)thread. Oliver was just recollecting thoughts about a position he took in favor of WeakMaps having non-object keys.

You're right that any double (e.g.) that might be confused for a pointer in a VM implementation makes a bad bug, and VMs must carefully avoid (find and fix!) such bugs.

The issue about non-object WeakMap keys was about semantics only, not implementation safety bugs. If I can put "42" in a WeakMap, it can never be removed, since I can "forge" that value by uttering the "42" literal again, or (in a way refractory to analysis) concatenating "4" and "2", etc.

# Marius Gundersen (11 years ago)

There are numerous problems with weak references and primitives, mostly

revolving around the ability to regenerate a primitive, e.g.

The issue about non-object WeakMap keys was about semantics only, not

implementation safety bugs. If I can put "42" in a WeakMap, it can never be removed, since I can "forge" that value by uttering the "42" literal again, or (in a way refractory to analysis) concatenating "4" and "2", etc.

This is why I suggested, in the other thread, a system for weak event listeners. This would not be a problem if the only allowed argument to a weak reference is a function. An iterable weak set of functions would not have this problem, would solve the suggested usecases for weak references (observables/events):

Marius Gundersen

# Brendan Eich (11 years ago)

Marius Gundersen wrote:

This is why I suggested, in the other thread, a system for weak event listeners. This would not be a problem if the only allowed argument to a weak reference is a function. An iterable weak set of functions would not have this problem, would solve the suggested usecases for weak references (observables/events):

WeakMaps are useful for membranes too, not just "event listeners":

doku.php?id=harmony:proxies&s=proxy#an_identity-preserving_membrane, doku.php?id=harmony:proxies&s=proxy#garbage_collection_behavior

# Hudson, Rick (11 years ago)

This brings up another interesting point. Do WeakRefs change a compiler's liveness analysis? This could complicate some apparently useful optimizations.

{ var x = new Something(); someWeakRef.set(x); // Is x dead? (yes) Is x required to contribute to the root set? (I hope not.) gc(); someWeakRef.get() // null or foo? ... }

  •    Rick
    

For the optimization see @inproceedings{Agesen:1998:GCL:277650.277738, author = {Agesen, Ole and Detlefs, David and Moss, J. Eliot}, title = {Garbage collection and local variable type-precision and liveness in Java virtual machines}, booktitle = {Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation}, series = {PLDI '98}, year = {1998}, isbn = {0-89791-987-4}, location = {Montreal, Quebec, Canada}, pages = {269--279}, numpages = {11}, url = {doi.acm.org/10.1145/277650.277738}, doi = {10.1145/277650.277738}, acmid = {277738}, publisher = {ACM}, address = {New York, NY, USA}, } From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Oliver Hunt Sent: Monday, April 01, 2013 4:37 PM To: Marius Gundersen Cc: es-discuss discussion Subject: Re: memory safety and weak references

There are numerous problems with weak references and primitives, mostly revolving around the ability to regenerate a primitive, e.g.

someWeakRef.set("foo")
gc()
var something = "foo"
someWeakRef.get() // null or "foo"?

vs.

someWeakRef.set("foo")
var something = "foo"
gc()
someWeakRef.get() // null or "foo"?

vs.

someWeakRef.set("foo")
var something = "fo"
something += "o"
gc()
someWeakRef.get() // null or "foo"?

And of course all this just becomes worse for numeric primitives -- All existing engines use tagged values for some set of numeric values, and can also end up with the same value stored in different ways. V8 (at least in 32bit) gc allocates doubles, but not a subset of integers, this means that if you get the value 1 as a double then it might be gc'd and so the weak ref could go away, but if it were in the tagged int form it would not.

JSC doesn't immediately intern strings, but over time duplicates do get merged, at which point weakness starts acquiring odd behaviour. Because off the implicitly shared heap in different pages this may even be exploitable as a way to find out information about other pages (essentially the weak reference to a primitive allows a side channel for determining content of other pages that you would not otherwise have access to)

This means that any consistent semantic for primitives results in useless behaviour - either the weak ref has to be (essentially) strong on primitives, or be cleared on ever gc() regardless of "liveness" of other references.

--Oliver

On Apr 1, 2013, at 1:22 PM, Marius Gundersen <gundersen at gmail.com<mailto:gundersen at gmail.com>> wrote:

This seems to be more a problem with the garbage collector than with weak references. If I understood it correctly, any double value can look like a pointer, and the garbage collector will check what it is pointing at. To me this seems like a source for memory leaks. This problem exists even without weak references (or weak iterable maps/sets); the weak references just makes it observable. Does this mean the main reason weak references (or, again, weak iterable maps/sets) are not to be implemented is because of a bug in the garbage collector of popular JS enginges? As noted earlier, the implementation of the garbage collector is not specified in the ecmascript standard, so this is a problem with implementors, not with the specification. Again, I'm far from an expert on GC or JS implementations (and would love a simplified explanation if I have misunderstood the problem), but this seems less like a problem with weak references, and more like a problem with specific implementations of GCs. Marius Gundersen

On Fri, Mar 29, 2013 at 3:47 AM, Oliver Hunt <oliver at apple.com<mailto:oliver at apple.com>> wrote:

On Mar 29, 2013, at 7:36 AM, David Herman <dherman at mozilla.com<mailto:dherman at mozilla.com>> wrote:

On Mar 27, 2013, at 4:52 AM, Sam Tobin-Hochstadt <samth at ccs.neu.edu<mailto:samth at ccs.neu.edu>> wrote:

On Tue, Mar 26, 2013 at 11:44 PM, Oliver Hunt <oliver at apple.com<mailto:oliver at apple.com>> wrote:

That said I believe that this does kill any dreams i may have had w.r.t primitive-keyed WeakMaps, kudos to MarkM.

Wouldn't a primitive-keyed WeakMap just be a strong Map for those keys? And therefore immune to any GC attacks?

Indeed, and also deeply misleading (a weak map with strongly held entries?), which is why I argued that WeakMap should disallow primitive keys.

Oliver-- can you clarify what you were hoping for?

I was dreaming of primitive keys, i was convinced in an earlier meeting of the problems that they would cause, but this security problem is a nail in the coffin :-/

# Kevin Gadd (11 years ago)

In that case it would be alive unless you destroyed the local variable 'x', but in some environments the compiler is free to treat it as dead. I believe in .NET the variable 'x' is only alive up until its last reference, so the variable would become dead immediately after the construction of the weak reference. The WR itself definitely should not affect object liveness. If you mean liveness in terms of like... escape analysis? Then I think you would have no issues as long as the weak reference doesn't escape. Optimization in the presence of weak references is probably not something authors should rely on, though.

P.S. That gc() call wouldn't be guaranteed to collect x. The GC is free to keep it alive for whatever reason it chooses (I'm not aware of any of these reasons, but I've encountered this behavior before)

# Brendan Eich (11 years ago)

Hudson, Rick wrote:

This brings up another interesting point. Do WeakRefs change a compiler’s liveness analysis?

Yes, of course.

This could complicate some apparently useful optimizations.

{

var x = new Something();

someWeakRef.set(x);

// Is x dead? (yes) Is x required to contribute to the root set? (I hope not.)

You dind't kill x yet. Did you forget

x = null;

here?

gc();

someWeakRef.get() // null or foo?

If x = null; happened before gc() then null else the original ref.

# Mark S. Miller (11 years ago)

On Mon, Apr 1, 2013 at 2:56 PM, Brendan Eich <brendan at mozilla.com> wrote:

Hudson, Rick wrote:

This brings up another interesting point. Do WeakRefs change a compiler’s liveness analysis?

Yes, of course.

This could complicate some apparently useful optimizations.

{

var x = new Something();

someWeakRef.set(x);

// Is x dead? (yes) Is x required to contribute to the root set? (I hope not.)

You dind't kill x yet. Did you forget

x = null;

here?

gc();

someWeakRef.get() // null or foo?

If x = null; happened before gc() then null else the original ref.

Not necessarily. For example, a conservative gc might not be able to see that foo is no longer actually reachable. < strawman:gc_semantics> explains that

this is why it states such matters as SHOULDs rather than MUSTs.

Of course, if we imagine a gc() function with a stronger contract, then the above would follow. I am skeptical that we could ever state such a stronger contract that all JS implementors could agree to.

# Hudson, Rick (11 years ago)

If the compiler can prove x does not escape the block and it is not used again then it is dead and the compiler is free to reuse the stack slot holding the last reference.

So I am arguing that x = null; is not required to kill x.

If we agree on that then I think we agree that someWeakRef.get(); is allowed to return null.

# Mark S. Miller (11 years ago)

On Mon, Apr 1, 2013 at 3:12 PM, Hudson, Rick <rick.hudson at intel.com> wrote:

If the compiler can prove x does not escape the block and it is not used again then it is dead and the compiler is free to reuse the stack slot holding the last reference.

So I am arguing that x = null; is not required to kill x.

If we agree on that then I think we agree that someWeakRef.get(); is allowed to return null.

I see. I misunderstood the question. Yes, I believe this should be allowed but not required. < strawman:gc_semantics> states "Our

safety requirements allow some reachable objects to be collected as well, so long as the garbage collector can ascertain that they will never be reached."

# Oliver Hunt (11 years ago)

On Apr 1, 2013, at 3:12 PM, "Hudson, Rick" <rick.hudson at intel.com> wrote:

If the compiler can prove x does not escape the block and it is not used again then it is dead and the compiler is free to reuse the stack slot holding the last reference.

So I am arguing that x = null; is not required to kill x.

That semantic would mean that the interpreter would need to do escape analysis, and then the moment a variable became dead it would be required to clear it, even if it did not need that slot for anything else.

The world is filled with papers on ways to reduce the conservatism of a GC, but you have to balance the cost of work required for that increased conservatism against the win you might get from reduced liveness.

But all of this is kind of moot, as weak refs are by definition going to have some degree of non-determinism w.r.t liveness, and the initial discussion was of weak refs to primitives which have their own, completely separate problems (as was already covered)

# Hudson, Rick (11 years ago)

Didn't mean to imply that one is required to use an optimization. I just wanted to make it clear that one could.

  •    Rick
    

From: Oliver Hunt [mailto:oliver at apple.com] Sent: Monday, April 01, 2013 6:18 PM To: Hudson, Rick Cc: Brendan Eich; Marius Gundersen; es-discuss discussion Subject: Re: memory safety and weak references

On Apr 1, 2013, at 3:12 PM, "Hudson, Rick" <rick.hudson at intel.com<mailto:rick.hudson at intel.com>> wrote:

If the compiler can prove x does not escape the block and it is not used again then it is dead and the compiler is free to reuse the stack slot holding the last reference.

So I am arguing that x = null; is not required to kill x.

That semantic would mean that the interpreter would need to do escape analysis, and then the moment a variable became dead it would be required to clear it, even if it did not need that slot for anything else.

The world is filled with papers on ways to reduce the conservatism of a GC, but you have to balance the cost of work required for that increased conservatism against the win you might get from reduced liveness.

But all of this is kind of moot, as weak refs are by definition going to have some degree of non-determinism w.r.t liveness, and the initial discussion was of weak refs to primitives which have their own, completely separate problems (as was already covered)

# Brendan Eich (11 years ago)

thanks for the clarifications. With the example as written, no later uses of x (even inside a direct eval or closure that captures x and might be called after the gc()!) then I agree with Mark -- should not must.

# Mark Miller (9 years ago)

At esdiscuss.org/topic/memory-safety-and-weak-references#content-1 Dave Herman wrote

Interestingly, I wonder if the idea of only collecting weak references between turns is immune to such attacks, since it's not possible to have a bogus reference on the stack between turns, where there is no stack.

Sorry it has taken me more than two years to respond to this ;)

If you actually GC only between turns, then yes. However, I doubt this is practical.

If you use the implementation technique shown at strawman:weak_references so that you never observably collect during a turn, then no, it doesn't help.