Removal of WeakMap/WeakSet clear
The root of the issue, is that WeakMaps are thought as a tool for two unrelated use cases. Quoting 1:
(...) WeakMaps were motivate[d] by two distinct use cases. Soft fields and object-keyed caches.
Now, in short, for the "soft fields" use case, a .clear()
method is unwanted, but for the "object-keyed caches" use case, a .clear()
method is welcome.
—Claude
On Nov 26, 2014, at 10:22 AM, Claude Pache wrote:
The root of the issue, is that WeakMaps are thought as a tool for two unrelated use cases. Quoting 1:
(...) WeakMaps were motivate[d] by two distinct use cases. Soft fields and object-keyed caches.
Now, in short, for the "soft fields" use case, a
.clear()
method is unwanted, but for the "object-keyed caches" use case, a.clear()
method is welcome.—Claude
Let's ignore security for the moment.
If WeakMaps/WeakSets are not inspectable (via iteration) and do not have a clear operation, then the inverted implementation technique can be use used. This technique eliminates significant GS complexity.
The ability to iterate over the elements of a WeakMap/Set would make implementation specific (and probably non-deterministic) GC timing observable to ES programs. Non-deterministic and observable implementation specific behavior is bad for interoperability. That reason alone is enough for TC39 to choose to not make WeakMaps/Sets non-iterable as interoperability is one of our highest priorities.
That said, there are use cases for iterable weak collections if you are willing to tolerate the non-determinism and observability of the GC. Implementations supporting situation where those use cases are important might choose to provide a non-standard IterableWeakMap/Set implementation and there would be no particular reason for not including a 'clear' method. If there of evidence of enough utility for such collections TC39 might even decide to standardize them. In which case we would probably but them into a standard module that needs to be explicitly imported. Use of the module would be a flag that the program may have non-deterministic behavior.
Agreed, and i believe the only real reason would be the inability to polyfill it properly, in order to discourage its usage.
If this is why, it doesn't justify its drop, imo
On Wed, Nov 26, 2014 at 9:33 AM, Katelyn Gadd <kg at luminance.org> wrote:
Is there a detailed rationale for this somewhere?
It is a combination of three issues.
- The security issue.
- The implementation issue.
- The ES process issue.
The implementation issue is that the use cases for WeakMaps basically divide into the following cases:
a. Those for which we're confident that the map's lifetime outlives its typical key. b. Those for which we're confident that the key's lifetime outlives the typical map it is a key in. c. Those for which we're not confident which typically lives longer.
For #a, the transposed representation of WeakMaps is strictly better. The non-transposed implementation would promote ephemeral keys to later GC generations, which is very expensive. (I believe the expense of this promotion has been radically underestimated in most prior discussions.) This is the GC pressure that really matters.
For #b, just use a Map rather than a WeakMap.
For #c, transposed rep or not is a tie. In the interests of minimizing implementation mechanism, we should just use a transposed rep for this as well.
Given the transposed representation, the only implementation techniques for clear are
x. Enumerating all memory y. Having the WeakMap encapsulate the token used to lookup the value in the key's hidden map, and have .clear replace this token.
#x is not reasonable. #y is equivalent to the replacing of the Map that would be the user-level technique for emulating clear in any case. This is exactly what the use of WeakMaps for membranes do, when the membrane should be revoked. If .clear could be implemented for the transposed representation more efficiently than this, membranes would benefit from .clear as well. I have never expected they could.
The process issue is that .clear was not in the original proposal (for all of these reasons), and it never achieved consensus in committee. It would have violated our process to keep it in the spec. The process issue is not "Should it be dropped?" since it was never legitimately added. The only issue is "Should it be added?".
On Wed, Nov 26, 2014 at 11:09 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Nov 26, 2014, at 10:22 AM, Claude Pache wrote:
The root of the issue, is that WeakMaps are thought as a tool for two unrelated use cases. Quoting 1:
(...) WeakMaps were motivate[d] by two distinct use cases. Soft fields and object-keyed caches.
Now, in short, for the "soft fields" use case, a
.clear()
method is unwanted, but for the "object-keyed caches" use case, a.clear()
method is welcome.—Claude
Let's ignore security for the moment.
If WeakMaps/WeakSets are not inspectable (via iteration) and do not have a clear operation, then the inverted implementation technique can be use used. This technique eliminates significant GS complexity.
The ability to iterate over the elements of a WeakMap/Set would make implementation specific (and probably non-deterministic) GC timing observable to ES programs. Non-deterministic and observable implementation specific behavior is bad for interoperability.
Hi Allen, as much as I'd like to agree with this, I don't see the observability-of-GC issue. Could you explain more? Thanks.
On Wed, Nov 26, 2014 at 11:23 AM, Mark S. Miller <erights at google.com> wrote:
On Wed, Nov 26, 2014 at 11:09 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Nov 26, 2014, at 10:22 AM, Claude Pache wrote:
The root of the issue, is that WeakMaps are thought as a tool for two unrelated use cases. Quoting 1:
(...) WeakMaps were motivate[d] by two distinct use cases. Soft fields and object-keyed caches.
Now, in short, for the "soft fields" use case, a
.clear()
method is unwanted, but for the "object-keyed caches" use case, a.clear()
method is welcome.—Claude
Let's ignore security for the moment.
If WeakMaps/WeakSets are not inspectable (via iteration) and do not have a clear operation, then the inverted implementation technique can be use used. This technique eliminates significant GS complexity.
The ability to iterate over the elements of a WeakMap/Set would make implementation specific (and probably non-deterministic) GC timing observable to ES programs. Non-deterministic and observable implementation specific behavior is bad for interoperability.
Hi Allen, as much as I'd like to agree with this, I don't see the observability-of-GC issue. Could you explain more? Thanks.
Allen appears to be mixing in an iterability argument. This isn't germane to the .clear() discussion, unless you think of .clear() as being implemented by iterating and removing keys.
On Nov 26, 2014, at 11:25 AM, Tab Atkins Jr. wrote:
On Wed, Nov 26, 2014 at 11:23 AM, Mark S. Miller <erights at google.com> wrote:
On Wed, Nov 26, 2014 at 11:09 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Nov 26, 2014, at 10:22 AM, Claude Pache wrote:
The root of the issue, is that WeakMaps are thought as a tool for two unrelated use cases. Quoting 1:
(...) WeakMaps were motivate[d] by two distinct use cases. Soft fields and object-keyed caches.
Now, in short, for the "soft fields" use case, a
.clear()
method is unwanted, but for the "object-keyed caches" use case, a.clear()
method is welcome.—Claude
Let's ignore security for the moment.
If WeakMaps/WeakSets are not inspectable (via iteration) and do not have a clear operation, then the inverted implementation technique can be use used. This technique eliminates significant GS complexity.
The ability to iterate over the elements of a WeakMap/Set would make implementation specific (and probably non-deterministic) GC timing observable to ES programs. Non-deterministic and observable implementation specific behavior is bad for interoperability.
Hi Allen, as much as I'd like to agree with this, I don't see the observability-of-GC issue. Could you explain more? Thanks.
Allen appears to be mixing in an iterability argument. This isn't germane to the .clear() discussion, unless you think of .clear() as being implemented by iterating and removing keys.
Right, it's the lack of iterability and clear that makes the inverted implementation feasible. We justify lack of iterability because it exposes GC.
The detailed explanations are helpful for understanding this, at least when it comes to WeakMap. Thanks. I was not aware that clear() was a tenative addition - MDN (developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakMap) lists it as available in all the major browsers. Understanding it as 'should this new thing be added' makes the decision less controversial for me even if I personally want the primitive to be there.
However: Does this mean that if I implement clear myself by creating new WeakMaps, i'm inadvertently doing horrible things to the GC? I get the impression that this would mean every time I throw the old map away, the data in the keys' hidden maps persists for some unknown amount of time, and I'm adding to that hidden map. If I destroy and repopulate these maps on a regular basis that seems like it could get out of hand quickly. If this usage pattern is fundamentally bad for JS VMs, it seems like the sort of guidance that probably needs to be clearly messaged to users - 'clear() isn't available because you really, really shouldn't do that' vs 'clear() isn't available because you can write it yourself'. Just my 2c there - very interesting to know about the reasoning behind this.
Katelyn Gadd wrote:
However: Does this mean that if I implement clear myself by creating new WeakMaps, i'm inadvertently doing horrible things to the GC?
Assume yes (only defensive assumption). What goes wrong?
Sorry if I missed it: what is your .clear use case, in example code?
On Wed, Nov 26, 2014 at 2:49 PM, Katelyn Gadd <kg at luminance.org> wrote:
The detailed explanations are helpful for understanding this, at least when it comes to WeakMap. Thanks. I was not aware that clear() was a tenative addition - MDN (developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakMap) lists it as available in all the major browsers. Understanding it as 'should this new thing be added' makes the decision less controversial for me even if I personally want the primitive to be there.
However: Does this mean that if I implement clear myself by creating new WeakMaps, i'm inadvertently doing horrible things to the GC?
Well, I'll try to explain how expensive I expect it to be, in qualitative terms, and leave it to you to judge whether it is horrible. As I mentioned, the original motivating use case for WeakMaps -- membranes -- throws WeakMaps away in precisely this manner when the membrane is revoked. So for that use case at least, I do not consider it too horrible. Nor do I know of any better alternative.
I get the impression that this would mean every time I throw the old map away, the data in the keys' hidden maps persists for some unknown amount of time,
Yes.
and I'm adding to that hidden map.
Yes.
If I destroy and repopulate these maps on a regular basis that seems like it could get out of hand quickly.
If you're doing this at greater frequency than the expected lifetime of the keys, then I need to ask: Why are you using a WeakMap instead of a Map? (An example would be great)
If this usage pattern is fundamentally bad for JS VMs, it seems like the sort of guidance that probably needs to be clearly messaged to users - 'clear() isn't available because you really, really shouldn't do that' vs 'clear() isn't available because you can write it yourself'. Just my 2c there - very interesting to know about the reasoning behind this.
Sure. But this kind of advice is outside the normative spec. Perhaps it is more appropriate for the various outlets authored by many of the authors here
The discussion here still makes various untested assumptions about the transposed representation. With respect to .clear in particular, it seems to assume that this representation does not normally need the ability to iterate (internally) over the keys of a weak map. AFAICT, that is incorrect. Both the implementation we discussed in the V8 team, and from what I heard, the implementation in IE, maintain a list of all its keys for each weak map. Otherwise, when a map dies, you couldn't generally kill the entries and the values associated with them easily and in a timely manner. (And of course, this list means almost twice the space usage for a map, because you duplicate part of the information.)
So from that perspective at least, .clear is not an issue.
I also don't see the security issue, to be honest. In any security-relevant (or abstraction-relevant) scenario you would be crazy to hand out the (mutable!) weak map to third parties anyway. They could run just as much havoc by adding random things to the map (or removing things at guess) as they could by clearing it.
Mark, can you explain the specific scenario you have in mind?
Andreas I think Mark few times already gave for granted WeakMaps are based
on transposed representations and it's also partially pointless from a
polyfill point of view to not implement them as such but clear makes fast
polyfills very pointless with .clear()
in place ( e.g. [1] )
I've had similar thoughts when I've first realized there was a delete
method since I thought: if you can delete, you should be able to iterate
and get keys ... which wasn't even the initial case but that's how my
initial polyfill was implemented, with a real .clear()
ability that was
making the WeakMap itself not weak anymore [2]
I start thinking for performance and polyfill sake maybe it's better for
everyone to keep .clear()
out of the equation so there won't be broken
polyfill and implementations can decide which implementation is better for
their main use cases.
Who wants to expose clear can always do it via wrappers:
// just for example purpose
function CleanableWeakMap() {
this.clean();
this.delete = this._wm.delete.bind(this._wm);
this.get = this._wm.get.bind(this._wm);
this.has = this._wm.has.bind(this._wm);
this.set = this._wm.set.bind(this._wm);
}
CleanableWeakMap.prototype.clean = function () {
this._wm = new WeakMap;
};
Above code should make everyone happy except the GC in between a fake
.clean()
and a garbage call but at least it could be sort of consistent
across ES5 to ES.next platforms.
Willing also to drop my non WeakMap and bring in either the Polymer
polyfill which is already missing .clear()
but polluting the global scope
anyway with this "not really full spec" implementation, or the one in link
[1]
[1] changing the slot without being able to clean up a thing: gist.github.com/WebReflection/5991636#file-weakmap-js-L15-L19
[2] not a WeakMap, you should delete things ( aka: sort of pointless WeakMap )
On 27 November 2014 at 15:22, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
Andreas I think Mark few times already gave for granted WeakMaps are based on transposed representations and it's also partially pointless from a polyfill point of view to not implement them as such but clear makes fast polyfills very pointless with
.clear()
in place ( e.g. [1] )
I think you misunderstood. The argument in this thread that I was responding to was that .clear is incompatible with a (presumably desirable) transposed representation. That is not so, at least not for any realistic implementation of that approach that I am aware of.
Polyfills are a different issue that I haven't considered. However, the only functionally correct polyfill for WeakMaps is via non-weak Maps or even arrays, and then there is no problem either.
I've had similar thoughts when I've first realized there was a
delete
method since I thought: if you can delete, you should be able to iterate and get keys ... which wasn't even the initial case but that's how my initial polyfill was implemented, with a real.clear()
ability that was making the WeakMap itself not weak anymore [2]
Well, there is no functionally correct polyfill for WeakMaps that is actually weak, regardless of .clear.
I start thinking for performance and polyfill sake maybe it's better for everyone to keep
.clear()
out of the equation so there won't be broken polyfill and implementations can decide which implementation is better for their main use cases.Who wants to expose clear can always do it via wrappers:
// just for example purpose function CleanableWeakMap() { this.clean(); this.delete = this._wm.delete.bind(this._wm); this.get = this._wm.get.bind(this._wm); this.has = this._wm.has.bind(this._wm); this.set = this._wm.set.bind(this._wm); } CleanableWeakMap.prototype.clean = function () { this._wm = new WeakMap; };
Sure, but the same is true the other way round. So that argument doesn't resolve the discussion either way.
On Thu, Nov 27, 2014 at 2:44 PM, Andreas Rossberg <rossberg at google.com>
wrote:
Well, there is no functionally correct polyfill for WeakMaps that is actually weak, regardless of .clear.
Please bear with me so I understand your point: if you have o1
and o2
and o1.link = o2;
, wouldn't o2
be "free" once o1
is not referenced
anymore ?
On 27 November 2014 at 15:58, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
On Thu, Nov 27, 2014 at 2:44 PM, Andreas Rossberg <rossberg at google.com> wrote:
Well, there is no functionally correct polyfill for WeakMaps that is actually weak, regardless of .clear.
Please bear with me so I understand your point: if you have
o1
ando2
ando1.link = o2;
, wouldn'to2
be "free" onceo1
is not referenced anymore ?
Yes, but this approach only works for regular, extensible, writable objects o1, i.e., fails for frozen or sealed objects, proxies, etc. And of course, as a polyfill for weak maps, it breaks other things, such as correctly iterating over properties of an object used as a key.
Right, the compromise then is usually to use a descriptor with
enumerable:false
that won't break "the common web" but agreed it's not
that reliable or safe ... although developers are interested in the "weak"
bit, and most of them most likely won't deal with "new" kind of objects
such Proxies and/or frozen/sealed but I understand your point and that's
why I've been happily using an Array for keys and one for values that never
failed, but also never granted weakness requiring either the usage of clear
or delete, which is usually more problematic.
Now I let Mark answer your initial questions and stop talking about polyfills since there's not a clear winner anyway.
On Nov 27, 2014, at 1:17 AM, Andreas Rossberg wrote:
The discussion here still makes various untested assumptions about the transposed representation. With respect to .clear in particular, it seems to assume that this representation does not normally need the ability to iterate (internally) over the keys of a weak map. AFAICT, that is incorrect. Both the implementation we discussed in the V8 team, and from what I heard, the implementation in IE, maintain a list of all its keys for each weak map. Otherwise, when a map dies, you couldn't generally kill the entries and the values associated with them easily and in a timely manner. (And of course, this list means almost twice the space usage for a map, because you duplicate part of the information.)
So from that perspective at least, .clear is not an issue.
Since I made this assertion that 'clear' interfere with with an inverted implementation of WeakMap/Set I'll explain what I'm thinking.
First, here is my design for an inverted implementation:
Every object internally maintains a table (probably a hash table if it contains more than a few elements) that is used to implement WeakMap/Set. Entry in the table is a key/value pair where the key is a WeakMap/Set instance. Values are arbitrary ES values. Let's call such a table an "inverted map" and generically refer to such WeakMaps/Sets as WCs.
A WC object has no instance specific state (other than the state required of all objects, such as [[Prototype]] etc.
The WeakMap 'set' method applied to a WeakMap M with arguments k and v works by accessing the inverted map of k and creating an entry in the inverted map with key: M and value: k (or updating the value field with v with an entry for M already exists).
WeakMap get/set/has/delete work similarly by accessing keys' inverted map using M as the key.
Note that since such a WC instance does not contain any references to its keys (or values) it does not prevent an object that is used as one of its keys from being garbage collected. When an object is GC'd its inverted map is, of course, also discarded.
There is no change in the GC latency of an object uses as a WeakMap key. If the only reference to an object used as a value in a Weakmap is from a single inverted map entry, then that value object should be GC's when its key value is GC'ed.
So far great, none of this requires anything special from the GC. But what if there are no references to a WC instance other than from the keys of inverted maps? In that case the WC instance should be considered garbage, but if the inverted map keys are normal strong object references such should-be-garbage WC instance won't be recognized as such. (This might not be a very big deal if we were only talking about the WC instances as they are small and presumably not very numerous, but retaining those instances also cause any corresponding inverted map value references to be retained).
We can fix this by having the GC treat all key entries of inverted maps as weak references. Note that this requires weak reference support in the GC (definitely a complication) but does not require that the GC implement ephemeron semantics (a bigger complication).
Note that nothing above requires (or allows) enumerating the entries of a WC. Explicit enumeration operations or a 'clear' method would require some way to enumerate the objects that are used as keys of a WC.
This is the end of my assumed inverted WC design and why I assert that a clear method is incompatible with it.
Lets explore how we can extend this design to support iteration (of the sort need to implement 'clear'). To do iteration we need to have some way to find all of the objects that are used as keys of a WC. Since were are using an inverted representation (and assuming we don't want to examine all objects to see if the WC is a key in its inverted map). We need a way to efficiently find all objects that are used as key of a particular WC. The obvious way to do that would be to add to the internal state of a WC a list of all active key values. This would need to be a list represented using weak references. Otherwise, it would prevent key objected from being properly GC'ed. So, at the very least, clear or other iteration requires an additional weak reference list for each WC (or perhaps a likely less efficient single global weak list) that would not otherwise be needed.
(Nit-correcting along with comments and questions in-line, just trying to parse what you mean, not snarking -- except for the first comment :-D.)
Allen Wirfs-Brock wrote:
On Nov 27, 2014, at 1:17 AM, Andreas Rossberg wrote:
The discussion here still makes various untested assumptions about the transposed representation. With respect to .clear in particular, it seems to assume that this representation does not normally need the ability to iterate (internally) over the keys of a weak map. AFAICT, that is incorrect. Both the implementation we discussed in the V8 team, and from what I heard, the implementation in IE, maintain a list of all its keys for each weak map. Otherwise, when a map dies, you couldn't generally kill the entries and the values associated with them easily and in a timely manner. (And of course, this list means almost twice the space usage for a map, because you duplicate part of the information.)
So from that perspective at least, .clear is not an issue.
Since I made this assertion that 'clear' interfere with with an inverted implementation of WeakMap/Set I'll explain what I'm thinking.
First, here is my design for an inverted implementation:
Every object internally maintains a table (probably a hash table if it contains more than a few elements) that is used to implement WeakMap/Set. Entry in the table is a key/value pair where the key is a WeakMap/Set instance. Values are arbitrary ES values. Let's call such a table an "inverted map" and generically refer to such WeakMaps/Sets as WCs.
(Water Closet? Just curious what the C stands for :-P.)
A WC object has no instance specific state (other than the state required of all objects, such as [[Prototype]] etc.
The WeakMap 'set' method applied to a WeakMap M with arguments k and v works by accessing the inverted map of k and creating an entry in the inverted map with key: M and value: k
value: v, right?
(or updating the value field with v with an entry for M already exists).
WeakMap get/set/has/delete work similarly by accessing keys' inverted map using M as the key.
[A] This means M must be mapped to WC, via some internal member or (side-)side-table.
Note that since such a WC instance does not contain any references to its keys (or values) it does not prevent an object that is used as one of its keys from being garbage collected. When an object is GC'd its inverted map is, of course, also discarded.
In this sense the inverted map is atomic, featureless, just an identity -- like a symbol, but hidden from inspection. It could be a symbol, even.
There is no change in the GC latency of an object uses
("used")
as a WeakMap key. If the only reference to an object used as a value in a Weakmap is from a single inverted map entry, then that value object should be GC's
"GC'ed"
when its key value is GC'ed.
So far great, none of this requires anything special from the GC. But what if there are no references to a WC instance other than from the keys of inverted maps?
If M is garbage then its connection [A] to its WC or WCs won't be considered by the GC. Just confirming that you are presuming M is garbage in this paragraph.
If M is not garbage, then [A] must keep WC alive.
In that case the WC instance should be considered garbage, but if the inverted map keys are normal strong object references such should-be-garbage WC instance won't be recognized as such. (This might not be a very big deal if we were only talking about the WC instances as they are small and presumably not very numerous, but retaining those instances also cause any corresponding inverted map value references to be retained).
That's not clear to me: if a WC is implemented using a symbol, and nothing else references that symbol, then (unlike string-equated property keys, where a new name could be computed later), the WC-keyed entries can be purged.
We can fix this by having the GC treat all key entries of inverted maps as weak references. Note that this requires weak reference support in the GC (definitely a complication) but does not require that the GC implement ephemeron semantics (a bigger complication).
I would appreciate a reply to the previous comment ("That's not clear to me") before going on. This may be just a clarifying point, or a tangent, with respect to the feasibility of .clear for maps with inverted representations, but I hope it will clarify things so no one (including me!) goes down a bogus path.
On Nov 27, 2014, at 11:49 AM, Brendan Eich wrote:
(Nit-correcting along with comments and questions in-line, just trying to parse what you mean, not snarking -- except for the first comment :-D.)
Allen Wirfs-Brock wrote:
On Nov 27, 2014, at 1:17 AM, Andreas Rossberg wrote:
The discussion here still makes various untested assumptions about the transposed representation. With respect to .clear in particular, it seems to assume that this representation does not normally need the ability to iterate (internally) over the keys of a weak map. AFAICT, that is incorrect. Both the implementation we discussed in the V8 team, and from what I heard, the implementation in IE, maintain a list of all its keys for each weak map. Otherwise, when a map dies, you couldn't generally kill the entries and the values associated with them easily and in a timely manner. (And of course, this list means almost twice the space usage for a map, because you duplicate part of the information.)
So from that perspective at least, .clear is not an issue.
Since I made this assertion that 'clear' interfere with with an inverted implementation of WeakMap/Set I'll explain what I'm thinking.
First, here is my design for an inverted implementation:
Every object internally maintains a table (probably a hash table if it contains more than a few elements) that is used to implement WeakMap/Set. Entry in the table is a key/value pair where the key is a WeakMap/Set instance. Values are arbitrary ES values. Let's call such a table an "inverted map" and generically refer to such WeakMaps/Sets as WCs.
(Water Closet? Just curious what the C stands for :-P.)
Collection
A WC object has no instance specific state (other than the state required of all objects, such as [[Prototype]] etc.
The WeakMap 'set' method applied to a WeakMap M with arguments k and v works by accessing the inverted map of k and creating an entry in the inverted map with key: M and value: k
value: v, right?
oops, right
(or updating the value field with v with an entry for M already exists).
WeakMap get/set/has/delete work similarly by accessing keys' inverted map using M as the key.
[A] This means M must be mapped to WC, via some internal member or (side-)side-table.
M is an instance of a WC, eg, M is a WeakMap which is a kind of WC
Note that since such a WC instance does not contain any references to its keys (or values) it does not prevent an object that is used as one of its keys from being garbage collected. When an object is GC'd its inverted map is, of course, also discarded.
In this sense the inverted map is atomic, featureless, just an identity -- like a symbol, but hidden from inspection. It could be a symbol, even.
No, a WC instance is atomic, featureless, etc. internally. Externally it expose a WeakMap or WeakSet api. The inverted map is part of the implementation level represent of an Object. It is a map-like data structure in the same sense that the own roperties of an object might be implemented using a map-like data structure
There is no change in the GC latency of an object uses
("used")
as a WeakMap key. If the only reference to an object used as a value in a Weakmap is from a single inverted map entry, then that value object should be GC's
"GC'ed"
when its key value is GC'ed.
So far great, none of this requires anything special from the GC. But what if there are no references to a WC instance other than from the keys of inverted maps?
If M is garbage then its connection [A] to its WC or WCs won't be considered by the GC. Just confirming that you are presuming M is garbage in this paragraph.
yes
If M is not garbage, then [A] must keep WC alive.
I think you are seeing an additional pointer that I didn't intend. M is a WC so there is no [A]. There concern is about the reference from the key side of k's inverted map. It should keep M alive.
In that case the WC instance should be considered garbage, but if the inverted map keys are normal strong object references such should-be-garbage WC instance won't be recognized as such. (This might not be a very big deal if we were only talking about the WC instances as they are small and presumably not very numerous, but retaining those instances also cause any corresponding inverted map value references to be retained).
That's not clear to me: if a WC is implemented using a symbol, and nothing else references that symbol, then (unlike string-equated property keys, where a new name could be computed later), the WC-keyed entries can be purged.
But WC instances do have mutable state (their [[Prototype]] and [[Exstensible]] internal slots. Plus you might use a WC as the key of a WC, so that means a WC instance also needs to have it's own internally mutable inverted map to such usages.
We can fix this by having the GC treat all key entries of inverted maps as weak references. Note that this requires weak reference support in the GC (definitely a complication) but does not require that the GC implement ephemeron semantics (a bigger complication).
I would appreciate a reply to the previous comment ("That's not clear to me") before going on. This may be just a clarifying point, or a tangent, with respect to the feasibility of .clear for maps with inverted representations, but I hope it will clarify things so no one (including me!) goes down a bogus path.
/be
hope it's clearer now
Allen Wirfs-Brock wrote:
WeakMap get/set/has/delete work similarly by accessing keys' inverted map using M as the key.
[A] This means M must be mapped to WC, via some internal member or (side-)side-table.
M is an instance of a WC, eg, M is a WeakMap which is a kind of WC
Oh I see -- this means identity, no link [A] from M to WC, just one and the same object. Thanks.
On 27 November 2014 at 19:40, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Nov 27, 2014, at 1:17 AM, Andreas Rossberg wrote:
The discussion here still makes various untested assumptions about the transposed representation. With respect to .clear in particular, it seems to assume that this representation does not normally need the ability to iterate (internally) over the keys of a weak map. AFAICT, that is incorrect. Both the implementation we discussed in the V8 team, and from what I heard, the implementation in IE, maintain a list of all its keys for each weak map. Otherwise, when a map dies, you couldn't generally kill the entries and the values associated with them easily and in a timely manner. (And of course, this list means almost twice the space usage for a map, because you duplicate part of the information.)
So from that perspective at least, .clear is not an issue.
Since I made this assertion that 'clear' interfere with with an inverted implementation of WeakMap/Set I'll explain what I'm thinking.
First, here is my design for an inverted implementation:
Every object internally maintains a table (probably a hash table if it contains more than a few elements) that is used to implement WeakMap/Set. Entry in the table is a key/value pair where the key is a WeakMap/Set instance. Values are arbitrary ES values. Let's call such a table an "inverted map" and generically refer to such WeakMaps/Sets as WCs.
A WC object has no instance specific state (other than the state required of all objects, such as [[Prototype]] etc.
The WeakMap 'set' method applied to a WeakMap M with arguments k and v works by accessing the inverted map of k and creating an entry in the inverted map with key: M and value: k (or updating the value field with v with an entry for M already exists).
WeakMap get/set/has/delete work similarly by accessing keys' inverted map using M as the key.
Note that since such a WC instance does not contain any references to its keys (or values) it does not prevent an object that is used as one of its keys from being garbage collected. When an object is GC'd its inverted map is, of course, also discarded.
There is no change in the GC latency of an object uses as a WeakMap key. If the only reference to an object used as a value in a Weakmap is from a single inverted map entry, then that value object should be GC's when its key value is GC'ed.
So far great, none of this requires anything special from the GC. But what if there are no references to a WC instance other than from the keys of inverted maps? In that case the WC instance should be considered garbage, but if the inverted map keys are normal strong object references such should-be-garbage WC instance won't be recognized as such. (This might not be a very big deal if we were only talking about the WC instances as they are small and presumably not very numerous, but retaining those instances also cause any corresponding inverted map value references to be retained).
With a normal ephemeral weak map implementation, like the one in V8, the value in a (map,key,value) triple can be reclaimed immediately (in the same cycle) when either the map or the key dies. In the transposed implementation you describe that is not the case. If the map dies, the value will survive another (major GC) cycle. Under memory pressure that can increase the rate of major GCs.
To achieve timely reclamation in the former implementation, the GC maintains a list of all live weak maps, and traverses it during the atomic pause in the final marking phase (in an incremental GC). If you wanted to do the analog in the transposed representation, then the GC would dually need to maintain a list of all keys, and always traverse all of that. That would clearly be very expensive (there are typically far more keys than maps). The more practical solution is to have one such list per weak map, so that you only iterate over relevant keys of maps that have died. Alas, every map gets a list of its keys.
If you don't want to do that, then you'll have inferior GC behaviour with the transposed representation. In addition, you'll have to implement the whole weak property mechanism, and make every access to a key/value pair require an indirection through such a weak reference.
We can fix this by having the GC treat all key entries of inverted maps as weak references. Note that this requires weak reference support in the GC (definitely a complication) but does not require that the GC implement ephemeron semantics (a bigger complication).
At least in V8, we have various internal uses of ephemeral data structures anyway. In contrast, there is only one use of weak references, which is for persistent handles in the API, and we have plans to get rid of that. ;)
But more to the point, weak properties are far more complicated than just weak references. At least if you want to avoid substantial secondary costs for object accesses (see my mail from a while ago).
On Thu, Nov 27, 2014 at 8:58 AM, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:
On Thu, Nov 27, 2014 at 2:44 PM, Andreas Rossberg <rossberg at google.com> wrote:
Well, there is no functionally correct polyfill for WeakMaps that is actually weak, regardless of .clear.
Please bear with me so I understand your point: if you have
o1
ando2
ando1.link = o2;
, wouldn'to2
be "free" onceo1
is not referenced anymore ?
Yes, but consider the other way around: o1 is still referenced, but the WeakMap is not.
Under a polyfill, o2 would still be kept alive, even though per spec there's not supposed to be any way to access it.
On Fri, Nov 28, 2014 at 4:21 AM, Andreas Rossberg <rossberg at google.com> wrote: [...]
With a normal ephemeral weak map implementation, like the one in V8, the value in a (map,key,value) triple can be reclaimed immediately (in the same cycle) when either the map or the key dies.
Hi Andreas, it is at this claim that I get confused. This is certainly not true for a normal ephemeron map implementation, and is not true for any ephemeron implementation I've read about, including years of prior postings here on es-discuss. If this is true for v8, great! Please walk me through what the v8 implementation does under the following scenario:
m = new WeakMap(); non_garbage_var = m; //... stuff not changing whether m is garbage //... scavenge { const k = {}; const v = [k, m]; m.set(k, v); } //...assume that variables k and v are no longer reachable. //...scavenge
// m has remained non-garbage the whole time. // Immediately after the scavenge above, have k and/or v been either collected or promoted?
I am also interested in the transposed scenario:
k = {}; non_garbage_var = k; //... stuff not changing whether k is garbage //... scavenge { const m = new WeakMap(); const v = [k, m]; m.set(k, v); } //...assume that variables m and v are no longer reachable. //...scavenge
// k has remained non-garbage the whole time. // Immediately after the scavenge above, have m and/or v been either collected or promoted?
If the existing v8 implementation efficiently collects v as of the second scavenge in both scenarios, or even if there is any known way of doing so efficiently (that also deals with the general ephemeron issue), I would be pleasantly astonished. My assumptions would indeed be wrong, and I would indeed need to revisit all conclusions that followed from these assumptions.
Do we have such good news?
Just to be explicit, my assumptions are:
Scavenging, in order to be as super efficient as it needs to be, avoids any ephemeron collection algorithm during the scavenge itself.
A normal non-transposed ephemeron map implementation will promote k and v as of the second scavenge in the first scenario, which is the only safe option that postpones the ephemeron part of the gc algorithm.
A normal non-transposed ephemeron map implementation will collect m and v as of the second scavenge in the second scenario, since these are collected merely by the classic scavenge without awareness of ephemeron issues.
A transposed ephemeron map implementation will collect k and v as of the second scavenge in the first scenario, since these are collected merely by the classic scavenge without awareness of ephemeron issues.
A transposed ephemeron map implementation will promote m and v as of the second scavenge in the second scenario, which is the only safe option that postpones the ephemeron part of the gc algorithm for transposed map representations.
On Nov 28, 2014, at 4:21 AM, Andreas Rossberg wrote:
On 27 November 2014 at 19:40, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Nov 27, 2014, at 1:17 AM, Andreas Rossberg wrote:
The discussion here still makes various untested assumptions about the transposed representation. With respect to .clear in particular, it seems to assume that this representation does not normally need the ability to iterate (internally) over the keys of a weak map. AFAICT, that is incorrect. Both the implementation we discussed in the V8 team, and from what I heard, the implementation in IE, maintain a list of all its keys for each weak map. Otherwise, when a map dies, you couldn't generally kill the entries and the values associated with them easily and in a timely manner. (And of course, this list means almost twice the space usage for a map, because you duplicate part of the information.)
Based upon you description below, it isn't clear you you means by a "transposed" representation. I this something you are actually using in V8? What you describe below sounds like a fairly conventional (non inverted) map representation.
some more comments below
...
With a normal ephemeral weak map implementation, like the one in V8, the value in a (map,key,value) triple can be reclaimed immediately (in the same cycle) when either the map or the key dies. In the transposed implementation you describe that is not the case. If the map dies, the value will survive another (major GC) cycle. Under memory pressure that can increase the rate of major GCs.
It isn't clear what you mean by "ephemeral weak map implementation". Is this a ephemeron-based design or is this some other key map scheme (reference??).
If the later, what does "ephemeral" relate to? Does it assume that the keys are more ephemeral than the map.
Whether there is a delayed value collection issue probably depends upon the patterns of usage and the overall design of the GC. If the keys are generally more ephemeral than the maps then this probably won't be a problem. If the maps are generally more ephemeral than the keys and the GC is a multi-generation collator then the map is likely to have already been collected by the the time the inverted map in a key object is scanned.
To achieve timely reclamation in the former implementation, the GC maintains a list of all live weak maps, and traverses it during the atomic pause in the final marking phase (in an incremental GC). If you wanted to do the analog in the transposed representation, then the GC would dually need to maintain a list of all keys, and always traverse all of that. That would clearly be very expensive (there are typically far more keys than maps). The more practical solution is to have one such list per weak map, so that you only iterate over relevant keys of maps that have died. Alas, every map gets a list of its keys.
If you don't want to do that, then you'll have inferior GC behaviour with the transposed representation. In addition, you'll have to implement the whole weak property mechanism, and make every access to a key/value pair require an indirection through such a weak reference.
But doesn't this design have have complexity that is similar to a ephemeron table solution? In particular, doesn't it result in key/value circularities if a value is or references the same WeakMap as the inverted map key value? To avoid this won't the inverted map essentially have to have ephemeron table semantics?
On 29 November 2014 at 22:30, Mark S. Miller <erights at google.com> wrote:
On Fri, Nov 28, 2014 at 4:21 AM, Andreas Rossberg <rossberg at google.com> wrote: [...]
With a normal ephemeral weak map implementation, like the one in V8, the value in a (map,key,value) triple can be reclaimed immediately (in the same cycle) when either the map or the key dies.
Hi Andreas, it is at this claim that I get confused. This is certainly not true for a normal ephemeron map implementation, and is not true for any ephemeron implementation I've read about, including years of prior postings here on es-discuss. If this is true for v8, great! Please walk me through what the v8 implementation does under the following scenario:
m = new WeakMap(); non_garbage_var = m; //... stuff not changing whether m is garbage //... scavenge { const k = {}; const v = [k, m]; m.set(k, v); } //...assume that variables k and v are no longer reachable. //...scavenge
// m has remained non-garbage the whole time. // Immediately after the scavenge above, have k and/or v been either collected or promoted?
In a minor collection nothing is collected, because it doesn't handle ephemerons at all.
If a major collection happens after the block, then both k and v (or any such cycle, for that matter) are reclaimed during that collection. (Ignoring complications like incremental marking, which might already have marked k and v live while the block was still active.)
In a transposed representation for weak maps with key lists, a minor collection would not collect anything either, because k would still be reachable from m through its list of keys (and minor doesn't handle ephemerons specially). A major collection would be able to remove k from m's key list and reclaim it.
I am also interested in the transposed scenario:
k = {}; non_garbage_var = k; //... stuff not changing whether k is garbage //... scavenge { const m = new WeakMap(); const v = [k, m]; m.set(k, v); } //...assume that variables m and v are no longer reachable. //...scavenge
// k has remained non-garbage the whole time. // Immediately after the scavenge above, have m and/or v been either collected or promoted?
In both minor or major collection both m and v are immediately reclaimed, because neither is strongly reachable at that point (unless the WeakMap is pretenured, which would make it survive the minor).
However, in a transposed representation for weak maps, a minor collection would no longer reclaim anything, because m would actually be reachable from k (and minor collections do not handle ephemerons specially). A major collection would still reclaim both m and v immediately, presuming m has a list of keys, which it can traverse to find k and remove itself from it. If that wasn't the case, the major collection could not reclaim m or v in this cycle.
So having a list of keys in the transposed map representation potentially causes some keys to survive a minor collection, but prevents values from surviving a major collection. Independent of that, the transposed map representation makes scenario 2 worse wrt minor collections.
If the existing v8 implementation efficiently collects v as of the second scavenge in both scenarios, or even if there is any known way of doing so efficiently (that also deals with the general ephemeron issue), I would be pleasantly astonished. My assumptions would indeed be wrong, and I would indeed need to revisit all conclusions that followed from these assumptions.
Do we have such good news?
Just to be explicit, my assumptions are:
Scavenging, in order to be as super efficient as it needs to be, avoids any ephemeron collection algorithm during the scavenge itself.
All ephemeron collections are visited in the final atomic pause of the marking phase of a major GC cycle, which will only mark values life that still have strongly reachable keys. All others are collected by the following sweeping phase.
However, if by "scavenging" you meant minor GC then that does not handle ephemeron collections at all.
Good. I think we're coming to a mutual understanding. By "scavenge" I mean exactly your "minor collection". I think collecting typical garbage during minor collection, rather than promoting/tenuring it, is desperately important, and dominates the other efficiency considerations mentioned in this thread. Of the common scenario, you say:
In a minor collection nothing is collected, because it doesn't handle ephemerons at all.
which clears up my confusion from your earlier claim
the value in a (map,key,value) triple can be reclaimed immediately (in the same cycle) when either the map or the key dies.
In the common scenario, as of the moment of the minor collection, the key and value are unreachable, but v8 collects neither. Instead, both are expensively promoted/tenured.
Of the transposed scenario, you say:
In both minor or major collection both m and v are immediately reclaimed, because neither is strongly reachable at that point
which shows the asymmetry, and that v8 is effectively optimizing for the wrong side of that asymmetry. By adopting what Allen and I refer to as the transposed representation (as opposed to your transposed representation), you'd instead be able to say of the common scenario
In both minor or major collection both k and v are immediately reclaimed, because neither is strongly reachable at that point
which would be much more valuable than any other efficiency issue discussed in this thread.
In any case, we now both understand that you can't have both at the moment of a minor collection because, as we agree, a minor collection
doesn't handle ephemerons at all.
You have to choose one side or the other. Why optimize for the transposed scenario rather than the common one?
On Dec 2, 2014, at 10:15 AM, Mark S. Miller wrote:
Good. I think we're coming to a mutual understanding. By "scavenge" I mean exactly your "minor collection". I think collecting typical garbage during minor collection, rather than promoting/tenuring it, is desperately important, and dominates the other efficiency considerations mentioned in this thread. Of the common scenario, you say:
In a minor collection nothing is collected, because it doesn't handle ephemerons at all.
which clears up my confusion from your earlier claim
The very first implementation of Ephemerons we did at Digitalk was in the context of a multi-generationational ( my recollection was that it had 5 generations) scavenging collector and it did processes ephemerons at every generational level of scavenging. And quite efficiently using various remembered set tricks and other heuristics to deal with cross-generational references from ephemerons.
I've always felt that the essentially two generation model that Ungar introduced (new/old space, minor/major collection, etc.) was too simplistic. It really has only two modes fast and slow (or incremental). Multiple generations are better at dealing with the wider distribution of object life times that occur in a typical application. I'm not convinced that only processing ephemerons during a "major collection" is a good heuristic .
the value in a (map,key,value) triple can be reclaimed immediately (in the same cycle) when either the map or the key dies.
In the common scenario, as of the moment of the minor collection, the key and value are unreachable, but v8 collects neither. Instead, both are expensively promoted/tenured.
Of the transposed scenario, you say:
In both minor or major collection both m and v are immediately reclaimed, because neither is strongly reachable at that point
which shows the asymmetry, and that v8 is effectively optimizing for the wrong side of that asymmetry. By adopting what Allen and I refer to as the transposed representation (as opposed to your transposed representation), you'd instead be able to say of the common scenario
In both minor or major collection both k and v are immediately reclaimed, because neither is strongly reachable at that point
which would be much more valuable than any other efficiency issue discussed in this thread.
In any case, we now both understand that you can't have both at the moment of a minor collection because, as we agree, a minor collection
doesn't handle ephemerons at all.
You have to choose one side or the other. Why optimize for the transposed scenario rather than the common one?
My main take-away from this discussion is that not have a 'clear' method on WeakMap/Set is indeed a simpler semantics and hence leaves GC designers more optimization opportunities.
Good point about multi-gen. Lars Hansen even researched "oldest first" collection:
www.cesura17.net/~will/professional/research/presentations/gc/index.html
Clearly, Ungar's model was a simplification, with trade-offs as expected.
Allen Wirfs-Brock wrote:
My main take-away from this discussion is that not have a 'clear' method on WeakMap/Set is indeed a simpler semantics and hence leaves GC designers more optimization opportunities.
Agreed. And we can always add .clear later. The worst case, where some leading implementation (oh, say, V8) adds it sooner, just forces the conclusion. But we should not force that conclusion now, absent evidence.
On 2 December 2014 at 19:15, Mark S. Miller <erights at google.com> wrote:
In both minor or major collection both m and v are immediately reclaimed, because neither is strongly reachable at that point
which shows the asymmetry, and that v8 is effectively optimizing for the wrong side of that asymmetry. By adopting what Allen and I refer to as the transposed representation (as opposed to your transposed representation), you'd instead be able to say of the common scenario
In both minor or major collection both k and v are immediately reclaimed, because neither is strongly reachable at that point
which would be much more valuable than any other efficiency issue discussed in this thread.
What I'm saying is that this is a theoretical conclusion at which you arrive only under various simplifying and idealistic assumptions (such as life time being the only factor that matters). I maintain that there is no evidence yet that this actually translates to better overall performance in practice, and I have severe doubts that it would. I don’t want to reiterate the concerns that I already raised in an earlier thread, but the executive summary is:
Good performance absolutely wants an object layout that is stable and compact. The transposed weak map implementation pretty much is an antithesis to that. It hence induces a potential cost on all key objects (and all their uses throughout a program) instead of just the maps. That is a case of optimising for the wrong thing as well.
(Back to the actual topic of this thread, you still owe me a reply regarding why .clear is bad for security. ;) )
On Wed, Dec 3, 2014 at 8:35 AM, Andreas Rossberg <rossberg at google.com> wrote:
(Back to the actual topic of this thread, you still owe me a reply regarding why .clear is bad for security. ;) )
I'd like to hear this too, just for education value.
Le 03/12/2014 16:26, Jason Orendorff a écrit :
On Wed, Dec 3, 2014 at 8:35 AM, Andreas Rossberg <rossberg at google.com> wrote:
(Back to the actual topic of this thread, you still owe me a reply regarding why .clear is bad for security. ;) ) I'd like to hear this too, just for education value.
Unlike Map.prototype.clear, WeakMap.prototype.clear is a capability that cannot be userland implemented. With WeakMap.prototype.clear, any script can clear any weakmap even if it knows none of the weakmap keys. A script which builds a weakmap may legitimately later assume the weakmap is filled. However, passing the weakmap to a mixed-trusted (malicious or buggy) script may result in the weakmap being cleared (and break the assumption of the weakmap being filled and trigger all sorts of bugs). Like all dumb things, at web-scale, it will happen. WeakMap.prototype.clear is ambiant authority which necessity remains to be proven.
It remains possible to create clearless weakmaps to pass around (by wrapping a weakmap, etc.), but it makes security (aka code robustness) an opt-in and not the default.
Opt-ins are cool, but are often forgotten, like CSP, like "use strict", like cookie HttpOnly, like HTTPS, you know the list :-) It would be cool if they were by default and people didn't have to learn about them all.
Security by default is cooler in my opinion.
The same argument can be used for removing array.length setter - some code can remove all entries from array important for your application.
Maybe we should think about WM methods .clone and cloneInto(otherWeakMap). 3 gru 2014 18:04 "David Bruant" <bruant.d at gmail.com> napisał(a):
On Wed, Dec 3, 2014 at 9:10 AM, Michał Wadas <michalwadas at gmail.com> wrote:
The same argument can be used for removing array.length setter - some code can remove all entries from array important for your application.
No it's not, because Arrays are open to editing in all ways. They don't have one single place, impossible to otherwise reproduce, that is the sole channel for messing with the Array contents.
On Thu, Nov 27, 2014 at 10:40 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>
wrote:
This is the end of my assumed inverted WC design and why I assert that a clear method is incompatible with it.
Couldn't this be solved by adding a little state (a monotonically increasing 'generation' counter) to the WC? Then, something like this:
WC.prototype.clear = function() { this.generation ++; }
WC.prototype.has = function(k) {
var slot = getWCSlot(k, this);
return slot && slot.generation == this.generation;
};
WC.prototype.set = function(k, v) {
var slot = getWCSlot(k, this);
if (slot) {
// update the slot's information (including generation)
slot.generation = this.generation;
slot.value = v;
}
else {
k[@@weakContainers][this] = { generation: this.generation, value: v };
}
};
WC.prototype.get = function(k) {
var slot = getWCSlot(k, this);
if (!slot) return undefined;
if (slot.generation != this.generation) {
// purge the key's slot for this weakmap
delete k[@@weakContainers][this];
return undefined;
}
return slot.value;
};
Then clear()'s description can be changed to (if it wasn't this already) simply: "There is no way to retrieve values corresponding to keys added prior to the clear()"
On Wed, Dec 3, 2014 at 9:04 AM, David Bruant <bruant.d at gmail.com> wrote:
A script which builds a weakmap may legitimately later assume the weakmap is filled. However, passing the weakmap to a mixed-trusted (malicious or buggy) script may result in the weakmap being cleared (and break the assumption of the weakmap being filled and trigger all sorts of bugs). Like all dumb things, at web-scale, it will happen.
OK. I read the whole thing, and I appreciate your writing it.
There's something important that's implicit in this argument that I still don't have yet. If you were using literally any other data structure, any other object, passing a direct reference to it around to untrusted code would not only be dumb, but obviously something the ES spec should not try to defend against. Right? It would be goofy. The language just is not that hardened. Arguably, the point of a data structure is to be useful for storing data, not to be "secure" against code that has a direct reference to it. No?
So what's missing here is, I imagine you must see WeakMap, unlike all the other builtin data structures, as a security feature. Specifically, it must be a kind of secure data structure where inserting or deleting particular keys and values into the WeakMap does not pose a threat, but deleting them all does.
Can you explain that a bit more?
I see the invariant you're talking about, I agree it's elegant, but to be useful it also has to line up with some plausible security use case and threat model.
On Wed, Dec 3, 2014 at 1:10 PM, Jason Orendorff <jason.orendorff at gmail.com>
wrote:
On Wed, Dec 3, 2014 at 9:04 AM, David Bruant <bruant.d at gmail.com> wrote:
A script which builds a weakmap may legitimately later assume the weakmap is filled. However, passing the weakmap to a mixed-trusted (malicious or buggy) script may result in the weakmap being cleared (and break the assumption of the weakmap being filled and trigger all sorts of bugs). Like all dumb things, at web-scale, it will happen.
OK. I read the whole thing, and I appreciate your writing it.
There's something important that's implicit in this argument that I still don't have yet. If you were using literally any other data structure, any other object, passing a direct reference to it around to untrusted code would not only be dumb, but obviously something the ES spec should not try to defend against. Right? It would be goofy. The language just is not that hardened. Arguably, the point of a data structure is to be useful for storing data, not to be "secure" against code that has a direct reference to it. No?
So what's missing here is, I imagine you must see WeakMap, unlike all the other builtin data structures, as a security feature. Specifically, it must be a kind of secure data structure where inserting or deleting particular keys and values into the WeakMap does not pose a threat, but deleting them all does.
Can you explain that a bit more?
I see the invariant you're talking about, I agree it's elegant, but to be useful it also has to line up with some plausible security use case and threat model.
As Mark presented it in the meeting ( rwaldron/tc39-notes/blob/master/es6/2014-11/nov-19.md#412-should-weakmapweakset-have-a-clear-method-markm ):
"the mapping from weakmap/key pair value can only be observed or affected by someone who has both the weakmap and the key. With clear(), someone with only the WeakMap would've been able to affect the WeakMap-and-key-to-value mapping."
(This is the same thing that David was explaining, but slightly different delivery)
As I understand this explanation, removing clear() simply closes off a path that would otherwise allow for potentially malicious meddling—no more, no less.
A more compact implementation occurred to me during my commute: just have an internal slot in the WC that it uses when looking up values, instead of the WC reference itself.
This has the downside of not being able to purge old slots on calls to has/get, but removes the possibility of overflow (if 'generation' is a uint32, e.g)
WC.prototype.clear = function() { this.[[WCIdentity]] = new object(); }
WC.prototype.has = function(k) {
var slot = GetWCSlot(k, this.[[WCIdentity]]);
return slot !== undefined;
};
WC.prototype.set = function(k,v) {
var slot = GetWCSlot(k, this.[[WCIdentity]]);
if (!slot)
slot = CreateWCSlot(k, this.[[WCIdentity]]);
slot.value = v;
};
WC.prototype.get = function(k) {
var slot = GetWCSlot(k, this.[[WCIdentity]]);
return slot && slot.value;
};
On Dec 3, 2014, at 11:28 AM, Chris Toshok wrote:
A more compact implementation occurred to me during my commute: just have an internal slot in the WC that it uses when looking up values, instead of the WC reference itself.
This has the downside of not being able to purge old slots on calls to has/get, but removes the possibility of overflow (if 'generation' is a uint32, e.g)
Presumably one use for a clear operation is to release the association between the key and the value and hence enable GC of the value (if it has no other references). This approach won't accomplish that. It provides behaviorally clears the map but doesn't provide the potential GC benefits.
Related to Andreas's points, if performance is a real concern here, the dramatically inferior memory locality of a transposed weak map is probably something that is at least worth thinking about. Cache locality matters a lot in modern software, especially if you have a large heap, and it will impact the performance of both normal execution and collections.
There are so many moving parts involved in implementing this container that it seems ill-advised to try and understand its overall performance just by looking narrowly at one specific aspect like whether or not reclaiming happens in a minor collection. In most real-world applications, you will see containers get used in many ways: write-once-read-many, write-many-read-once, write-many-read-many, etc. Immutable lookup tables, mutable short-lived state trackers for in-progress tasks, state associated with live 'entities' or other objects, etc. If the goal here is really to deliver the best possible performance, intuition & experience from implementing WMs in (for example) v8 should in most cases trump the theoretical performance of the algorithm.
Likewise, if performance is important, clearing is a common use case. There are certainly use cases where you don't want to clear, or don't need to, and in those cases it has been pointed out that the necessary infrastructure to clear MIGHT slow them down. But you want to optimize globally, where possible, so that the vast majority of use cases perform acceptably (no horrible, unusably slow cases) while trying to make as many of those cases fast as you can. So entirely ruling out clear() in order to optimize one case is potentially unwise there too.
'It remains possible to create clearless weakmaps to pass around (by wrapping a weakmap, etc.), but it makes security (aka code robustness) an opt-in and not the default.' Seeing this sort of security-first argument when discussing a user-space data structure is really confusing to me. If the stated goal of WeakMap is security, and this is a noble goal, why isn't the same true for the non-weak new containers like Map and Set? Immutable containers built-in to the standard library would certainly be a useful thing to have - people implement them in JS libraries all the time because they need them - so if that's in scope it seems like it should be a goal that's applied more evenly.
As stated before, I believe the benefit of protecting people from having maps accidentally cleared may not outweigh the damage caused by having many applications out there with catastrophically bad performance due to clearing maps incorrectly (or expecting map clearing to be a sensible operation at all, when it isn't.) The underlying implementation and characteristics of WeakMaps are really, profoundly strange to anyone who doesn't know a lot about GCs and JS runtimes, if you ask me. A data structure that isn't understood is more likely to be misused. If we're talking about promoting overall code robustness, minimizing surprises in general seems like it should be a priority, and the underlying behavior of a transposed WeakMap - contained to everything else in the stdlib - is surprising.
Le 03/12/2014 19:10, Jason Orendorff a écrit :
On Wed, Dec 3, 2014 at 9:04 AM, David Bruant<bruant.d at gmail.com> wrote:
A script which builds a weakmap may legitimately later assume the weakmap is filled. However, passing the weakmap to a mixed-trusted (malicious or buggy) script may result in the weakmap being cleared (and break the assumption of the weakmap being filled and trigger all sorts of bugs). Like all dumb things, at web-scale, it will happen. OK. I read the whole thing, and I appreciate your writing it.
There's something important that's implicit in this argument that I still don't have yet. If you were using literally any other data structure, any other object, passing a direct reference to it around to untrusted code would not only be dumb, but obviously something the ES spec should not try to defend against. Right? It would be goofy.
Object.freeze and friends were added to the ES spec for the very purpose of being able to pass direct reference to an object and defend against unwanted mutations. à propos d'une Is Object.freeze goofy?
The language just is not that hardened. Arguably, the point of a data structure is to be useful for storing data, not to be "secure" against code that has a direct reference to it. No?
The way I see it, data structures are a tool to efficiently query data. They don't have to be arbitrarily mutable anytime for this purpose. It's a point of view problem, but in my opinion, mutability is the problem, not sharing the same object. Being able to create and share structured data should not have to mean it can be modified by anyone anytime. Hence Object.freeze, hence the recent popularity of React.js.
So what's missing here is, I imagine you must see WeakMap, unlike all the other builtin data structures, as a security feature.
I'm not sure what you mean by "security feature". Any API is a security feature of sort.
Specifically, it must be a kind of secure data structure where inserting or deleting particular keys and values into the WeakMap does not pose a threat, but deleting them all does.
Can you explain that a bit more? I see the invariant you're talking about, I agree it's elegant, but to be useful it also has to line up with some plausible security use case and threat model.
The ability to clear any WeakMap anytime needs to be equally justified in my opinion. I'm curious about plausible use cases.
What about making 'clear' an own property of weakmaps and make it only capable of clearing the weakmap it's attached to?
On 4 December 2014 at 00:54, David Bruant <bruant.d at gmail.com> wrote:
The way I see it, data structures are a tool to efficiently query data. They don't have to be arbitrarily mutable anytime for this purpose. It's a point of view problem, but in my opinion, mutability is the problem, not sharing the same object. Being able to create and share structured data should not have to mean it can be modified by anyone anytime. Hence Object.freeze, hence the recent popularity of React.js.
I agree, but that is all irrelevant regarding the question of weak maps, because you cannot freeze their content.
So my question stands: What would be a plausible scenario where handing a weak map to an untrusted third party is not utterly crazy to start with? In particular, when can giving them the ability to clear be harmful, while the ability to add random entries, or attempt to remove entries at guess, is not?
Le 04/12/2014 09:55, Andreas Rossberg a écrit :
On 4 December 2014 at 00:54, David Bruant <bruant.d at gmail.com> wrote:
The way I see it, data structures are a tool to efficiently query data. They don't have to be arbitrarily mutable anytime for this purpose. It's a point of view problem, but in my opinion, mutability is the problem, not sharing the same object. Being able to create and share structured data should not have to mean it can be modified by anyone anytime. Hence Object.freeze, hence the recent popularity of React.js. I agree, but that is all irrelevant regarding the question of weak maps, because you cannot freeze their content.
The heart of the problem is mutability and .clear is a mutability capability, so it's relevant. WeakMap are effectively frozen for some bindings if you don't have the keys.
So my question stands: What would be a plausible scenario where handing a weak map to an untrusted third party is not utterly crazy to start with?
Sometimes you call functions you don't have written and pass arguments to them. WeakMaps are new, but APIs will have functions with WeakMaps as arguments. I don't see what's crazy. It'd be nice if I don't have to review all NPM packages I use to make sure they dont use .clear when I pass a weakmap. If you don't want to pass the WeakMap directly, you have to create a new object "just in case" (cloning or wrapping) which carries its own obvious efficiency. Security then comes at the cost of performance while both could have been achieved if the same safe-by-default weakmap can be shared.
In particular, when can giving them the ability to clear be harmful, while the ability to add random entries, or attempt to remove entries at guess, is not?
I don't have an answer to this case, now. That said, I'm uncomfortable with the idea of seeing a decision being made that affects the language of the web until its end based on the inability of a few person to find a scenario that is deemed plausible by few other persons within a limited timeframe. It's almost calling for an "I told you so" one day. I would return the question: can you demonstrate there are no such scenario?
We know ambiant authority is a bad thing, examples are endless in JS. The ability to modify global variable has been the source of bugs and vulnerabilities. JSON.parse implementations were modified by browsers because they used malicious versions of Array as a constructor which led to data leakage. WeakMap.prototype.clear is ambiant authority. Admittedly, its effects are less broad and its malicious usage is certainly more subtle.
There are scenarios where both security and performance matter. I think this is more than self-evident at this point in the thread since examples of both have been provided repeatedly. 'can you demonstrate there are no such scenario' isn't really a necessary question because we already know the answer: no.
That's not the relevant issue here, though. As discussed before, the basic choice at hand here is whether to optimize the common case for security or for reasonable performance & usability.
The security use case can be addressed by wrapping the weakmap in an opaque object that revokes the ability to clear (along with the ability to write, for example - an ability I think you would want to deny in most security-sensitive scenarios.) It is true, as you state, that wrapping the weakmap imposes a performance cost; however, it seems unlikely that the cost would be more than negligible, given that the wrapper object would literally consist of one property containing an object reference, and its methods could be trivially inlined by every JS runtime I'm aware of.
The performance use cases can theoretically be addressed by a sufficiently smart runtime, even if a .clear() method is not present. However I would argue that the difficulty of actually providing good performance for these use cases without a .clear() method is extremely high, for example:
Without a clear() method you have to wait for a sufficiently exhaustive collection to be triggered, likely by memory pressure. If the values being held alive in the interim do not merely use JS heap - for example, they are webgl textures or bitmap images - it is likely that the memory pressure feedback may not be sufficient to trigger an exhaustive collection soon enough. I have seen this exact issue in the past (while using weakmap, actually!)
It was previously stated that 'secure by default' is a noble goal, and I agree. However, in this case secure-by-default is not something a user will expect from JS containers, because no other JS data structure offers secure-by-default. WeakMap as currently specced - with or without clear() - is also not secure by default since writes can still occur to values. You would need to disable writes by default as well, somehow.
On a performance note, I would also argue that it seems profoundly questionable that a transposed weak map implementation can provide consistently good performance out in the real world for typical use cases. I am certain that there are use cases where it is optimal, and it clearly has its advantages, but as someone who spends absurd amounts of time tuning the performance of software - both JS and native - the design of a transposed weakmap contains many red flags that suggest bad performance. I will speculate, based on my understanding of transposed weak maps and my (incomplete) knowledge of modern JS runtimes - please correct any errors:
The transposed weak map must store values as hidden properties on the keys used. This means that any object used as a key - any object reference, that is - must be able to accept hidden properties. This means that it is effectively impossible to allocate object instances with fixed-size, fixed-layout storage unless you reserve space for a place to store the weakmap values. The only way I can imagine to solve this is to make really aggressive use of type information gathering and/or bailouts in the runtime to identify every type used as a weakmap key - at which point I suppose you would have to convert their memory layout on the heap in order to ensure consistency, or support two different memory layouts for the same type.
I don't consider the above an academic concern, either: Dense memory layout is essential if you want good locality (and thus, good cache efficiency) and if you want the ability to cheaply do things like copy your instances into a typed array and upload them onto the GPU. The cheap copying use case will matter a lot once typed objects are introduced since they are all about fixed, dense memory layout and cheap copying.
A transposed weakmap generally implies poor memory locality, extensive pointer-chasing, and higher memory overhead for each key/value pair stored in the map.
If I'm not mistaken, a transposed weakmap may also increase the cost of GC tracing overall, or at least for any object type that can be used as a key - the requirement to allocate space for weakmap values on those types means that the GC must now trace those weakmap value slots regardless of whether they actually contain a value.
A transposed weakmap probably also implies worse memory fragmentation or more wasted heap, because you either have to lazily allocate the space for the weakmap values (which means a separate heap allocation) or reserve empty space in all instances for the values. Neither of these feel particularly ideal.
A transposed weakmap may also imply hindrances to a VM's ability to elide heap allocations or store JS object instances on the stack/in registers, by increasing the minimum size of instances and making them variable-size where they were previously fixed-size. It is unclear to me how often these optimizations currently happen, but they are certainly something that will be desired once typed objects are in use, and SIMD use cases absolutely demand them.
While transposed weakmaps are an interesting representation with clear advantages, I think the risks suggest that perhaps it is not wise to assume that all implementations will use transposed weakmaps, and that perhaps it is unwise to optimize for that layout. Andreas's stated POV in this thread makes me feel more confident in my opinion on this subject.
So, to summarize: It's necessary to pick which of the two concerns to optimize for with this design. They are both legitimate concerns. I would fervently argue that the risks/complications involved in optimizing for security dramatically outweigh the risks involved in optimizing for performance.
I should also note that while much of the above is speculative and based on intuition/experience, I have been shipping a use of WeakMap for performance-critical code for over a year now (though naturally, a fallback is required in browsers without WeakMap - basically everything other than Firefox.) WeakMap offers real advantages there and my use case actually does benefit from an efficient, built-in clear primitive. I care about WeakMap being performant and I suspect once it is widely available in browsers, more applications will use it and their authors will also care about performance.
Some of the performance penalties I describe above should make people especially wary since they will be difficult to debug. Subtle changes to memory layout and GC overhead are virtually impossible to debug with existing JS performance analysis tools - identifying those issues would require extensive knowledge of VM internals.
On 4 December 2014 at 13:58, David Bruant <bruant.d at gmail.com> wrote:
Le 04/12/2014 09:55, Andreas Rossberg a écrit :
On 4 December 2014 at 00:54, David Bruant <bruant.d at gmail.com> wrote:
The way I see it, data structures are a tool to efficiently query data. They don't have to be arbitrarily mutable anytime for this purpose. It's a point of view problem, but in my opinion, mutability is the problem, not sharing the same object. Being able to create and share structured data should not have to mean it can be modified by anyone anytime. Hence Object.freeze, hence the recent popularity of React.js.
I agree, but that is all irrelevant regarding the question of weak maps, because you cannot freeze their content.
The heart of the problem is mutability and .clear is a mutability capability, so it's relevant. WeakMap are effectively frozen for some bindings if you don't have the keys.
No, they are not. Everybody can enter additional keys, for example. In the security or abstraction related examples I'm aware of, allowing that would actually be more disastrous than doing a clear.
So my question stands: What would be a plausible scenario where handing a weak map to an untrusted third party is not utterly crazy to start with?
Sometimes you call functions you don't have written and pass arguments to them. WeakMaps are new, but APIs will have functions with WeakMaps as arguments. I don't see what's crazy. It'd be nice if I don't have to review all NPM packages I use to make sure they dont use .clear when I pass a weakmap.
Sure, I should have added "security-related" to the above sentence.
If you don't want to pass the WeakMap directly, you have to create a new object "just in case" (cloning or wrapping) which carries its own obvious efficiency. Security then comes at the cost of performance while both could have been achieved if the same safe-by-default weakmap can be shared.
In particular, when can giving them the ability to clear be harmful, while the ability to add random entries, or attempt to remove entries at guess, is not?
I don't have an answer to this case, now. That said, I'm uncomfortable with the idea of seeing a decision being made that affects the language of the web until its end based on the inability of a few person to find a scenario that is deemed plausible by few other persons within a limited timeframe. It's almost calling for an "I told you so" one day. I would return the question: can you demonstrate there are no such scenario?
We know ambiant authority is a bad thing, examples are endless in JS. The ability to modify global variable has been the source of bugs and vulnerabilities. JSON.parse implementations were modified by browsers because they used malicious versions of Array as a constructor which led to data leakage. WeakMap.prototype.clear is ambiant authority. Admittedly, its effects are less broad and its malicious usage is certainly more subtle.
Sure, but WeakMap.prototype.set is no different in that regard. When you hand out a sensitive weak map you've already lost, with or without clear. This really seems like a phantom discussion to me (and I'm saying that although I do care a lot about abstraction and security!).
On Thu, Dec 4, 2014 at 9:25 AM, Katelyn Gadd <kg at luminance.org> wrote:
The only way I can imagine to solve this is to make really aggressive use of type information gathering and/or bailouts in the runtime to identify every type used as a weakmap key - at which point I suppose you would have to convert their memory layout on the heap in order to ensure consistency, or support two different memory layouts for the same type.
Yup, this is how it's done.
In JavaScript every object can have new properties added to it at arbitrary times. jayconrod.com/posts/52/a
On Thu, Dec 4, 2014 at 6:25 AM, Katelyn Gadd <kg at luminance.org> wrote: [...]
I should also note that while much of the above is speculative and based on intuition/experience, I have been shipping a use of WeakMap for performance-critical code for over a year now
Hi Katelyn, could you say more about your shipping code? Is the code something your could post or make available? Thanks.
JSIL has a shim that emulates the 2D portion of the XNA game framework's graphics stack using HTML5 canvas (for compatibility). Many of the stack's features don't have direct equivalents in canvas, so I have to generate and cache various bits of data and graphics resources on-demand to implement them.
A main use case here is that in order to do color multiplication of bitmaps - typically used for text rendering, but used in other cases as well - I have to take a given image I intend to draw and split it into images for each specific color channel (r, g, b, a) and keep the images around. The lifetime of those images needs to be tied to the lifetime of the image they are derived from, and I also need the ability to discard them in response to memory pressure. WeakMap is near-perfect for this.
I have a complex garbage collection scheme where I manually maintain a LRU cache of these images and discard the ones that have not recently been used periodically, and when the cache gets too big I discard the oldest ones. Ensuring this collector runs often enough without discarding images too often is a real challenge.
A downside here is that these resources are very heap light (just HTML5 canvases/images) but memory heavy. In the past I have found and filed bugs related to this where a browser was not properly responding to the memory pressure from these images. As a result of this I don't use WeakMap for this feature anymore (but I used to).
Managing the memory pressure here is important so it is very valuable to have both a way to clear out the entire cache (in response to the graphics adapter being reinitialized or a pool of game content being destroyed) and to remove a single value from the cache (in response to a single image resource being destroyed). The clear scenario is thankfully not common but it does happen. This is also an area where the performance is a concern.
I have a similar caching scenario involving textures generated from bitmap fonts + text strings but WeakMap can't solve that since JS strings aren't object references. Oh well :-) That one has to use my terrible hacked-together collector as a result regardless of memory pressure issues.
I do still use WeakMap in a few other places, for example to implement Object.GetHashCode. This is a case where the transposed representation is likely optimal - though in practice, I shouldn't need any sort of container here, if only the hashing mechanisms clearly built into the VM were exposed to user JS.
Out of the demos on the website, I think hildr.luminance.org/Lumberjack/Lumberjack.html makes the most significant use of the cache. The value below the framerate (measured in mb) is the size of the bitmap resource cache. If the demo is running using WebGL, the cache will be very small because WebGL needs far fewer temporary resources. If you load hildr.luminance.org/Lumberjack/Lumberjack.html?forceCanvas instead, it forces the use of the canvas backend and you will see the cache becomes quite large during gameplay. I think this at least provides a realistic scenario where you want a good WeakMap implementation that responds well to all forms of memory pressure.
I also have a more recent use of WeakMap that is used to cache typed array views for memory buffers. This is necessary to implement various pointer manipulation scenarios, so that arbitrary data structures can be unpacked from arbitrary offsets in a given array buffer. You can effectively view this as a Typed Objects polyfill that predates the Typed Objects spec work. I should note that this is something that would be unnecessary if DataView were designed better, but things are what they are. :)
I support Katelyn's suggestion to make clear() neuterable on an instance, perhaps with per-object configuration.
It leaves the API intact while allowing those with security concerns to address them.
On 12/04/2014 08:00 PM, Katelyn Gadd wrote:
I do still use WeakMap in a few other places, for example to implement Object.GetHashCode. This is a case where the transposed representation is likely optimal - though in practice, I shouldn't need any sort of container here, if only the hashing mechanisms clearly built into the VM were exposed to user JS.
If I am understanding correctly, I don't think there is any such hashing mechanism in the Spidermonkey VM. We hash on an object's pointer address, which can change during a moving GC. (We update any hashtables that incorporate an object's numeric address into their hash key computations.)
I'm a little curious what you're generating the hashcode from. Is this mimicking a value object? If the contents of the object change, would you want the hashcode to change? Or are the "hashcodes" just incrementing numerical object ids?
(Sorry for the tangent to the current thread.)
.NET's hashing protocol is weird and arguably it's some awful baggage carried over from its Java influences. All instances, value or reference type, have GetHashCode. For a given type there is the 'default' implementation, or you can provide a specific one. This enables anything to be used as a key in a container like a dictionary/map.
For reference types, GetHashCode's default implementation assigns an object a semi-unique value that persists for the entire lifetime of the instance. So in a non-moving GC you could use the pointer address, but in a moving GC the runtime is basically assigning it a permanent identifier that sticks with the instance somewhere. For value types, the default hashing implementation basically walks over the whole type and hashes the fields to create a hash for the value as a whole.
I'm surprised to hear that JS runtimes don't necessarily have ways to 'hash' a given JS value, but it makes sense. I can see how that is a great reason for 'get me a hash for this value' to never actually exist in the API, even if it's unfortunate that I have to recreate that facility myself in runtimes that do have it.
On Thu, Dec 4, 2014 at 9:46 PM, Katelyn Gadd <kg at luminance.org> wrote:
I'm surprised to hear that JS runtimes don't necessarily have ways to 'hash' a given JS value, but it makes sense. I can see how that is a great reason for 'get me a hash for this value' to never actually exist in the API, even if it's unfortunate that I have to recreate that facility myself in runtimes that do have it.
JS has maps/sets that take objects natively, hiding any details about how a mutable object is tracked/stored as keys or values, so there's never been any need for such a thing. Explicitly exposing hash codes is leaking implementation details into the user-facing API.
Tab Atkins Jr. wrote:
JS has maps/sets that take objects natively, hiding any details about how a mutable object is tracked/stored as keys or values, so there's never been any need for such a thing. Explicitly exposing hash codes is leaking implementation details into the user-facing API.
Not necessarily an implementation leak. It's really an implementation constraint, or set of constraints, that can be specified. See ye olde
including important safety tip
proposals:hashcodes#a_note_regarding_security
Why might this come to ES7 or beyond?
esdiscuss.org/topic/maps-with-object-keys
especially esdiscuss.org/topic/maps-with-object-keys#content-5
See also esdiscuss.org/topic/fwd-friam-fwd-hash-collision-denial-of-service -- have to avoid DoS attacks in any hashcode implementation.
Never been a need?
Why if you want to use objects as keys and not use reference equality? We've been over this.
Here's a ref esdiscuss/2014-February/036389
On Mon, Dec 8, 2014 at 3:46 PM, Tab Atkins Jr. <jackalmage at gmail.com> wrote:
On Thu, Dec 4, 2014 at 9:46 PM, Katelyn Gadd <kg at luminance.org> wrote:
I'm surprised to hear that JS runtimes don't necessarily have ways to 'hash' a given JS value [...]
JS has maps/sets that take objects natively, hiding any details about how a mutable object is tracked/stored as keys or values, so there's never been any need for such a thing.
It's not rare to want a hashmaps with compound keys. In current JS there's no convenient, efficient way to make such a hash. If value types are still happening, they might solve that problem.
Explicitly exposing hash codes is leaking implementation details into the user-facing API.
This is certainly true and (afaik) the reason we haven't exposed this kind of feature so far.
[Katelyn, I decided to share this challenge publicly]
I very much respect experience-driven feedback [...]. [For any of you with an implementation based use case for WeakCollections, do] you have, or can you cook up based on what you have, a realistic performance critical use of WeakMap with .clear? If so, can I ask you to do the following experiment:
Benchmark your use on a browser currently supporting builtin WeakMaps with clear.
Substitute
class ClearableWeakMap { constructor(...args) { this.wm = new WeakMap(...args); } get(key) { return this.wm.get(key); } // etc... for anything you're using other than clear
clear() {
this.wm = new WeakMap();
}
}
The reason I ask is we're all assuming that this is slower. It may be. But [if] you have a realistic use you believe in, I'm curious what numbers you see. Thanks!
[In addition, if anyone wants to try micro benchmarks with no claim to engineering realism, I'd still be curious what they say. Thanks]
On Wed, Dec 3, 2014 at 3:54 PM, David Bruant <bruant.d at gmail.com> wrote:
There's something important that's implicit in this argument that I still don't have yet. If you were using literally any other data structure, any other object, passing a direct reference to it around to untrusted code would not only be dumb, but obviously something the ES spec should not try to defend against. Right? It would be goofy.
Object.freeze and friends were added to the ES spec for the very purpose of being able to pass direct reference to an object and defend against unwanted mutations. à propos d'une Is Object.freeze goofy?
So you're saying it's not a security thing; you have no particular threat model in mind. Hmm.
I really have trouble making sense of this, David. This line of reasoning would make sense if you were arguing for (a) really supporting immutable data structures in the stdlib; or (b) adding a freeze() method on all the stdlib collections; or perhaps (c) making Object.freeze somehow apply to the internal data in the new stdlib collections. In fact all those things sound nice.
But as it stands we're talking about one mutation method on a kind of object that already has a bunch of other mutation methods that you're apparently OK with. I don't get it. I thought it had to do with the invariant we were discussing before, and that's why I asked why that particular invariant is useful... and then you didn't mention it at all in your response.
The ability to clear any WeakMap anytime needs to be equally justified in my opinion. I'm curious about plausible use cases.
Clearing a data structure is useful when your data structure is a cache; WeakMaps are particularly useful for caches, as they do not keep the keys in memory.
Consistency with the Map API is a separate justification. A program migrating from Maps to WeakMaps might quite naturally end up wanting WeakMap.prototype.clear().
What about making 'clear' an own property of weakmaps and make it only capable of clearing the weakmap it's attached to?
I can't tell what this is meant to achieve.
Le 5 déc. 2014 à 05:06, Alex Russell <slightlyoff at google.com> a écrit :
I support Katelyn's suggestion to make clear() neuterable on an instance, perhaps with per-object configuration.
It leaves the API intact while allowing those with security concerns to address them.
What is the advantage of having .clear()
on instances instead of the prototype?
for in both cases, you have to opt-out if you don't want it.
On Thu, Dec 4, 2014 at 6:58 AM, David Bruant <bruant.d at gmail.com> wrote:
Sometimes you call functions you don't have written and pass arguments to them. WeakMaps are new, but APIs will have functions with WeakMaps as arguments. I don't see what's crazy. It'd be nice if I don't have to review all NPM packages I use to make sure they dont use .clear when I pass a weakmap.
...Under what circumstance would you see yourself doing that? Do you review all the code in NPM packages you use now?
Maybe you look at the source when there's a bug. So is calling .clear() on an argument that's not supposed to be modified a particularly likely bug somehow?
If you don't want to pass the WeakMap directly, you have to create a new object "just in case" (cloning or wrapping) which carries its own obvious efficiency. Security then comes at the cost of performance while both could have been achieved if the same safe-by-default weakmap can be shared.
This is some rhetorical sleight of hand. Logic can't support any meaning of "safe" here except "safe from clear (and only clear)". But that's not the only or the most interesting mischief a function with a reference to a WeakMap can get up to. set() and delete() are much more interesting.
We know ambiant authority is a bad thing, examples are endless in JS.
This much I agree with, and I think it's your best framing of the question so far, but...
WeakMap.prototype.clear is ambiant authority.
Here you're just asserting your conclusion. I don't see it as ambient authority. It can only operate on the object you pass to it.
I would return the question: can you demonstrate there are no such scenario?
It's hard to prove a negative, but certainly the same technique which JS programmers already use to avoid having their objects mutated in unwanted ways also works great with WeakMaps: don't pass around direct references to them. And I can certainly demonstrate that removing WeakMap#clear does not make it "safe by default" to pass a WeakMap you care about to bad code.
I think a .freeze() or .frozenCopy() method would be a productive way of getting the sort of guarantee you're interested in. Immutable data structures might be even better.
On Dec 9, 2014, at 2:51 PM, Jason Orendorff wrote:
On Mon, Dec 8, 2014 at 3:46 PM, Tab Atkins Jr. <jackalmage at gmail.com> wrote:
On Thu, Dec 4, 2014 at 9:46 PM, Katelyn Gadd <kg at luminance.org> wrote:
I'm surprised to hear that JS runtimes don't necessarily have ways to 'hash' a given JS value [...]
JS has maps/sets that take objects natively, hiding any details about how a mutable object is tracked/stored as keys or values, so there's never been any need for such a thing.
It's not rare to want a hashmaps with compound keys. In current JS there's no convenient, efficient way to make such a hash. If value types are still happening, they might solve that problem.
Explicitly exposing hash codes is leaking implementation details into the user-facing API.
This is certainly true and (afaik) the reason we haven't exposed this kind of feature so far.
A rationale, that made at least some of us on TC39 semi-confortable with not exposing per object hash values is as follows:
If you want to assign per object hash values to some groups of objects, then you can use a built-in WeakMap to associate an application defined hash values with objects in the group. This make the actual hash assignment an application responsibility rather than exposing implementation details. It permits an application to assign deterministic and repeatable hash codes. It even allows different subsystems of an application to use different hash coding for the same objects.
It assume that the Weakmap lookup that is used to retrieve the hashcode will be fast, either because the implementation internally uses an inverted map or has an internally per object hash value that is used to access Weakmaps.
Is there a detailed rationale for this somewhere? Making typical applications pay the cost here for a specific security scenario seems really bizarre to me. Clearing standard library data structures is an incredibly common operation. If you want to ensure that someone can't clear the map/set, shouldn't you be handing them an encapsulated version of the data structure? This seems like a corner case that shouldn't justify removing an important primitive.
If you have a clear method, the security problem seems solved by wrapping it in an object or using a proxy to deny the ability to clear (you hide the actual map/set, so it can't be cleared - you expose only the operations you want to expose).
If you don't have a clear method, anyone wanting to clear the data structure has to throw it away and allocate a new one. This has significant disadvantages: The new structure starts empty at a default size, so repopulating it will have to grow the buffer multiple times - this is undesirable for cases where you are reusing a single data structure to store state for a long-running application. The allocation adds to GC and memory pressure for a long-running application that needs to clear data structures frequently. Were it a lightweight data type this would matter less, but a typical map instance with data in it can occupy a considerable amount of space in the heap. Being able to clear the structure now requires that all consumers have support for replacing their reference(s) to the old map with the new one. This makes it harder to maintain encapsulation because you may have saved a reference to the map in a private property or within a closure. Now you need to add accessibility points to everything that might retain the map so that you can update the reference. Or, you have to encapsulate maps and sets just to recreate the clear operation that should have been there to begin with.
In either case, encapsulation or shielding the container behind a proxy is necessary. I insist that the common case is the one that shouldn't have to encapsulate, because optimizing for that case will benefit the vast majority of web applications that use it and the penalty to security-sensitive cases is small.