{Weak|}{Map|Set}
On Wednesday, September 14, 2011, David Bruant <david.bruant at labri.fr>
wrote:
Also, I would like to talk a little bit about terminology. WeakMaps have their name inspired by the idea of "weak" references which have particular garbage-collection properties. From the developer perspective, this seems to be some sort of implementation detail they should not be aware of. As far as I know, current functions/constructors have their name inspired by the contract they fulfill rather than implementation considerations. The difference between current WeakMaps and Maps is their contract. In the latter, keys can be enumerated, in the former not. I think that this is the difference that should inspire different names rather than the implementation optimisation that is induced by this contract difference.
In the last few days I had to write a piece of code that would strongly benefit from WeakMaps. I needed to store information about DOM nodes and retrieve it later, and these nodes aren't in my control so they can be detached at any time by someone else. If the references I kept were weak, I'd be sure that I wouldn't be causing a memory leak. And that's important in this case because the nodes are very likely Flash objects which can easily mean 20-50mb in memory. So knowing that a reference is weak is important information.
Juan
On Wed, Sep 14, 2011 at 6:04 PM, Juan Ignacio Dopazo <dopazo.juan at gmail.com>wrote:
On Wednesday, September 14, 2011, David Bruant <david.bruant at labri.fr> wrote:
Also, I would like to talk a little bit about terminology. WeakMaps have their name inspired by the idea of "weak" references which have particular garbage-collection properties. From the developer perspective, this seems to be some sort of implementation detail they should not be aware of. As far as I know, current functions/constructors have their name inspired by the contract they fulfill rather than implementation considerations. The difference between current WeakMaps and Maps is their contract. In the latter, keys can be enumerated, in the former not. I think that this is the difference that should inspire different names rather than the implementation optimisation that is induced by this contract difference.
In the last few days I had to write a piece of code that would strongly benefit from WeakMaps. I needed to store information about DOM nodes and retrieve it later, and these nodes aren't in my control so they can be detached at any time by someone else. If the references I kept were weak, I'd be sure that I wouldn't be causing a memory leak. And that's important in this case because the nodes are very likely Flash objects which can easily mean 20-50mb in memory. So knowing that a reference is weak is important information.
I agree.
Normally I strongly take the same position David does: emphasize semantics over implementation. But why? It is good when we can label a tool according to its purpose, rather than how it accomplishes that purpose. Associating the tool with its purpose helps us remember the right tool for the right job. Few would reach for the WeakMap tool thinking "I need a non-enumerable table". Granted, there are cases when the non-enumerability is the desired feature, but those cases are rare. The common purpose of a WeakMap is rooted in our understanding, at a high level, of certain implementation costs, and our desire to avoid certain avoidable implementation costs. Generally, that is what a WeakMap is for.
I noticed that these class definitions declare private names within the class body but not the constructor. My latest read of the class proposal was that private could only be declared within the constructor. Have I interpreted the BNF incorrectly?
An HTML attachment was scrubbed... URL: esdiscuss/attachments/20110914/e97b340c/attachment-0001
On Sep 14, 2011, at 9:42 PM, Kyle Simpson wrote:
I too have been confused by the name "weakmap"...partially because the name is misleading, and partially because documentation on it is ambiguous/misleading. Specifically, "weakmap" really means "weakkeymap", because only the key is weak, not the value. But then again, "weakkeymap" would be even more implementation-instead-of-semantics naming.
We've discussed the naming of this abstraction several times: esdiscuss/2010-July/011465 resulted in WeakMap name esdiscuss/2010-September/011711, esdiscuss/2011-May/014211
and at least two of the above threads seem to have their start in confusion over the meaning of "weak" in this context.
I continue to think that "WeakMap" is a poor name (although it is much better than "EphemerionMap", which it replaced). To me, "weak" in this context is not a meaningful term for everyday programmers. It is GC jargon. It also adds nothing to a user's conceptual model of the abstraction. If a "map" is a non-enumerable associative table, then you would never want to have a nonweakMap as that would simply be a leaky map, i.e a buggy map implementation. Perhaps it can be argued is that, in the case of WeakMap, the "Weak" really is intended to mean nonleaky/nonbuggy and that this is necessary to make this explicit because of the prevalence of leaky map implementations in other languages.
I would prefer ObjectMap (the keys are restricted to objects). ObjectRegistry is another possibility that is suggestive of a primary use cases. Kyle's suggestion of Map would also be fine although I think there is a valid objection that the name is too general (and perhaps too conflict prone) given that the keys may not be values other than objects.
This issue keeps coming back around, starting with different people. Perhaps the present name really is a problem and we could revisit it. Is not, my guess is that we will see the same issue again.
The reference to private is actually in the Map, Set classes shown below. The BNF is also below:
CallExpression : ... private ( AssignmentExpression ) ConstructorElement : ... PrivateVariableDefinition PrivateVariableDefinition : private ExportableDefinition class Map { private keys, vals; constructor() { private(this).keys = []; private(this).vals = []; } get(key) { const keys = private(this).keys; const i = indexOfIdentical(keys, key); return i < 0 ? undefined : private(this).values[i]; } has(key) { const keys = private(this).keys; return indexOfIdentical(keys, key) >= 0; } set(key, val) { const keys = private(this).keys; const vals = private(this).vals; let i = indexOfIdentical(keys, key); if (i < 0) { i = keys.length; } keys[i] = key; vals[i] = val; } delete(key) { const keys = private(this).keys; const vals = private(this).vals; const i = indexOfIdentical(keys, key); if (i < 0) { return false; } keys.splice(i, 1); vals.splice(i, 1); return true; } // todo: iteration }
class Set { private map; constructor() { private(this).map = Map(); } has(key) { return private(this).map.has(key); } add(key) { private(this).map.set(key, true); } delete(key) { return private(this).delete(key); } // todo: iteration }
On Sep 14, 2011, at 11:09 PM, Allen Wirfs-Brock wrote:
I would prefer ObjectMap (the keys are restricted to objects).
Now that you point it out (again), I agree.
People who know a bit about GC also confuse WeakMap with WeakRef/WeakPtr, which we have only in the strawman space:
On 15 September 2011 09:10, Brendan Eich <brendan at mozilla.com> wrote:
On Sep 14, 2011, at 11:09 PM, Allen Wirfs-Brock wrote:
I would prefer ObjectMap (the keys are restricted to objects).
Now that you point it out (again), I agree.
I don't. :) It is true to some extent that WeakMap is GC jargon -- but as Mark points out, the normal use case for weak maps is to ensure a certain space behaviour closely related to GC. So why obfuscate the very intent by purposely avoiding what is more or less standard terminology for it (if slightly ambiguous)? If I was a programmer looking for something like weak referencing in JS for the first time, "weak" is what I'd be searching for. "ObjectMap" would be too generic a name to catch my immediate attention.
If I was a programmer looking for something like weak referencing in JS for the first time, "weak" is what I'd be searching for.
But if you're actually aware of weakrefs (as I am), and you're searching for them in JS (as I was), and you see "WeakMap" (as I did), and you make the conclusion that "Weak" in the name means in fact weak references (as I did), then you probably also (as I did) assume that all the refs are weak. That's a failed conclusion, because only the keyrefs are weak.
The name doesn't do anything to enlighten you that it only offers weak keyrefs and not weak valuerefs -- in fact, by your "discovery" line of reasoning, the name is almost a landmine that traps/misleads someone who does in fact know about weakrefs -- someone who didn't know about weakrefs wouldn't necessarily make the same deductive assumption by seeing "weak" in the name.
Misleading/confusing with an API name is, IMHO, worse than less implementation-self-descriptive naming.
Why constrain WeakMap to object keys, but not Map and Set ? I think it is because "Weak" only makes sense for object keys. However, if we base the distinction on iterability rather than weakness as David suggested, then we do not have to include an object key constraint anywhere. We could have:
Map - current WeakMap minus object key constraint IterableMap - current Map Set - a "WeakSet" (David's above use case) minus object key constraint IterableSet - current Set
This would also encourage use of the non-leaky abstractions where iterability is not a requirement.
Also, what are the iteration order semantics ? Consider the following:
let s = new Set // i.e. IterableSet s.add(0); s.add(1); // clearly [0,1] here s.add(0); // [1, 0] now ? s.delete(0); s.add(0); // clearly [1, 0] here
let m = new Map; // i.e. IterableMap m.set(0, 2); m.set(1, 2); // clearly [0,1] here m.set(0, 2); // set to existing value // [1, 0] now ? m.set(0, 3); // actually change value // [1, 0] now ? m.delete(0); m.set(0, 2); // clearly [1, 0] here
Also, for the iteration API of Map (i.e. IterableMap) and Set (i.e. IterableSet), why not integrate with iterators [1] :
let m = new Map, s = new Set; ... for(i of m.iterator()) { ... } for(i of s.iterator()) { ... }
On Thu, Sep 15, 2011 at 2:10 AM, Brendan Eich <brendan at mozilla.com> wrote:
On Sep 14, 2011, at 11:09 PM, Allen Wirfs-Brock wrote:
I would prefer ObjectMap (the keys are restricted to objects).
Now that you point it out (again), I agree.
People who know a bit about GC also confuse WeakMap with WeakRef/WeakPtr, which we have only in the strawman space:
/be
es-discuss mailing list es-discuss at mozilla.org, mail.mozilla.org/listinfo/es-discuss
Thanks, Sean Eagan
It would be great if we found a name that suggested the full meaning of the abstraction by itself. "WeakMap" is certainly not that. Sometimes we find something like this. Also important is one that avoids suggesting any misunderstandings. I agree that "ObjectMap" beats WeakMap on those grounds. As to the importance of naming for purpose, which is the contrary argument here, I have a question for you.
You say "and you're searching for them in JS (as I was)". Had the abstraction been called ObjectMap or ObjectRegistry, would you have found it? As it was, you found the right thing to look at and think about, but you needed to read more before you understood whether it serves your actual purpose. Was this a better outcome than not finding it?
Other alternate names that were discussed in that thread, listed from my least to most favorite:
- NonLeakyObjectMap // accurate and suggestive, but a mouthful
- WeakKeyMap // inaccurate, less so than the conclusion Kyle lept to
- EphemeralMap // accurate and suggestive, and far enough from "weak" to avoid quick misunderstandings
The reason WeakKeyMap is technically inaccurate is that in
var wkm = WeakKeyMap();
var k = {};
wkm.set(k, [k]);
k = null;
the k -> [k] association is no longer reachable. However, because weak key
maps are built on the use of a weak reference to point at the key, and to receive notification of the key's demise, no weak key map can collect the above cyclic garbage association.[1] That's what's so important about ephemerons: the collector knows about such associations, rather than knowing about the key separately from knowing about the value. That's why I like "WeakMap" best -- it is the mapping that is weak, not the keys or the values.
Nevertheless, given the magnitude of misunderstandings people are worried about here, perhaps this technical inaccuracy is the lesser evil. But my own preference remains "WeakMap" first and then "EphemeralMap".
[1] wkm holds onto each value strongly until it notices the corresponding key's demise. In this case, the value holds onto the key strongly, so wkm has no demise to notice.
On Sep 15, 2011, at 6:49 AM, Andreas Rossberg wrote:
On 15 September 2011 09:10, Brendan Eich <brendan at mozilla.com> wrote:
On Sep 14, 2011, at 11:09 PM, Allen Wirfs-Brock wrote:
I would prefer ObjectMap (the keys are restricted to objects).
Now that you point it out (again), I agree.
I don't. :) It is true to some extent that WeakMap is GC jargon -- but as Mark points out, the normal use case for weak maps is to ensure a certain space behaviour closely related to GC.
No the normal use case for WeakMaps is simply to make associations between objects and arbitrary values. The special GC behavior is necessary to avoid memory leaks, but that is a quality of implementation issue, not a use case.
The typical JS programmer who wants to form such associations is not going to be thinking about the possibility of a leaky map unless they have already been burnt by them in some other language.
So why obfuscate the very intent by purposely avoiding what is more or less standard terminology for it (if slightly ambiguous)? If I was a programmer looking for something like weak referencing in JS for the first time, "weak" is what I'd be searching for. "ObjectMap" would be too generic a name to catch my immediate attention.
Because, the majority of JS programmers will simply be looking for a way to "map objects to values" and will not be thinking about GC consequences. We want them to find this thing we are currently trying to name. We don't want them to miss it because it has "weak" in its name and they don't know what weak means. We need to design first for the 95% (ass generated number) of JS programmers who don't understand GC.
Finally, none are good exemplars for typical JS programmers. We (and our friends) know too much and in general have a level of PL expertise that far exceeds that of the typical JS programmer. In cases such as this our expectations may be the exact opposite of a typical JS programmer.
On Thu, Sep 15, 2011 at 8:41 AM, Sean Eagan <seaneagan1 at gmail.com> wrote:
Why constrain WeakMap to object keys, but not Map and Set ? I think it is because "Weak" only makes sense for object keys. However, if we base the distinction on iterability rather than weakness as David suggested, then we do not have to include an object key constraint anywhere.
Hi Sean, that is a perfect example of how quickly we could go from de-emphasizing the purpose to losing the purpose. With this restriction relaxed, I can well imagine programmers using large numbers or strings as keys, thinking the corresponding association would be collected when those numbers or strings no longer otherwise exist in their program. Programmers coming from languages where boxed large numbers or strings have an observable unique identity will be especially prone to fall into this trap.
The result is that their program has a memory leak that remains silent until they stress their program enough to run out of memory. At which point, they may have great difficultly tracking down the cause -- especially if this misuse of *Map is buried in some library written by someone else. Given the normal invisibility of the symptom, this is quite likely. Better to error early with a clear diagnostic, so no one is confused about what keys can be usefully used.
Which btw, does bring up one way in which the FF implementation of WeakMaps is too strict. By the above reasoning, we should reject non-objects in the key position of a .set(). We need not reject it in the key position of a .get(), .has(), or .delete(). All all cases, we know that a non-object key cannot correspond to any stored association, so .get(), .has(), and .delete() can respond as if for any other not-found key.
On Thu, Sep 15, 2011 at 8:47 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote: [...]
No the normal use case for WeakMaps is simply to make associations between objects and arbitrary values. The special GC behavior is necessary to avoid memory leaks, but that is a quality of implementation issue, not a use case.
The typical JS programmer who wants to form such associations is not going to be thinking about the possibility of a leaky map unless they have already been burnt by them in some other language.
I do not know of a non-leak-association-map in any other language or library whose name does not suggest its GC role. Do you know of a counter-example? If not, then let's say "...burnt by them in every other language...". A programmer approaching ES6 wanting simply to make associations and not thinking about GC issues will probably reach for the (iteratable) Map anyway, which is how it should be. For them, the iterability of these maps will often be seen as a virtue. If they are unconcerned about storage, they will usually see little reason to give up on this iterability.
Because, the majority of JS programmers will simply be looking for a way to "map objects to values" and will not be thinking about GC consequences. We want them to find this thing we are currently trying to name. We don't want them to miss it because it has "weak" in its name and they don't know what weak means. We need to design first for the 95% (ass generated number) of JS programmers who don't understand GC.
Those programmers should find our interatable maps, not our weak-association maps.
Finally, none are good exemplars for typical JS programmers. We (and our friends) know too much and in general have a level of PL expertise that far exceeds that of the typical JS programmer. In cases such as this our expectations may be the exact opposite of a typical JS programmer.
There are many kinds and levels of programmers. For the typical programmers you talk about, I'd just point them at our current iteratable Maps and Sets.
On Thu, Sep 15, 2011 at 8:41 AM, Sean Eagan <seaneagan1 at gmail.com> wrote:
Also, for the iteration API of Map (i.e. IterableMap) and Set (i.e. IterableSet), why not integrate with iterators [1] :
let m = new Map, s = new Set; ... for(i of m.iterator()) { ... } for(i of s.iterator()) { ... }
That is the intention -- it's what that mysterious "// todo: iteration" note is about at < harmony:simple_maps_and_sets>. At the
time this was written, iterators hadn't yet settled down. Your suggestion about iteration semantics and order is helpful. Thanks.
On 15 September 2011 17:47, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
No the normal use case for WeakMaps is simply to make associations between objects and arbitrary values. The special GC behavior is necessary to avoid memory leaks, but that is a quality of implementation issue, not a use case.
Just like with tail calls, certain space optimizations are semantically relevant, because code will rely on them and potentially break if they are not performed.
You say "and you're searching for them in JS (as I was)". Had the abstraction been called ObjectMap or ObjectRegistry, would you have found it?
I don't think the API name is the only way someone can discover what they're looking for. Proper documentation for "ObjectMap" which said "keyrefs are held weakly" or something to that respect would probably have ended up on my search radar.
Of course, if the WeakMap really was both weak-key and weak-value, then I'd absolutely expect the name to be WeakMap (I think "weak" in this case is a useful and non-trivial part of the behavior). So my complaint is not "Weak", but that "Weak" implies something more comprehensive than is actually the case. It over-promises and under-delivers, so to speak.
I should also clarify my process for how I found this API (I over-simplified in my previous email). I actually started out with both the weakref need AND the object-as-key ("map") need. For the object-as-key need, I had already constructed a dual-numerically-indexed-array solution to associating an object with a value. But then, realizing that this was not only clunky, but also was going to make manual GC (that is, cleaning up the entries if the object is removed) also more awkward/less performant, I started looking for an API that would do both: Map (object-as-key) and automatically clean up the entries in the Map if either the key-object or the value-object (in my case, both DOM objects) were GC'd.
In that context (and in support of Allen's early assertions), "Map" in the name was most important to focus me into a (theoretical) class of APIs (if there were indeed several different kinds of maps). Then I would have searched through the documentation to see if any of the Maps had the "weak" behavior I was looking for.
OTOH, had I only been looking for pure Weak References, and not a Map structure, then I'd have been looking for some API like "WeakRef", and actually "Map" probably would have been confusing or ignorable noise.
As it was, you found the right thing to look at and think about, but you needed to read more before you understood whether it serves your actual purpose.
I found it because a fellow Mozilla dev said "hey, that sounds like WeakMaps" and I thought "awesome, ask and ye shall find". Of course, the devil was in the details, because it wasn't actually what I needed completely. This was compounded by the fact that the MDN documentation (at least at the time) was ambiguous and didn't make it clear that only keys were "weak". So a well-experienced co-worker and the documentation BOTH were confused (as were several others through various IRC chats) as to exactly what was and was not "weak" in the WeakMap.
How did I figure it out? By writing it into my code, and then seeing mem-leak tests fail. Thankfully, I eventually found some IRC people who clarified that what I was seeing was not a bug but was in fact by-design. But, that's a hard way to learn the lesson.
Would a more accurate name have helped? Perhaps. "WeakKeyMap" certainly would have made it obvious that the Map was not fully "weak". Would more accurate documentation have helped? Absolutely. Would naming and documentation have helped other co-workers not be misled and consequently point me in the wrong path? I hope so.
That's why I like "WeakMap" best -- it is the mapping that is weak, not the keys or the values.
I understand what you're saying here. But as I mentioned before, the way my (far less informed) brain thinks about it, the "map" or "link" between two objects should in fact be weak and ephemeral enough that either side going away (being GC'd) should be enough to cause the link between the two to be cleaned up. I think it's because I tend to think of "Map" as more 2-way than one-way, though I understand it's technically only 1-way.
Saying it a different way... if the focus is on the map or link itself, and the RHS thing the map/link is pointing to is no longer valid/defined, then what use is there keeping a link that points to something now undefined?
It just seems a little unfortunate/misleading to me that from an implementation perspective, creating the map/link is sufficient to prevent the RHS value in question from ever getting to that "undefined" state. When I create a reference using variables/properties, I expect a hard reference that behaves like that. But when I use a specialized API with "Weak" in the name, I definitely expect the opposite.
On Thu, Sep 15, 2011 at 12:24 PM, Kyle Simpson <getify at gmail.com> wrote:
Of course, if the WeakMap really was both weak-key and weak-value, then I'd absolutely expect the name to be WeakMap (I think "weak" in this case is a useful and non-trivial part of the behavior). So my complaint is not "Weak", but that "Weak" implies something more comprehensive than is actually the case. It over-promises and under-delivers, so to speak.
Ok, how about WeakishMap? You would find it where you expected to look, and the name would make you read the docs to see how it is different.
Mostly joking... mostly.
On Sep 15, 2011, at 9:07 AM, Mark S. Miller wrote:
On Thu, Sep 15, 2011 at 8:47 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: [...] No the normal use case for WeakMaps is simply to make associations between objects and arbitrary values. The special GC behavior is necessary to avoid memory leaks, but that is a quality of implementation issue, not a use case.
The typical JS programmer who wants to form such associations is not going to be thinking about the possibility of a leaky map unless they have already been burnt by them in some other language.
I do not know of a non-leak-association-map in any other language or library whose name does not suggest its GC role. Do you know of a counter-example? If not, then let's say "...burnt by them in every other language...".
Probably not, but in most (if not all cases) those other languages what already used up the "good" names on leaky versions and needed to define more explicit names for the non-leaky versions that were later additions. We don't have that heritage. We can do our naming starting from a clean slate. My speculation is that over the long run, many more programmers will be first introduced to these data structures via JS than will come to it after already being corrupted by some other broken language.
A programmer approaching ES6 wanting simply to make associations and not thinking about GC issues will probably reach for the (iteratable) Map anyway, which is how it should be. For them, the iterability of these maps will often be seen as a virtue. If they are unconcerned about storage, they will usually see little reason to give up on this iterability.
iterability makes Map inherently leaky from an application logic perspective. If you place a key in a Map you must manually delete if you don't want to retain it for the full life of the map. For this reason, the fact that a map is iteratable should probably be featured in its name. Using "Map" as the name of iteratable associative maps is probably an attractive nuance. From that perspective, the names I suggest are: IteratableMap --> currently Map, iteratatble associations between arbitrary valued keys and values ObjectMap -> currently WeakMap, non-iteratable associations between object valued keys and arbitrary values
It might be even better to even further increate the conceptual distance between to two abstractions by not using "map" in both names. In that case, ObjectRegistry might be better than ObjectMap.
Because, the majority of JS programmers will simply be looking for a way to "map objects to values" and will not be thinking about GC consequences. We want them to find this thing we are currently trying to name. We don't want them to miss it because it has "weak" in its name and they don't know what weak means. We need to design first for the 95% (ass generated number) of JS programmers who don't understand GC.
Those programmers should find our interatable maps, not our weak-association maps.
Only if they need to iterate over them. If not, the weak association maps is what they should find.
(with the caveat, that I still have concerns about the potential impact of heavy use of ephemeron based maps upon GC performance.)
Finally, none are good exemplars for typical JS programmers. We (and our friends) know too much and in general have a level of PL expertise that far exceeds that of the typical JS programmer. In cases such as this our expectations may be the exact opposite of a typical JS programmer.
There are many kinds and levels of programmers. For the typical programmers you talk about, I'd just point them at our current iteratable Maps and Sets.
However, for them, application level leaks are often a problem. Except for situations where they actually need to iterate they are better served by being pointed towards "WeakMap".
Le 15/09/2011 17:47, Allen Wirfs-Brock a écrit :
On Sep 15, 2011, at 6:49 AM, Andreas Rossberg wrote:
On 15 September 2011 09:10, Brendan Eich<brendan at mozilla.com> wrote:
On Sep 14, 2011, at 11:09 PM, Allen Wirfs-Brock wrote:
I would prefer ObjectMap (the keys are restricted to objects). Now that you point it out (again), I agree. I don't. :) It is true to some extent that WeakMap is GC jargon -- but as Mark points out, the normal use case for weak maps is to ensure a certain space behaviour closely related to GC. No the normal use case for WeakMaps is simply to make associations between objects and arbitrary values. The special GC behavior is necessary to avoid memory leaks, but that is a quality of implementation issue, not a use case.
Furthermore, let imagine for a minute that i need an ECMAScript implementation for programs i write which i know (for some reason) all are short live and use a maximum finite amount of memory i know. Based on this knowledge (i admit this to be weird to have this knowledge), i could decide to not implement a garbage collector. My programs are short-live and use few memory, i may not need to care about the memory.
In this imagniary ECMAScript environment without Garbage Collection, why would i care of references to be weak or strong? I don't, i write programs.
The typical JS programmer who wants to form such associations is not going to be thinking about the possibility of a leaky map unless they have already been burnt by them in some other language.
They could have been burnt in JS as well for performance reasons.
So why obfuscate the very intent by purposely avoiding what is more or less standard terminology for it (if slightly ambiguous)? If I was a programmer looking for something like weak referencing in JS for the first time, "weak" is what I'd be searching for. "ObjectMap" would be too generic a name to catch my immediate attention. Because, the majority of JS programmers will simply be looking for a way to "map objects to values" and will not be thinking about GC consequences. We want them to find this thing we are currently trying to name. We don't want them to miss it because it has "weak" in its name and they don't know what weak means. We need to design first for the 95% (ass generated number) of JS programmers who don't understand GC.
Also, the notion of garbage collection is a separate concern from programming. It comes from the necessity of being careful of resources. I recommand this read on the topic: www.cse.nd.edu/~dthain/courses/cse40243/spring2006/gc-survey.pdf
Ideally, we'd wish memory used by an object to be free'd as soon as we don't need the object anymore. Since this notion is clearly undecidable (some roulette russe program could make undecidable whether or not an object will be used or not later in a program), we have come up with restrictions of this notion. A first naive approach was to count references of an object; when the number of reference drops to 0, then the object cannot be accessed anymore, consequently, we don't need the memory it uses to be occupied anymore and can free the object. The reference counting is tricked by reference cycles. A second approach is to consider that there are some root references and traverse the graph of accessible objects. All objects not accessible through this graph is not needed anymore since it can be accessed. Once again, this approach is a restriction of the general garbage collection problem of figuring out whether or not an object is needed in a program. A third approach could be to dynamically prove that an object won't be use even though there would be still references of it in the program.
var m = new Map(); // iterable var o = {}; o.o = o; m.set(o, true); var f167 = fibonacci(167); // assumed defined beforehand
Since this is my entire program, during the fibonacci call, a (very smart) garbage collector could figure out that even though there is a living reference to o (o.o or map key) and even though the map is iterable the o object can be gabage collected since it won't be used anymore. I admit that this kind of proof is very hard to achieve and impossible in so many cases. However, even in that context, a program is able to free o's memory even without the programmer having to say anything about references being weak or strong.
My point is that current state-of-the-art GC technologies rely on reference graph traversing. Provinding information to help out the traversing algorithm is good, but providing names related to this particular help doesn't seem future-proof with regard to future GC technologies which may not need this information.
It may also be misleading to implement a so-called "WeakMap" with ECMAScript arrays for compatibility reasons [1]. Maybe that WeakMap is not really what is needed from the programmer point of view and weak references are just a (non-necessary in some cases) convinience rather than an actual feature.
David
[1] code.google.com/p/es-lab/source/browse/trunk/src/ses/WeakMap.js
On 16 September 2011 13:52, David Bruant <david.bruant at labri.fr> wrote:
Furthermore, let imagine for a minute that i need an ECMAScript implementation for programs i write which i know (for some reason) all are short live and use a maximum finite amount of memory i know. Based on this knowledge (i admit this to be weird to have this knowledge), i could decide to not implement a garbage collector. My programs are short-live and use few memory, i may not need to care about the memory.
In this imagniary ECMAScript environment without Garbage Collection, why would i care of references to be weak or strong? I don't, i write programs.
Right, but why would you care about WeakMap either? The canonical Map is perfectly fine for that situation.
Also, the notion of garbage collection is a separate concern from programming. It comes from the necessity of being careful of resources.
Well yes, but that's part of programming. In practice, all resources are finite. And the difference between finite and infinite space usage is a correctness criterium.
Consider writing a server. If I cannot rely on tail call optimization then writing its message loop as a recursive function (e.g. actors style) would be incorrect. If I cannot rely on GC, then allocating an object for each received message would be incorrect. If I cannot rely on weak maps, then, say, mapping every message object to a return IP would be incorrect.
Le 16/09/2011 14:14, Andreas Rossberg a écrit :
On 16 September 2011 13:52, David Bruant<david.bruant at labri.fr> wrote:
Furthermore, let imagine for a minute that i need an ECMAScript implementation for programs i write which i know (for some reason) all are short live and use a maximum finite amount of memory i know. Based on this knowledge (i admit this to be weird to have this knowledge), i could decide to not implement a garbage collector. My programs are short-live and use few memory, i may not need to care about the memory.
In this imagniary ECMAScript environment without Garbage Collection, why would i care of references to be weak or strong? I don't, i write programs. Right, but why would you care about WeakMap either? The canonical Map is perfectly fine for that situation.
True.
Also, the notion of garbage collection is a separate concern from programming. It comes from the necessity of being careful of resources. Well yes, but that's part of programming. In practice, all resources are finite. And the difference between finite and infinite space usage is a correctness criterium.
Consider writing a server. If I cannot rely on tail call optimization then writing its message loop as a recursive function (e.g. actors style) would be incorrect. If I cannot rely on GC, then allocating an object for each received message would be incorrect. If I cannot rely on weak maps, then, say, mapping every message object to a return IP would be incorrect.
You are making a connection between program correctness and implementation consideration of what you use to write these programs. It doesn't sound right.
"If I cannot rely on tail call optimization then writing its message loop as a recursive function (e.g. actors style) would be incorrect." => This is not true. If your server ever receives only one message, you
should be fine. The problem is in implementation limitation, not correctness of your program. It turns out your programming style with nowadays reasonable use cases (size of input, number of messages...) makes current implementations fail. I agree that it is annoying, but it doesn't make your program incorrect. If we start considering implementation limitations as sources of program incorrectness, then some ECMAScript programs will always be incorrect.
Regarding tail call optimization, as far as I'm concerned, an agreement between implementors sounds like a more reasonable approach. There is no way in the language to test this feature. In this video [1], David Herman explains that a test can be written (at 40:40), but the test relies on implementation limitations rather than the language by itself (unlike all tests that can currently be found on test262).
"If I cannot rely on GC, then allocating an object for each received message would be incorrect." => Can you refine this point? I don't understand the connection between
garbage collection and correctness of your program. I allocate objects on a daily basis and have never /relied/ on garbage collection. Don't get me wrong, I'm glad it exists, I'm glad i can write long-running program because garbage collection recycles memory, but i can still write programs that can run out of memory. The language allows it. With regard to memory management, GC is a convinience and allows me to write program which can live longer without running out of memory in a minute. It is in no way something that can be relied on for program correctness, (unless i am missing something?) Same goes for tail call optimization.
David
[Forgot the list.]
---------- Forwarded message ---------- From: Andreas Rossberg <rossberg at google.com>
Date: 16 September 2011 15:35 Subject: Re: {Weak|}{Map|Set} To: David Bruant <david.bruant at labri.fr>
On 16 September 2011 15:17, David Bruant <david.bruant at labri.fr> wrote:
Well yes, but that's part of programming. In practice, all resources are finite. And the difference between finite and infinite space usage is a correctness criterium.
Consider writing a server. If I cannot rely on tail call optimization then writing its message loop as a recursive function (e.g. actors style) would be incorrect. If I cannot rely on GC, then allocating an object for each received message would be incorrect. If I cannot rely on weak maps, then, say, mapping every message object to a return IP would be incorrect.
You are making a connection between program correctness and implementation consideration of what you use to write these programs. It doesn't sound right.
"If I cannot rely on tail call optimization then writing its message loop as a recursive function (e.g. actors style) would be incorrect." => This is not true. If your server ever receives only one message, you should be fine.
Obviously, I was talking about a server that is expected to get an unbounded number of messages during its uptime.
The problem is in implementation limitation, not correctness of your program. It turns out your programming style with nowadays reasonable use cases (size of input, number of messages...) makes current implementations fail. I agree that it is annoying, but it doesn't make your program incorrect. If we start considering implementation limitations as sources of program incorrectness, then some ECMAScript programs will always be incorrect.
The difference is, this limitation would be hit for the normal use case of the server. If my program cannot deal with its expected use case, then it is incorrect.
Regarding tail call optimization, as far as I'm concerned, an agreement between implementors sounds like a more reasonable approach. There is no way in the language to test this feature. In this video [1], David Herman explains that a test can be written (at 40:40), but the test relies on implementation limitations rather than the language by itself (unlike all tests that can currently be found on test262).
That is true, but whether some property can be observed from within the language itself, or only by its environment, is not relevant.
There is no test that you can reliably write for a 'print' function. Still you want to be able to rely on it printing what you gave it. Or consider a sleep(secs) function. How would you test it?
"If I cannot rely on GC, then allocating an object for each received message would be incorrect." => Can you refine this point? I don't understand the connection between garbage collection and correctness of your program. I allocate objects on a daily basis and have never /relied/ on garbage collection.
I think you implicitly do, all the time. Just try turning off GC and see whether your programs still work reliably.
On Thu, Sep 15, 2011 at 10:56 AM, Kyle Simpson <getify at gmail.com> wrote:
If I was a programmer
looking for something like weak referencing in JS for the first time, "weak" is what I'd be searching for.
But if you're actually aware of weakrefs (as I am), and you're searching for them in JS (as I was), and you see "WeakMap" (as I did), and you make the conclusion that "Weak" in the name means in fact weak references (as I did), then you probably also (as I did) assume that all the refs are weak. That's a failed conclusion, because only the keyrefs are weak.
I can second this exact experience.
On Sep 16, 2011, at 5:14 AM, Andreas Rossberg wrote:
...
Consider writing a server. If I cannot rely on tail call optimization then writing its message loop as a recursive function (e.g. actors style) would be incorrect. If I cannot rely on GC, then allocating an object for each received message would be incorrect.
But at the extreme you can't rely on GC. Any allocation may be the one that exhausts actual available storage or triggers a interminable GC thrashing of virtual memory. In the real world these are quality of implementation issues that developers must deal with via load testing on real implementations.
If I cannot rely on weak maps, then, say, mapping every message object to a return IP would be incorrect.
A WeakMap reliably maintains a mapping from an object to a value. If you have access to an object key then you can rely upon a WeakMap preserving any mappings it has for that key. What you can not rely upon is the timely recycling of any resources that are used to represent inaccessible WeakMap mappings. This is the exact situation that exists for any ECMAScript dynamic allocation. There are no guarantees in the ES spec. that resources allocated with an inaccessible object will be freed in a timely manner or ever. Given the state of the art of memory management isn't clear how you could make such guarantees without unreasonably limiting implementors.
On Fri, Sep 16, 2011 at 10:14 AM, Rick Waldron <waldron.rick at gmail.com>wrote:
On Thu, Sep 15, 2011 at 10:56 AM, Kyle Simpson <getify at gmail.com> wrote:
[...] "WeakMap" [...] That's a failed conclusion, because only the keyrefs
are weak.
I can second this exact experience.
[...]
Does anyone see anything wrong with "EphemeralMap"? It seems like the only name suggested so far that no one has raised any objections to.
On Fri, Sep 16, 2011 at 10:41 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote: [...]
This is the exact situation that exists for any ECMAScript dynamic allocation. There are no guarantees in the ES spec. that resources allocated with an inaccessible object will be freed in a timely manner or ever. Given the state of the art of memory management isn't clear how you could make such guarantees without unreasonably limiting implementors.
Does everyone, on both sides of this debate, agree that the WeakMap GC issue is and the tail-call issue are equivalent specification problems?
I propose that the way to look at both of these is a way I have seen proposed somewhere[1] about how to deal with specifying tail calls. Rather than taking it as a correctness criteria on an implementation, instead view it as a statement about how to analyze the space complexity of programs. When writing Andreas' server, it would be nice if he could analyze whether his algorithms have O(1) asymptotic space complexity. Does this mean it will succeed on every implementation? Of course not. It may be that a given implementation doesn't even have enough memory for the server to load. Or the server might crash for other reasons.
In order to do this analysis, we need a model of how much memory is retained by a computational state -- not in terms of how many bytes, but in terms adequate to make statements about asymptotic space complexity.
Then an implementation that runs out of space on programs held to be "reasonable" by such analysis can be viewed as low quality implementations.
[1] At one of the citations on the tail-call page, I forget which.
On 16 September 2011 19:42, Mark S. Miller <erights at google.com> wrote:
Does anyone see anything wrong with "EphemeralMap"?
Yes. It's a longish name, and one that I will never be able to remember how to spell correctly. And to most programmers it probably sounds about as reassuring as "endofunctor" or "catamorphism". ;)
On Sep 16, 2011, at 10:52 AM, Mark S. Miller wrote:
On Fri, Sep 16, 2011 at 10:41 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: [...] This is the exact situation that exists for any ECMAScript dynamic allocation. There are no guarantees in the ES spec. that resources allocated with an inaccessible object will be freed in a timely manner or ever. Given the state of the art of memory management isn't clear how you could make such guarantees without unreasonably limiting implementors.
Does everyone, on both sides of this debate, agree that the WeakMap GC issue is and the tail-call issue are equivalent specification problems?
I propose that the way to look at both of these is a way I have seen proposed somewhere[1] about how to deal with specifying tail calls. Rather than taking it as a correctness criteria on an implementation, instead view it as a statement about how to analyze the space complexity of programs. When writing Andreas' server, it would be nice if he could analyze whether his algorithms have O(1) asymptotic space complexity. Does this mean it will succeed on every implementation? Of course not. It may be that a given implementation doesn't even have enough memory for the server to load. Or the server might crash for other reasons.
In order to do this analysis, we need a model of how much memory is retained by a computational state -- not in terms of how many bytes, but in terms adequate to make statements about asymptotic space complexity.
Then an implementation that runs out of space on programs held to be "reasonable" by such analysis can be viewed as low quality implementations.
This all sounds reasonable but I don't see how it particularly impacts the actual specification.
On Fri, Sep 16, 2011 at 12:41 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote: [...]
This all sounds reasonable but I don't see how it particularly impacts the actual specification.
The spec has multiple customers. I agree it doesn't change the spec's normative requirements on implementors. If this simply gets demoted to clear a non-normative note, I think that would be fine.
Do you agree that the specification of the weak map gc issue should be handled the same way as the tail call issue?
On Sep 16, 2011, at 2:47 PM, Mark S. Miller wrote:
On Fri, Sep 16, 2011 at 12:41 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: [...] This all sounds reasonable but I don't see how it particularly impacts the actual specification.
The spec has multiple customers. I agree it doesn't change the spec's normative requirements on implementors. If this simply gets demoted to clear a non-normative note, I think that would be fine.
Do you agree that the specification of the weak map gc issue should be handled the same way as the tail call issue?
I'm not sure exactly how we are going to specify tail calls. I know that Dave Herman has ideas that I assume we will build upon .
For weak maps I think that a non-normative note that make explicit the "doesn't leak" expectation and points implementors towards an ephemeron based implementation will suffice.
On Fri, Sep 16, 2011 at 3:13 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
I'm not sure exactly how we are going to specify tail calls. I know that Dave Herman has ideas that I assume we will build upon .
For weak maps I think that a non-normative note that make explicit the "doesn't leak" expectation and points implementors towards an ephemeron based implementation will suffice.
+1. At least until we see how Dave proposes specing tail calls to see if he has any ideas we might adapt.
I mostly have a similar approach in mind for tail calls. Precise about the interface, imprecise/informative about the implementation requirements. For WeakMaps, that means a well-defined API with informal English describing the expectations about memory consumption. For tail calls, it means a well-defined spec for what is considered a tail call with, again, informal English describing the expectations about memory consumption.
On Fri, Oct 7, 2011 at 5:23 PM, David Herman <dherman at mozilla.com> wrote:
I mostly have a similar approach in mind for tail calls. Precise about the interface, imprecise/informative about the implementation requirements. For WeakMaps, that means a well-defined API with informal English describing the expectations about memory consumption. For tail calls, it means a well-defined spec for what is considered a tail call with, again, informal English describing the expectations about memory consumption.
Thanks Dave,
Allen, ok then. +1 without further reservations ;).
I have come across use cases where i didn't need a Weak/Map/, but rather a Weak/Set/ which is that I didn't need to set a value. Usually, i set "true" but feel like I'm doing something useless. The API I actually need is:
Also, I would like to talk a little bit about terminology. WeakMaps have their name inspired by the idea of "weak" references which have particular garbage-collection properties. From the developer perspective, this seems to be some sort of implementation detail they should not be aware of. As far as I know, current functions/constructors have their name inspired by the contract they fulfill rather than implementation considerations. The difference between current WeakMaps and Maps is their contract. In the latter, keys can be enumerated, in the former not. I think that this is the difference that should inspire different names rather than the implementation optimisation that is induced by this contract difference.