Template site objects and WeakMap
I am not sure I understood your message. Could you show some example code that would observe the observable difference you have in mind?
It sounds like this is discussing the implementation in V8, unless it’s done similarly in other engines. Possibly it’s talking about a polyfill mechanism that might be used in compile-to-js implementations that target older browsers.
V8’s template map is a Map with smi keys representing the hash of the raw string, containing a smaller array of lists of strings which yield the same hash (usually a single template literal)
Unfortunately, since the Map is not tied directly to the callsite objects (which aren’t available at parse-time when the hashing occurs), there’s no way to store these callsites in a WeakMap, and they are never collected. If WeakSets were iterable, this could be solved pretty easily, but these are the tools we’ve got :(
If WeakSets were iterable, then any such optimization would be observable and therefore disallowed. It is precisely the unobservability of GC that give us the freedom to engage in such transparent optimizations.
The platform can certainly provide itself with internal iterable WeakSet-like collections for its own internal and unobservable purposes. If we're talking about an unobservable optimization internal to the platform, then why is the non-iterability of externally visible WeakSets an issue?
Is there currently some observability issue I am missing?
It’s not related to observability, this just isn’t used currently, and probably wouldn’t be much of an improvement if it were. Creating the template callsites themselves is pretty costly, and using weak references to the callsites would, in the majority of cases, mean recreating them every time they were used. So, while making the implementation more complicated, I don’t think it would be a win for performance, only for memory consumption.
Thanks. That is clarifying.
Thanks. And sorry for the late reply.
On Wed, Jun 17, 2015 at 11:31 AM, Mark S. Miller <erights at google.com> wrote:
Hi Yusuke, I am not sure I understood your message. Could you show some example code that would observe the observable difference you have in mind?
To lookup the identical template site objects, template site objects are stored in the realm.[[templateMap]]. So they are strongly referenced and the naive implementation leaks memory.
// By writing the following code, we can leak memory that GC cannot collect.
function tag(siteObject)
{
return siteObject;
}
for (var i = 0;; ++i) {
eval("tag`" + i + "`");
}
However, we can alleviate this situation. Because template site objects are frozen completely, it behaves as if it's a primitive value. It enables the implementation to reference it from the realm weakly. When all disclosed site objects are not referenced, we can GC them because nobody knows the given site object is once collected (& re-generated).
By implementing the realm.[[templateMap]] as WeakMap, we can alleviate this situation.
function tag(siteObject) {
// Since siteObject is frozen, we cannot attach a property to it.
// So if nobody has the reference to the siteObject, we can collect this siteObject since identity can not be tested across already collected & newly created site object.
}
But, even if the object is frozen, we can bind the property with it indirectly by using WeakMap. As a result, if the site objects are referenced by the realm weakly, users can observe it by using WeakMap.
var map = new WeakMap();
function tag(siteObject) {
return siteObject;
}
var siteObject = tag`hello`;
map.set(siteObject, true);
gc(); // If realm.[[templateMap]] is implemente by the WeakMap, siteObject
will be collected.
var siteObject = tag`hello`;
map.get(siteObject); // false, but should be true.
To avoid this situation, we need to specially handle template site objects in WeakMap; WeakMap refers template site objects strongly (if we choose the weak reference implementation for realm.[[templateMap]]). But this may complicate the implementation and it may prevent implementing WeakMap as per-object table (it can be done, but it is no longer simple private symbols).
var map = new WeakMap();
function tag(siteObject) {
return siteObject;
}
tag`hello`;
gc(); // siteObject can be collected because there's no reference to it if
the [[templateMap]] is implemented as WeakMap.
(function () {
var siteObject = tag`hello`;
map.set(siteObject, true); // To avoid the previously described
situation, WeakMap specially handles the siteObject. It is now refereneced
strongly by the WeakMap.
}());
gc();
(function () {
var siteObject = tag`hello`;
map.get(siteObject); // true
}());
// And if WeakMap is collected, siteObject can be collected.
map = null;
gc();
(code fixup edited in above)
congratulations and THANK YOU! I learned something important reading your messages. The notion that we can preserve non-observability when making one thing a WeakMap iff we make all other WeakMaps be strong for those same objects is true, novel, and very surprising. I have been working on such concepts for decades and never come across anything like it.
In this case, I suspect that implementers will continue to choose the memory leak rather than make WeakMap more complex in this way. But you have now given them a choice, which is great! The spec does not need to change to enable this choice. The spec is only about observable differences, and the space optimization you suggest would be unobservable.
Your observation, being general, may find other applications even if it is not used to optimize this one. This observation is not language specific; it may well find application in other memory safe languages including those yet to be invented. You have added another tool to our toolbox. You have deepened our understanding of what is possible.
On Wed, Jun 17, 2015 at 10:41 PM, Mark Miller <erights at gmail.com> wrote:
Hi Yusuke, congratulations and THANK YOU! I learned something important reading your messages. The notion that we can preserve non-observability when making one thing a WeakMap iff we make all other WeakMaps be strong for those same objects is true, novel, and very surprising. I have been working on such concepts for decades and never come across anything like it.
Thanks for your clarification! If the target object is resilient (it can be re-generated without an observable side effects) and immutable, we can inverse strong & weak maps.
In this case, strong internal Map is converted into WeakMap and user exposed WeakMap are converted into strong Map. The internal map itself cannot be collected, but, user exposed maps can be collected. So by inverting them, the target objects become collectable.
In this case, I suspect that implementers will continue to choose the memory leak rather than make WeakMap more complex in this way. But you have now given them a choice, which is great! The spec does not need to change to enable this choice. The spec is only about observable differences, and the space optimization you suggest would be unobservable.
I'm planning to discuss about this implementation in JavaScriptCore because I'm an template strings implementor in JavaScriptCore.
BTW, by extending the implementation slightly, we can still preserve performance optimization. When generating template site objects at first, generate it, register it to realm's WeakMap and strongly reference it in the JavaScript code site instead of realm. This is the same behavior to that immutable JSStrings are stored and strongly referenced in the JavaScript code itself. When the JavaScript code is discarded, we can collect them.
And by slightly modifying the per-object table proposal, we can still support it with this change ;) We can define WeakMap as
class WeakMap {
constructor(...) {
this.privateSymbol = @privateSymbol;
this.map = new Map();
}
get(object) {
if (object is template site object) {
return this.map.get(object);
}
return object[this.privateSymbol];
}
set(object, value) {
if (object is template site object) {
this.map.set(object, value);
return this;
}
object[this.privateSymbol] = value;
return this;
}
}
[+Allen]
Can registered Symbols be used as keys in WeakMaps? If so, we have a fatal unauthorized communications channel that we need to fix in the spec asap!
On Thu, Jun 18, 2015 at 1:18 AM, Mark S. Miller <erights at google.com> wrote:
[+Allen]
Can registered Symbols be used as keys in WeakMaps? If so, we have a fatal unauthorized communications channel that we need to fix in the spec asap!
Why do registered Symbols appear? (oops, maybe I missed some context...) User exposed WeakMap only accepts objects as a key.
In an inverted per-object table implementation, I've talked in this context[1, 2] Now, because iteration and clear method are dropped, we can implement WeakMap as an inverted per-object table instead of Ephemerons3. The last implementation example shows that my "converting realm.[[tempateMap]] into WeakMap" proposal can be implemented even if the WeakMap is implemented as an inverted per-object table.
Of course, if we take an inverted per-object table, private symbols should be treated specially in the implementation :)
- we need to carefully extend the object with private symbols even if the object is frozen.
- private symbols' lookup system should not be trapped by ES6 Proxy.
- private symbols should not be exposed to users.
Aren't WeakMap keys only objects?
congratulations and THANK YOU! I learned something important reading your messages. The notion that we can preserve non-observability when making one thing a WeakMap iff we make all other WeakMaps be strong for those same objects is true, novel, and very surprising. I have been working on such concepts for decades and never come across anything like it.
I apologize, I understand the problem with a weak registry forcing observable garbage collection in user code - that's nice but isn't this always the case with references to objects when an object pool/flyweight is used?
Isn't this the same issue as ==
working on strings that have string objects interned but possibly GC'd (and precisely why Java never collects interned strings)?
Could I not use Object(Symbol.for('some global registry symbol'))
as a
WeakMap
key? That would return a realm-specific object, of course.
On 6/17/15 1:54 PM, Jordan Harband wrote:
Could I not use
Object(Symbol.for('some global registry symbol'))
as aWeakMap
key? That would return a realm-specific object, of course.
Object(Symbol.for("x")) == Object(Symbol.for("x")) tests false. That's because www.ecma-international.org/ecma-262/6.0/#sec-object-value reaches step 3 which lands you in www.ecma-international.org/ecma-262/6.0/#sec-toobject which returns a new Symbol object every time.
So while you could use Object(Symbol.for('some global registry symbol')) as a weakmap key it would not be terribly useful unless you hung on to that object somewhere.
It would return a different object each time (for the same Symbol, like new String) so it would not exhibit the problem of being observable.
On Thu, Jun 18, 2015 at 1:18 AM, Mark S. Miller <erights at google.com> wrote:
[+Allen]
Can registered Symbols be used as keys in WeakMaps? If so, we have a fatal unauthorized communications channel that we need to fix in the spec asap!
It turns out the spec is fine. people.mozilla.org/~jorendorff/es6-draft.html#sec-weakmap.prototype.set step 5 says
If Type(key) is not Object, throw a TypeError exception.
as I hoped and expected. The reason I was alarmed is that I got the following behavior on v8/iojs:
> var w = new WeakMap();
undefined
> var r = Symbol.for('foo');
undefined
> w.set(r, true);
{}
> w.get(r)
true
I will file a v8 bug. Please someone, add a test for this to test262.
It turns out the spec is fine
...
I will file a v8 bug. Please someone, add a test for this to test262.
Ah, I see.
FYI (you may know deeper than I ;)), since symbols are primitive values, they cannot be used as a WeakMap's key. And since they are primitive values, they cannot have any properties. It means that primitive values are immutable. So Symbol.for / Symbol.keyFor's registry can be WeakMap in the internal implementation.
Actually, we implemented so :D trac.webkit.org/changeset/182915
On Wed, Jun 17, 2015 at 11:19 AM, Yusuke SUZUKI <utatane.tea at gmail.com> wrote:
FYI (you may know deeper than I ;)), since symbols are primitive values, they cannot be used as a WeakMap's key. And since they are primitive values, they cannot have any properties. It means that primitive values are immutable. So Symbol.for / Symbol.keyFor's registry can be WeakMap in the internal implementation.
Actually, we implemented so :D trac.webkit.org/changeset/182915
Yes, the important issue is the precondition in your "since" statement, which v8 currently violates.
What do other browsers currently do?
By your observation at the start of this thread, we logically could have specified that Symbols could be used as WeakMap keys as long as the WeakMap held them strongly. Needless to say, I am glad we didn't ;).
On 6/17/15 2:35 PM, Mark S. Miller wrote:
What do other browsers currently do?
Firefox:
> var w = new WeakMap(); var r = Symbol.for('foo'); w.set(r, true);
TypeError: r is not a non-null object
WebKit nightly:
> var w = new WeakMap(); var r = Symbol.for('foo'); w.set(r, true);
TypeError: Attempted to set a non-object key in a WeakMap
Don't have a new enough IE to test with offhand.
The v8 bug referred to earlier in this thread was filed by Rick Waldron and fixed back in March, I think engines are on the same page with this — just FYI
TypeError: Invalid value used as weak map key
Yes, already fixed on v8. Thanks.
actually I was surprised the apparently mentioned native behavior looked too much like my Symbol based WeakMap partial poly:
var WeakMap = WeakMap || (function (s) {'use strict';
function WeakMap() {
this[s] = Symbol('WeakMap');
}
WeakMap.prototype = {
'delete': function del(o) {
delete o[this[s]];
},
get: function get(o) {
return o[this[s]];
},
has: function has(o) {
return -1 < Object.getOwnPropertySymbols(o).indexOf(this[s]);
},
set: function set(o, v) {
o[this[s]] = v;
}
};
return WeakMap;
}(Symbol('WeakMap')));
weird to say the least, and yet I've no idea why that realm check would be needed/done/reliable per each WeakMap.
Sorry for the noise, if any
uh ... never mind then, I don't even need to understand :D
The idea that (a shared Weak interning table of immutable-objects-with-identity + WeakMaps makes gc observable) is not new. The idea that (the shared interning tables of immutable-objects-with-identity must therefore be strong) is not new.
What was new to me is the idea that
Interning of a particular immutable-objects-with-identity in an interning table can still safely be weakly interned, by marking that object, at interning time, so all WeakMaps from then on hold it strongly
is new. At least to me.
Interning of a particular immutable-objects-with-identity in an interning table can still safely be weakly interned, by marking that object, at interning time, so all WeakMaps from then on hold it strongly
Oh cool, I didn't realize that. That is pretty neat :)
On Jun 17, 2015, at 9:18 AM, Mark S. Miller wrote:
[+Allen]
Can registered Symbols be used as keys in WeakMaps? If so, we have a fatal unauthorized communications channel that we need to fix in the spec asap!
No, symbols are not objects and the keys of WeakMaps must be objects.
BTW, some people have identified use cases where allowing symbols values as WeakMap keys would be useful.
I am curious about what these are? But regardless, I would expect there to be examples where it would be useful if it weren't fatal. Regarding the issues in this thread, it actually would be safe to allow unregistered Symbols as keys. But unless these examples are tremendously compelling, please let's not.
this is puzzling me too ... so I've got few cases
- you want/need a one to many relations, Symbol as key, Array as value, and you play with that Array values as needed per each Symbol used as key in the very same WeakMap
- you invert the logic and you have a WeakMap that checks per each object of the value Symbol has been set already
- you want to use that very same Symbol with that object, but that objet is not extensible, so you fallback through a private WeakMap that associates such object instead of using Symbol directly
which case was that?
In ES6 spec, template site objects are strongly referenced by the realm.[[templateMap]]. So naive implementation leaks memory because it keeps all the site objects in the realm.
However, we can alleviate this situation. Because template site objects are frozen completely, it behaves as if it's a primitive value. It enables the implementation to reference it from the realm weakly. When all disclosed site objects are not referenced, we can GC them because nobody knows the given site object is once collected (& re-generated).
But, even if the object is frozen, we can bind the property with it indirectly by using WeakMap. As a result, if the site objects are referenced by the realm weakly, users can observe it by using WeakMap.
To avoid this situation, we need to specially handle template site objects in WeakMap; WeakMap refers template site objects strongly (if we choose the weak reference implementation for realm.[[templateMap]]). But this may complicate the implementation and it may prevent implementing WeakMap as per-object table (it can be done, but it is no longer simple private symbols).
Is it intentional semantics? I'd like to hear about this. (And please point it if I misunderstood)