@@initialize, or describing subclassing with mixins

# Jussi Kalliokoski (10 years ago)

This is probably an absurd idea and I have no idea if it can be actually made to work for host objects, but wanted to throw it in the air to see if it has any viability and could be polished.

As we all know, mixins are a strongly ingrained pattern in JS out in the wild. One notable example is the EventEmitter in node and the numerous browserland alternatives. The main benefit of mixins is pretty much that they allow multiple inheritance.

The common mixin design pattern is something like:

function A () {
  this.x = 1;
}

function B () {
  this.y = 2;
  A.apply(this, arguments);
}

So you have the constructor function that initializes the state of the instance, and doesn't care whether the state is applied to an instance of the class or even a plain object.

Could subclassing be described in terms of mixins? After all, the problems we're having here and the reason we're trying to make complex solutions like deferred creation step is to hide the uninitialized state.

But say, in addition to @@create, we had @@initialize which would just add the hidden state to any given object bound to this. This can be done in terms of for example assigning a host object or other private state into one symbol, or multiple symbols to hold different private variables, or an engine-specific way; it doesn't matter because it's an implementation detail that is not visible to the userland. However, the key thing is that it could be applied to any given object, not just instances of the host object. That sidesteps the whole problem of uninitialized state, because you only have objects that have no state related to the host object or objects that have initialized state of the host object.

So let's say the default value of @@initialize was as follows:

super(...arguments);

That is, just propagate through the @@initialize methods in the inheritance chain.

Then the default value of @@create could be described as follows:

var instance = Object.create(ThisFunction.prototype);
ThisFunction[@@initialize].apply(instance, arguments);
return instance;

As an example, here's how you could self-host Map in these terms: gist.github.com/jussi-kalliokoski/5ef02ef90c6cbb8c1a70 . In the example, the uninitialized state is never revealed.

Simplicity is not the only gain from this approach, since it also opens the door to multiple inheritance, e.g. let's say you wanted a Map whose contents you can append to a DOM node:

class DomBag {
    [@@initialize] () {
        DocumentFragment[@@initialize].apply(this);
        Map[@@initialize].apply(this, arguments);

        for ( let node of this.values ) {
            DocumentFragment.prototype.appendChild.call(this, value);
        }
    }

    get: Map.prototype.get
    has: Map.prototype.has

    set (key, value) {
        if ( this.has(key) ) {
            this.removeChild(this.get(key));
        }

        Map.prototype.set.call(this, key, value);
        DocumentFragment.prototype.appendChild.call(this, value);
    }

    remove (key) {
        if ( this.has(key) ) {
            this.removeChild(this.get(key));
        }

        Map.prototype.remove.call(this, key);
    }
}

var bag = new DomBag([ ["foo", document.createElement("foo")], ["bar",
document.createElement("bar")] ]);
document.body.appendChild(bag);

So the core idea of the proposal is to make host objects completely unobservable. A DOMElement instance for example is no longer a host object; it's a normal object with __proto__ assigned to DOMElement.prototype, however it contains a private reference to a host object that is not observable to userland in any way.

There's obvious problems that need to be thought of, mostly because of DOM, like if you initialize two DOM node things on the same object, and then append that node somewhere. However, if we think of it in terms of symbols, we can have a symbol that represents the host object that gets applied to the tree when the object is applied, and the @@initialize of these nodes assigns a host object to that symbol, of course the assignment in the latter @@initialize overrides the one in the former.

WDYT?

# Boris Zbarsky (10 years ago)

On 6/30/14, 5:37 AM, Jussi Kalliokoski wrote:

However, the key thing is that it could be applied to any given object, not just instances of the host object.

The problem with this is that it requires either allocating the hidden state outside the object itself somewhere or requiring all objects to have space for this hidden space or both (e.g. allocating some amount of space for hidden state in all objects but spilling out into out-of-line slots when you want more hidden state than that).

So this fails one of the primary design criteria for @@create: being able to explain the existing builtins (both ES and DOM), which allocate hidden state inline for efficiency's sake.

I realize you consider this an irrelevant implementation detail, but it is important.

As an example, here's how you could self-host Map in these terms: gist.github.com/jussi-kalliokoski/5ef02ef90c6cbb8c1a70 . In the example, the uninitialized state is never revealed.

Right, at the cost of requiring the symbol thing, which costs both performance and memory.

         DocumentFragment.prototype.appendChild.call(this, value);

This is an interesting example. How do you expect the appendChild code to perform the check that its this value is a Node (not necessarily DocumentFragment, note, appendChild needs to work on any node) in this case? Your time budget for this is 20 machine instructions as an absolute limit, 10 preferred and your space budget is "as small as possible".

(This is not even worrying about the fact that in practice we want different in-memory layouts for different DOM objects, that Object.prototype.toString needs to return different values for different sorts of DOM objects, or that some DOM objects need different internal hooks (akin to proxies) from normal objects.)

# Jussi Kalliokoski (10 years ago)

On Mon, Jun 30, 2014 at 4:41 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

On 6/30/14, 5:37 AM, Jussi Kalliokoski wrote:

However, the key thing is that it could be applied to any given object, not just instances of the host object.

The problem with this is that it requires either allocating the hidden state outside the object itself somewhere or requiring all objects to have space for this hidden space or both (e.g. allocating some amount of space for hidden state in all objects but spilling out into out-of-line slots when you want more hidden state than that).

So this fails one of the primary design criteria for @@create: being able to explain the existing builtins (both ES and DOM), which allocate hidden state inline for efficiency's sake.

I realize you consider this an irrelevant implementation detail, but it is important.

No, I don't consider potential to optimize an irrelevant implementation detail, especially when it comes to DOM. However, my knowledge in engine-level DOM optimization is relatively poor, so I'm glad to have any flaws in the idea pointed out.

As an example, here's how you could self-host Map in these terms:

gist.github.com/jussi-kalliokoski/5ef02ef90c6cbb8c1a70 . In the example, the uninitialized state is never revealed.

Right, at the cost of requiring the symbol thing, which costs both performance and memory.

The host environment needs not use actual symbols, but I see your point.

          DocumentFragment.prototype.appendChild.call(this, value);

This is an interesting example. How do you expect the appendChild code to perform the check that its this value is a Node (not necessarily DocumentFragment, note, appendChild needs to work on any node) in this case? Your time budget for this is 20 machine instructions as an absolute limit, 10 preferred and your space budget is "as small as possible".

There are various approaches to that, all cost something, but the current approach is not free either. One heavy-handed optimization for the poor performance common case is to have Object's struct contain a pointer to a Node's host object, then if that's null, the object doesn't represent a Node. Depending on what you count in, that's around 4 instructions per check (I'm surprised if the current implementations do better than that). Memory footprint is 32bits on ARM and x86, and 64 bits on 64bit environments, per every Object, so pretty damn high. Performance often lies in tradeoffs, however, so an implementation might spend some extra cycles on having its own memory map of DOM nodes instead and enforce 32bit, or if they're sadistic towards developers of massive table sites, 24 bits. This can also be implemented as a binary flag where the layout is expanded if the flag is active to take the non-Node object memory footprint addition to 1 bit.

Another, more general, approach that's less memory-greedy for DOM-light applications is to store that pointer (or the whole state) in the layout of objects that are "naturally" instances of Node, then even possibly use the symbol space (or something less expensive) for the state if it's not. That doesn't change the current situation's best-case performance (i.e. only instances of Node are attempted to be appended to the DOM, where you have to do the instance checks in the current situation as well), adds one check to the case of erroring out for appending a non-Node, and leaves the performance cost on the doorstep of the thing that wasn't even possible before.

Aside, out of curiosity, which is more problematic in DOM currently: creation or appending of nodes? My guess is appending, but when it comes to performance I'd take data over instinct any day.

(This is not even worrying about the fact that in practice we want different in-memory layouts for different DOM objects,

These different in-memory layouts can be applied to the state behind the pointer.

that Object.prototype.toString needs to return different values for different sorts of DOM objects

Depending on how the internal layout problem is solved, that can be looked up from the internal state just as now, or the prototype chain (latter being a slight compatibility hazard). After all, no code is currently using @@initialize today so we can't break the behavior of plain object - DOM node mixins.

, or that some DOM objects need different internal hooks (akin to proxies) from normal objects.)

The internals can again be in the internal state. However, this internal state is now explainable in terms of the spec language. For all the implementation cares, the spec may describe that the implementation uses private symbols to store the internal state, and then the implementation stores the state inline anyway, because there's no test can prove that it doesn't internally use private symbols for it.

# Boris Zbarsky (10 years ago)

On 6/30/14, 3:00 PM, Jussi Kalliokoski wrote:

There are various approaches to that, all cost something, but the current approach is not free either.

Sure.

One heavy-handed optimization for the poor performance common case is to have Object's struct contain a pointer to a Node's host object

There are non-Node objects browsers have to support too, fwiw.

Would it help the discussion if UA implementors described how they solve these problems now?

Memory footprint is 32bits on ARM and x86, and 64 bits on 64bit environments, per every Object, so pretty damn high.

Yep, which is why implementations don't do this. It's actually higher than that because of alignment requirements...

Aside, out of curiosity, which is more problematic in DOM currently: creation or appending of nodes?

Problematic in what sense? You mean time cost?

It's hard to say and depends a lot on context: creation involves a lot more memory traffic; appending to a node depends a lot on whether that node is in the document, etc.

These different in-memory layouts can be applied to the state behind the pointer.

No, I mean different in-memory layouts for the JS object itself, not for the DOM object it points to. For example, for some objects we (Gecko in this case) want to have extra built-in slots for caching particular data for the JIT to access quickly.

# Jussi Kalliokoski (10 years ago)

On Mon, Jun 30, 2014 at 10:16 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

On 6/30/14, 3:00 PM, Jussi Kalliokoski wrote:

There are various approaches to that, all cost something, but the current approach is not free either.

Sure.

One heavy-handed optimization for

the poor performance common case is to have Object's struct contain a pointer to a Node's host object

There are non-Node objects browsers have to support too, fwiw.

Indeed, but as I mentioned, it's but one approach out of many.

Would it help the discussion if UA implementors described how they solve these problems now?

Yess, please! +9999

Memory footprint is 32bits on ARM and x86, and 64

bits on 64bit environments, per every Object, so pretty damn high.

Yep, which is why implementations don't do this. It's actually higher than that because of alignment requirements...

Affirmative.

Aside, out of curiosity, which is more problematic in DOM currently:

creation or appending of nodes?

Problematic in what sense? You mean time cost?

Yup, that was the original intent, but actually both memory and time complexity are of interest. But I'm infinitely curious so this is probably not the right place. I'm not demanding anything, but a blog post about DOM internals performance would be a real treat for people like me. ;)

It's hard to say and depends a lot on context: creation involves a lot more memory traffic; appending to a node depends a lot on whether that node is in the document, etc.

Right.

These different in-memory layouts can be applied to the state behind the

pointer.

No, I mean different in-memory layouts for the JS object itself, not for the DOM object it points to. For example, for some objects we (Gecko in this case) want to have extra built-in slots for caching particular data for the JIT to access quickly.

As I suggested in one approach, for "natural" instances (in lack of a better word) of Nodes can have all the in-memory layout optimizations they need. The difference between self-hosting with for example symbols (either in userland or spec language) to inlining the state internally is not observable to the spec language or the userland (other than in performance).

# Boris Zbarsky (10 years ago)

On 6/30/14, 3:38 PM, Jussi Kalliokoski wrote:

On Mon, Jun 30, 2014 at 10:16 PM, Boris Zbarsky <bzbarsky at mit.edu Would it help the discussion if UA implementors described how they solve these problems now?

Yess, please! +9999

ask.mozilla.org/question/850/how-do-webidl-methodsgetterssetters-tell-whether-objects-are-of-the-right-types/?answer=851#post-id-851 for Gecko.

I'd be curious to hear what other UAs do.

As I suggested in one approach, for "natural" instances (in lack of a better word) of Nodes can have all the in-memory layout optimizations they need.

Some of these are not just optimizations; some of these are needed for correctness. For example, we use fixed parts of the object layout to cache values for some Web IDL properties that need to keep returning the same thing over and over again.

# Jussi Kalliokoski (10 years ago)

On Tue, Jul 1, 2014 at 4:06 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

On 6/30/14, 3:38 PM, Jussi Kalliokoski wrote:

On Mon, Jun 30, 2014 at 10:16 PM, Boris Zbarsky <bzbarsky at mit.edu Would it help the discussion if UA implementors described how they solve these problems now?

Yess, please! +9999

ask.mozilla.org/question/850/how-do-webidl- methodsgetterssetters-tell-whether-objects-are-of-the- right-types/?answer=851#post-id-851 for Gecko.

I'd be curious to hear what other UAs do.

Thanks for the link! Me too. :)

As I suggested in one approach, for "natural" instances (in lack of a

better word) of Nodes can have all the in-memory layout optimizations they need.

Some of these are not just optimizations; some of these are needed for correctness. For example, we use fixed parts of the object layout to cache values for some Web IDL properties that need to keep returning the same thing over and over again.

I might be missing something, so is there something in the idea I proposed that would prevent doing this?

# Boris Zbarsky (10 years ago)

On 7/4/14, 12:15 PM, Jussi Kalliokoski wrote:

Some of these are not just optimizations; some of these are needed
for correctness.  For example, we use fixed parts of the object
layout to cache values for some Web IDL properties that need to keep
returning the same thing over and over again.

I might be missing something, so is there something in the idea I proposed that would prevent doing this?

Apart from not being sure the place to store the cached value will exist?