Proxying built-ins (Was: [[Invoke]] and implicit method calls)

# Tom Van Cutsem (12 years ago)

2013/9/24 Allen Wirfs-Brock <allen at wirfs-brock.com>

I think this is a key point. Things like 'new Proxy(new Date, {}).getDate()' just don't work as expected with direct proxies and we have not been able to fix that while maintaining other important semantic requirements. If JS programmer have an expectation that they can usefully write such code they are going to be disappointed. Direct proxies seem to be a fine primitive for implementing membranes and some virtual objects. They aren't good for things like this Date example.

In my original direct proxies proposal, this would have worked because I proposed that a Proxy-for-a-Date be recognized as a genuine Date, with all Date.prototype.* methods auto-unwrapping the proxy, operating on the actual Date target.

Off the top of my head I don't recall why this was a no-go.

To me, this solution is appealing because it feels analogous to how we solved the "subclassing built-ins" problem.

IIRC, the initial class proposal didn't cater to subclassing built-ins, i.e. instances of class MyDate extends Date {...} would fail to be recognized by built-in Date methods.

Allen then changed the way instance initialization works, allowing the @@create method to "brand" the subclass instance as being of the right "type" / having the right internal state.

Translating this solution to the proxying built-ins problem somehow requires "branding" the proxy that it is of the right primitive type. The easiest way to brand a proxy is simply to check whether its target is of the right type. IOW, a proxy is-a built-in <=> its target is-a built-in.

This solution requires all built-in methods to try and unwrap proxies passed as the |this| value. This would be just another branch in the preamble of such built-ins. The actual built-in body will only operate on the unwrapped value, which is always a genuine built-in value, with whatever custom data layout the built-in expects.

The auto-unwrapping doesn't break membranes, because membranes never expose direct references to built-in functions (they only expose wrapper functions, which can still do whatever interposition they want before calling the actual built-in).

# David Bruant (12 years ago)

Le 25/09/2013 11:18, Tom Van Cutsem a écrit :

Off the top of my head I don't recall why this was a no-go.

IIRC, the problem was that this solution wasn't generic enough because it couldn't work with userland private state. Auto-unwrapping with Date and Set works because interaction is always through methods. It is less clear how it should work for userland private state.

Builtins are easy, because it's very clear which method is expected to work with which object; there is a clear definition of what an "instance" is and that can be tracked internally. Also, the auto-unwrapping can be done safely, because it's done internally. None of these two are true for userland private state.

I think it's important to have a generic solution to avoid having magic (non-self-hostable) built-ins (but I don't have this solution).

# David Bruant (12 years ago)

Le 25/09/2013 12:01, David Bruant a écrit :

I think it's important to have a generic solution to avoid having magic (non-self-hostable) built-ins (but I don't have this solution).

Can relationships help here?

# Mark S. Miller (12 years ago)

Why does Date need private state? AFAICT, it only needs uniquely named state. Why not do what we've done for many other bits of internal state that doesn't need to be private: just name it with a unique symbol? This doesn't work for all internal state of course, but it does seem it would work for Date.

# Boris Zbarsky (12 years ago)

On 9/25/13 5:18 AM, Tom Van Cutsem wrote:

The auto-unwrapping doesn't break membranes, because membranes never expose direct references to built-in functions (they only expose wrapper functions, which can still do whatever interposition they want before calling the actual built-in).

I'd like to clarify this part a bit... What happens when I take a built-in from one realm and .call() it on a proxy which is implementing a cross-realm membrane?

In this situation, I will claim the right behavior is to throw if the cross-realm access is disallowed and to unwrap the proxy otherwise. This is the behavior SpiderMonkey implements or its cross-realm proxies, in fact. But at the moment this is implemented via having a special brand on cross-realm proxies that indicates they're cross-realm proxies and giving cross-realm proxies's handler an extra "is it safe to unwrap your proxies?" hook.

All of which is to say, maybe such a hook is needed in general?

# David Bruant (12 years ago)

Le 25/09/2013 15:49, Mark S. Miller a écrit :

Why does Date need private state? AFAICT, it only needs uniquely named state. Why not do what we've done for many other bits of internal state that doesn't need to be private: just name it with a unique symbol?

yes (assuming unique symbols are a thing in the end. I think there were threatened. unique string could work though)

This doesn't work for all internal state of course

yes. I think that in all this discussion, Date is a representant of built-ins that have private state. WeakMap internals really can't be exposed as unique properties for instance.

# Allen Wirfs-Brock (12 years ago)

On Sep 25, 2013, at 3:01 AM, David Bruant wrote:

Le 25/09/2013 11:18, Tom Van Cutsem a écrit :

2013/9/24 Allen Wirfs-Brock <allen at wirfs-brock.com> I think this is a key point. Things like 'new Proxy(new Date, {}).getDate()' just don't work as expected with direct proxies and we have not been able to fix that while maintaining other important semantic requirements. If JS programmer have an expectation that they can usefully write such code they are going to be disappointed. Direct proxies seem to be a fine primitive for implementing membranes and some virtual objects. They aren't good for things like this Date example.

In my original direct proxies proposal, this would have worked because I proposed that a Proxy-for-a-Date be recognized as a genuine Date, with all Date.prototype.* methods auto-unwrapping the proxy, operating on the actual Date target.

Off the top of my head I don't recall why this was a no-go. IIRC, the problem was that this solution wasn't generic enough because it couldn't work with userland private state. Auto-unwrapping with Date and Set works because interaction is always through methods. It is less clear how it should work for userland private state.

More generally, such simple Proxy's such as the date one above break for any target object that have internal identify dependencies upon the target object. This might be private state dependencies but they also could be external relationships such as registering the object in a Map.

Builtins are easy, because it's very clear which method is expected to work with which object; there is a clear definition of what an "instance" is and that can be tracked internally. Also, the auto-unwrapping can be done safely, because it's done internally. None of these two are true for userland private state.

How is this clear at the level of specify the behavior of MOP operations for Proxy instances? What is it that distinguishes a function that has such dependencies from one that doesn't.

Also, one of the early goals of Proxy was to support self hosting of built-ins. Saying that the Proxy MOP implementation has special knowledge of all built-in methods would not be supportive of that goal. As it now stands a self-hosted implementation of, for example, Date is quite possible and it would fail for the above Proxy example in exactly the same way as a "native" implementation.

The issue isn't built-in vs. self-hosted. It's passing identify dependencies through a Proxy.

I think it's important to have a generic solution to avoid having magic (non-self-hostable) built-ins (but I don't have this solution).

I don't think there is one, based upon Direct Proxies. That's why if we want to have Proxy as a primitive that supports creating membranes we need to stop thinking about it as a more universal primitive that also supports things like transparent forwarding and try to find ways to avoid leading future JS programmers into that same confusion.

# Allen Wirfs-Brock (12 years ago)

On Sep 25, 2013, at 6:49 AM, Mark S. Miller wrote:

Why does Date need private state? AFAICT, it only needs uniquely named state. Why not do what we've done for many other bits of internal state that doesn't need to be private: just name it with a unique symbol? This doesn't work for all internal state of course, but it does seem it would work for Date.

Mostly, because that is its legacy definition and changing it would be observably different is various ways.

But more importantly, Date in these examples is just an exemplar of any object that has internal state or identify dependencies. For example, I suspect you wouldn't propose using a unique symbol keyed property to access the internal state of a Map.

# David Bruant (12 years ago)

Le 25/09/2013 17:59, Allen Wirfs-Brock a écrit :

On Sep 25, 2013, at 3:01 AM, David Bruant wrote:

I think it's important to have a generic solution to avoid having magic (non-self-hostable) built-ins (but I don't have this solution).

I don't think there is one, based upon Direct Proxies.

There probably is one. Something like for a given class, register a bunch of functions (typically the ones on the prototype) which receive the unwrapped target if this target was the output of the class constructor. It could look like something like (self-hosting Date):

 // don't worry too much about the syntax
 class Date(){
     constructor(){
         this.datetime = readDatetimeFromSystem();
         setInterval(()=>{ this.datetime++; }, 1);
     }

     private datetime;

     public getMonth(){
         return someComputation(this.datetime);
     }
 }

 // only the class creator should be able to do that (maybe "class" 

should be an expression returning the capability or something) Date.functionsWhereThisShouldBeUnwrapped = [Date.prototype.getMonth];

 var d = new Date();
 var p = new Proxy(d, {});
 d.getMonth(); // works
 // the method sees c because 1) it's "whitelisted", 2) the target 

is a Date instance

 var p2 = new Proxy({}, {}); // random target
 p2.getMonth = p.getMonth;
 p2.getMonth(); // throws because p2 wasn't created by the Date 

constructor and the private state can't be found

Edges are a bit hand-wavy, but I hope you get my point that in a way or another, it's possible to maintain the information of which constructor created what function and tell with fine-grain which function should unwrap automatically to the target (leading to private state access).

That's why if we want to have Proxy as a primitive that supports creating membranes we need to stop thinking about it as a more universal primitive that also supports things like transparent forwarding

Why would the 2 goals be contradictory?

# Andreas Rossberg (12 years ago)

On 25 September 2013 15:49, Mark S. Miller <erights at google.com> wrote:

Why does Date need private state? AFAICT, it only needs uniquely named state. Why not do what we've done for many other bits of internal state that doesn't need to be private: just name it with a unique symbol? This doesn't work for all internal state of course, but it does seem it would work for Date.

Because implementations need to cache various aspects of a date value for performance. Anything but true private, on-object state would have far too much overhead for that.

# Mark S. Miller (12 years ago)

On Wed, Sep 25, 2013 at 7:13 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

On 9/25/13 5:18 AM, Tom Van Cutsem wrote:

The auto-unwrapping doesn't break membranes, because membranes never expose direct references to built-in functions (they only expose wrapper functions, which can still do whatever interposition they want before calling the actual built-in).

I'd like to clarify this part a bit... What happens when I take a built-in from one realm and .call() it on a proxy which is implementing a cross-realm membrane?

In this situation, I will claim the right behavior is to throw if the cross-realm access is disallowed and to unwrap the proxy otherwise. This is the behavior SpiderMonkey implements or its cross-realm proxies, in fact. But at the moment this is implemented via having a special brand on cross-realm proxies that indicates they're cross-realm proxies and giving cross-realm proxies's handler an extra "is it safe to unwrap your proxies?" hook.

All of which is to say, maybe such a hook is needed in general?

Hi Boris, I don't understand what you mean by "in general". I think the SpiderMonkey use of cross-realm membranes is a great use case for membranes, and I don't understand why they need any special logic at all -- beyond the logic expressed by their handlers, which must include revocation. These membrane handlers should be the only logic which express the semantics of cross realm invocation, revocation, etc. Other membranes across other boundaries express other things. I think this use case is a good acid test of transparency, efficiency, and expressiveness across membrane boundaries. Is there anything about the browser's inter-realm interaction that cannot be expressed by browser-self-hosted vanilla membranes?

# Mark S. Miller (12 years ago)

That's why I picked on Date. We need to distinguish formerly-internal potentially-public state from genuinely private state. WeakMap is (ironically) a great example of an abstraction whose internal state must remain private.

# Mark S. Miller (12 years ago)

On Wed, Sep 25, 2013 at 8:59 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:

On Sep 25, 2013, at 3:01 AM, David Bruant wrote:

Le 25/09/2013 11:18, Tom Van Cutsem a écrit :

2013/9/24 Allen Wirfs-Brock <allen at wirfs-brock.com>

I think this is a key point. Things like 'new Proxy(new Date, {}).getDate()' just don't work as expected with direct proxies and we have not been able to fix that while maintaining other important semantic requirements. If JS programmer have an expectation that they can usefully write such code they are going to be disappointed. Direct proxies seem to be a fine primitive for implementing membranes and some virtual objects. They aren't good for things like this Date example.

In my original direct proxies proposal, this would have worked because I proposed that a Proxy-for-a-Date be recognized as a genuine Date, with all Date.prototype.* methods auto-unwrapping the proxy, operating on the actual Date target.

Off the top of my head I don't recall why this was a no-go.

IIRC, the problem was that this solution wasn't generic enough because it couldn't work with userland private state. Auto-unwrapping with Date and Set works because interaction is always through methods. It is less clear how it should work for userland private state.

More generally, such simple Proxy's such as the date one above break for any target object that have internal identify dependencies upon the target object. This might be private state dependencies but they also could be external relationships such as registering the object in a Map.

Identity dependencies work fine across full identity preserving membranes. They cannot work at all across proxying patterns short of membranes. Representing private state in this identity dependent manner enables it to work across membranes. No other solution to private state has survived the joint requirements of: preserving privacy and transparency across membranes. Thus, we have zero solutions to private state that meet both of these requirements and also work across non-membrane proxying patterns. I propose that we solve only solvable problems, and give up on private state across across non-membrane proxying patterns.

# Boris Zbarsky (12 years ago)

On 9/25/13 3:47 PM, Mark S. Miller wrote:

Hi Boris, I don't understand what you mean by "in general". I think the SpiderMonkey use of cross-realm membranes is a great use case for membranes, and I don't understand why they need any special logic at all -- beyond the logic expressed by their handlers, which must include revocation.

Mark,

The issue is that if I have access to a different-realm Location object called "loc", say, then:

Object.getOwnPropertyDescriptor(Location.prototype, "href").get.call(loc)

should throw if the "loc" is not same-origin with me. But:

Object.getOwnPropertyDescriptor(Location.prototype, "href").set.call(loc, "whatever")

should perform the set. (There are actually some more complications here for the specific case of Location, but let's ignore them for now.)

What that means in practice is that the membrane isn't actually "revoked": it needs to be unwrapped to the real underlying object in some cases, but not others.

The way we implement that, again, is with a way to unconditionally unwrap a cross-realm membrane, and a way to ask a cross-realm membrane whether it's ok to unwrap it. Some of the methods/getters/setters involved use the former, and some use the latter.

Note that in this case the actual property getter/setter is gotten without any interaction with membranes at all. The only question is when a membrane around the thisArg to a function can be pierced to get to the thing the function actually knows how to operate on.

# David Bruant (12 years ago)

Le 25/09/2013 22:00, Boris Zbarsky a écrit :

On 9/25/13 3:47 PM, Mark S. Miller wrote:

Hi Boris, I don't understand what you mean by "in general". I think the SpiderMonkey use of cross-realm membranes is a great use case for membranes, and I don't understand why they need any special logic at all -- beyond the logic expressed by their handlers, which must include revocation.

Mark,

The issue is that if I have access to a different-realm Location object called "loc", say, then:

Object.getOwnPropertyDescriptor(Location.prototype, "href").get.call(loc)

should throw if the "loc" is not same-origin with me. But:

Object.getOwnPropertyDescriptor(Location.prototype, "href").set.call(loc, "whatever")

should perform the set.

This line looks very WebIDL-ish. Does the web really need that to work? Isn't it a case of WebIDL being overzealous?

(There are actually some more complications here for the specific case of Location, but let's ignore them for now.)

What that means in practice is that the membrane isn't actually "revoked": it needs to be unwrapped to the real underlying object in some cases, but not others.

The way we implement that, again, is with a way to unconditionally unwrap a cross-realm membrane, and a way to ask a cross-realm membrane whether it's ok to unwrap it.

What are the other such cases that require this check?

# Boris Zbarsky (12 years ago)

On 9/25/13 5:33 PM, David Bruant wrote:

This line looks very WebIDL-ish. Does the web really need that to work?

The web doesn't need anything involving getOwnPropertyDescriptor to work. I just picked a simple example.

The web does in general rely on being able to apply methods from some built-in (including DOM) prototype in one realm to objects from another realm.

What are the other such cases that require this check?

Which "this check"?

# Brendan Eich (12 years ago)

Boris Zbarsky wrote:

The web does in general rely on being able to apply methods from some built-in (including DOM) prototype in one realm to objects from another realm.

This goes back to the dawn of JS. You can set location on a reachable frame or window, even if not same-origin.

There's no revocation in the Revocable Membrane sense going on. Rather, the membraning has to distinguish gets and other operations from sets, and throw or unwrap |this| accordingly.

# Tom Van Cutsem (12 years ago)

2013/9/25 Allen Wirfs-Brock <allen at wirfs-brock.com>

On Sep 25, 2013, at 3:01 AM, David Bruant wrote:

Builtins are easy, because it's very clear which method is expected to work with which object; there is a clear definition of what an "instance" is and that can be tracked internally. Also, the auto-unwrapping can be done safely, because it's done internally. None of these two are true for userland private state.

How is this clear at the level of specify the behavior of MOP operations for Proxy instances? What is it that distinguishes a function that has such dependencies from one that doesn't.

Also, one of the early goals of Proxy was to support self hosting of built-ins. Saying that the Proxy MOP implementation has special knowledge of all built-in methods would not be supportive of that goal.

My proposal would not require the Proxy MOP to know about all the built-ins. Rather it's the other way around: all the built-ins for which passing in a proxy should work, should know about proxies and un-wrap them.

But your and David's point is well-taken: this would be an ad hoc mechanism that might work for just the ES6 built-ins, but it does not scale easily beyond those. I withdraw my proposal.

# Tom Van Cutsem (12 years ago)

2013/9/25 Boris Zbarsky <bzbarsky at mit.edu>

On 9/25/13 3:47 PM, Mark S. Miller wrote:

Hi Boris, I don't understand what you mean by "in general". I think the SpiderMonkey use of cross-realm membranes is a great use case for membranes, and I don't understand why they need any special logic at all -- beyond the logic expressed by their handlers, which must include revocation.

Mark,

The issue is that if I have access to a different-realm Location object called "loc", say, then:

Object.**getOwnPropertyDescriptor(**Location.prototype, "href").get.call(loc)

should throw if the "loc" is not same-origin with me. But:

Object.**getOwnPropertyDescriptor(**Location.prototype, "href").set.call(loc, "whatever")

should perform the set. (There are actually some more complications here for the specific case of Location, but let's ignore them for now.)

What that means in practice is that the membrane isn't actually "revoked": it needs to be unwrapped to the real underlying object in some cases, but not others.

The way we implement that, again, is with a way to unconditionally unwrap a cross-realm membrane, and a way to ask a cross-realm membrane whether it's ok to unwrap it. Some of the methods/getters/setters involved use the former, and some use the latter.

Note that in this case the actual property getter/setter is gotten without any interaction with membranes at all. The only question is when a membrane around the thisArg to a function can be pierced to get to the thing the function actually knows how to operate on.

I believe the crucial part of why this works is because the built-ins can recognize trusted, cross-realm proxies, from arbitrary other proxies.

In general, we can't let the built-in transfer control to an arbitrary proxy handler, because users calling |builtin.call(thing)| don't expect |thing| to influence the result of the call (cf. the discussion in the [[invoke]] thread < esdiscuss.org/topic/invoke-and-implicit-method-calls#content-115>).

Answering MarkM's question of whether we can self-host such behavior, I believe we can:

var trustedMembraneProxies = new WeakMap(); // maps trusted membrane
proxies to their target, allowing code with access to this WeakMap to
unwrap them

Object.defineProperty(Location.prototype, "href", {
   get: function() {
     var target = trustedMembraneProxies.get(this);
     if (target === undefined) {
        // this-binding was not a cross-realm wrapper, just operate on the
original this-binding
        target = this;
     }
     ... // execute builtin behavior with target as the this-binding
  }
}
# Anne van Kesteren (12 years ago)

On Thu, Sep 26, 2013 at 12:22 AM, Brendan Eich <brendan at mozilla.com> wrote:

Boris Zbarsky wrote:

The web does in general rely on being able to apply methods from some built-in (including DOM) prototype in one realm to objects from another realm.

This goes back to the dawn of JS. You can set location on a reachable frame or window, even if not same-origin.

Probably does not matter much, but the cross-origin case might become proxy objects long term due to process isolation.

# David Bruant (12 years ago)

Le 26/09/2013 14:44, Anne van Kesteren a écrit :

On Thu, Sep 26, 2013 at 12:22 AM, Brendan Eich <brendan at mozilla.com> wrote:

Boris Zbarsky wrote:

The web does in general rely on being able to apply methods from some built-in (including DOM) prototype in one realm to objects from another realm. This goes back to the dawn of JS. You can set location on a reachable frame or window, even if not same-origin. Probably does not matter much, but the cross-origin case might become proxy objects long term due to process isolation.

It seems like it matters a lot. If that's not already the case, would that break the web when that happens? Formulated differently: does the web has requirements making such process isolation impossible? If no, can the specs and implementations can be changed now to make process isolation possible?

# Boris Zbarsky (12 years ago)

On 9/26/13 5:16 AM, Tom Van Cutsem wrote:

I believe the crucial part of why this works is because the built-ins can recognize trusted, cross-realm proxies, from arbitrary other proxies.

Yes, agreed.

Answering MarkM's question of whether we can self-host such behavior, I believe we can:

var trustedMembraneProxies = new WeakMap(); // maps trusted membrane
proxies to their target, allowing code with access to this WeakMap to
unwrap them

Ah, and another weakmap for the per-property security checks. Yes, I guess that would work.

# Boris Zbarsky (12 years ago)

On 9/26/13 10:45 AM, David Bruant wrote:

Formulated differently: does the web has requirements making such process isolation impossible?

Unclear.

The web has a requirement that if I have a pointer to a cross-origin window "win" then doing win.name will find child windows of it with name="name". See bugzilla.mozilla.org/show_bug.cgi?id=916939 for what happened when we tried to disallow this in Firefox.

Now this could be implemented by having "win" be a cross-process proxy and having the get be a blocking RPC call that returns another cross-process proxy... maybe. Deadlock hazards abound.

# Anne van Kesteren (12 years ago)

On Thu, Sep 26, 2013 at 10:45 AM, David Bruant <bruant.d at gmail.com> wrote:

Le 26/09/2013 14:44, Anne van Kesteren a écrit :

Probably does not matter much, but the cross-origin case might become proxy objects long term due to process isolation.

It seems like it matters a lot.

Still not sure how it matters in this context. In any event, this is mostly a browser research project. Once someone has done it we know it's feasible. (We already stopped introducing new cross-origin synchronous dependencies.)

# Mark S. Miller (12 years ago)

I think we need to distinguish two senses of process when we ask this question:

a) address space separation in the implementation

b) concurrency

b) would be a breaking semantic change, so I'm going to write that off here. Feel free to start a separate thread on public-script-coord if you like about whether this breaking change might be possible, but I'm skeptical.

a) without b) might make use of OS processes in the implementation, since most OSes only provide separate address spaces to separate processes. But from a scheduling perspective, a group of address spaces can function as a single process by adopting the discipline that at most one is active at a time. A sync IPC makes the caller inactive and the callee active. It simulates exactly the locus of activity we have now in a single process single thread implementation.

So the a) question in isolation really does reduce to the membrane question. Since no object references ever directly cross a membrane boundary implementing a realm boundary, the two sides of a membrane can be connected by only a bit channel with no loss of observable functionality. However, we'd lose the easy GC of cross compartment cycles we get right now by building in-address-space membranes using proxies and weakmaps. Cross-address-space GC is essentially the same problem as distributed GC, which is a big topic in itself.

# Brendan Eich (12 years ago)

Anne van Kesteren <mailto:annevk at annevk.nl> September 26, 2013 5:44 AM

Probably does not matter much, but the cross-origin case might become proxy objects long term due to process isolation.

Backward compat says it doesn't matter, but process isolation really wants async APIs. Some browsers currently, e.g., make window.open cross-process more async than other browsers.

Something to pin down, and more important: provide promise-based modern (and sane) equivalent APIs.

# Brendan Eich (12 years ago)

Mark S. Miller <mailto:erights at google.com> September 26, 2013 8:55 AM

So the a) question in isolation really does reduce to the membrane question. Since no object references ever directly cross a membrane boundary implementing a realm boundary, the two sides of a membrane can be connected by only a bit channel with no loss of observable functionality. However, we'd lose the easy GC of cross compartment cycles we get right now by building in-address-space membranes using proxies and weakmaps. Cross-address-space GC is essentially the same problem as distributed GC, which is a big topic in itself.

Distributed GC is harder still because of network partitioning and multiple machine failure scenarios, but you're right that the cycle collection problem arises with multi-process GC, as with multi-language-VM-heap and with distributed multi-machine.