Proxy performance: JIT-compilation?

# Alex Vincent (a year ago)

Recently, a fellow engineer observed that ECMAScript proxies are "around 700x slower" in retrieving a property than a simple getter function with a closure. He thought this meant "the JIT compiler can't optimize the code".

I'm not so sure, so I thought I'd ask the experts here.

Proxies are one of the frontiers of JavaScript, largely unexplored. The ECMAScript specification makes no mention whatsoever of just-in-time compilation (rightly so, I would think). But this means that proxy code execution might be one of those areas where we do not currently enjoy the benefits of JIT... and we could.

So, I have a direct question for JavaScript engine developers: other than having to look up and execute a proxy handler's traps, are there specific reasons why proxy performance might be significantly slower than a simple function execution?

I can imagine a three-fold performance test: (1) direct inspection of an object, (2) inspection of Proxy({}, Reflect), and (3) inspection of Proxy({}, handler) where

allTraps.forEach(function(t) { handler[t] = function() { return Reflect[t].apply(Reflect, arguments); } });

# Mark S. Miller (a year ago)

At tvcutsem/es-lab#21 Tom and I have an idea (that we should turn into a proposal) for a subtle change to proxy semantics that * should break essentially no current code, * repair the cycle detection transparency violation bug, * enable many proxies to be much faster.

# Allen Wirfs-Brock (a year ago)

On Aug 4, 2017, at 2:22 PM, Mark S. Miller <erights at google.com> wrote:

At tvcutsem/es-lab#21, tvcutsem/es-lab#21 Tom and I have an idea (that we should turn into a proposal) for a subtle change to proxy semantics that * should break essentially no current code, * repair the cycle detection transparency violation bug, * enable many proxies to be much faster.

I actually don’t see why any semantic changes are needed to enable better Proxy performance. One abstractions are sufficiently lowered, a proxy trap invocation is just a series of procedure calls (some dynamically dispatched; some to built-in procedures). I don’t see any reason why the same sort of PIC+dynamic typed based specialization+inlining that is used to optimize more conventional JS code isn’t also applicable to Proxy using code.

I don’t think the barriers to such optimization are technical. It’s more a matter of convincing that engine implementors that doing the work (probably significant) to optimizing Proxies in this manner is a sound investment and hight priority

# Mark S. Miller (a year ago)

On Fri, Aug 4, 2017 at 2:52 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>

wrote:

On Aug 4, 2017, at 2:22 PM, Mark S. Miller <erights at google.com> wrote:

At tvcutsem/es-lab#21 Tom and I have an idea (that we should turn into a proposal) for a subtle change to proxy semantics that * should break essentially no current code, * repair the cycle detection transparency violation bug, * enable many proxies to be much faster.

I actually don’t see why any semantic changes are needed to enable better Proxy performance. One abstractions are sufficiently lowered, a proxy trap invocation is just a series of procedure calls (some dynamically dispatched; some to built-in procedures). I don’t see any reason why the same sort of PIC+dynamic typed based specialization+inlining that is used to optimize more conventional JS code isn’t also applicable to Proxy using code.

I don’t think the barriers to such optimization are technical. It’s more a matter of convincing that engine implementors that doing the work (probably significant) to optimizing Proxies in this manner is a sound investment and hight priority

I agree. So I change the claims to

* should break essentially no current code,
* repair the cycle detection transparency violation bug,
* enable many proxies to be *much* faster with less work by

implementors.

In particular, for many permanently absent traps, the proxy can just pass these through directly to the target without much analysis.

# Alex Vincent (a year ago)

In particular, for many permanently absent traps, the proxy can just pass these through directly to the target without much analysis.

Your suggested changes to the ECMAScript specifications seem to focus on permanently absent traps... which doesn't do much good for present traps. For instance, new Proxy({}, Reflect), which I mentioned in my initial e-mail and, by the way, implements all the traps. :-)

# Mark Miller (a year ago)

On Fri, Aug 4, 2017 at 3:18 PM, Alex Vincent <ajvincent at gmail.com> wrote:

In particular, for many permanently absent traps, the proxy can just pass

these through directly to the target without much analysis.

Your suggested changes to the ECMAScript specifications seem to focus on permanently absent traps... which doesn't do much good for present traps. For instance, new Proxy({}, Reflect), which I mentioned in my initial e-mail and, by the way, implements all the traps. :-)

Hence the "many" qualification. It is true that the main use case for proxies is membranes, in which case all traps are present. But even for these, if the trap is permanently present, it need only be [[Get]]ed from the handler once on initializing the proxy. Granted, this is much less helpful for speeding up implementations, but it is not nothing ;).

# Alex Vincent (a year ago)

So, how many boxes of chocolates do I need to send to the two big vendors in Mountain View? :-)

It's been fifteen years since I seriously tried to profile C++ code, and I didn't really know what I was doing back then: unfamiliar tools, and less competence in C++ than I would've liked. What little knowledge of profiling I had back then has long since faded.

Even if I could generate a pretty picture of how long we spent in each code path, I wouldn't know how to interpret it.

I recently submitted a patch for improving error reporting in SpiderMonkey [1], so I can occasionally dip my toes in the JSAPI code...

[1] bugzilla.mozilla.org/show_bug.cgi?id=1383630

# Mark Miller (a year ago)

Alex, I'll just point out that you are already engaged in the best kind of activity to get implementors to optimize these paths: Building a membrane library that can get widespread use, which encapsulate the complexity of proxies behind a more usable API, for which these proxy operations are the bottleneck. If these costs were sufficient to deter use of your library this would not be a good strategy. But many uses of membranes will be for cases where membrane crossings are rare compared to direct object-to-object interaction on either side of the membrane. For most of these, faster proxies will not matter. But for some of these, proxy performance will not be enough to deter use, but faster proxies would still produce a noticeably more pleasant experience.

This is a long term strategy. For the short term, if you can manage it, make proxy performance significant in some widely used benchmark suite.

None of this is meant to detract from the box of chocolate strategy. Try everything!

# kai zhu (a year ago)

of course Proxy is going to cause deoptimization problems when you start breaking assumptions about Object builtins (which obviously have more aggressive and specialized optimizations than generic methods). in v8, i understand the common-case optimization for builtin getters to be a direct property lookup of a c++ hidden class. adding a trap with dynamic-code to the getter throws that direct-lookup out the window and likely no reasonable amount jit-optimization will get you back the common-case performance.

# Sam Tobin-Hochstadt (a year ago)

On Fri, Aug 4, 2017 at 4:52 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

On Aug 4, 2017, at 2:22 PM, Mark S. Miller <erights at google.com> wrote:

At tvcutsem/es-lab#21 Tom and I have an idea (that we should turn into a proposal) for a subtle change to proxy semantics that * should break essentially no current code, * repair the cycle detection transparency violation bug, * enable many proxies to be much faster.

I actually don’t see why any semantic changes are needed to enable better Proxy performance. One abstractions are sufficiently lowered, a proxy trap invocation is just a series of procedure calls (some dynamically dispatched; some to built-in procedures). I don’t see any reason why the same sort of PIC+dynamic typed based specialization+inlining that is used to optimize more conventional JS code isn’t also applicable to Proxy using code.

Indeed, this works well in practice with other proxy constructs in other languages -- my collaborators and I have a paper showing this that should be out soon. The only barrier to good proxy performance is implementation work.

# Isiah Meadows (a year ago)

Yes, it's possible to optimize them using specialized ICs on the proxy handler itself, but it would be far easier to optimize it if the ICs weren't necessary in the first place, since you can just build it into the proxy's type, almost like a lazily-generated vtable. It's just far less work than the otherwise-necessary complex ICs you'd need otherwise.

Even though it is in theory possible to optimize such proxies, it's pretty complicated to set up, and JS engines aren't exactly magic.

Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Mark Miller (a year ago)

So from y'all's various implementation perspectives, how does tvcutsem/es-lab#21 look? Do you think it would make it easier to achieve much higher performance proxies than we could without these subtle semantic changes? Or do you think we can as easily achieve these performance gains with no observable changes at all?

By "subtle", I mean that it is unlikely to affect any normal code.

(Note that, even if the answer is that they don't help, these changes are still adequately motivated by the cycle-transparency bug. But it would be good to know.)

# Allen Wirfs-Brock (a year ago)

On Aug 8, 2017, at 5:03 PM, Mark Miller <erights at gmail.com> wrote:

So from y'all's various implementation perspectives, how does tvcutsem/es-lab#21, tvcutsem/es-lab#21 look? Do you think it would make it easier to achieve much higher performance proxies than we could without these subtle semantic changes? Or do you think we can as easily achieve these performance gains with no observable changes at all?

Detecting mutation of an handler object seems isomorphic to deleting mutation of an object on the [[Prototype]] chain of an ordinary object. I suspect a smart implementation would use the same detection/invalidation techniques for both. Sure an immutable handler object (or prototype object) may eliminate the need for that check, but the mechanism is going to have to be there anyway since most prototype objects are mutable. The basic idea is too lower every abstraction to primitive; highly optimizable IR. It has to me able to optimize potentially mutable dispatch objects (prototypes, etc.) even though actual mutation is very rate. So, not clear if special casing non-mutable dispatch objects would be worth added complexity. Obviously, milage may vary among actual designs.