Sharing a JavaScript implementation across realms
My hypothesis is that it requires no further changes to JavaScript proper beyond ES5 strict mode, ES6 modules, and the inter-realm (aka "global") Symbol registry. But some future changes under discussion may help, such as extensible value types, if done right.
First, an impractical straw man ("straw man" in the negative sense that is) that would have worked even in ES3 days, just to make a point:
When the same url-full-path.js file is loaded multiple times to populate multiple realms, the browser cache hopefully typically hits, avoiding actually loading the source code over the web multiple times. Such cacheable urls might, for examples, be urls on a CDN.
The string representing this source code can then of course be shared across realms, and even between workers sharing an address space. With enough cleverness, large strings can even be shared between address spaces.
All the code generation derived from this string can be re-derived from this string, so all that generated code can be in a memory-budget limited cache. As long as the cache is big enough for the working set of code that needs to be run, a finite cache + some per-realm bookkeeping can handle an unbounded number of realms loading the same sources.
The per-realm bookkeeping has to preserve the correspondence of the identity and state of function objects to the code describing their behavior. The code part of these function objects at a minimum can refer to its source string and the position in that source string of its own source code. Regarding remaining identity and state, all this is still per realm with no further economizing, but this is true for builtin (C++, Rust, etc) functions as well.
The above scenario "solves" the memory consumption problem, but at a cost of regenerating the code from string source on a generated-code-cache-miss. Much of the time spent regenerating from string sources is lexing and parsing, which are context independent even in ES3, so we can cache some immutable representation of the parsed form rather than the source strings, probably taking more space, but reducing the regeneration time. Next is scope analysis, which is stable up to free variables (typically globals) in ES5 strict mode code, so this can be cached inter-realm as well.
ES6 modules brings even more stability of scope analysis, given that our cache-hit test takes transitive imports into account as well.
The inter-realm Symbol registry gives us an inter-realm namespace that we can use for reliable runtime inter-realm brand testing, public slot naming, and duck typing.
Note that all modern JS engines JIT compile to generate the actual machine code, which they invalidate when assumptions change, so all actual machine code is in an invalidate-able cache that cannot be shared between realms. This is an irreducible cost compared to a builtin (C++, Rust, etc) implementation. The point of the inter-realm cache is to reduce the time taken to repopulate this unsharable part. Both caches can be memory-budget limited.
Unfortunately, browser caches do not test cache hits on a sound basis. Really, we need soundly cacheable code URLs to carry a cryptographic hash in the URL, where the browser only considers the loaded content to be valid if its hash matches. Then, a browser can cache and reuse soundly based on hash match. www.tahoe-lafs.org/trac/tahoe-lafs refers to such
URLs as "self-authenticating designators". See also the threads rooted at www.eros-os.org/pipermail/e-lang/2000-January/003188.html, www.eros-os.org/pipermail/e-lang/2000-January/003194.html and the message at www.eros-os.org/pipermail/e-lang/2009-April/013098.html on hashing the transitive closure on import dependencies, rather than hashing each module independently. In retrospect, I expect this further logic to be past the point of diminishing returns. But none of this has been subjected to any measurements.
A political problem arises in the first step -- the browser cache. No one expects a browser to provide an affordance to remove its C++ builtins, forcing them (if there were such an option) to be reloaded over the web. By contrast, all browsers provide, and must provide, an affordance to clear their caches. Under current assumptions, this forces all the externally loaded code to be reloaded the next time they are fetched. The extensible web agenda needs to come to grips with this political problem. I don't know how.
On Tue, Jan 13, 2015 at 6:14 PM, Mark S. Miller <erights at google.com> wrote:
My hypothesis is that it requires no further changes to JavaScript proper beyond ES5 strict mode, ES6 modules, and the inter-realm (aka "global") Symbol registry. But some future changes under discussion may help, such as extensible value types, if done right.
From talking to implementers it seems like there are some things we could still try memory-wise, but it becomes ever more complicated to do and is not currently a high priority. Which makes me wonder how much we realistically expect to progress here and if there's something we could do language-wise that alleviates some of the issues.
SpiderMonkey self-hosted code apparently has access to some mechanisms that allow lazy loading of the self-hosted code and allows for that code to be collected again as well. The bytecode generated from the self-hosted code can apparently be shared cross-realm under certain circumstances. Though this is limited to realms running on the same thread, making workers vastly less efficient and if at some point we start using more threads for the main browser it would become less efficient overall. But perhaps there are further tricks to share this across threads as well as realms. These self-hosted tricks are also currently limited to SpiderMonkey and cannot be used to implement features outside the JavaScript engine.
(Thanks for the analysis by the way. Curious what others have to say.)
We have been trying to improve sharing in JSC for a while now. We can share bytecode between realms, but this is mostly about reducing parse time rather than space saving - the bytecode has to be "linked" before a realm uses it, which involves making a copy of most of the data structures.
I don't think that full sharing is impossible in the current language. Turning off the link step - or eliminating the need for a full copy - is almost doable, but would have some cost for variable resolution performance in the early iterations of a function. I've been toying with sharing JIT code at the lowest JIT tier, which would have similar trade offs.
Something that JSC does have going for it if we had a larger library footprint is that we interpret bytecode for the first 100 or so executions of any function. Even when we have to splat a copy of the bytecode, it still takes less memory than machine code at any level of optimization.
So, I'm curious what issues you were specifically concerned about.
Le 13/01/2015 13:21, Anne van Kesteren a écrit :
A big challenge with self-hosting is memory consumption. A JavaScript implementation is tied to a realm and therefore each realm will have its own implementation. Contrast this with a C++ implementation of the same feature that can be shared across many realms. The C++ implementation is much more efficient.
Why would a JS implementation has to be tied to a realm? I understand if this is how things are done today, but does it need to be? Asked differently, what is so different about JS (vs C++) as an implementation language? It seems like the sharings that are possible in C++ should be possible in JS. What is (or can be) shared in C++ that cannot in JS?
PS: Alternative explanation available here: annevankesteren.nl/2015/01/javascript-web-platform From your post : More concretely, this means that an implementation of |Array.prototype.map| in JavaScript will end up existing in each realm, whereas an identical implementation of that feature in C++ will only exists once.
Why? You could have a single privileged-JS implementation and each content-JS context (~realm) would only have access to a proxy to Array.prototype.map (transparently forwarding calls, which I imagine can be optimized/inlined by engines to be the direct call in the optimistic case). It would cost a proxy per content JS, but that already much much less than a full Array.prototype.map implementation. In a hand-wavy fashion, I'd say the proxy handler can be shared across all content-JS. There is per-content storage to be created (lazily) in case Array.prototype.map is mutated (property added, etc.), but the normal case is fine (no mutation on built-ins means no cost)
One drawback is trying Object.freeze(Array.prototype.map). For this to work with proxies as they are, either the privileged-JS Array.prototype.map needs to be frozen (unacceptable, of course), or each proxy needs a new target (which is equivalently bad than one Array.prototype.map implementation per content-JS context). The solution might be to allow proxies in privileged-JS contexts that are more powerful than the standard ones (for instance, they can pretend the object is frozen even when the underlying target isn't).
This is a bit annoying as a suggestion, because it means JS isn't really implemented in normal JS any longer, but it sounds like a reasonable trade-off (that's open for debate, of course). The "problem" with proxies as they are today is that they were retroffited in JS which severely constrained their design making use cases like the one we're discussing (or even membranes) possible, but cumbersome. Privileged-JS taking some liberties from this design sounds reasonable.
(It was pointed out to me that SpiderMonkey has some tricks to share the bytecode of a JavaScript implementation of a feature across realms, though not across threads (still expensive for workers). And SpiderMonkey has the ability to load these JavaScript implementations lazily and collect them when no longer used, further reducing memory footprint. However, this requires very special code that is currently not available for features outside of SpiderMonkey. Whether that is feasible might be up for investigation at some point.)
For contexts running in parallel to be able to share (read-only) data in JS, we would need immutable data structures in JS, I believe. esdiscuss/2014-November/040218, esdiscuss/2014-November/040219
Before we go tl;dr on this topic, how about some data to back up the asserted problem size? Filip gently raised the question. How much memory does a realm cost in top open source engines? Fair question, empirical and (I think) not hard to answer. Burdened malloc/GC heap full cost, not net estimate from source analysis, would be best. Cc'ing Nick, who may already know. Thanks,
I don't think there is any difference in self-hosting JavaScript or JS-engine in C++. For example, use the example case Array.prototype.map
, in C++, we could code a native function and create a corresponding object for each realm (note that the only shared part is the native function). In JS-based JS engine, we can create multiple objects sharing the same [[Call]] internal slot - exactly same as it is in C++ implementations.
On Wed, Jan 14, 2015 at 1:28 AM, Brendan Eich <brendan at mozilla.org> wrote:
Before we go tl;dr on this topic, how about some data to back up the asserted problem size? Filip gently raised the question. How much memory does a realm cost in top open source engines? Fair question, empirical and (I think) not hard to answer. Burdened malloc/GC heap full cost, not net estimate from source analysis, would be best. Cc'ing Nick, who may already know. Thanks,
Well, I heard that for e.g. B2G we moved from JavaScript workers to C++ threads due to memory constraints. It might well be that this is a solvable problem in JavaScript engines with sufficient research, it's just at the moment (and in the past) it's been blocking us from doing certain things.
SpiderMonkey and Firefox OS people I asked about this just now say the problems are not realm-specific, rather worker-specific and implementation-specific. Best to catch up with them first and get real numbers, attributed to realm and worker separately.
I can provide some insight on this from JSC’s innards.
A realm ain’t free but it is lightweight enough that we don’t really sweat creating them. It’s an object with some prototypes hanging off of it, most of which can be reified lazily if you’re paranoid about their overhead. They incur some theoretical long-term overhead due to the JIT. Even if two realms execute identical code, the JIT will optimize that code separately in the different realms. That’s not a fundamental impasse; rather it’s just how we’ve adapted to the reality of the code we see. We're currently not in a world where many realms all get super hot and run the same code. One realm might get hot enough to JIT, or multiple realms may execute the same code, but multiple realms getting hot on the same code isn’t a thing yet. If it became a thing, then we’d have some cold-hearted engineering to do.
On the other hand a worker might as well be a new process. We end up firing up a new VM instance along with its own heap. That heap and all of that VM’s resources grow and shrink without any interaction with the other VMs you’ve also spawned. Having multiple things in one VM like in the realm case gives what systems guys think of as “elasticity” - if one thing suddenly doesn’t need a resource anymore and another thing simultaneously decides that it needs that same resource, you can easily have a hand-off occur. But if your heaps are separate - and have separate GCs - then the hand-off of resources isn’t so elastic. If one heap suddenly shrinks, then the VM will probably think that it’s best to still hold on to the underlying memory in case the heap grows again soon - it won’t have any way of knowing whether or not some other VM in the same process is actively growing its heap and needs pages. You could achieve some elasticity by having multiple heaps that talk to each other a lot, but probably if you really cared, you’d just have one single heap under the hood. But that’s not how it works in JSC right now, and so workers cost you memory overhead and most of that overhead is from lost elasticity.
This isn’t really how it needs to be, long term. But because threads aren’t a thing yet in JavaScript, the VM would rather believe that none of its resources will be touched by multiple threads at the same time (other than VM-internal threads like GC and JIT), since that gives some small benefits, mostly for maintainability of the VM itself. This implies that each worker currently needs a separate VM.
You could imagine building a multi-threaded VM and then making workers be just an illusion of isolation. Then, workers would be much cheaper - they would be the cost of a realm (lightweight, as I say above) plus the cost of a thread (super cheap on modern-ish OSes). But if you are willing to go to that trouble, then you might as well also change fundamental limitations of workers such as the share-almost-nothing model.
A big challenge with self-hosting is memory consumption. A JavaScript implementation is tied to a realm and therefore each realm will have its own implementation. Contrast this with a C++ implementation of the same feature that can be shared across many realms. The C++ implementation is much more efficient.
If we want to get further with turning the web platform into a giant JavaScript library, we need to tackle this somehow.
Has anyone been thinking about how to do this and what changes it would require from JavaScript proper? We're now at the point where we can implement platform objects in terms of JavaScript, but JavaScript loses out due to lack of efficiency.
PS: Alternative explanation available here: annevankesteren.nl/2015/01/javascript