Mark S. Miller (2015-01-13T17:14:56.000Z)
d at domenic.me (2015-01-20T21:59:07.656Z)
My hypothesis is that it *requires* no further changes to JavaScript proper beyond ES5 strict mode, ES6 modules, and the inter-realm (aka "global") Symbol registry. But some future changes under discussion may help, such as extensible value types, if done right. First, an impractical straw man ("straw man" in the negative sense that is) that would have worked even in ES3 days, just to make a point: When the same url-full-path.js file is loaded multiple times to populate multiple realms, the browser cache hopefully typically hits, avoiding actually loading the source code over the web multiple times. Such cacheable urls might, for examples, be urls on a CDN. The string representing this source code can then of course be shared across realms, and even between workers sharing an address space. With enough cleverness, large strings can even be shared between address spaces. All the code generation derived from this string can be re-derived from this string, so all that generated code can be in a memory-budget limited cache. As long as the cache is big enough for the working set of code that needs to be run, a finite cache + some per-realm bookkeeping can handle an unbounded number of realms loading the same sources. The per-realm bookkeeping has to preserve the correspondence of the identity and state of function objects to the code describing their behavior. The code part of these function objects at a minimum can refer to its source string and the position in that source string of its own source code. Regarding remaining identity and state, all this is still per realm with no further economizing, but this is true for builtin (C++, Rust, etc) functions as well. The above scenario "solves" the memory consumption problem, but at a cost of regenerating the code from string source on a generated-code-cache-miss. Much of the time spent regenerating from string sources is lexing and parsing, which are context independent even in ES3, so we can cache some immutable representation of the parsed form rather than the source strings, probably taking more space, but reducing the regeneration time. Next is scope analysis, which is stable up to free variables (typically globals) in ES5 strict mode code, so this can be cached inter-realm as well. ES6 modules brings even more stability of scope analysis, given that our cache-hit test takes transitive imports into account as well. The inter-realm Symbol registry gives us an inter-realm namespace that we can use for reliable runtime inter-realm brand testing, public slot naming, and duck typing. Note that all modern JS engines JIT compile to generate the actual machine code, which they invalidate when assumptions change, so all actual machine code is in an invalidate-able cache that cannot be shared between realms. This is an irreducible cost compared to a builtin (C++, Rust, etc) implementation. The point of the inter-realm cache is to reduce the time taken to repopulate this unsharable part. Both caches can be memory-budget limited. Unfortunately, browser caches do not test cache hits on a sound basis. Really, we need soundly cacheable code URLs to carry a cryptographic hash in the URL, where the browser only considers the loaded content to be valid if its hash matches. Then, a browser can cache and reuse soundly based on hash match. <https://www.tahoe-lafs.org/trac/tahoe-lafs> refers to such URLs as "self-authenticating designators". See also the threads rooted at http://www.eros-os.org/pipermail/e-lang/2000-January/003188.html http://www.eros-os.org/pipermail/e-lang/2000-January/003194.html and the message at http://www.eros-os.org/pipermail/e-lang/2009-April/013098.html on hashing the transitive closure on import dependencies, rather than hashing each module independently. In retrospect, I expect this further logic to be past the point of diminishing returns. But none of this has been subjected to any measurements. A political problem arises in the first step -- the browser cache. No one expects a browser to provide an affordance to remove its C++ builtins, forcing them (if there were such an option) to be reloaded over the web. By contrast, all browsers provide, and must provide, an affordance to clear their caches. Under current assumptions, this forces all the externally loaded code to be reloaded the next time they are fetched. The extensible web agenda needs to come to grips with this political problem. I don't know how.