Spawn proposal strawman

# Mark S. Miller (16 years ago)

On Wed, May 6, 2009 at 10:53 AM, Mark S. Miller <erights at google.com> wrote:

[...] For the Valija-like level, I think the most important enabler would be some kind of hermetic eval or spawn primitive for making a new global context (global object and set of primordials) whose connection to the world outside itself is under control of its spawner. With such a primitive, we would no longer need to emulate inheritance and mutable globals per sandbox.

May as well get us started with a strawman, intended only to provide the global object separations, not any scheduling separations. Whether these two issues should in fact be coupled or decoupled is an interesting question.

eval.hermetic(program :(string | AST), thisValue, optBindings) :any
eval.spawn(optBindings, optGlobalSubsetDirective :Opt(string)) :Sandbox

interface Sandbox {
  public getGlobal() :GlobalObject;
  public deactivate(problem :string) :void;
}

eval.hermetic() does an indirect eval of the program in the current global context, but, as in strawman:lexical_scope,

without the global object at the bottom of the scope chain.

the 'this' value in scope at the top of the evaluated program is the provided thisValue; so a caller without access to their own global object cannot grant access they don't have. If the 'thisValue' is absent, then the global 'this' of the evaled program is undefined.

  • If optBindings is absent or falsy, then the bottom of the scope chain is populated with the ES spec defined global bindings, i.e., {Object: Object, Array: Array, etc...}. If optBindings is provided and truthy, then its own properties (or perhaps its enumerable properties, or its enumerable own properties?) are uses to populate the bottom of the scope chain. In either case, the object at the bottom of a hermetic scope chain is a declarative environment record, not an object environment record.

eval.spawn() makes a new global context -- a global object and a set of primordials. This new global context is subsidiary to the present one for (at least) deactivation and language subsetting purposes, so spawning forms a tree.

  • If optBindings are provided, its own properties are copied over to populate the newly created global object.
  • If optGlobalSubsetDirective is provided, then all code evaluated in this global context is constrained as if by an outer lexical use subset directive. Subset constraints compose across subsidiary spawns -- as if the optGlobalSubsetDirective of outer sandboxes were yet more outer lexical use subset directives.
  • It returns a Sandbox object for interacting with that new global context: obtaining its global object or deactivating all objects created within this global context or any subsidiary global contexts. If there are any live stack frames executing within a context that would be deactivated at the time sandbox.deactivate() is called, then deactivate throws and has no other effect.

One then runs code hermetically within a sandbox by

sandbox.getGlobal().eval.hermetic(...)

Given good catchalls and weak-key-tables, good membranes should be possible. Indeed, this should be a litmus test of catchall proposals. Given good membranes, one can easily gain other desired security properties by interposing membranes around some of the above objects.

# Kris Kowal (16 years ago)

On Wed, May 6, 2009 at 12:59 PM, Mark S. Miller <erights at google.com> wrote:

On Wed, May 6, 2009 at 10:53 AM, Mark S. Miller <erights at google.com> wrote:

[...] For the Valija-like level, I think the most important enabler would be some kind of hermetic eval or spawn primitive for making a new global context (global object and set of primordials) whose connection to the world outside itself is under control of its spawner. With such a primitive, we would no longer need to emulate inheritance and mutable globals per sandbox.

Before I dive into inline comments, I'll lay out some background for the discussion.

There are now seven prototypes for the securable module proposal in the server-side JavaScript group. The spec is weak on implementation details, so we're seeing a variety of ways to implement what is effectively the "salty" or "transitional" require/exports system from Ihab's and my proposal in January. While hermetic eval is all we "need", some of the implementations benefit from a more specialized "module evaluator".

As was pointed out in Mountain View, modules must be "program" constructions. We also want module factory functions to be reusable factory methods that accept "require" and "exports" (and "system" for dependency injection, but that is an orthogonal matter and still a subject of debate).

When we use hermetic eval, we coax it to return a module factory function by injecting the text of the module into a function constructor.

"(function (require, exports) {" + text + "/**/\n}"

This doesn't enforce the "program" construction, and some of the JavaScript language semantics suffer for it. For example, function statements aren't defined and do not necessarily get bound to the module scope in some implementations. It also is "vulnerable" to "injection" style attacks, like:

"}); /* run at load-time */ (function (require, exports) {"

Some of the implementations avoid these problems in various ways with various side-effects. GPSEE and Helma NG parse the module as a program and save the AST in the module loader's memo. Then, "require" creates a fresh context and scope. The scope contains "require" and "exports" and is parented in the sandbox's primordial or global object. This has its own set of implications:

  • "this" is the module instance's unique scope object at the top level.
  • in function's that are called anonymously, is the module instance's unique scope object, unless the interpreter is very recent and strict (I'm vague on the details of this change to the language).
  • the primordials can only be accessed as free variables or as members of "proto", if that's supported.
  • "require" and "exports" are accessible and mutable on the module instance's unique scope object.

In my opinion, it would be ideal if:

  • "this" where "undefined" in the top scope.
  • "this" were "undefined" in functions that are called anonymously.
  • the primordials can only be accessed as free variables.
  • "require" and "exports" were only accessible as free variables.

At this stage, there are two kinds of module systems we can implement, that vary in character. They vaguely correspond to the character of Cajita and Valija respectively (strict/simple/fast/easy vs permissive). The Valija loader has a unique tree of primordial objects for each sandbox. The Cajita loader only needs one, deeply frozen primordial tree. (Note: in my present server-side experiments, there are two global trees: one is the original, thawed global object in the bootstrapping context, with which I make the loader and attenuate other authorities like file system access. In this case, there's an outer module system that generates its file API from its ambient authority, and an inner module system where all authority flows through the "system" free-variable or module).

Valija-style:

  • the transitive primordials can be monkey patched
  • sandboxes are expensive: not only do you need to create fresh primordials for each sandbox, you cannot share module factory functions among sandboxes since module factory functions closes on its primordials.
  • matching the types of objects passed among sandboxes is hard. A lot of the time, this means that serializing and de-serializing objects as JSON across sandbox membranes (much like worker threads) will save a lot of frustration. However, this would not be possible for functions, which means no object capabilities. With herculean effort, we could use shadowing/proxy-objects among sandboxes, much like we presently have to do between client and server for RPC.

Cajita-style:

  • for both good and ill, the primordials can not be monkey patched
  • sandboxes are cheap
  • objects can be shared among sandboxes

With the server-side JavaScript group, I've been working on one prototype called Narwhal with Tom Robinson. It runs on Rhino (with Helma, GAE, or bare-bones) or V8 (with K7 or v8cgi) so far. There's a kernel loader that boot-straps itself using a "module evaluator" function, a file reader function, a file existence tester function, and a copy of the system environment object. It uses the kernel-loader to load up a sandbox module and its transitive dependencies. These dependencies include a platform-specific file API module, which gets its authority to access the file system from ambient authorities like the Java package name spaces or dynamically-loaded FFI modules. Then, it loads your application module.

At this point, you have a module system and all the ambient authority you normally have in JavaScript. You can then elect to created attenuated sandboxes. This is a compromise. It's my hope that the value of having sandboxes will compel people to use them as often as possible, and write modules that will work with their restrictions (no monkey-patching of globals/primordials). I think that Kevin Dangoor and the ServerJS community are in agreement that the ServerJS standard library, at least, will abide by the rules of the sandbox.

That being said, it will occasionally be necessary to use modules that do not play well with these rules. When that's the case, we could still support Valija-style modules by providing an alternate interface that creates new primordial trees for every sandbox and just suck up the performance loss and communication complexity.

So, onward to the straw man:

eval.hermetic(program :(string | AST), thisValue, optBindings) :any

The most closely matching interface for the needs of Caja-style modules would be:

eval.hermetic(program:String)
    :Function(require:Function, exports:Object)
    :Undefined

So a module could be loaded:

var factory = eval.hermetic(program);

And instantiated:

let require = Require(baseId);
let exports = {};
memo[baseId] = exports;
factory(require, exports);

As for the value of "this", i'd hope for "undefined" or the frozen globals, but that poses a problem for the GPSEE/Helma-style implementation. If the top and only scope for a module is "undefined", or even a frozen object, there's nowhere to put top-level declarations.

There are a couple ways to dodge this issue: we could either beget the top-level object from globals, or we could make the initial scope chain contain both the frozen globals and an empty function closure for module locals. Both of these solutions are a stretch to implement in present-day *Monkey, Rhino, and presumably all others (which, I hope we agree is a red herring). To address the former problem, the engine would need to provide a program evaluator that distinguishes the top-level scope from the "this" value passed to anonymous functions. To address the latter problem, the engine would need to provide a program evaluator that begins with two scopes, one with globals, the other for module locals, wherein top-level declarations (and potentially free assignment, if that can't simply be culled from modules) are bound to the locals instead of the globals. I believe this addresses issues with Mark's idea:

eval.hermetic() does an indirect eval of the program in the current global context, but, as in strawman:lexical_scope, without the global object at the bottom of the scope chain.

Implementations would need to decouple the top of the scope chain and the global object.

We could fall back to a consistent value for "this", like a module scope object or a singleton frozen global scope object (since there's no need for more than one frozen global scope object).

For the Cajita style, there is no need to expose initGlobalObjects, albeit "eval.spawn" to the user, since a single deeply-frozen global scope object can be closed over by "eval.hermetic" and safely shared among sandboxes. There's also no need to add bindings to the global object.

Then, addressing Valija-style:

eval.spawn(optBindings, optGlobalSubsetDirective :Opt(string)) :Sandbox

The most closely matching interface for the needs of Valija-style modules would be:

eval.hermetic(program:(String|AST), global:Object, local:Object)
eval.spawn() :Object

I presume that, like Mark's optBindings, both global and local might not be the exact objects provided in the sandbox, and that we might deem it necessary to copy the [enumerable] members of those objects into lexical scope frames to avoid prototype inheritance slip-ups.

So, a module could be run by:

let require = Require(baseId);
let exports = memo[baseId] = {};
let sandbox = eval.spawn();
evaluate(program, sandbox, {require, exports});

With the addition of a "eval.parse(text):AST", we could recover some performance lost with this method, by sharing AST's among sandboxes.

Loading:

var ast = eval.parse(program); programs[baseId] = ast;

Instantiating:

let require = Require(baseId); let exports = memo[baseId] = {}; let sandbox = eval.spawn(); evaluate(programs[baseId], sandbox, {require, exports});

It might be desirable for spawn to provide both deeply frozen and thawed global trees, or for hermetic to use its own deeply frozen singleton if no scope is provided.

Once:

var sandbox = eval.spawn(true);

Loading:

var ast = eval.parse(program); programs[baseId] = ast;

Instantiating:

let require = Require(baseId); let exports = memo[baseId] = {}; evaluate(programs[baseId], sandbox, {require, exports});

interface Sandbox {      public getGlobal() :GlobalObject;      public deactivate(problem :string) :void;    }

I would hope that deactivation would be implicit by dropping all references to the sandbox, apart from its internal references.

My present Sandbox objects are Functions that look like (please forgive my pseudo-syntax):

interface Sandbox({loader:Loader, modules:Object, system:Object}) { public getSystem() :Object; public getLoader() :Loader; public invoke(mainId) :Object; // returns the exports }

getGlobal() would be necessary for the Valija-style, and for executing non-module code snippets in the sandbox.

I find that there's only ever need of one sandbox implementation. It works for both client-side, and server-side. It works for secure and insecure boxes. The only thing that ever seems to vary for me is what type to use for the module exports memo object (a usual Object, or something more elaborate like a Map).

interface Loader { public resolve(id, [baseId]); public load(resolvedId); }

That's the minimum interface for Loader as used by Sandbox. Implementations generally have:

interface Loader { public resolve(id, [baseId]); public load(resolvedId); // fetch->evaluate->memoize->return public fetch(resolveId); // storage dependent, attenuated public evaluate(text, resolveId); // hermetic eval public reload(resolvedId); // for convenience public isLoaded() :Boolean; // convenience public canLoad(resolvedId) :Boolean; // for multi-loader searching }

  • Unlike strawman:lexical_scope, the 'this' value in scope at the top of the evaluated program is the provided thisValue; so a caller without access to their own global object cannot grant access they don't have. If the 'thisValue' is absent, then the global 'this' of the evaled program is undefined.

This is very important, and can be satisfied many ways.

eval.spawn() makes a new global context -- a global object and a set of primordials. This new global context is subsidiary to the present one for (at least) deactivation and language subsetting purposes, so spawning forms a tree.

I'm not sure what these details about subsidization and deactivation are meant to address. I envision that the details would emerge from the garbage collector. If there are multiple garbage collectors with disjointly managed but cross-referenced objects (like the DOM and JS today)…well, just let me know. I don't want to make a fuss about a phantom, but I have thoughts about how that would be a problem and how we could address it.

  • If optGlobalSubsetDirective is provided, then all code evaluated in this global context is constrained as if by an outer lexical use subset directive. Subset constraints compose across subsidiary spawns -- as if the optGlobalSubsetDirective of outer sandboxes were yet more outer lexical use subset directives.

So that sandboxes can become recursively more restrictive, but not implicitly add objects they did not receive, I presume. This would apply to the Valija world, where you might want to pass the same monkey patching you've done to yourself onto your children. I doubt this could be pulled off well, since it would involve not merely enumerating the shallow properties, but merging the properties of the entire tree. I think it would be better if recursive sandboxes were responsible for monkey patching themselves with the modules they have access to, even though this would have a performance penalty.

One then runs code hermetically within a sandbox by

sandbox.getGlobal().eval.hermetic(...)

Interesting.

Given good catchalls and weak-key-tables, good membranes should be possible. Indeed, this should be a litmus test of catchall proposals. Given good membranes, one can easily gain other desired security properties by interposing membranes around some of the above objects.

Exciting. I've definitely felt the need for weak references. I'm not sure why they're necessary for the module loader though, since dropping a sandbox should be as simple (or complicated) as ditching all references you possess to the sandbox or its contents. Wes, Ihab, and I discussed at length what it would mean if the module memo table were weak, but resolved that it would be of little practical value and that the motivating problem would be better solved with generation garbage collection.

Summarily, I think that this straw man tackles too much complexity in the line of appeasing monkey-patchers, but appears to do so well. I've got a sneaking suspicion that monkey-patching globals has such strong support that we must permit it. If that's the case, I hope we can provide a switch on the sandbox machinery so the application programmer can chose whether they want light-weight sandboxes or heavy ones.

Kris Kowal

# kevin curtis (16 years ago)

Re: eval.hermetic(program :(string | AST), thisValue, optBindings) :any

Is a 'canonical' AST part of the plans for ecmascript 6/harmony.

# David-Sarah Hopwood (16 years ago)

kevin curtis wrote:

Re: eval.hermetic(program :(string | AST), thisValue, optBindings) :any

Is a 'canonical' AST part of the plans for ecmascript 6/harmony.

I hope so; that would be extremely useful. I would like to see an ECMAScript source -> AST parser (as well as an AST evaluator) in the

Harmony standard library.

# Eugene Lazutkin (16 years ago)

+1 from me too. AST + tools (compile JS to AST, get AST from a function object, produce a function object from AST, simple AST manipulations, and so on) will help us (JS developers) tremendously by reducing the existing hackery, and increasing the performance, while clearing the road for the various meta-programming techniques.

Eugene Lazutkin Dojo Toolkit, Committer lazutkin.com

# Brendan Eich (16 years ago)

On May 9, 2009, at 9:19 AM, David-Sarah Hopwood wrote:

kevin curtis wrote:

Re: eval.hermetic(program :(string | AST), thisValue, optBindings) :any

Is a 'canonical' AST part of the plans for ecmascript 6/harmony.

I hope so; that would be extremely useful. I would like to see an ECMAScript source -> AST parser (as well as an AST evaluator) in the Harmony standard library.

We've wanted this since early in ES4 days. It would help many projects
and experimental extensions (type checkers, template systems, macro
processors, etc.) to have a standard AST, which could be serialized to
JSON.

The AST should not be lowered much, if at all. I believe users will
want to capture idioms and nuances in the source. As you've noted re:
+= and kin, the language has high-level compound semantics without a
low form that can express Reference (lvalue) computation, [[Put]], etc.

I am thinking of implementing support for a JSON-friendly AST in
SpiderMonkey to speed up Narcissus (hat tip: Blake Kaplan suggested
this). Thus Narcissus could use the host JS compiler to produce ASTs
it then interprets (as it currently produces trees, slowly, using its
own self-hosted regexp-based lexer and recursive-descent/operator- precedence parser). I'll post a link to the bug so people can track
progress if they are interested.

With some successful experiments, perhaps we can converge quickly on a
standardizable high-level AST.

Suggestions welcome on any issues that crop up. Here's one: && and ||
want to group to the right, for short-circuiting evaluation, yet the
ES1-5 grammars use standard left-recursive bottom-up-style productions:

LogicalORExpression: See 11.11 LogicalANDExpression LogicalORExpression || LogicalANDExpression

This will produce a tree for X || Y || Z shaped like this initializer
(skipping tedious quoting of property names):

{op: "||", left: {op: "||", left: {op: "Id", value: "X"}, right: {op: "Id", value: "Y"}}, right: {op: "Id", value: "Z"}}

A definitional interpreter the recursively evaluates nodes can handle
this easily enough, although the right-recursive AST alternative,
which is equivalent AFAIK, would be better for avoiding recursion. A
simple code generator, on the other hand, definitely wants the right- recursive tree:

{op: "||", left: {op: "Id", value: "Z"}, right: {op: "||", left: {op: "Id", value: "X"}, right: {op: "Id", value: "Y"}}}

so that, at each recursion level, branches to the next bytecode after
the expression can be emitted and fixed up using local variables only.
Of course backpatching could be used (and must be used for other
control structures in the language), but all else equal, || and &&
want to be right-associated in the AST.

Users chain || and && with the same op, without parentheses. GCC- nagged programmers know to parenthesize && against || even though &&
is higher precedence, to avoid a warning.

The issue here seems to be that the ES1-5 grammar over-specify the AST
in some cases. If we do standardize the AST, then we have to choose,
and I'm in favor of right association for these logical connective
ops, since they short-circuit evaluation based on truthiness (||) or
falsiness (&&) of the left operand. Thoughts?

# Brendan Eich (16 years ago)

On May 9, 2009, at 11:57 AM, Brendan Eich wrote:

{op: "||", left: {op: "||", left: {op: "Id", value: "X"}, right: {op: "Id", value: "Y"}}, right: {op: "Id", value: "Z"}}

A definitional interpreter the recursively evaluates nodes can
handle this easily enough, although the right-recursive AST
alternative, which is equivalent AFAIK, would be better for avoiding
recursion. A simple code generator, on the other hand, definitely
wants the right-recursive tree:

{op: "||", left: {op: "Id", value: "Z"}, right: {op: "||", left: {op: "Id", value: "X"}, right: {op: "Id", value: "Y"}}}

Of course I got sleepy after cutting and pasting, and didn't edit the
values ;-). That should be:

{op: "||", left: {op: "Id", value: "X"}, right: {op: "||", left: {op: "Id", value: "Y"}, right: {op: "Id", value: "Z"}}}

# David-Sarah Hopwood (16 years ago)

Brendan Eich wrote:

With some successful experiments, perhaps we can converge quickly on a standardizable high-level AST.

Suggestions welcome on any issues that crop up. Here's one: && and || want to group to the right, for short-circuiting evaluation, yet the ES1-5 grammars use standard left-recursive bottom-up-style productions:

LogicalORExpression: See 11.11 LogicalANDExpression LogicalORExpression || LogicalANDExpression

This will produce a tree for X || Y || Z shaped like this initializer (skipping tedious quoting of property names):

{op: "||", left: {op: "||", left: {op: "Id", value: "X"}, right: {op: "Id", value: "Y"}}, right: {op: "Id", value: "Z"}}

I would mildly prefer to use an S-expression-style AST, like this:

["||", ["||", ["Id", "X"], ["Id", "Y"]], ["Id", "Z"]]

which is more concise, does not lose any useful information, and is easier to remember. This is the same style as used in JsonML (www.jsonml.org).

A definitional interpreter the recursively evaluates nodes can handle this easily enough, although the right-recursive AST alternative, which is equivalent AFAIK, would be better for avoiding recursion. A simple code generator, on the other hand, definitely wants the right-recursive tree:

{op: "||", left: {op: "Id", value: "Z"}, right: {op: "||", left: {op: "Id", value: "X"}, right: {op: "Id", value: "Y"}}}

so that, at each recursion level, branches to the next bytecode after the expression can be emitted and fixed up using local variables only. Of course backpatching could be used (and must be used for other control structures in the language), but all else equal, || and && want to be right-associated in the AST.

I'm not convinced about this. '||' and '&&' are defined in ES5 to be left-associative; that is, "a || b || c" is defined to mean "(a || b) || c", and that is how its evaluation is specified.

It so happens that this is semantically equivalent to "a || (b || c)", but that is not an equivalence of abstract syntax. It's trivial for a code generator (even a very naïve one) to convert between these if it needs to, but the AST should preserve the associativity defined in the language spec.

I'm in favor of right association for these logical connective ops, since they short-circuit evaluation based on truthiness (||) or falsiness (&&) of the left operand. Thoughts?

Evaluating "a || b || c" always evaluates the left operand of the outer '||', which is "a || b". This in turn always evaluates "a". But the left operand of the outer expression is not "a", unless we also want to change how '||' and '&&' are specified.

# Brendan Eich (16 years ago)

On May 9, 2009, at 2:32 PM, David-Sarah Hopwood wrote:

I would mildly prefer to use an S-expression-style AST, like this:

["||", ["||", ["Id", "X"], ["Id", "Y"]], ["Id", "Z"]]

which is more concise, does not lose any useful information, and is easier to remember. This is the same style as used in JsonML (www.jsonml.org).

Not bad. I will give it a try in the prototype I've started. Thanks
for the pointer.

It so happens that this is semantically equivalent to "a || (b ||
c)", but that is not an equivalence of abstract syntax.

I know, that's my point. The connectives can be parsed and evaluated
either way, but the short-circuiting seems to favor right
associativity, on one hand.

On the other hand, I'm defying a long tradition here. The precedent
going back to C (K&R before ANSI got in the loop; ignore Algol and
BCPL precedents ;-) uses left associativity, and ES1 followed suit.

So this is not a big deal, and it would be quixotic of me to make too
much of it :-). I wanted to raise it as a potential issue. It's a
practical issue in SpiderMonkey since we do right-associate || and &&
in the internal syntax tree.

# Mark S. Miller (16 years ago)

On Sat, May 9, 2009 at 2:32 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:

[...] but the AST should preserve the associativity defined in the language spec.

But which language spec? Again, specs only traffic in observable differences. Since ES5 does not define any std parse or AST API, there is no observable difference in ES5 whether this is specified as left-or-right associative. Assuming ES6 does define such APIs, the difference becomes observable. I see no reason why ES6 could not compatibly specify a right associative grammar for || and &&.

# Brendan Eich (16 years ago)

On May 9, 2009, at 2:59 PM, Mark S. Miller wrote:

On Sat, May 9, 2009 at 2:32 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:

[...] but the AST should preserve the associativity defined in the language spec.

But which language spec? Again, specs only traffic in observable differences. Since ES5 does not define any std parse or AST API, there is no observable difference in ES5 whether this is specified as left-or-right associative.

Exactly.

Assuming ES6 does define such APIs, the difference becomes observable. I see no reason why ES6 could not compatibly specify a right associative grammar for || and &&.

Glad to hear this. I'm going with right associativity for local- convention and lazy-hacker reasons, in the SpiderMonkey prototype.

Of course if we standardize left-associative || and &&, I will cope --
software can do anything. But there's a slight and (IMHO) winning
economy of expression, interpretation, and compilation, in right
associativity for these logical connectives.

Ritchie's original C compiler, and Thompson's implementations of B
(initially for a very limited memory machine, PDP-7), focused on one- pass code generation. This obviously made for busy-work burdens
(forward declarations, e.g.) for the C programmer. At first (B on the
PDP-7) there was no alternative. BCPL had a "let rec" equivalent ("let
X be ... and Y be ...") but Thompson's target machine had insufficient
core to hold an intermediate representation in memory for real
programs. I'm pretty sure all these compilers treated & and | (which
split into && and || later, explaining the poor precedence for & and
|) as right-associative.

Today we have too much memory to waste, and hardly any living
programmer worries about such things as fitting a compiler in 8K.
More's the pity.

But why shouldn't we bias the AST, in this particular case, toward the
more economical associativity mode?

# David-Sarah Hopwood (16 years ago)

Mark S. Miller wrote:

On Sat, May 9, 2009 at 2:32 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:

[...] but the AST should preserve the associativity defined in the language spec.

But which language spec? Again, specs only traffic in observable differences. Since ES5 does not define any std parse or AST API, there is no observable difference in ES5 whether this is specified as left-or-right associative. Assuming ES6 does define such APIs, the difference becomes observable. I see no reason why ES6 could not compatibly specify a right associative grammar for || and &&.

I have no objection to that as long as the AST API and the ES6 grammar are consistent with each other.

# kevin curtis (16 years ago)

Are there 2 approaches to adding AST functionality in ecmascript:

  1. The AST as JSON Either Brendan's original example or the S-expression-ish jsonml. This covers both the in memory representation of the AST and its serialization format.

  2. API Similar to the ast module in python where multiple api calls build the AST nodes.

# Brendan Eich (16 years ago)

(Targeted replies below, too much to digest in one pass. :-P)

On May 8, 2009, at 8:49 PM, Kris Kowal wrote:

"(function (require, exports) {" + text + "/**/\n}"

Nit-picking a bit on names: require : provide :: import : export -- so
mixing require and export mixes metaphors. Never stopped me ;-).

This doesn't enforce the "program" construction, and some of the JavaScript language semantics suffer for it. For example, function statements aren't defined and do not necessarily get bound to the module scope in some implementations.

What makes functions eval'ed hermetically by the module function occur
in a statement context? They should be nested function declarations,
not (sub-)statements. Or I'm missing something.

To address the latter problem, the engine would need to provide a program evaluator that begins with two scopes, one with globals, the other for module locals, wherein top-level declarations (and potentially free assignment, if that can't simply be culled from modules) are bound to the locals instead of the globals.

This is a language change. ES1-5 put free variables created by
assignment in the object at the bottom of the scope chain.

I believe this addresses issues with Mark's idea:

eval.hermetic() does an indirect eval of the program in the current global context, but, as in strawman:lexical_scope, without the global object at the bottom of the scope chain.

Mark is citing a proposal that removes the global object from the
scope chain; that proposal does not fiddle with where declared and
free vars go.

Implementations would need to decouple the top of the scope chain and the global object.

Implementations can do this easily, but the issue is language-level:
is the global object at the bottom of the scope chain? So far, it is.

With the addition of a "eval.parse(text):AST", we could recover some performance lost with this method, by sharing AST's among sandboxes.

I've considered exposing the AST encoder as eval.parse. It's a cute
trick to use eval as a namespace, tempting in order to minimize
compatibility hits in adding to the global object. But it feels a
little "off" here.

Also, the AST codec I'm writing produces and consumes strings of
JsonML. It's up to the user to invoke the JSON codec (or not), taking
the hit of building a big array/object graph only as needed. This
should not be a mandatory consequence of parsing from source to an AST
form that can be consumed by the engine. You shouldn't have to go from

source string -> object graph -> JsonML string

in the language itself, since the middle step can chew up a fair
amount of memory compared to an incremental (top-level-declaration-or- statement at a time or better), native parser that feeds its internal
AST into a native stringifier.

# Kris Kowal (16 years ago)

On Mon, May 11, 2009 at 9:26 AM, Brendan Eich <brendan at mozilla.com> wrote:

On May 8, 2009, at 8:49 PM, Kris Kowal wrote:

"(function (require, exports) {" + text + "/**/\n}" Nit-picking a bit on names: require : provide :: import : export -- so mixing require and export mixes metaphors. Never stopped me ;-).

I agree about mixing metaphors. The befuddlement of start : stop :: begin : end is one that bothers me a lot. The notion is to desugar "import" and "export" to these two facets, importing and exporting. imports : exports would be proper, but doesn't read well in code. The reason for using the term "exports" is to ease migration, since:

exports.a = function a() {};

Is easy to transform textually to:

export a = function a() {};

So, I'm inclined to stick with "exports" instead of "provide". The metaphor would be complete if we used "imports(id)" or "import(id)". Since "import" is a keyword, it would not be available for the desugarred syntax. That leaves "imports".

const {a} = imports("module");

What makes functions eval'ed hermetically by the module function occur in a statement context? They should be nested function declarations, not (sub-)statements. Or I'm missing something.

Perhaps I'm behind on the times, but I'm under the impression that presently the behavior of this function "foo" declaration has no standard behavior:

(function () { function foo() { } })();

If foo gets bound in function block scope, there's no problem (which is the case in most browsers, I believe), but if it gets bound as a member of global, that would be a problem, and if it gets bound like a free assignment, it would only be a problem if free assignment isn't localized to the module somehow.

This is a language change. ES1-5 put free variables created by assignment in the object at the bottom of the scope chain.

I'm of course in favor of changing as little as possible. If the bottom-most scope is unique to the present module instead of the global object, there's no need for change here.

Mark is citing a proposal that removes the global object from the scope chain; that proposal does not fiddle with where declared and free vars go.

Alright, I'm following now. I'll explain why I think that this would be sufficient to fix some problems, although perhaps not necessary.

Implementations would need to decouple the top of the scope chain and the global object.

Implementations can do this easily, but the issue is language-level: is the global object at the bottom of the scope chain? So far, it is.

I've operated on the assumption that the global object was on the bottom of the scope chain. There are some concerns about module texts for parsing and interpreting modules, some of which might be sufficiently addressed by moving global off the scope chain for module evaluation, but perhaps not necessarily.

  • free assignment. I'm less concerned about the behavior of free assignment. I'd prefer assignment to global to be explicit, but this ship may have sailed long ago. It might be more appropriate for free assignment to create module locals or module exports, which could be accomplished by changing the bottom-of-the-scope-chain, or by changing the behavior of free assignment in the context of a hermetic eval. In any case, this is not something I'm deeply concerned with.
  • function statements. These really must be module local. I'm not in-the-know about whether this is a problem or not. In the case where hermetic eval runs a program, we'd have to wrap the program in a function declaration. In that case, if function statements create function block scope locals, there's no problem. If they operate like free assignment, then there's a problem if the bottom-scope is "global", but not if it's a module local object. If hermetic eval returns a module factory function that runs a program with a given require and exports object, function statements would occur in the bottom-scope. In that case, it would be a problem if the bottom-scope were "global", whether or not function statements behave like free assignment or function block scope declarations.
  • return statement. This should be a parse error in the top most scope of a module. If hermetic eval wraps a module's text in a function declaration, the return statement would not be a parse error, which would be a problem. If heremetic eval returns a function that executes the module with a given require and exports object, then return would be a parse error in the bottom-scope.
  • injection attack strings. These are a weakness of using a hermetic eval that immediately evaluates a module factory function expression with the module text inside.

In present implementations, it's possible to have globals available as free variables by replacing the bottom-scope-chain object with one begotten from globals. I'd concede that this is a hack.

Here's a pseudo-code representation of our options and their ramifications:

running a module looks like: let require = Require(id); let exports = memo[id] = {}; factory(require, exports);

if hermetic eval, like eval, immediately evaluates a program and returns the last evaluated expression:

creating a module factory function looks like: let factory = eval.hermetic( "(function(require,exports){" + moduleText + "/**/})" );

PROBLEM: "return" statements ARE NOT an error.

PROBLEM: "}); (function () {" attacks ARE possible.

if bottom of the scope chain is global:

   if function statements are free assignment:

       PROBLEM: function statements should
       be local to modules.

   if function statements are function block
   local:

       no problem.

if bottom of the scope chain is local:

   no problem with free assignment.

   if the bottom-scope is frozen:

       no problem

   if the bottom-scope is mutable:

       PROBLEM: module factories cannot be shared
       since data can be communicated among
       instances of the same module via their common
       bottom-scope-object.

if hermetic eval returns a module factory function that evaluates a program with given a (require, exports) upon invocation:

let factory = eval.hermetic(moduleText);

"return" statements ARE an error.

"}); (function () {" attacks ARE NOT possible.

if bottom of the scope chain is global:

   if global is mutable:

       PROBLEM: local variables are shared among
       all modules.

   if global is frozen:

       PROBLEM: local variable declarations are
       run-time errors.

   if function statements are free assignment:

       if free assignment binds to bottom-scope:

           PROBLEM: function statements are global.

       if free assignment binds to module scope:

           no problem.

   if function statements are function block
   local:

       PROBLEM: function statements are global.

if bottom of the scope chain is local:

   PROBLEM: where do the primordials come from?
   Presumably, the primordials would have to be
   either copied onto the module local scope,
   or the module scope must be begotten from
   globals.

   if function statements are free assignment:

       no problem.

   if function statements are function block
   local:

       no problem.

I've considered exposing the AST encoder as eval.parse. It's a cute trick to use eval as a namespace, tempting in order to minimize compatibility hits in adding to the global object. But it feels a little "off" here.

Agreed. I'm not attached to the name.

Kris Kowal

# Brendan Eich (16 years ago)

On May 11, 2009, at 4:10 PM, Kris Kowal wrote:

I agree about mixing metaphors. The befuddlement of start : stop :: begin : end is one that bothers me a lot. The notion is to desugar "import" and "export" to these two facets, importing and exporting. imports : exports would be proper, but doesn't read well in code. The reason for using the term "exports" is to ease migration, since:

exports.a = function a() {};

Is easy to transform textually to:

export a = function a() {};

Hmm, that cuts both ways. You can run a sed script or whatever, but if
skimming or otherwise manually inspecting (humans are error-prone ;-)
the difference is slight.

So, I'm inclined to stick with "exports" instead of "provide". The metaphor would be complete if we used "imports(id)" or "import(id)". Since "import" is a keyword, it would not be available for the desugarred syntax. That leaves "imports".

const {a} = imports("module");

Kinda works, yeah.

What makes functions eval'ed hermetically by the module function
occur in a statement context? They should be nested function declarations, not (sub-)statements. Or I'm missing something.

Perhaps I'm behind on the times, but I'm under the impression that presently the behavior of this function "foo" declaration has no standard behavior:

(function () { function foo() { } })();

No, that's fully specified by ES3.

If foo gets bound in function block scope, there's no problem (which is the case in most browsers, I believe), but if it gets bound as a member of global, that would be a problem,

I know of no such bug. You might be thinking of a different IE bug,
where named function expressions are evaluated to bind their names in
the variable object.

and if it gets bound like a free assignment, it would only be a problem if free assignment isn't localized to the module somehow.

No such bug in any implementation I've ever seen or heard of.

Implementations would need to decouple the top of the scope chain
and the global object.

Implementations can do this easily, but the issue is language- level: is the global object at the bottom of the scope chain? So far, it is.

I've operated on the assumption that the global object was on the bottom of the scope chain. There are some concerns about module texts for parsing and interpreting modules, some of which might be sufficiently addressed by moving global off the scope chain for module evaluation, but perhaps not necessarily.

Sounds like "use lexical scope" or (as the proposal happily allows)
something subsuming it would do the trick, and provide other benefits.

  • free assignment. I'm less concerned about the behavior of free assignment. I'd prefer assignment to global to be explicit, but this ship may have sailed long ago. It might be more appropriate for free assignment to create module locals or module exports, which could be accomplished by changing the bottom-of-the-scope-chain, or by changing the behavior of free assignment in the context of a hermetic eval. In any case, this is not something I'm deeply concerned with.

ES5 strict mode makes free variable creation via assignment an error,
so let's roll with that.

  • function statements. These really must be module local. I'm not in-the-know about whether this is a problem or not.

Not a problem.

  • return statement. This should be a parse error in the top most scope of a module. If hermetic eval wraps a module's text in a function declaration, the return statement would not be a parse error, which would be a problem.

The only way around this with existing tools is to eval the string
body of the module-function, which will again make return a parse
error. Double eval overhead.

If heremetic eval returns a function that executes the module with a given require and exports object, then return would be a parse error in the bottom-scope.

I don't follow this sentence.

  • injection attack strings. These are a weakness of using a hermetic eval that immediately evaluates a module factory function expression with the module text inside.

ASTs to the rescue? This just moves the injection attack surface,
arguably shrinks it a good deal by breaking eval up into parse +
execute (let's call it).

# David-Sarah Hopwood (16 years ago)

Kris Kowal wrote:

On Mon, May 11, 2009 at 9:26 AM, Brendan Eich <brendan at mozilla.com> wrote:

On May 8, 2009, at 8:49 PM, Kris Kowal wrote:

"(function (require, exports) {" + text + "/**/\n}" Nit-picking a bit on names: require : provide :: import : export -- so mixing require and export mixes metaphors. Never stopped me ;-).

I agree about mixing metaphors. The befuddlement of start : stop :: begin : end is one that bothers me a lot. The notion is to desugar "import" and "export" to these two facets, importing and exporting. imports : exports would be proper, but doesn't read well in code. The reason for using the term "exports" is to ease migration, since:

exports.a = function a() {};

Is easy to transform textually to:

export a = function a() {};

So, I'm inclined to stick with "exports" instead of "provide". The metaphor would be complete if we used "imports(id)" or "import(id)". Since "import" is a keyword, it would not be available for the desugarred syntax.

Neither "import" nor "export" are ES3 or ES5 keywords. However, both are context-dependent keywords in Mozilla JavaScript:

developer.mozilla.org/En/Core_JavaScript_1.5_Reference/Statements/import, developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Statements/export

I don't know whether any future 'import' or 'export' syntax could be made not to collide with the Mozilla extensions.

Perhaps I'm behind on the times, but I'm under the impression that presently the behavior of this function "foo" declaration has no standard behavior:

(function () { function foo() { } })();

No, that is perfectly standard (and implemented correctly cross-browser). The body of the outer function is a sequence of SourceElements, which allows a FunctionDeclaration. 'foo' is bound only within the outer function's scope.

# Brendan Eich (16 years ago)

On May 11, 2009, at 6:14 PM, David-Sarah Hopwood wrote:

I don't know whether any future 'import' or 'export' syntax could be made not to collide with the Mozilla extensions.

Those keywords from Netscape 4 are gone in Firefox 3.5.

# kevin curtis (16 years ago)

JsonML looks good for for an AST handling:

["||", ["||", ["Id", "X"], ["Id", "Y"]], ["Id", "Z"]]

Maybe the 'canonical' AST serialized string format could actually be more scheme-y:

(or (or X Y) Z)

JsonML could be used for building pure js in-memory AST graphs which could then be easily stringified to the 'canonical' format.

The benefit is that a scheme-y format could help the thinking on the semantics for es6/harmony. (Downside compared to a JSON canonical format is that with JSON the parsing/stringifying is free via the JSON api in es5).

For convenience JSON could remain JSON in this scheme-y format: var x = [1,4,5] becomes: (var x [1,4,5])

# Brendan Eich (16 years ago)

On May 12, 2009, at 12:24 AM, kevin curtis wrote:

JsonML looks good for for an AST handling:

["||", ["||", ["Id", "X"], ["Id", "Y"]], ["Id", "Z"]]

Yes.

Maybe the 'canonical' AST serialized string format could actually be more scheme-y:

(or (or X Y) Z)

JsonML could be used for building pure js in-memory AST graphs which could then be easily stringified to the 'canonical' format.

JsonML wouldn't be used to build object graphs -- the JSON decoder
would do that given JsonML in a string, from the AST encoder. That's
the point I made in the words you bottom-cite about not mandating a
big fat object graph if the use-case doesn't need the graph, just the
string containing the AST serialization.

The benefit is that a scheme-y format could help the thinking on the semantics for es6/harmony.

That seems like no benefit in memory use or cycles, only in thinking.
If you squint, don't the []s turn into ()s? :-P

(Downside compared to a JSON canonical format is that with JSON the parsing/stringifying is free via the JSON api in es5).

This is a big downside.

For convenience JSON could remain JSON in this scheme-y format: var x = [1,4,5] becomes: (var x [1,4,5])

I don't see why we'd invent a third language.

# kevin curtis (16 years ago)

Just an idea - () seems natural for AST's. jsonML is more than good.

I am definitely not suggesting building a jsonML object graph if the programmer does not require it. eg sax vs dom.

Is your approach:

// jsonML AST string var js_str = "2 + 2" var jsonML_ast_str = eval.parse(js_str) print(jsonML_ast_str) // ["+", 2, 2] or [ast.plus, 2, 2] var res = eval.execute(jsonML_ast_str) print(res) // 4

// if the programmer wants to manipulate the jsonML AST in memory var js_lst = ["+", 2] js_lst[2] = 2 var jsonML_ast_str = JSON.stringify(js_lst) ... then same as above

The eval.execute function (implemented in c/c++) would take the jsonML_ast_str and build the native c/c++ ast tree. Which is then walked to generate bytecode (and maybe machine code) for execution.

In the original code snippet which prompted the discussion: eval.hermetic(program :(string | AST), thisValue, optBindings) :any

I wasn't sure if the AST type meant a string (in jsonML) or a reference to the native AST object (in c++) which could be be manipulated via an api.

# Kris Kowal (16 years ago)

On Mon, May 11, 2009 at 4:21 PM, Brendan Eich <brendan at mozilla.com> wrote:

On May 11, 2009, at 4:10 PM, Kris Kowal wrote:

Perhaps I'm behind on the times, but I'm under the impression that presently the behavior of this function "foo" declaration has no standard behavior:

(function () { function foo() { } })();

No, that's fully specified by ES3.

Once again, I've been chasing a JS phantom. That cuts the complexity of the options tree roughly in half. Let's consider:

let asts = {}; let memos = {};

// loading if (Object.prototype.hasOwnProperty.call(asts, id)) return id; let ast = parse(moduleText); asts[id] = ast;

// executing if (Object.prototype.hasOwnProperty.call(memo, id)) return id; let ast = asts[id]; let exports = memo[id] = {}; let require = Require(id); execute(ast, {require, exports});

Furthermore, let's assume that "execute" enforces "use lexical scope" and "use strict".

These are the ramifications if I understand correctly:

  • free assignment becomes an error at run time.
  • free variable access, apart from primordials, require, and exports throw reference errors.
  • the object bearing the primordials has no name.
  • global object has no name.
  • the bottom scope has no name.
  • default "this" for functions and the bottom-scope is "undefined".
  • function statements are local to the module and only accessible lexically.
  • var and let declarations in the bottom scope.
  • "require" and "exports" get injected into the bottom scope.

What scope contains primordials? Should primordials be injected into the bottom scope before "require" and "exports" or should their be two scopes (global, local) in a module? I see several potential definitions of execute:

execute(program:AST|String, scope); // wherein we create a new global scope and add require and exports for each program

execute(program:AST|String, local); // wherein a shared frozen global scope is implied and a local scope is pushed above it

execute(program:AST|String, global, local); // wherein we reuse the global scope frame and put require and exports in a scope right above it

In both of these cases, Mark's comments about copying members to a scope frame instead of using the object itself might apply. I presume that this is to avoid following the prototype chain when resolving a variable. If globals are shallowly copied into the local/bottom scope of the module, some things get simpler. For one, we can freeze the global object without freezing the bottom of the scope chain, which would impair module local declarations. We also wouldn't need two scopes initially. However, it would be nominally slower. I think that execute should take two arguments either way, since it would be inconvenient and slower than necessary to do this for every module execution:

var scope = copy(global); scope.require = Require(id); scope.exports = memo[exports] = {}; execute(ast, scope); // scope chain is [Frame(scope)]

…since execute effectively hides an implicit copy to the scope frame, making the explicit copy superfluous, as opposed to:

var local = { require: Require(id), exports: memo[exports] = {} }; execute(ast, global, local); // scope chain is [Frame(global), Frame(local)]

Is this on the right line of reasoning?

Kris Kowal

# Eugene Lazutkin (16 years ago)

I think JsonML (with []) is perfect:

  • it doesn't introduce new concepts like the other proposal
  • working with it doesn't require new APIs
  • it is simple but not more restrictive than the other proposal
  • we can add new "tokens" if we need to without introducing new objects/APIs

Given these positives and the lack of notable negatives I give +1 for JsonML. If we need more user-friendly facade we can always implement it with a library that uses JsonML-based facilities under the hood.

Eugene Lazutkin Dojo Toolkit, Committer lazutkin.com

# Kris Kowal (16 years ago)

kevin curtis wrote:

Is a 'canonical' AST part of the plans for ecmascript 6/harmony.

On May 9, 2009, at 9:19 AM, David-Sarah Hopwood wrote:

I hope so; that would be extremely useful. I would like to see an ECMAScript source -> AST parser (as well as an AST evaluator) in the Harmony standard library.

On Sat, May 9, 2009 at 11:57 AM, Brendan Eich <brendan at mozilla.com> wrote:

We've wanted this since early in ES4 days. It would help many projects and experimental extensions (type checkers, template systems, macro processors, etc.) to have a standard AST, which could be serialized to JSON.

Other neat uses for the AST would potentially include comment scraping for automated documentation tools and minification, which have their own requirements beyond those for execution, optimization, and static analysis.

Upon further reflection, I'm not sure that parse(program:String):AST would serve the purpose of fast sandboxing. The intent of splitting parse and execute is to reduce the cost of execution, so that modules can be reused in many small sandboxes. Having parse produce a (mutable) AST, and then leaving execute to translate the AST for the interpreter might constrain our options for producing the desired performance. It might be better to have a compile() routine that returns an opaque, engine-specific Program object that can in turn be executed multiple times.

Kris Kowal

# Brendan Eich (16 years ago)

On May 21, 2009, at 4:15 PM, Kris Kowal wrote:

Upon further reflection, I'm not sure that parse(program:String):AST would serve the purpose of fast sandboxing. The intent of splitting parse and execute is to reduce the cost of execution, so that modules can be reused in many small sandboxes.

Premature optimization?

Implementations can do ccache(1)-style compiled-results-caching.

Having parse produce a (mutable) AST, and then leaving execute to translate the AST for the interpreter might constrain our options for producing the desired performance. It might be better to have a compile() routine that returns an opaque, engine-specific Program object that can in turn be executed multiple times.

SpiderMonkey used to have a Script object you could create via s = new
Script(source), or s = Script.compile(source) -- then you'd s.exec()
to run the script. It was eval split in two. That made it pure evil,
since the implementation couldn't analyze for it easily to deoptimize
the calling code so that its scope could be used. We had code to
deoptimize at runtime, but that was many years ago, when we didn't
optimize much.

I don't think we want any such primitives split from eval's guts,
although if we were to have these, the eval-as-container idea Mark
started and you ran with does have appeal, since eval is already a
quasi-keyword that implementations must recognize: eval. {parse,compile,exec}.

Anyway, AST serialization is what I'm working on (back burner this
week). Beyond that is a cliff....

# kevin curtis (16 years ago)

I initially thought that: parse(program:String):AST

would return a handle to the native c/c++ AST tree.Then the question was if this native AST tree should be manipulatable by javascript ie should there be a wrapper javascript api which updates the native tree. Python says no - its native asdl c AST tree is not manipulatable by Python. Rather a copy is made into pure Python objects which can be manipulated and then converted back to a new native adl ast. So the native AST is immutable.

I was thinking along the lines of:

var astObj = parse(js_source) // returns a handle to a immutable native c/c++ AST object var jsonML = astObj.toJson() // the ast object has a single method - toJson() which returns a jsonML string

//the jsonML is manipulated either as as a string or via JSON.parse var jsonAstObj = JSON.parse(jsonML) .... processing var jsonML2 = JSON.stringify(jsonASTObj)

var ast2 = parseJson(jsonML2) // create a handle to new immutable native ast object execute(ast2)

This would cover the 2 use cases of:

  • efficient processing re processing the AST when no javascript manipulation of the AST is required, as in fast sandboxing. ie no use of ast.toJson and parseJson methods.
  • manipulating the AST in javascript for templating etc via jsonML

There could be issues re resource usage/garbage collection of native AST object. Maybe the ast object could have a method - release().