Module linking (Was: The global object should not be the "global scope instance object")
On Jan 30, 2012, at 5:00 AM, Andreas Rossberg wrote:
On 28 January 2012 02:08, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
I played around a bit to see if I could come up with a troublesome example of the sort you may be thinking about. What I came up with is the follow:
<script> module a { import {x:y} from b;
I think you wanted to say {y:x} here.
no, I think {x:y} means creating a binding for x that is linked to b.y
module b{ export let y = x; //essentially this is let y=y } } </script>
1)The script is parsed, and static semantic checks are made. There are no static errors.
Just to be clear: the static semantics is where most of the complication of modules lie. Depending on what exactly we want to allow, it amounts to a non-trivial type inference problem.
The static semantics I have in mind are all about disallowing duplicate declarations and hosting. It's about names not about values. I don't see where any type-like analysis comes into play.
- module instantiation is performed for the block. This instantiates each module defined by the top level of the block, instantiating a module includes producing the list of identifiers exported by the module. Each identifier is associated with a new uninitialized binding. Instantiated modules are not initialized (their body is not executed) at this time.
You need a new uninitialized binding for all identifiers in the module's local scope, not just the exported ones.
The non-exported declarations within a module don't need to be instantiated until module initialization. You probably could instantiate them durning my module instantiation step but I'm trying to only identify what must happen at that step.
- An initialized binding for "a" is is created in the top level environment for the script. (all top level binding are instantiate at this point, if there were any others). Note that the binding for a is initialized (it reference a module instance object) but the module itself is not yet initialized
- initialize module a 5) module instantiation is performed for the body of module a. This instantiates a module instance for module b with exported identifier "y" and its binding.
It's not quite that simple. Recursive initialization does not suffice, because there can be arbitrary import/export/aliasing cycles (e.g., b could import from a as well). Consequently, you generally need to create the instance object for all modules in the program first, before you can start initializing any of them (or the toplevel, for that matter).
I'm trying to demonstrate an algorithm that takes care of this. Give me a specific example that you don't think works. Note that my algorithm, at each lexical level, first looks at the exports of all the modules that are immediately defined (or referenced) within it. It is only those exports that add names to the current level. However, in general, the value that is accessed via the exported name is not yet relevant.
6) An initialized binding for "b" is is created in module a's inner
environment; (but module b is not yet initialized) 7) An binding for "x" is created in module a's inner environment. The binding is linked to the binding of "y" exported from b. Both bindings share the same initialization state. (currently uninitialized) 8) initialize module b 9) The binding for for "y" that was created when module b was instantiated is added to module b's inner environment 10) evaluate the LHS of the exported let; the binding found for "x" is uninitialized so we throw and the script terminates.
I suppose you mean RHS here and below.
oops, yes
If evaluating LHS didn't have any dependencies upon uninitialized bindings (say it was a constant or a function expression) we would continue as follows:
11) set the "y" binding to the value of the LHS and mark "y" as
initialized, this also mark the "x" binding in module a as initialized 12) module b is not fully initialized 13) module a is not fully initialized
I've only mentally walked through the steps but it looks to me like this process will also work for circular dependencies such as harmony:modules_examples#cyclic_dependencies
Here is what my toy source-to-source translator generates for your example:
"use strict"; { // Create: const _ = {}; const __a = {}; const __a_b = {}; // Link: Object.defineProperty(__a_b, "y", {get: function() { return __a_b_y }}); Object.freeze(__a_b); Object.freeze(__a); Object.freeze(_); // Run: let __a_x = __a_b_y; let __a_b_y = __a_x; }
Executing this snippet on V8 throws a reference error as expected, when evaluating the first let binding. (The additional __a_x is due to my translator not supporting import renaming, so I expanded "import {y:x} from b" to "import y from b; let x = y".)
More generally, in a program with modules, you basically have three phases of execution:
- Instantiation. Bindings to fresh instance objects are created for all modules in the program, and uninitialized bindings for all other (non-local) bindings in the program (i.e., hoisting the let's above).
- Linking. The exported bindings of each module are installed on the respective instance objects. (For exported modules, these are data properties carrying the already created instance objects, for others, accessor properties forwarding to the yet uninitialized bindings.)
- Execution. All non-module bindings' RHSs are evaluated in order of appearance, initializing the respective bindings.
Note that each phase is a separate recursion over the whole program.
I agree that there are logically three phases. However, I'm not yet convinced that a three separate traversals over the entire program is needed. I still think it can be interweaved within a single breadth-first traversal that is driven off of the initialization (really evaluation phase) of a module body. The "trick" is that each level peaks at the names (but not values) exported by the modules it immediately defines (or imports). The exported name list of a module can be generated while the module is parsed, prior and prior to any execution of the Program..
Allen Wirfs-Brock <mailto:allen at wirfs-brock.com> January 30, 2012 10:17 AM On Jan 30, 2012, at 5:00 AM, Andreas Rossberg wrote:
On 28 January 2012 02:08, Allen Wirfs-Brock<allen at wirfs-brock.com> wrote:
I played around a bit to see if I could come up with a troublesome example of the sort you may be thinking about. What I came up with is the follow:
<script> module a { import {x:y} from b; I think you wanted to say {y:x} here.
no, I think {x:y} means creating a binding for x that is linked to b.y
No, the binding name is in the property value position. This is why the shorthand works: import {x} from b; would bind local x to b.x.
Some find this "backwards" but it has to be this way -- the property name destructured from the RHS is on the left of : and the binding name is on the right. The shorthand helps in most cases, and backwards-sensitive people learn :-|.
On Jan 30, 2012, at 11:10 AM, Brendan Eich wrote:
Allen Wirfs-Brock <mailto:allen at wirfs-brock.com> January 30, 2012 10:17 AM On Jan 30, 2012, at 5:00 AM, Andreas Rossberg wrote:
On 28 January 2012 02:08, Allen Wirfs-Brock<allen at wirfs-brock.com> wrote:
I played around a bit to see if I could come up with a troublesome example of the sort you may be thinking about. What I came up with is the follow:
<script> module a { import {x:y} from b; I think you wanted to say {y:x} here.
no, I think {x:y} means creating a binding for x that is linked to b.y
No, the binding name is in the property value position. This is why the shorthand works: import {x} from b; would bind local x to b.x.
Some find this "backwards" but it has to be this way -- the property name destructured from the RHS is on the left of : and the binding name is on the right. The shorthand helps in most cases, and backwards-sensitive people learn :-|.
Of source, and I've even written the spec. for destructuring with the ordering you intend. Obviously, by backwards bit flipped sometime recently...
(Just a quick reply, because I'm in a bit of a hurry.)
On 30 January 2012 19:17, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
<script> module a { import {x:y} from b;
I think you wanted to say {y:x} here.
no, I think {x:y} means creating a binding for x that is linked to b.y
It's supposed to be read as a destructuring. At least that was my understanding so far.
module b{ export let y = x; //essentially this is let y=y } } </script>
1)The script is parsed, and static semantic checks are made. There are no static errors.
Just to be clear: the static semantics is where most of the complication of modules lie. Depending on what exactly we want to allow, it amounts to a non-trivial type inference problem.
The static semantics I have in mind are all about disallowing duplicate declarations and hosting. It's about names not about values. I don't see where any type-like analysis comes into play.
Modules have structural interfaces (their lists of exports, and recursively, the interfaces of those) that you have to infer to do the static analysis and name resolution. These interfaces can naturally be viewed as types (not in the JS sense, of course), and the analysis can naturally be stated as a type inference problem (you can also formalise it differently, but it doesn't make a difference). That's really the essence of being "static".
The analysis is not as straightforward as it might seem. In particular, this:
The exported name list of a module can be generated while the module is parsed
unfortunately is not true, because of (1) alias bindings, and (2) import *. For example, with
module A = B.C
you won't know what A exports before you haven't analysed B and its nested C, which might come later in the program. And those can recursively refer to A, which might require you knowing what A exports. Here is a more messed up, random example:
module A = B.C module D { ... } module B { let x = B.E.x import * from A export module B { export module E = D export module C { export module D { ... } } } }
You couldn't tell which D is aliased by E before processing C and so on. In general, it's all recursive, so a simple AST traversal alone, in whatever order, won't do. Instead you need to collect and solve constraints. In terms of type inference, you infer and unify (principal) types.
- module instantiation is performed for the block. This instantiates each module defined by the top level of the block, instantiating a module includes producing the list of identifiers exported by the module. Each identifier is associated with a new uninitialized binding. Instantiated modules are not initialized (their body is not executed) at this time.
You need a new uninitialized binding for all identifiers in the module's local scope, not just the exported ones. The non-exported declarations within a module don't need to be instantiated until module initialization. You probably could instantiate them durning my module instantiation step but I'm trying to only identify what must happen at that step.
I'm not so sure. You could export parts of a local module:
module A { module B { export module C { ... } } export module C = B.C } module C = A.C
- An initialized binding for "a" is is created in the top level environment for the script. (all top level binding are instantiate at this point, if there were any others). Note that the binding for a is initialized (it reference a module instance object) but the module itself is not yet initialized
- initialize module a 5) module instantiation is performed for the body of module a. This instantiates a module instance for module b with exported identifier "y" and its binding.
It's not quite that simple. Recursive initialization does not suffice, because there can be arbitrary import/export/aliasing cycles (e.g., b could import from a as well). Consequently, you generally need to create the instance object for all modules in the program first, before you can start initializing any of them (or the toplevel, for that matter).
I'm trying to demonstrate an algorithm that takes care of this. Give me a specific example that you don't think works.
If I understand your description correctly, then consider this:
module A { export module C = B.C; export module D { ... } } module B { export module C { ... }; export module D = A.D }
If you haven't created the instance object for B.C already, then how do you initialise the binding for A.C? And because D works the other way round, there is no way you can reorder the modules either, so that one pass would work.
Note that my algorithm, at each lexical level, first looks at the exports of all the modules that are immediately defined (or referenced) within it. It is only those exports that add names to the current level. However, in general, the value that is accessed via the exported name is not yet relevant.
I think to initialise either module alias bindings or import bindings you need the value already, in order to do the proper linking at this step.
On Jan 30, 2012, at 12:09 PM, Andreas Rossberg wrote:
(Just a quick reply, because I'm in a bit of a hurry.)
On 30 January 2012 19:17, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
...
The analysis is not as straightforward as it might seem. In particular, this:
The exported name list of a module can be generated while the module is parsed
unfortunately is not true, because of (1) alias bindings, and (2) import *. For example, with
module A = B.C
Dave/Sam, it would be nice if we had up to date syntax at harmony:modules
you won't know what A exports before you haven't analysed B and its nested C, which might come later in the program. And those can recursively refer to A, which might require you knowing what A exports. Here is a more messed up, random example:
1 module A = B.C 2 module D { ... } 3 module B { 4 let x = B.E.x 5 import * from A 6 export module B { 7 export module E = D 8 export module C { 9 export module D { ... } } } }
I added line numbers above for reference. I think there is a basic bug in the above that may just be a typo. "B" reference in line 1 must refer to the module defined in line 3. Hence "B.C" must refer to an export of "C" from that module. However, the only export from that module is line 6 which exports "B" so the "B.C reference in line 1 is invalid. This does require any circular analysis and line 5 doesn't impact it.
To get around this this issue, I looked at assuming that line 6 wasn't really intended to be there but that seems to break your example in other ways. What did you really have in mind?
You couldn't tell which D is aliased by E before processing C and so on. In general, it's all recursive, so a simple AST traversal alone, in whatever order, won't do. Instead you need to collect and solve constraints. In terms of type inference, you infer and unify (principal) types.
I agree that * imports complicate matters. I thought there had been discussion at some point about eliminating them, but I'm not yet convinced that by themselves they turn this into a constraint solving problems. In stead I think that we need appropriate static restrictions to ensure that it can never be such a problem. If it takes solving to understand module compositions then we have created a too power model of module composition. If the module linkages aren't pretty damn obvious to both the writer and the reader, then we've done something wrong.
I'm actually not a bit fan of using nesting to do module wiring, but I don't think it has result in needing constraint solving to understand the wiring.
- module instantiation is performed for the block. This instantiates each module defined by the top level of the block, instantiating a module includes producing the list of identifiers exported by the module. Each identifier is associated with a new uninitialized binding. Instantiated modules are not initialized (their body is not executed) at this time.
You need a new uninitialized binding for all identifiers in the module's local scope, not just the exported ones. The non-exported declarations within a module don't need to be instantiated until module initialization. You probably could instantiate them durning my module instantiation step but I'm trying to only identify what must happen at that step.
I was really thinking about non-module declarations when I wrote the above statement. Module alias declarations like in these examples do lead to a more complex traversal. I have to think about is some more.
I'm not so sure. You could export parts of a local module:
module A { module B { export module C { ... } } export module C = B.C } module C = A.C
- An initialized binding for "a" is is created in the top level environment for the script. (all top level binding are instantiate at this point, if there were any others). Note that the binding for a is initialized (it reference a module instance object) but the module itself is not yet initialized
- initialize module a 5) module instantiation is performed for the body of module a. This instantiates a module instance for module b with exported identifier "y" and its binding.
It's not quite that simple. Recursive initialization does not suffice, because there can be arbitrary import/export/aliasing cycles (e.g., b could import from a as well). Consequently, you generally need to create the instance object for all modules in the program first, before you can start initializing any of them (or the toplevel, for that matter).
I'm trying to demonstrate an algorithm that takes care of this. Give me a specific example that you don't think works.
If I understand your description correctly, then consider this:
module A { export module C = B.C; export module D { ... } } module B { export module C { ... }; export module D = A.D }
Sold, it certainly does look like that modules need to be identified and linked before evaluation starts.
that feels to me link a distinct pass.
On 31 January 2012 00:50, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
1 module A = B.C 2 module D { ... } 3 module B { 4 let x = B.E.x 5 import * from A 6 export module B { 7 export module E = D 8 export module C { 9 export module D { ... } } } }
I added line numbers above for reference. I think there is a basic bug in the above that may just be a typo. "B" reference in line 1 must refer to the module defined in line 3. Hence "B.C" must refer to an export of "C" from that module. However, the only export from that module is line 6 which exports "B" so the "B.C reference in line 1 is invalid. This does require any circular analysis and line 5 doesn't impact it.
To get around this this issue, I looked at assuming that line 6 wasn't really intended to be there but that seems to break your example in other ways. What did you really have in mind?
Yes, I'm sorry, the example was messed up indeed. Let me try to fix it:
module A = B.C.F module D { ... } module B { import * from A import x from G export module C { export module E = D export module F { export module G = E export module D { ... } } } }
It's probably still overcomplicated. It was just intended to show how entangled things can get. In particular, there is no bottom-up, left-to-right, or otherwise straightforward order in which you can derive module interfaces. The dependency graph generally contains arbitrary cycles, so not even dependency analysis is of much use. That's why it becomes (more easily expressible as) a constraint problem.
Full-blown import * adds the additional issue that you may have to analyse its RHS (here, A aka B.C.F) which refers to an inner scope (here, F) before you can even tell the extent of an outer scope (here, B), which makes straightforward variable lookup impossible when analysing the inner scope. The semantics I played around with actually rejects such uses of import *, requiring that it only refers to outer modules. But that may be a pretty severe restriction. (On the other hand, it's the only restriction that I impose.)
I agree that * imports complicate matters. I thought there had been discussion at some point about eliminating them, but I'm not yet convinced that by themselves they turn this into a constraint solving problems.
Just to clarify: static checking amounts to a constraint solving problem even without import *, due to the highly cyclic nature of definitions and uses.
I realize that I might be scaring off people here by talking about constraint solving, or -- dare! -- type inference. Don't worry, modulo full-blown import *, these constraints are pretty simple, and easy and efficient to solve. That they resemble a well-understood type inference problem is good news, not bad. The implementation is not difficult either, my static analysis prototype is ~100 lines for a core version of modules.
Instead I think that we need appropriate static restrictions to ensure that it can never be such a problem. If it takes solving to understand module compositions then we have created a too power model of module composition. If the module linkages aren't pretty damn obvious to both the writer and the reader, then we've done something wrong.
I'm actually not a bit fan of using nesting to do module wiring, but I don't think it has result in needing constraint solving to understand the wiring.
Well, I don't see much of a way around it, unless you are thinking about making modules non-recursive. Any other restriction I was able to come up with would have vastly reduced the utility of some features, like module aliasing.
In terms of user experience, I don't think it matters much, since I don't expect users to write programs whose structure is a brain teaser. Even the bad example above isn't that difficult to untangle for a human reader, she just has to read back and forth a little.
Reformulating it as a constraint problem is just an elegant way to avoid doing that back and forth in the algorithm, so that one efficient pass suffices for the analysis.
Note also that the spec doesn't need to specify it that way. In fact, it shouldn't. A static semantics should be specified declaratively, not algorithmically. And declaratively, the story is rather simple. (Although admittedly, I currently wouldn't know right now how to formulate that declarative specifications in a style that fits the ES spec.)
On 28 January 2012 02:08, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
I think you wanted to say {y:x} here.
Just to be clear: the static semantics is where most of the complication of modules lie. Depending on what exactly we want to allow, it amounts to a non-trivial type inference problem.
You need a new uninitialized binding for all identifiers in the module's local scope, not just the exported ones.
It's not quite that simple. Recursive initialization does not suffice, because there can be arbitrary import/export/aliasing cycles (e.g., b could import from a as well). Consequently, you generally need to create the instance object for all modules in the program first, before you can start initializing any of them (or the toplevel, for that matter).
I suppose you mean RHS here and below.
Here is what my toy source-to-source translator generates for your example:
"use strict"; { // Create: const _ = {}; const __a = {}; const __a_b = {}; // Link: Object.defineProperty(__a_b, "y", {get: function() { return __a_b_y }}); Object.freeze(__a_b); Object.freeze(__a); Object.freeze(_); // Run: let __a_x = __a_b_y; let __a_b_y = __a_x; }
Executing this snippet on V8 throws a reference error as expected, when evaluating the first let binding. (The additional __a_x is due to my translator not supporting import renaming, so I expanded "import {y:x} from b" to "import y from b; let x = y".)
More generally, in a program with modules, you basically have three phases of execution:
Note that each phase is a separate recursion over the whole program.