Modules: Name capture

# ihab.awad at gmail.com (15 years ago)

As promised, this is the first major issue I wish to raise regarding the Simple Modules strawman. This point was first brought to our collective attention by my colleague, Jasvir Nagra.

I'm describing this from scratch here, even though this came up previously on the list, to help anyone who has not been following the previous thread in detail get a handle on the issues. I've divided it into sections to avoid tl;dr for people who are already up to speed.

== Motivating factors we agree upon ==

The Web is large, and full of code. For any given corpus, there may be millions of copies on various people's servers; hundreds of forks on github; and dozens of "officially supported" versions on the Website of the maintainers. This presents a distributed version matching and naming problem quite unlike any other we have encountered so far in software development. Software is now created as casually as Web pages.

All of us who are interested in the module problem seem to agree (and please correct me if I am mistaken in this or any other collective claim) that even URL string equality is not a good metric of whether the resource found at that URL is "the same": on the one hand, the stuff retrieved via a URL can change from one moment to the next; and on the other hand, distinct URLs may validly refer to bitwise identical code.

All of us also seem to agree that we cannot brush this problem under the rug completely: we cannot completely relegate it to off-spec "configuration management" or "build process". When one is composing software from various independent sources, this software must sometimes make demands for loading things that the parent loader did not even know about. Yet the parent loader must sometimes provide modules to what is being loaded for convenience and unification. So, if I write an application that loads modules M1 and M2:

  • I may want to provide a common module jQuery to M1 and M2; but also

  • In order to do its work, M1 may want to fetch some module Z of which I have never heard.

To put it another way, modules on the Web must be able to wire themselves, and compute, over the Web as it is, rather than over the small set of software that happens to be "installed" in (say) the "/usr/lib" directory of the local system. In Python, I can simply "import smtplib". On the Web, the question is, "which one"?

== Current Simple Modules solution ==

The Simple Modules strawman shows a rather clever solution to this problem. It distributes the mappings -- from agreed-upon names to gnarly URLs -- across the codebase in a fine-grained fashion. At the most basic level, I can map the name "jQuery" to my chosen copy of the jQuery library, and use it as below. Nested modules can choose their own versions of (say) YUI, and use them as well.

module jQuery = load "http://.../jquery-1.3.2.js"; module Foo { module YUI = load "http://.../yui-3.1.1.js"; module Bar { import jQuery.ajax; import YUI.Accordion; ajax(...); Accordion(...); } }

This also works if I know that some library simply uses the name "jQuery", and I map that to my own chosen copy. In the following example:

// somelib.js import jQuery.ajax; ajax(...);

// My code module jQuery = load "http://.../jquery-1.3.2.js"; module Somelib = load "http://.../somelib.js";

the code in "somelib.js" will pick up the jQuery that I have defined prior to the "load".

The clever part of this is, again, the decentralized name assignments.

== Pitfalls in the Simple Modules solution ==

The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture (though it does clearly separate other names such as "var", "const" and "function" declarations). An example is:

// zero.js: module jQuery = load 'jquery.js'; module Drawing = load 'footgun.js'; module One = load 'one.js';

// one.js: import jQuery.ajax; module Two = load 'two.js';

// two.js: import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a reference to Drawing. Now, unbeknownst to the author of "one.js", "two.js" changed and now refers to something named Drawing, which it expected to draw a picture:

// two.js: import jQuery.ajax; import Drawing.draw; draw(); // intended to draw a picture

Unfortunately, it is now actually using the gun library, and thus breaks the correctness of the system.

The key things to note here are:

  1. Under mutation of the code of "two.js" on the Web, which we agree will happen (see above regarding mistrust of URL equality), the author of "one.js" has no defense against this accidental capture of the name "Drawing". The author cannot prevent this capture by controlling the environment of "two.js", nor can they foresee all possible environments such as "one.js" in which they may be embedded.

  2. The above is an easily seen (not to mention crippling...) manifestation of this problem. More generally, in a large system with many independently named versions of the same library floating around, more subtle captures, with harder-to-debug problems, may occur.

  3. Since we agree we cannot control "two.js" in this scenario, our choice here is whether to fail subtly or to fail fast. I claim we must fail fast.

There are some fail-fast solutions to this problem that come to mind immediately. I encourage us to brainstorm others as well.

== Solution 1: Explicit sub-module environment ==

Whenever there is a transition from one compilation unit to the other -- i.e., across any "load" -- we should explicitly specify the imports the module is allowed to inherit. No other imports may cross the "load" boundary. So, we would rewrite the example above as:

// zero.js: module jQuery = load 'jquery.js'; module Drawing = load 'footgun.js'; module One = load 'one.js' with {Drawing, jQuery};

// one.js: import jQuery.ajax; module Two = load 'two.js' with {jQuery};

// two.js: import jQuery.ajax; import Drawing.draw; // => static error!!! draw();

Clearly, we could also support renaming at the boundary, like this:

// zero.js: module JQ = load 'jquery.js'; module Gun = load 'footgun.js'; module One = load 'one.js' with {Drawing: Gun, jQuery: JQ};

== Solution 2: Catalog per loader ==

A module loader represents a community of mutually independent module instances which together form a coherent subsystem. If there is to be any sharing of module instances, and absent URL equality, these modules must be written with some knowledge of the "community" in which they live.

Perhaps a good way to understand this is to make the analogy with Linux distros. A distro effectively maps memorable, agreed-upon names like "libmpeg2" to concrete software resources. There is nothing keeping a dozen programmers in the wild from building a dozen different things and calling them "libmpeg2" -- but, within the Linux community, and specifically within a distro, there is only one "libmpeg2". Programs declare their dependency on it, often with a version identifier.

If we view modules in a loader in the same way, then each loader should contain a single mapping -- call it a catalog -- from memorable names to concrete resources (perhaps identified by URL). Each catalog is an implicit community, or agreement point:

All modules in a loader would be expected to use the same catalog. Catalogs could refer to one another, and so reuse one anothers' bindings. This would allow all programmers to predict the effect of using a new name. In the example of the modified "two.js" previously, the statement:

import Drawing.draw;

would work but, by referring to the same catalog, the authors of all three of "one.js", "two.js" and "three.js" would have agreed that the symbol Drawing stands for something that operates a footgun and treat it with due care. The catalog would contain an entry like:

{ Drawing: "http://.../footgun.js", ... }

That all said, I don't yet have a good solution for how a module would declare that it is defined relative to a specific catalog. It's not enough to add an extra argument -- the catalog -- to the constructor of a module loader. The individual modules would have to specify their dependency on the catalog. This requires extra syntax and machinery. I am loath to add either.

== Solution 3: Forming a more generative Union ==

Going back to basics, let's ask ourselves: Why is this such a problem in the first place? In other words, why is it so important that "one.js" and "two.js" use the same jQuery and Drawing modules? There are a variety of answers, including:

  • Performance optimization: The code can be shared and memory saved.

  • Shared state: Each module contains important shared state (such as cached information about the DOM, or off-screen bitmaps used to double-buffer a <canvas> widget module) which its clients need to

share in order to properly collaborate according to the module's rules.

  • Programmer familiarity: Programmers are accustomed to dividing up their program into sections, each of which is logically a singleton in "the system", and establishing communication between them.

What if we assumed that the result of "importing" a module is just the code of the module, ready to be instantiated with external state? In low-level terms, a module represents a "code segment" in memory which may be shared between its instantiations.

The effect of this choice is that dependencies between software are written using direct object passing, rather than attempts to denote the "same" module. Each module specifies as free variables the objects it requires from its caller and, when it loads another module, it does so with no expectation that what it gets is shared with anyone else. So "two.js" could start out like:

// two.js /** @require jQuery a jQuery instance */ jQuery.ajax();

and could move to:

// two.js /** @require jQuery a jQuery instance */ module Drawing = load "http://.../picturesOfCats.js" with { jQuery: jQuery }; jQuery.ajax(); Drawing.draw();

The difference between this and Solution 1 (and the original Simple Modules strawman) is that there are no promises made that are not explicitly wired. With Solution 1, there is still lack of clarity about what a "singleton" module represents -- depending on how modules are redefined down the chain of loadings, some singletons are more single than others. With this solution, only object APIs may define the behavior expected, and nothing is naturally expected to be a singleton.

To put it another way, if the original Simple Modules and Solution 1 are taken to their logical conclusion, one must assume, because modules may be redefined along the loading chain, that nothing is really a singleton. If one is to code defensively against this situation anyway, why not make this the default and gain the attendant simplicity?

Let's revisit the points brought up earlier:

  • Performance optimization: This is up to the implementation. For example, a perfectly reasonable implementation can do a HEAD request for every URL loaded and, if it detects no change, reuse the code.

  • Shared state: Shared state is now represented explicitly using objects.

  • Programmer familiarity: It's really not that bad. :) Programs continue to be written with free variables, just as <script>s are.

Programmers learn to introduce concrete objects into the lexical scope of their programs. And they write "export" statements to export variables back.

To optimize this a bit, it is possible to introduce a concept of "packages" (under active debate in CommonJS at the moment) to gather things up. This improves, but does not intrinsically modify, the model presented here.

== Afterword ==

Some of the solutions I present here are similar to proposals I have already made. In all sincerity, I cannot help that. But, if there are other solutions extant, I assure you that my brain is open. :)

Ihab

# Kam Kasravi (15 years ago)

This problem has many similarities to the XML / WSDL world. The use of namespaces and versioning has been leveraged there to disambiguate names which otherwise could be occluded by changing interfaces. That being said, I like your suggestion of 'renaming at the boundary' where one could possibly disambiguate like names. Independent of versioning, which is how many large software systems guarantee integrity, we need something like namespaces or your 'boundary renaming' to avoid name collisions. Section 4.2.1 in XSchema describes import, include and redefine as mechanisms to allow composition of multiple schemas and is worth reading.

thanks kam

# ihab.awad at gmail.com (15 years ago)

On Wed, May 26, 2010 at 7:34 PM, Kam Kasravi <kamkasravi at yahoo.com> wrote:

This problem has many similarities to the XML / WSDL world. The use of namespaces and versioning has been leveraged there to disambiguate names ...

That's a very interesting idea, thank you. I whiteboarded this a bit and here's what I came up with.

I will refer to management of "modules" here in order not to introduce extra machinery. However, we would wish to use this scheme to manage collections of modules that are distributed and developed together. So, for example, instead of using this scheme to manage the Accordion module, we would use it to manage the entirety of YUI, and select individual pieces from that. I will try to extend from individual modules to collections at the end of the post.

Assume each module has a canonical URL (cURL). This serves as a global name for the module, not a location. It incorporates DNS addresses and thus allows for decentralized name assignment. As you point out, it follows the precedent of W3C usage. For example, some cURLs might be:

yahoo.com/frameworks/YUI, jquery.com/jquery

The guidance about whether your library should be assigned a new cURL or given an existing one is that you have to answer the question: "Does it make sense for multiple independent instances of my library to be running within the same loader?" If the answer is "Yes", then you should use a separate cURL. If not, then you shouldn't. Practically speaking, distinct versions of a library should not have distinct cURLs, but libraries that do completely different things should.

To instantiate a loader, you supply a catalog. The catalog maps cURLs to actual code resources. It may do so in any of a number of ways: by consulting some third party; by keeping a literal table; by implementing some general code that does a Google search; whatever. The important thing is the contract of a catalog: For each distinct cURL, there exists zero or one code resources:

yahoo.com/frameworks/YUI => retrieve "http://developer.yahoo.com/yui/3/download/yui-min.js" jquery.com/jquery => retrieve "code.jquery.com/jquery-1.4.2.min.js" [anything else] => [undefined]

If a catalog does not contain (in the general sense, meaning "cannot locate") a code resource for a cURL, it fails fast; there is no facility for an individual module to render its own "opinion" about what a cURL maps to. If the module wishes to implement a new or extended (cURL -> code) mapping, it does so by instantiating a new

loader.

To review: the usefulness of cURLs is that they provide a reasonable, distributed, Webby way to manage the key space of catalogs. They do not themselves implement the catalogs. One can imagine sites out there that host catalogs, for everyone's convenience, and one can connect one's loader to one of these; build code that uses these catalogs but overrides a couple of entries; whatever.

The naming "substitution" attack scenario is safe. If I go rogue and decide to camp on the name "jquery.com/jquery" with my own crazy library, nothing prevents me -- but I have to go convince the authors of all the catalogs in the world to map:

jquery.com/jquery => retrieve "ihabsmalware.com/badjquery.js"

which is an unlikely scenario.

As for collections of modules, we can map cURLs to some notion of "packages", or we can design "entry point" JS files that pull in all the contents of their respective modules, or whatever. I'm sure we can come up with a design were we to accept this approach.

 *   *   *   *   *

The following are the remaining questions in my mind:

  1. How much should we standardize? To my mind, the use of cURLs simply provides a plausibility argument for having "globally recognizable" keys in catalogs, but should not be mandated.

  2. How does this support the casual development model where a developer writes up some module but does not attempt (or does not have the resources) to choose a cURL for it? How can this module be used in a simple manner? Clearly, if this module is to be used in a loader, and if it wishes to be treated as a singleton, the loader should assign it a name unique within that loader. How is that done? By using the exact location URL for that library as the library's cURL? That means we are doing "url equality" -- an evil. But at least we are doing it in a controlled manner....

  3. What is the syntax for importing? I imagine something like a library using jQuery saying:

    from "jquery.com/jquery" import { ... };

Ihab

# David Herman (15 years ago)

Thanks for your thoughts on this. I'll just respond to what I understand to be your main points.

The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture

Yes, implicit linking means that you can mis-link. In my mind, the questions are how much of a hazard this really is, how much architecture it would require imposing on programmers to address it, and what we lose if we do that.

Many of your suggested alternatives involve explicit linking. Explicit linking certainly has its appeal: it lets you really say exactly what you mean. But there are massive differences in the convenience of explicit and implicit linking systems. Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

The author cannot prevent this capture by controlling the environment of "two.js", nor can they foresee all possible environments such as "one.js" in which they may be embedded.

This is true. Now, it's true regardless of whether linking is implicit or explicit. Either way, if the interface of the library changes in the wild to add another dependency, its clients will likely break. The difference is that with explicit linking it will necessarily fail whereas with implicit linking, it might a) succeed, b) fail by not receiving a module binding, or c) fail unpredictably by receiving the wrong binding. You said as much, I'm just calling out the fact that no module system can change the fact that code changes and programmers must deal versioning.

I claim we must fail fast.

Failing fast would be nice, but I'm not convinced it's a necessity. We are not trying to solve all versioning problems ever. People can easily add version information to their modules with whatever protocols they like, and we don't need to enforce them. There are plenty of over-engineered library systems in use today, with crypto hashes and byzantine manifest formats etc etc, and they're a nightmare for programmers.

module One = load 'one.js' with {Drawing, jQuery};

This is just the kind of thing that looks nice with a single example, but as soon as you put it into practice it gets out of control. Watch what happens with even a trivial cyclic dependency:

module Even = load 'even.js' with { Even: Even, Odd: Odd };
module Odd = load 'odd.js' with { Even: Even, Odd: Odd };

Now imagine what happens to the combinatorics as your program size increases.

So then you either go down the road of trying to build a more expressive language for wiring together the module graph (that way madness lies), or you fall back to first-class modules-as-objects and programmers have to wire together the module graph by mutating objects.

In my experience, explicit linking is the better-is-better solution that makes programmers' lives harder for not enough gain.

== Solution 3: Forming a more generative Union ==

I didn't understand all this, but eliminating side effects in modules is not going to change the fact that when you load different bits, you get a different module. Solutions involving canonicalization are either going to be too brittle (e.g., trust the user to provide a single, stable set of bits for a given canonical name) or too clunky (e.g., crypto-hashes, a total non-starter).

# Brendan Eich (15 years ago)

On May 27, 2010, at 4:19 PM, David Herman wrote:

The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture

Yes, implicit linking means that you can mis-link. In my mind, the
questions are how much of a hazard this really is, how much
architecture it would require imposing on programmers to address it,
and what we lose if we do that.

Many of your suggested alternatives involve explicit linking.
Explicit linking certainly has its appeal: it lets you really say
exactly what you mean. But there are massive differences in the
convenience of explicit and implicit linking systems. Years of PL
research and experience have demonstrated that explicit linking
tends to be unwieldy and inconvenient.

Last I checked, CommonJS (ignoring packages, which NodeJS and others
are avoiding) uses implicit linking via the filesystem.

This is extremely common -- see Python and many other languages. It is
so convenient than any explicit-linking system delivered in an Ecma de- jure standard will beget implicit linking systems built on top (like
CommonJS's) in the wild, imposing costs that we could head off by
standardizing implicit linking. Which simple modules proposes.

Hazards involve trade-offs. There's no risk-free solution. It is hard
to prove decisively that implicit linking won't lead to some bad name
dependency bug down the road -- it could. But I argue that it's
inevitable in the wild if we try to impose explicit linking, so we
should not standardize around it.

Rather, we should try to standardize a module system people will use
well, which restores lexical scoping all the way up (ridding us of the
global object), which improves integrity (const exports, shallowly
frozen if functions, etc.), and which does not instantly beget another
module system (or non-standard systems) on its back for want of
implicit-linking convenience.

# ihab.awad at gmail.com (15 years ago)

Sorry for the slow reply -- was sick....

On Thu, May 27, 2010 at 4:19 PM, David Herman <dherman at mozilla.com> wrote:

Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

That needs to be added to my reading list. Cite away! :)

People can easily add version information to their modules with whatever protocols they like, and we don't need to enforce them. ...

People are already creating module systems with versioning information (see CommonJS). We need to make the world safe for them.

module Even = load 'even.js' with { Even: Even, Odd: Odd };

module Odd = load 'odd.js' with { Even: Even, Odd: Odd };

With concise object literals, would that not be:

module Even = load 'even.js' with { Odd };
module Odd = load 'odd.js' with { Even };

In my experience, explicit linking is the better-is-better solution that

makes programmers' lives harder for not enough gain.

But didn't you hear? Worse is also worse:

dreamsongs.com/Files/worse-is-worse.pdf

# ihab.awad at gmail.com (15 years ago)

On Thu, May 27, 2010 at 6:57 PM, Brendan Eich <brendan at mozilla.com> wrote:

Last I checked, CommonJS (ignoring packages, which NodeJS and others are avoiding) uses implicit linking via the filesystem.

CommonJS is right now working on much the same issues: what does it mean for two modules to be the "same"? There is an active discussion, and a variety of implemented and in-process specs including:

wiki.commonjs.org/wiki/Packages/Mappings/B

The common desire is to be able to grant a package (of modules) autonomy regarding what it depends upon. This is by no means a filesystem-only approach, and the "implicit"-ness of the linking is debatable given that it is driven by pretty extensive metadata.

Now, a bit of background. In CommonJS, the expression:

require('foo/bar/baz')

means, "find the module 'foo/bar/baz' in whatever package mappings may exist; instantiate a singleton in the current sandbox; and return the singleton". A CommonJS sandbox is equivalent to Sam and Dave's "loader".

With that in mind, NodeJS does not, at this point, implement an API for building a new sandbox/loader:

nodejs.org/api.html

This means that the singleton of 'foo/bar/baz' is a singleton in an entire OS process. They are doing just what Python has done for a long time, based on filesystems, and are not trying to deal with the problem of multiple loaders connected by object references.

This is extremely common -- see Python and many other languages.

Yes, and Python and these other languages pull modules out of a centrally curated PATH of some sort wherein 'foo/bar/baz' has an unambiguous meaning to all other modules "installed" on the "system". We are tasked with creating a module system for a world in which these initial conditions do not apply.

It is so convenient than any explicit-linking system delivered in an Ecma

de-jure standard will beget implicit linking systems built on top (like CommonJS's) in the wild, imposing costs that we could head off by standardizing implicit linking. Which simple modules proposes.

At this point, it's not clear where CommonJS is going to end up after navel-gazing over the problem of distributed module management. It is clear that they have not arrived at the current "simple modules" design. Pace the definition of implicit vs. explicit linking, I think the problem deserves some further thought. I'm on the hook to provide a clear restatement of my Solution 2 (which, I repeat, may not solve the problem but will I hope clarify some issues), and I have every intention of delivering. :)

Ihab

# ihab.awad at gmail.com (15 years ago)

As promised, more detail on Solution 2.

Reviewing the problematic case:

// zero.js:
module jQuery = load 'jquery.js';
module Drawing = load 'footgun.js';
module One = load 'one.js';

// one.js:
import jQuery.ajax;
module Two = load 'two.js';

// two.js:
import jQuery.ajax;
import Drawing.draw;
draw(); // intended to draw a picture

There are dual formulations of the problem:

(a) The author of "one.js" is not given the opportunity to intermediate the name propagation -- they can neither control the names introduced by "zero.js", nor hide them from the view of "two.js"; or

(b) The names used in "import" statements are short, local nicknames, admitting of diverse interpretations as to their semantics.

This specific solution to the problem attempts to show the consequences of approach (b). In other words, if we got everyone to "import" via names that have a low likelihood of collision, then modules "zero.js", "one.js" and "two.js" can, assuming good intentions (our scheme is not intended to guard against malice), cooperate to populate that namespace.

We can construct long-winded names that are unlikely to collide, perhaps using DNS. Therefore, a module can:

import ... "com.drawings.art" ...;

or, to use XML namespace-like URIs-as-names:

import ... "http://art.drawings.com/module" ...;

The mapping from the uttered name to an actual code resource can occur in any of a number of places. Perhaps each loader has a single mapping. Perhaps each module can override the mappings for each module that it "load"s. The point is that these names are known to be distinct from:

import ... "com.guns.footshooting" ...;

The important thing is that we are now importing names, not just capturing program identifiers, and the semantics are those of a map lookup somewhere.

Ihab

# Waldemar Horwat (15 years ago)

ihab.awad at gmail.com wrote:

// one.js: import jQuery.ajax; module Two = load 'two.js';

// two.js: import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a reference to Drawing. Now, unbeknownst to the author of "one.js", "two.js" changed and now refers to something named Drawing, which it expected to draw a picture:

// two.js: import jQuery.ajax; import Drawing.draw; draw(); // intended to draw a picture

I don't understand your example of how this is supposed to work in the regular (non-accidental-aliasing) case. As you wrote in your example, two.js evolves to reference the identifier "Drawing" unbeknownst to one.js. There is no definition of it, so two.js wouldn't work at all.

Waldemar
# ihab.awad at gmail.com (15 years ago)

On Tue, Jun 1, 2010 at 6:26 PM, Waldemar Horwat <waldemar at google.com> wrote:

I don't understand your example of how this is supposed to work in the regular (non-accidental-aliasing) case. As you wrote in your example, two.js evolves to reference the identifier "Drawing" unbeknownst to one.js. There is no definition of it, so two.js wouldn't work at all.

[ I hope I understand your question. ]

In my original example, "zero.js" defined "Drawing". According to the current proposal, this would be propagated down to "two.js".

Does that help? Or do I misunderstand?

Ihab

# David Herman (15 years ago)

Sorry for the slow reply -- was sick....

No worries-- hope you're feeling better.

Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

That needs to be added to my reading list. Cite away! :)

ML is dead; what more evidence do you need? ;)

Really, though, the research literature on modules is enormous. I don't have the time or inclination to provide a full bibliography. Personally, I've worked with several advanced, explicitly-linked module systems, including ML functors and PLT Scheme units.

With concise object literals, would that not be:

module Even = load 'even.js' with { Odd };
module Odd = load 'odd.js' with { Even };

Possibly, depending on whether you want to present modules to themselves as well.

But really, I've seen it before: these kinds of specification languages for module graphs spin out of control. You'll wish you had the ability to abstract the thing on the RHS of "with" -- and then you'll have to introduce the complexity of compile-time bindings of module graphs, and figure out how to shoe-horn those into the existing syntax and semantics. Or, you'll hold the line and force programmers to keep writing out the full module graph over and over again, in which case they just won't ever use modules at all.

But seriously: I am not necessarily suggesting explicit linking (however defined). I am pointing out the necessary consequences of a dangerous design that promises more than it can deliver.

You've not demonstrated that.

# ihab.awad at gmail.com (15 years ago)

We are having two discussions here:

  • Discussion of the relative merits of explicit linking in its various forms; and

  • Discussion of the specifics of the current proposal for implicit linking, and alternatives holding fixed the initial condition that implicit linking is a desideratum.

I'm mostly interested in the second discussion, but I do not wish to let slip by some important distinctions regarding the first. As such, I will respond to the first here, and to the second in a separate email.

On Wed, Jun 2, 2010 at 10:38 AM, David Herman <dherman at mozilla.com> wrote:

I don't have the time or inclination to provide a full bibliography.

I consider your argument withdrawn, then.

Personally, I've worked with several advanced, explicitly-linked module systems, including ML functors and PLT Scheme units.

I'd be interested to hear more about your experience.

module Even = load 'even.js' with { Odd };
module Odd = load 'odd.js' with { Even };

Possibly, depending on whether you want to present modules to themselves as well.

As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.

As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.

Ihab

# David Herman (15 years ago)

I don't have the time or inclination to provide a full bibliography.

I consider your argument withdrawn, then.

Excuse me? My argument is not "withdrawn" (are we in court?). If you are unaware of decades of prior art on modules, that's not my failing but yours.

My argument was and remains that others have gone down that road, and it's still very much an open research topic how to create module systems that provide the generality of explicit linking with the convenience of implicit linking. See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.

Possibly, depending on whether you want to present modules to themselves as well.

As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.

Hence "depending."

As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.

And yet it's still too expensive. No one will take the step from non-module code to module code. They just won't. Besides, a not-quite-so-bad example of the Odd and Even modules is pretty weak tea.

The point is, you can special-case "this" if you want, but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.

# Kris Kowal (15 years ago)

On Wed, Jun 2, 2010 at 12:14 PM, David Herman <dherman at mozilla.com> wrote:

but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.

I agree that requiring explicit linking is a non-starter. I do however favor the option of explicit linking at some level of granularity. At reasonable expense, Narwhal provides several layers at which someone can buy-into explicit linking:

  • by manually instantiating a module using the module constructor proferred by the loader.

    var module = require.loader.load(id); module(freeVariables);

  • by manually instantiating a module using a facility of the sandbox that provides the import and export facilities but a.) does not memoize the module and b.) permits additional free variables to be injected. This is useful for creating module-enhanged "DSL's" that permit scripts designed for QUnit or Bogart to be migrated without alteration, subverting their use of global variables with explicitly injected free variables.

    require.once(id, freeVariables);

  • by manually instantiating a system of modules with a preopopulated memo of module instances.

    var SANDBOX = require("narwhal/sandbox"); var subRequire = SANDBOX.Sandbox({ "modules": { "even": EVEN } }); var EVEN = subRequire("odd");

I would invoke the axiom, "Simple should be easy, powerful should be possible". It's reasonable to pay for what you get. At the risk of misrepresenting their views, Ihab and Mark have argued that people should always use explicit linking for a variety of reasons, but I for one agree that implicit linking should be the norm, and explicit linking can at least be deferred to the layer of "packages", or coherently designed sets of modules linking to other coherently designed sets of modules. I presume that it is possible to isolate and explicitly link groups of modules.

Kris Kowal

# Brendan Eich (15 years ago)

On Jun 1, 2010, at 5:23 PM, ihab.awad at gmail.com wrote:

This is extremely common -- see Python and many other languages.

Yes, and Python and these other languages pull modules out of a
centrally curated PATH of some sort wherein 'foo/bar/baz' has an
unambiguous meaning to all other modules "installed" on the
"system". We are tasked with creating a module system for a world in
which these initial conditions do not apply.

"We are tasked" is a bit much. More important: you don't define the
novel initial conditions, but I'm guessing you mean the Web, which is
of course not centrally curated.

If so, then my response is to dispute your premise and not swallow
whatever conclusion you think follows from it.

It's not clear at all that the Harmony module system has so different
a set of constraints from those facing NodeJS and other users of
CommonJS modules-and-not-packages. Client-side programmers do not
include random modules from uncontrolled domains -- not in production
pages and web apps. Let's dig into this a bit.

I'm not suggesting that we use the web server's doc-tree (filesystem)
for implicit linking, only that some embeddings of the language, in
particular Node, seem to want exactly that. From where Node sits on
the server side, the nearby parts of filesystem are sufficiently well- curated ("centrally" or not). TC39 is trying to take non-browser
embedding use-cases into account.

To turn to the browser embedding, at least these questions seem to be
raised by analogy to the server-side situation:

  1. Is the URL space used by a web app not curated well (centrally or
    otherwise), and somehow fatally unreliable?

  2. Is the lexical binding space, a tree of scopes with no object
    aliasing badness, which simple modules proposes programmers can create
    by writing module declarations in <script type="harmony"> tags,

inherently not well-curated, unlike the case of the server-side
filesystem for a Node app?

I contend that the answers are "no" and "no".

Any web app author has to control URLs and provide the needed
resources, whether they use code.google.com/apis/ajaxlibs or
their own hosted/edge-cached copies of modules.

For in-language module naming (as opposed to URL naming), the simple
modules proposal lets the author of the web page compose modules in
lexical scopes (and only lexical scopes), naming modules with consumer- chosen identifiers and protecting inner bindings within explicit outer
modules if needed.

We're not considering mutual suspicion. However, the usual rules for
production web apps apply: URL provisioning means you don't just
include some uncontrolled version of a module in a production app.

Therefore if there is a good filesystem/PATH curator -- or set of
curators cooperating -- in the Node server-side case, then there's a
good curator or set of curators cooperating in the client-side page or
web app content and the set of modules loaded by that content.

It's true that a module loaded later in a page, app, or other module
may use a free name that ends up bound at top level by the module
loader (and only at top level -- there's no injection possible in the
middle of a module). But this aside, below the top level lexical
scope, all naming is under control of whoever curates the module
source at that level.

The extensibility of the simple modules top-level lexical environment
under the default module loader is neither a fatal flaw ("unhygienic
capture") nor an unalloyed good. On the plus side, it avoids any new,
secondary naming system, package indirection, brittle reverse-DNS
convention, or explicit-linking configuration language. This is a big
plus for most programmers.

At this point, it's not clear where CommonJS is going to end up
after navel-gazing over the problem of distributed module
management. It is clear that they have not arrived at the current
"simple modules" design.

The "require" system is pretty close, but lacking new syntax to help
static analysis -- or really to support second-class modules properly.
Simple modules fills this gap by extending syntax.

Pace the definition of implicit vs. explicit linking, I think the
problem deserves some further thought. I'm on the hook to provide a
clear restatement of my Solution 2 (which, I repeat, may not solve
the problem but will I hope clarify some issues), and I have every
intention of delivering. :)

Ok, but this is a discussion list, and discussion can go on and on. In
TC39, we've been going over first-class module ideas for about 18
months. Until simple modules were proposed this year, we weren't
getting anywhere quickly.

TC39 is not going to wait for novel research. The simple modules
strawman is heading toward prototype implementation and
harmony:proposals status. So some evolution of it is likely to be in
the next edition.

# Kam Kasravi (15 years ago)

By explicit linking are we talking about mechanism's to unambiguously reference modules that may otherwise be ambiguous? For example in java if 'Node' could refer to a Node class in several different packages, the language allows one to fully qualify the Node class eg foo.Node. Just want to make sure I'm clear on the distinction between explicit and implicit linking.

Thx kam

# David Herman (15 years ago)

At some point we switched from "internal vs external" to "implicit vs explicit" (might've been me) but it's the same basic idea. Here are some handy definitions from Owens and Flatt '06 [1]:

"Internal linking supports definite reference to a specific implementation of an interface. Internally linked module systems are common; examples include Java's packages and classes, ML's structures, and Haskell's modules. In each of these systems, two modules are linked when one directly mentions the name of another, either with a dotted path (ModuleName.x), or import statement (import ModuleName).

External linking supports parameterized reference to an arbitrary implementation of an interface. ML's module system includes a functor construct that supports external linking. A functor declares an interface over which its definitions are parameterized; the parameterization is resolved outside of the functor..."

In other words: in an internal linking system, each module names -- directly within its body -- the specific modules that implement its dependencies, and all the modules are wired together automatically by the linking system. In an external linking system, a module lists only its dependencies by name and/or interface, and some external linking code, written by a programmer, chooses the particular implementations of those interfaces by explicitly wiring up the module graph.

Dave

[1] www.cl.cam.ac.uk/~so294/documents/icfp06.pdf

# ihab.awad at gmail.com (15 years ago)

On Wed, Jun 2, 2010 at 12:14 PM, David Herman <dherman at mozilla.com> wrote:

See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.

Aha. Finally, something vaguely resembling a citation. I will look at this. Thank you.

Ihab

# Waldemar Horwat (15 years ago)

ihab.awad at gmail.com wrote:

On Tue, Jun 1, 2010 at 6:26 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:

I don't understand your example of how this is supposed to work in
the regular (non-accidental-aliasing) case.  As you wrote in your
example, two.js evolves to reference the identifier "Drawing"
unbeknownst to one.js.  There is no definition of it, so two.js
wouldn't work at all.

[ I hope I understand your question. ]

In my original example, "zero.js" defined "Drawing". According to the current proposal, this would be propagated down to "two.js".

Does that help?

No. In one sentence you wrote that two.js changed to require its invoker to provide a Drawing API; in another you wrote that two.js did not tell its invoker, one.js, to provide a Drawing API. The combination of the two is meaningless.

Waldemar