Modules: Name capture

# ihab.awad at gmail.com (15 years ago)

As promised, this is the first major issue I wish to raise regarding the Simple Modules strawman. This point was first brought to our collective attention by my colleague, Jasvir Nagra.

I'm describing this from scratch here, even though this came up previously on the list, to help anyone who has not been following the previous thread in detail get a handle on the issues. I've divided it into sections to avoid tl;dr for people who are already up to speed.

== Motivating factors we agree upon ==

The Web is large, and full of code. For any given corpus, there may be millions of copies on various people's servers; hundreds of forks on github; and dozens of "officially supported" versions on the Website of the maintainers. This presents a distributed version matching and naming problem quite unlike any other we have encountered so far in software development. Software is now created as casually as Web pages.

All of us who are interested in the module problem seem to agree (and please correct me if I am mistaken in this or any other collective claim) that even URL string equality is not a good metric of whether the resource found at that URL is "the same": on the one hand, the stuff retrieved via a URL can change from one moment to the next; and on the other hand, distinct URLs may validly refer to bitwise identical code.

All of us also seem to agree that we cannot brush this problem under the rug completely: we cannot completely relegate it to off-spec "configuration management" or "build process". When one is composing software from various independent sources, this software must sometimes make demands for loading things that the parent loader did not even know about. Yet the parent loader must sometimes provide modules to what is being loaded for convenience and unification. So, if I write an application that loads modules M1 and M2:

I may want to provide a common module jQuery to M1 and M2; but also
In order to do its work, M1 may want to fetch some module Z of which I have never heard.

To put it another way, modules on the Web must be able to wire themselves, and compute, over the Web as it is, rather than over the small set of software that happens to be "installed" in (say) the "/usr/lib" directory of the local system. In Python, I can simply "import smtplib". On the Web, the question is, "which one"?

== Current Simple Modules solution ==

The Simple Modules strawman shows a rather clever solution to this problem. It distributes the mappings -- from agreed-upon names to gnarly URLs -- across the codebase in a fine-grained fashion. At the most basic level, I can map the name "jQuery" to my chosen copy of the jQuery library, and use it as below. Nested modules can choose their own versions of (say) YUI, and use them as well.

module jQuery = load "http://.../jquery-1.3.2.js"; module Foo { module YUI = load "http://.../yui-3.1.1.js"; module Bar { import jQuery.ajax; import YUI.Accordion; ajax(...); Accordion(...); } }

This also works if I know that some library simply uses the name "jQuery", and I map that to my own chosen copy. In the following example:

// somelib.js import jQuery.ajax; ajax(...);

// My code module jQuery = load "http://.../jquery-1.3.2.js"; module Somelib = load "http://.../somelib.js";

the code in "somelib.js" will pick up the jQuery that I have defined prior to the "load".

The clever part of this is, again, the decentralized name assignments.

== Pitfalls in the Simple Modules solution ==

The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture (though it does clearly separate other names such as "var", "const" and "function" declarations). An example is:

// zero.js: module jQuery = load 'jquery.js'; module Drawing = load 'footgun.js'; module One = load 'one.js';

// one.js: import jQuery.ajax; module Two = load 'two.js';

// two.js: import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a reference to Drawing. Now, unbeknownst to the author of "one.js", "two.js" changed and now refers to something named Drawing, which it expected to draw a picture:

// two.js: import jQuery.ajax; import Drawing.draw; draw(); // intended to draw a picture

Unfortunately, it is now actually using the gun library, and thus breaks the correctness of the system.

The key things to note here are:

Under mutation of the code of "two.js" on the Web, which we agree will happen (see above regarding mistrust of URL equality), the author of "one.js" has no defense against this accidental capture of the name "Drawing". The author cannot prevent this capture by controlling the environment of "two.js", nor can they foresee all possible environments such as "one.js" in which they may be embedded.
The above is an easily seen (not to mention crippling...) manifestation of this problem. More generally, in a large system with many independently named versions of the same library floating around, more subtle captures, with harder-to-debug problems, may occur.
Since we agree we cannot control "two.js" in this scenario, our choice here is whether to fail subtly or to fail fast. I claim we must fail fast.

There are some fail-fast solutions to this problem that come to mind immediately. I encourage us to brainstorm others as well.

== Solution 1: Explicit sub-module environment ==

Whenever there is a transition from one compilation unit to the other -- i.e., across any "load" -- we should explicitly specify the imports the module is allowed to inherit. No other imports may cross the "load" boundary. So, we would rewrite the example above as:

// zero.js: module jQuery = load 'jquery.js'; module Drawing = load 'footgun.js'; module One = load 'one.js' with {Drawing, jQuery};

// one.js: import jQuery.ajax; module Two = load 'two.js' with {jQuery};

// two.js: import jQuery.ajax; import Drawing.draw; // => static error!!! draw();

Clearly, we could also support renaming at the boundary, like this:

// zero.js: module JQ = load 'jquery.js'; module Gun = load 'footgun.js'; module One = load 'one.js' with {Drawing: Gun, jQuery: JQ};

== Solution 2: Catalog per loader ==

A module loader represents a community of mutually independent module instances which together form a coherent subsystem. If there is to be any sharing of module instances, and absent URL equality, these modules must be written with some knowledge of the "community" in which they live.

Perhaps a good way to understand this is to make the analogy with Linux distros. A distro effectively maps memorable, agreed-upon names like "libmpeg2" to concrete software resources. There is nothing keeping a dozen programmers in the wild from building a dozen different things and calling them "libmpeg2" -- but, within the Linux community, and specifically within a distro, there is only one "libmpeg2". Programs declare their dependency on it, often with a version identifier.

If we view modules in a loader in the same way, then each loader should contain a single mapping -- call it a catalog -- from memorable names to concrete resources (perhaps identified by URL). Each catalog is an implicit community, or agreement point:

The catalog of "biology software" hosted at biojavascript.org/catalog.json
The catalog of "canvas widgets" hosted at cwidgets.org/cat.json
A meta-catalog of "general stuff" hosted at allmyjs.org/root.json Perhaps there are distinct versions: allmyjs.org/distros/distro3.1.8.json, allmyjs.org/distros/distro4.2.9.json

All modules in a loader would be expected to use the same catalog. Catalogs could refer to one another, and so reuse one anothers' bindings. This would allow all programmers to predict the effect of using a new name. In the example of the modified "two.js" previously, the statement:

import Drawing.draw;

would work but, by referring to the same catalog, the authors of all three of "one.js", "two.js" and "three.js" would have agreed that the symbol Drawing stands for something that operates a footgun and treat it with due care. The catalog would contain an entry like:

{ Drawing: "http://.../footgun.js", ... }

That all said, I don't yet have a good solution for how a module would declare that it is defined relative to a specific catalog. It's not enough to add an extra argument -- the catalog -- to the constructor of a module loader. The individual modules would have to specify their dependency on the catalog. This requires extra syntax and machinery. I am loath to add either.

== Solution 3: Forming a more generative Union ==

Going back to basics, let's ask ourselves: Why is this such a problem in the first place? In other words, why is it so important that "one.js" and "two.js" use the same jQuery and Drawing modules? There are a variety of answers, including:

Performance optimization: The code can be shared and memory saved.
Shared state: Each module contains important shared state (such as cached information about the DOM, or off-screen bitmaps used to double-buffer a <canvas> widget module) which its clients need to

share in order to properly collaborate according to the module's rules.

Programmer familiarity: Programmers are accustomed to dividing up their program into sections, each of which is logically a singleton in "the system", and establishing communication between them.

What if we assumed that the result of "importing" a module is just the code of the module, ready to be instantiated with external state? In low-level terms, a module represents a "code segment" in memory which may be shared between its instantiations.

The effect of this choice is that dependencies between software are written using direct object passing, rather than attempts to denote the "same" module. Each module specifies as free variables the objects it requires from its caller and, when it loads another module, it does so with no expectation that what it gets is shared with anyone else. So "two.js" could start out like:

// two.js /** @require jQuery a jQuery instance */ jQuery.ajax();

and could move to:

// two.js /** @require jQuery a jQuery instance */ module Drawing = load "http://.../picturesOfCats.js" with { jQuery: jQuery }; jQuery.ajax(); Drawing.draw();

The difference between this and Solution 1 (and the original Simple Modules strawman) is that there are no promises made that are not explicitly wired. With Solution 1, there is still lack of clarity about what a "singleton" module represents -- depending on how modules are redefined down the chain of loadings, some singletons are more single than others. With this solution, only object APIs may define the behavior expected, and nothing is naturally expected to be a singleton.

To put it another way, if the original Simple Modules and Solution 1 are taken to their logical conclusion, one must assume, because modules may be redefined along the loading chain, that nothing is really a singleton. If one is to code defensively against this situation anyway, why not make this the default and gain the attendant simplicity?

Let's revisit the points brought up earlier:

Performance optimization: This is up to the implementation. For example, a perfectly reasonable implementation can do a HEAD request for every URL loaded and, if it detects no change, reuse the code.
Shared state: Shared state is now represented explicitly using objects.
Programmer familiarity: It's really not that bad. :) Programs continue to be written with free variables, just as <script>s are.

Programmers learn to introduce concrete objects into the lexical scope of their programs. And they write "export" statements to export variables back.

To optimize this a bit, it is possible to introduce a concept of "packages" (under active debate in CommonJS at the moment) to gather things up. This improves, but does not intrinsically modify, the model presented here.

== Afterword ==

Some of the solutions I present here are similar to proposals I have already made. In all sincerity, I cannot help that. But, if there are other solutions extant, I assure you that my brain is open. :)

Ihab

Hi folks,

As promised, this is the first major issue I wish to raise regarding
the Simple Modules strawman. This point was first brought to our
collective attention by my colleague, Jasvir Nagra.

I'm describing this from scratch here, even though this came up
previously on the list, to help anyone who has not been following the
previous thread in detail get a handle on the issues. I've divided it
into sections to avoid tl;dr for people who are already up to speed.

== Motivating factors we agree upon ==

The Web is large, and full of code. For any given corpus, there may be
millions of copies on various people's servers; hundreds of forks on
github; and dozens of "officially supported" versions on the Website
of the maintainers. This presents a distributed version matching and
*naming* problem quite unlike any other we have encountered so far in
software development. Software is now created as casually as Web
pages.

All of us who are interested in the module problem seem to agree (and
please correct me if I am mistaken in this or any other collective
claim) that even URL string equality is not a good metric of whether
the resource found at that URL is "the same": on the one hand, the
stuff retrieved via a URL can change from one moment to the next; and
on the other hand, distinct URLs may validly refer to bitwise
identical code.

All of us also seem to agree that we cannot brush this problem under
the rug completely: we cannot *completely* relegate it to off-spec
"configuration management" or "build process". When one is composing
software from various independent sources, this software must
sometimes make demands for loading things that the parent loader did
not even know about. Yet the parent loader must sometimes provide
modules to what is being loaded for convenience and unification. So,
if I write an application that loads modules M1 and M2:

* I may want to provide a common module jQuery to M1 and M2; but also

* In order to do its work, M1 may want to fetch some module Z of which
I have never heard.

To put it another way, modules on the Web must be able to wire
themselves, and compute, over the Web as it is, rather than over the
small set of software that happens to be "installed" in (say) the
"/usr/lib" directory of the local system. In Python, I can simply
"import smtplib". On the Web, the question is, "which one"?

== Current Simple Modules solution ==

The Simple Modules strawman shows a rather clever solution to this
problem. It distributes the mappings -- from agreed-upon names to
gnarly URLs -- across the codebase in a fine-grained fashion. At the
most basic level, I can map the name "jQuery" to my chosen copy of the
jQuery library, and use it as below. Nested modules can choose their
own versions of (say) YUI, and use them as well.

  module jQuery = load "http://.../jquery-1.3.2.js";
  module Foo {
    module YUI = load "http://.../yui-3.1.1.js";
    module Bar {
      import jQuery.ajax;
      import YUI.Accordion;
      ajax(...); Accordion(...);
    }
  }

This also works if I know that some library simply uses the name
"jQuery", and I map that to my own chosen copy. In the following
example:

  // somelib.js
  import jQuery.ajax;
  ajax(...);

  // My code
  module jQuery = load "http://.../jquery-1.3.2.js";
  module Somelib = load "http://.../somelib.js";

the code in "somelib.js" will pick up the jQuery that I have defined
prior to the "load".

The clever part of this is, again, the decentralized name assignments.

== Pitfalls in the Simple Modules solution ==

The problem is that, in the space of module names, the current Simple
Modules strawman introduces a hazard of inadvertent name capture
(though it does clearly separate other names such as "var", "const"
and "function" declarations). An example is:

  // zero.js:
  module jQuery = load 'jquery.js';
  module Drawing = load 'footgun.js';
  module One = load 'one.js';

  // one.js:
  import jQuery.ajax;
  module Two = load 'two.js';

  // two.js:
  import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a
reference to Drawing. Now, unbeknownst to the author of "one.js",
"two.js" changed and now refers to something named Drawing, which it
expected to draw a picture:

  // two.js:
  import jQuery.ajax;
  import Drawing.draw;
  draw(); // intended to draw a picture

Unfortunately, it is now actually using the gun library, and thus
breaks the correctness of the system.

The key things to note here are:

1. Under *mutation* of the code of "two.js" on the Web, which we agree
will happen (see above regarding mistrust of URL equality), the author
of "one.js" has no defense against this accidental capture of the name
"Drawing". The author cannot prevent this capture by controlling the
environment of "two.js", nor can they foresee all possible
environments such as "one.js" in which they may be embedded.

2. The above is an easily seen (not to mention crippling...)
manifestation of this problem. More generally, in a large system with
many independently named versions of the same library floating around,
more subtle captures, with harder-to-debug problems, may occur.

3. Since we agree we cannot control "two.js" in this scenario, our
choice here is whether to fail subtly or to fail fast. I claim we must
fail fast.

There are some fail-fast solutions to this problem that come to mind
immediately. I encourage us to brainstorm others as well.

== Solution 1: Explicit sub-module environment ==

Whenever there is a transition from one compilation unit to the other
-- i.e., across any "load" -- we should explicitly specify the imports
the module is allowed to inherit. No other imports may cross the
"load" boundary. So, we would rewrite the example above as:

  // zero.js:
  module jQuery = load 'jquery.js';
  module Drawing = load 'footgun.js';
  module One = load 'one.js' with {Drawing, jQuery};

  // one.js:
  import jQuery.ajax;
  module Two = load 'two.js' with {jQuery};

  // two.js:
  import jQuery.ajax;
  import Drawing.draw; // => static error!!!
  draw();

Clearly, we could also support renaming at the boundary, like this:

  // zero.js:
  module JQ = load 'jquery.js';
  module Gun = load 'footgun.js';
  module One = load 'one.js' with {Drawing: Gun, jQuery: JQ};

== Solution 2: Catalog per loader ==

A module loader represents a community of mutually independent module
instances which together form a coherent subsystem. If there is to be
any sharing of module instances, and absent URL equality, these
modules must be written with some knowledge of the "community" in
which they live.

Perhaps a good way to understand this is to make the analogy with
Linux distros. A distro effectively maps memorable, agreed-upon names
like "libmpeg2" to concrete software resources. There is nothing
keeping a dozen programmers in the wild from building a dozen
different things and calling them "libmpeg2" -- but, within the Linux
community, and specifically within a distro, there is only one
"libmpeg2". Programs declare their dependency on it, often with a
version identifier.

If we view modules in a loader in the same way, then each loader
should contain a *single* mapping -- call it a catalog -- from
memorable names to concrete resources (perhaps identified by URL).
Each catalog is an implicit community, or agreement point:

  * The catalog of "biology software" hosted at
http://biojavascript.org/catalog.json

  * The catalog of "canvas widgets" hosted at http://cwidgets.org/cat.json

  * A meta-catalog of "general stuff" hosted at http://allmyjs.org/root.json
        Perhaps there are distinct versions:
              http://allmyjs.org/distros/distro3.1.8.json
              http://allmyjs.org/distros/distro4.2.9.json

All modules in a loader would be expected to use the same catalog.
Catalogs could refer to one another, and so reuse one anothers'
bindings. This would allow all programmers to predict the effect of
using a new name. In the example of the modified "two.js" previously,
the statement:

  import Drawing.draw;

would work but, by referring to the same catalog, the authors of all
three of "one.js", "two.js" and "three.js" would have agreed that the
symbol Drawing stands for something that operates a footgun and treat
it with due care. The catalog would contain an entry like:

  {
    Drawing: "http://.../footgun.js",
    ...
  }

That all said, I don't yet have a good solution for how a module would
declare that it is defined relative to a specific catalog. It's not
enough to add an extra argument -- the catalog -- to the constructor
of a module loader. The individual modules would have to specify their
dependency on the catalog. This requires extra syntax and machinery. I
am loath to add either.

== Solution 3: Forming a more generative Union ==

Going back to basics, let's ask ourselves: Why is this such a problem
in the first place? In other words, why is it so important that
"one.js" and "two.js" use the *same* jQuery and Drawing modules? There
are a variety of answers, including:

* Performance optimization: The code can be shared and memory saved.

* Shared state: Each module contains important shared state (such as
cached information about the DOM, or off-screen bitmaps used to
double-buffer a <canvas> widget module) which its clients need to
share in order to properly collaborate according to the module's
rules.

* Programmer familiarity: Programmers are accustomed to dividing up
their program into sections, each of which is logically a singleton in
"the system", and establishing communication between them.

What if we assumed that the result of "importing" a module is just the
code of the module, ready to be instantiated with external state? In
low-level terms, a module represents a "code segment" in memory which
may be shared between its instantiations.

The effect of this choice is that dependencies between software are
written using direct object passing, rather than attempts to denote
the "same" module. Each module specifies as free variables the objects
it requires from its caller and, when it loads another module, it does
so with no expectation that what it gets is shared with anyone else.
So "two.js" could start out like:

  // two.js
  /** @require jQuery a jQuery instance */
  jQuery.ajax();

and could move to:

  // two.js
  /** @require jQuery a jQuery instance */
  module Drawing = load "http://.../picturesOfCats.js" with {
    jQuery: jQuery
  };
  jQuery.ajax();
  Drawing.draw();

The difference between this and Solution 1 (and the original Simple
Modules strawman) is that there are no promises made that are not
explicitly wired. With Solution 1, there is still lack of clarity
about what a "singleton" module represents -- depending on how modules
are redefined down the chain of loadings, some singletons are more
single than others. With this solution, only object APIs may define
the behavior expected, and nothing is naturally expected to be a
singleton.

To put it another way, if the original Simple Modules and Solution 1
are taken to their logical conclusion, one must assume, because
modules may be redefined along the loading chain, that nothing is
*really* a singleton. If one is to code defensively against this
situation anyway, why not make this the default and gain the attendant
simplicity?

Let's revisit the points brought up earlier:

* Performance optimization: This is up to the implementation. For
example, a perfectly reasonable implementation can do a HEAD request
for every URL loaded and, if it detects no change, reuse the code.

* Shared state: Shared state is now represented explicitly using objects.

* Programmer familiarity: It's really not that bad. :) Programs
continue to be written with free variables, just as <script>s are.
Programmers learn to introduce concrete objects into the lexical scope
of their programs. And they write "export" statements to export
variables back.

To optimize this a bit, it is possible to introduce a concept of
"packages" (under active debate in CommonJS at the moment) to gather
things up. This improves, but does not intrinsically modify, the model
presented here.

== Afterword ==

Some of the solutions I present here are similar to proposals I have
already made. In all sincerity, I cannot help that. But, if there are
other solutions extant, I assure you that my brain is open. :)

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA

# Kam Kasravi (15 years ago)

This problem has many similarities to the XML / WSDL world. The use of namespaces and versioning has been leveraged there to disambiguate names which otherwise could be occluded by changing interfaces. That being said, I like your suggestion of 'renaming at the boundary' where one could possibly disambiguate like names. Independent of versioning, which is how many large software systems guarantee integrity, we need something like namespaces or your 'boundary renaming' to avoid name collisions. Section 4.2.1 in XSchema describes import, include and redefine as mechanisms to allow composition of multiple schemas and is worth reading.

thanks kam

Hi Ihab

This problem has many similarities to the XML / WSDL world.
The use of namespaces and versioning has been leveraged there to
disambiguate names which otherwise could be occluded by 
changing interfaces. That being said, I like your suggestion of 
'renaming at the boundary' where one could possibly 
disambiguate like names. Independent of versioning, 
which is how many large software systems guarantee 
integrity, we need something like namespaces or 
your 'boundary renaming' to avoid name collisions. 
Section 4.2.1 in XSchema describes import, include and 
redefine as mechanisms to allow composition of multiple 
schemas and is worth reading.

thanks
kam







________________________________
From: "ihab.awad at gmail.com" <ihab.awad at gmail.com>
To: es-discuss Steen <es-discuss at mozilla.org>
Sent: Wed, May 26, 2010 3:39:10 PM
Subject: Modules: Name capture

Hi folks,

As promised, this is the first major issue I wish to raise regarding
the Simple Modules strawman. This point was first brought to our
collective attention by my colleague, Jasvir Nagra.

I'm describing this from scratch here, even though this came up
previously on the list, to help anyone who has not been following the
previous thread in detail get a handle on the issues. I've divided it
into sections to avoid tl;dr for people who are already up to speed.

== Motivating factors we agree upon ==

The Web is large, and full of code. For any given corpus, there may be
millions of copies on various people's servers; hundreds of forks on
github; and dozens of "officially supported" versions on the Website
of the maintainers. This presents a distributed version matching and
*naming* problem quite unlike any other we have encountered so far in
software development. Software is now created as casually as Web
pages.

All of us who are interested in the module problem seem to agree (and
please correct me if I am mistaken in this or any other collective
claim) that even URL string equality is not a good metric of whether
the resource found at that URL is "the same": on the one hand, the
stuff retrieved via a URL can change from one moment to the next; and
on the other hand, distinct URLs may validly refer to bitwise
identical code.

All of us also seem to agree that we cannot brush this problem under
the rug completely: we cannot *completely* relegate it to off-spec
"configuration management" or "build process". When one is composing
software from various independent sources, this software must
sometimes make demands for loading things that the parent loader did
not even know about. Yet the parent loader must sometimes provide
modules to what is being loaded for convenience and unification. So,
if I write an application that loads modules M1 and M2:

* I may want to provide a common module jQuery to M1 and M2; but also

* In order to do its work, M1 may want to fetch some module Z of which
I have never heard.

To put it another way, modules on the Web must be able to wire
themselves, and compute, over the Web as it is, rather than over the
small set of software that happens to be "installed" in (say) the
"/usr/lib" directory of the local system. In Python, I can simply
"import smtplib". On the Web, the question is, "which one"?

== Current Simple Modules solution ==

The Simple Modules strawman shows a rather clever solution to this
problem. It distributes the mappings -- from agreed-upon names to
gnarly URLs -- across the codebase in a fine-grained fashion. At the
most basic level, I can map the name "jQuery" to my chosen copy of the
jQuery library, and use it as below. Nested modules can choose their
own versions of (say) YUI, and use them as well.

  module jQuery = load "http://.../jquery-1.3.2.js";
  module Foo {
    module YUI = load "http://.../yui-3.1.1.js";
    module Bar {
      import jQuery.ajax;
      import YUI.Accordion;
      ajax(...); Accordion(...);
    }
  }

This also works if I know that some library simply uses the name
"jQuery", and I map that to my own chosen copy. In the following
example:

  // somelib.js
  import jQuery.ajax;
  ajax(...);

  // My code
  module jQuery = load "http://.../jquery-1.3.2.js";
  module Somelib = load "http://.../somelib.js";

the code in "somelib.js" will pick up the jQuery that I have defined
prior to the "load".

The clever part of this is, again, the decentralized name assignments.

== Pitfalls in the Simple Modules solution ==

The problem is that, in the space of module names, the current Simple
Modules strawman introduces a hazard of inadvertent name capture
(though it does clearly separate other names such as "var", "const"
and "function" declarations). An example is:

  // zero.js:
  module jQuery = load 'jquery.js';
  module Drawing = load 'footgun.js';
  module One = load 'one.js';

  // one.js:
  import jQuery.ajax;
  module Two = load 'two.js';

  // two.js:
  import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a
reference to Drawing. Now, unbeknownst to the author of "one.js",
"two.js" changed and now refers to something named Drawing, which it
expected to draw a picture:

  // two.js:
  import jQuery.ajax;
  import Drawing.draw;
  draw(); // intended to draw a picture

Unfortunately, it is now actually using the gun library, and thus
breaks the correctness of the system.

The key things to note here are:

1. Under *mutation* of the code of "two.js" on the Web, which we agree
will happen (see above regarding mistrust of URL equality), the author
of "one.js" has no defense against this accidental capture of the name
"Drawing". The author cannot prevent this capture by controlling the
environment of "two.js", nor can they foresee all possible
environments such as "one.js" in which they may be embedded.

2. The above is an easily seen (not to mention crippling...)
manifestation of this problem. More generally, in a large system with
many independently named versions of the same library floating around,
more subtle captures, with harder-to-debug problems, may occur.

3. Since we agree we cannot control "two.js" in this scenario, our
choice here is whether to fail subtly or to fail fast. I claim we must
fail fast.

There are some fail-fast solutions to this problem that come to mind
immediately. I encourage us to brainstorm others as well.

== Solution 1: Explicit sub-module environment ==

Whenever there is a transition from one compilation unit to the other
-- i.e., across any "load" -- we should explicitly specify the imports
the module is allowed to inherit. No other imports may cross the
"load" boundary. So, we would rewrite the example above as:

  // zero.js:
  module jQuery = load 'jquery.js';
  module Drawing = load 'footgun.js';
  module One = load 'one.js' with {Drawing, jQuery};

  // one.js:
  import jQuery.ajax;
  module Two = load 'two.js' with {jQuery};

  // two.js:
  import jQuery.ajax;
  import Drawing.draw; // => static error!!!
  draw();

Clearly, we could also support renaming at the boundary, like this:

  // zero.js:
  module JQ = load 'jquery.js';
  module Gun = load 'footgun.js';
  module One = load 'one.js' with {Drawing: Gun, jQuery: JQ};

== Solution 2: Catalog per loader ==

A module loader represents a community of mutually independent module
instances which together form a coherent subsystem. If there is to be
any sharing of module instances, and absent URL equality, these
modules must be written with some knowledge of the "community" in
which they live.

Perhaps a good way to understand this is to make the analogy with
Linux distros. A distro effectively maps memorable, agreed-upon names
like "libmpeg2" to concrete software resources. There is nothing
keeping a dozen programmers in the wild from building a dozen
different things and calling them "libmpeg2" -- but, within the Linux
community, and specifically within a distro, there is only one
"libmpeg2". Programs declare their dependency on it, often with a
version identifier.

If we view modules in a loader in the same way, then each loader
should contain a *single* mapping -- call it a catalog -- from
memorable names to concrete resources (perhaps identified by URL).
Each catalog is an implicit community, or agreement point:

  * The catalog of "biology software" hosted at
http://biojavascript.org/catalog.json

  * The catalog of "canvas widgets" hosted at http://cwidgets.org/cat.json

  * A meta-catalog of "general stuff" hosted at http://allmyjs.org/root.json
        Perhaps there are distinct versions:
              http://allmyjs.org/distros/distro3.1.8.json
              http://allmyjs.org/distros/distro4.2.9.json

All modules in a loader would be expected to use the same catalog.
Catalogs could refer to one another, and so reuse one anothers'
bindings. This would allow all programmers to predict the effect of
using a new name. In the example of the modified "two.js" previously,
the statement:

  import Drawing.draw;

would work but, by referring to the same catalog, the authors of all
three of "one.js", "two.js" and "three.js" would have agreed that the
symbol Drawing stands for something that operates a footgun and treat
it with due care. The catalog would contain an entry like:

  {
    Drawing: "http://.../footgun.js",
    ...
  }

That all said, I don't yet have a good solution for how a module would
declare that it is defined relative to a specific catalog. It's not
enough to add an extra argument -- the catalog -- to the constructor
of a module loader. The individual modules would have to specify their
dependency on the catalog. This requires extra syntax and machinery. I
am loath to add either.

== Solution 3: Forming a more generative Union ==

Going back to basics, let's ask ourselves: Why is this such a problem
in the first place? In other words, why is it so important that
"one.js" and "two.js" use the *same* jQuery and Drawing modules? There
are a variety of answers, including:

* Performance optimization: The code can be shared and memory saved.

* Shared state: Each module contains important shared state (such as
cached information about the DOM, or off-screen bitmaps used to
double-buffer a <canvas> widget module) which its clients need to
share in order to properly collaborate according to the module's
rules.

* Programmer familiarity: Programmers are accustomed to dividing up
their program into sections, each of which is logically a singleton in
"the system", and establishing communication between them.

What if we assumed that the result of "importing" a module is just the
code of the module, ready to be instantiated with external state? In
low-level terms, a module represents a "code segment" in memory which
may be shared between its instantiations.

The effect of this choice is that dependencies between software are
written using direct object passing, rather than attempts to denote
the "same" module. Each module specifies as free variables the objects
it requires from its caller and, when it loads another module, it does
so with no expectation that what it gets is shared with anyone else.
So "two.js" could start out like:

  // two.js
  /** @require jQuery a jQuery instance */
  jQuery.ajax();

and could move to:

  // two.js
  /** @require jQuery a jQuery instance */
  module Drawing = load "http://.../picturesOfCats.js" with {
    jQuery: jQuery
  };
  jQuery.ajax();
  Drawing.draw();

The difference between this and Solution 1 (and the original Simple
Modules strawman) is that there are no promises made that are not
explicitly wired. With Solution 1, there is still lack of clarity
about what a "singleton" module represents -- depending on how modules
are redefined down the chain of loadings, some singletons are more
single than others. With this solution, only object APIs may define
the behavior expected, and nothing is naturally expected to be a
singleton.

To put it another way, if the original Simple Modules and Solution 1
are taken to their logical conclusion, one must assume, because
modules may be redefined along the loading chain, that nothing is
*really* a singleton. If one is to code defensively against this
situation anyway, why not make this the default and gain the attendant
simplicity?

Let's revisit the points brought up earlier:

* Performance optimization: This is up to the implementation. For
example, a perfectly reasonable implementation can do a HEAD request
for every URL loaded and, if it detects no change, reuse the code.

* Shared state: Shared state is now represented explicitly using objects.

* Programmer familiarity: It's really not that bad. :) Programs
continue to be written with free variables, just as <script>s are.
Programmers learn to introduce concrete objects into the lexical scope
of their programs. And they write "export" statements to export
variables back.

To optimize this a bit, it is possible to introduce a concept of
"packages" (under active debate in CommonJS at the moment) to gather
things up. This improves, but does not intrinsically modify, the model
presented here.

== Afterword ==

Some of the solutions I present here are similar to proposals I have
already made. In all sincerity, I cannot help that. But, if there are
other solutions extant, I assure you that my brain is open. :)

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org
https://mail.mozilla.org/listinfo/es-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100526/836e6dc4/attachment-0001.html>

# ihab.awad at gmail.com (15 years ago)

On Wed, May 26, 2010 at 7:34 PM, Kam Kasravi <kamkasravi at yahoo.com> wrote:

This problem has many similarities to the XML / WSDL world. The use of namespaces and versioning has been leveraged there to disambiguate names ...

That's a very interesting idea, thank you. I whiteboarded this a bit and here's what I came up with.

I will refer to management of "modules" here in order not to introduce extra machinery. However, we would wish to use this scheme to manage collections of modules that are distributed and developed together. So, for example, instead of using this scheme to manage the Accordion module, we would use it to manage the entirety of YUI, and select individual pieces from that. I will try to extend from individual modules to collections at the end of the post.

Assume each module has a canonical URL (cURL). This serves as a global name for the module, not a location. It incorporates DNS addresses and thus allows for decentralized name assignment. As you point out, it follows the precedent of W3C usage. For example, some cURLs might be:

yahoo.com/frameworks/YUI, jquery.com/jquery

The guidance about whether your library should be assigned a new cURL or given an existing one is that you have to answer the question: "Does it make sense for multiple independent instances of my library to be running within the same loader?" If the answer is "Yes", then you should use a separate cURL. If not, then you shouldn't. Practically speaking, distinct versions of a library should not have distinct cURLs, but libraries that do completely different things should.

To instantiate a loader, you supply a catalog. The catalog maps cURLs to actual code resources. It may do so in any of a number of ways: by consulting some third party; by keeping a literal table; by implementing some general code that does a Google search; whatever. The important thing is the contract of a catalog: For each distinct cURL, there exists zero or one code resources:

yahoo.com/frameworks/YUI => retrieve "http://developer.yahoo.com/yui/3/download/yui-min.js" jquery.com/jquery => retrieve "code.jquery.com/jquery-1.4.2.min.js" [anything else] => [undefined]

If a catalog does not contain (in the general sense, meaning "cannot locate") a code resource for a cURL, it fails fast; there is no facility for an individual module to render its own "opinion" about what a cURL maps to. If the module wishes to implement a new or extended (cURL -> code) mapping, it does so by instantiating a new

loader.

To review: the usefulness of cURLs is that they provide a reasonable, distributed, Webby way to manage the key space of catalogs. They do not themselves implement the catalogs. One can imagine sites out there that host catalogs, for everyone's convenience, and one can connect one's loader to one of these; build code that uses these catalogs but overrides a couple of entries; whatever.

The naming "substitution" attack scenario is safe. If I go rogue and decide to camp on the name "jquery.com/jquery" with my own crazy library, nothing prevents me -- but I have to go convince the authors of all the catalogs in the world to map:

jquery.com/jquery => retrieve "ihabsmalware.com/badjquery.js"

which is an unlikely scenario.

As for collections of modules, we can map cURLs to some notion of "packages", or we can design "entry point" JS files that pull in all the contents of their respective modules, or whatever. I'm sure we can come up with a design were we to accept this approach.

 *   *   *   *   *

The following are the remaining questions in my mind:

How much should we standardize? To my mind, the use of cURLs simply provides a plausibility argument for having "globally recognizable" keys in catalogs, but should not be mandated.
How does this support the casual development model where a developer writes up some module but does not attempt (or does not have the resources) to choose a cURL for it? How can this module be used in a simple manner? Clearly, if this module is to be used in a loader, and if it wishes to be treated as a singleton, the loader should assign it a name unique within that loader. How is that done? By using the exact location URL for that library as the library's cURL? That means we are doing "url equality" -- an evil. But at least we are doing it in a controlled manner....
What is the syntax for importing? I imagine something like a library using jQuery saying:

from "jquery.com/jquery" import { ... };

Ihab

Hi Kam,

On Wed, May 26, 2010 at 7:34 PM, Kam Kasravi <kamkasravi at yahoo.com> wrote:
> This problem has many similarities to the XML / WSDL world.
> The use of namespaces and versioning has been leveraged there to
> disambiguate names ...

That's a very interesting idea, thank you. I whiteboarded this a bit
and here's what I came up with.

I will refer to management of "modules" here in order not to introduce
extra machinery. However, we would wish to use this scheme to manage
*collections* of modules that are distributed and developed together.
So, for example, instead of using this scheme to manage the Accordion
module, we would use it to manage the entirety of YUI, and select
individual pieces from that. I will try to extend from individual
modules to collections at the end of the post.

Assume each module has a canonical URL (cURL). This serves as a global
*name* for the module, not a location. It incorporates DNS addresses
and thus allows for decentralized name assignment. As you point out,
it follows the precedent of W3C usage. For example, some cURLs might
be:

  http://yahoo.com/frameworks/YUI
  http://jquery.com/jquery

The guidance about whether your library should be assigned a new cURL
or given an existing one is that you have to answer the question:
"Does it make sense for multiple independent instances of my library
to be running within the same loader?" If the answer is "Yes", then
you should use a separate cURL. If not, then you shouldn't.
Practically speaking, distinct versions of a library should *not* have
distinct cURLs, but libraries that do completely different things
should.

To instantiate a loader, you supply a catalog. The catalog maps cURLs
to actual code resources. It may do so in any of a number of ways: by
consulting some third party; by keeping a literal table; by
implementing some general code that does a Google search; whatever.
The important thing is the contract of a catalog: For each distinct
cURL, there exists zero or one code resources:

  http://yahoo.com/frameworks/YUI
      => retrieve "http://http://developer.yahoo.com/yui/3/download/yui-min.js"
  http://jquery.com/jquery
      => retrieve "http://code.jquery.com/jquery-1.4.2.min.js"
  [anything else]
      =>  [undefined]

If a catalog does not contain (in the general sense, meaning "cannot
locate") a code resource for a cURL, it fails fast; there is no
facility for an individual module to render its own "opinion" about
what a cURL maps to. If the module wishes to implement a new or
extended (cURL -> code) mapping, it does so by instantiating a new
loader.

To review: the usefulness of cURLs is that they provide a reasonable,
distributed, Webby way to manage the *key* space of catalogs. They do
not themselves implement the catalogs. One can imagine sites out there
that *host* catalogs, for everyone's convenience, and one can connect
one's loader to one of these; build code that uses these catalogs but
overrides a couple of entries; whatever.

The naming "substitution" attack scenario is safe. If I go rogue and
decide to camp on the name "http://jquery.com/jquery" with my own
crazy library, nothing prevents me -- *but* I have to go convince the
authors of all the catalogs in the world to map:

  http://jquery.com/jquery
      => retrieve "http://ihabsmalware.com/badjquery.js"

which is an unlikely scenario.

As for collections of modules, we can map cURLs to some notion of
"packages", or we can design "entry point" JS files that pull in all
the contents of their respective modules, or whatever. I'm sure we can
come up with a design were we to accept this approach.

     *   *   *   *   *

The following are the remaining questions in my mind:

1. How much should we standardize? To my mind, the use of cURLs simply
provides a plausibility argument for having "globally recognizable"
keys in catalogs, but should not be mandated.

2. How does this support the casual development model where a
developer writes up some module but does not attempt (or does not have
the resources) to choose a cURL for it? How can this module be used in
a simple manner? Clearly, if this module is to be used in a loader,
*and* if it wishes to be treated as a singleton, the loader should
assign it a name unique within that loader. How is that done? By using
the exact *location* URL for that library as the library's cURL? That
means we are doing "url equality" -- an evil. But at least we are
doing it in a controlled manner....

3. What is the syntax for importing? I imagine something like a
library using jQuery saying:

      from "http://jquery.com/jquery" import { ... };

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA

# David Herman (15 years ago)

Thanks for your thoughts on this. I'll just respond to what I understand to be your main points.

The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture

Yes, implicit linking means that you can mis-link. In my mind, the questions are how much of a hazard this really is, how much architecture it would require imposing on programmers to address it, and what we lose if we do that.

Many of your suggested alternatives involve explicit linking. Explicit linking certainly has its appeal: it lets you really say exactly what you mean. But there are massive differences in the convenience of explicit and implicit linking systems. Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

The author cannot prevent this capture by controlling the environment of "two.js", nor can they foresee all possible environments such as "one.js" in which they may be embedded.

This is true. Now, it's true regardless of whether linking is implicit or explicit. Either way, if the interface of the library changes in the wild to add another dependency, its clients will likely break. The difference is that with explicit linking it will necessarily fail whereas with implicit linking, it might a) succeed, b) fail by not receiving a module binding, or c) fail unpredictably by receiving the wrong binding. You said as much, I'm just calling out the fact that no module system can change the fact that code changes and programmers must deal versioning.

I claim we must fail fast.

Failing fast would be nice, but I'm not convinced it's a necessity. We are not trying to solve all versioning problems ever. People can easily add version information to their modules with whatever protocols they like, and we don't need to enforce them. There are plenty of over-engineered library systems in use today, with crypto hashes and byzantine manifest formats etc etc, and they're a nightmare for programmers.

module One = load 'one.js' with {Drawing, jQuery};

This is just the kind of thing that looks nice with a single example, but as soon as you put it into practice it gets out of control. Watch what happens with even a trivial cyclic dependency:

module Even = load 'even.js' with { Even: Even, Odd: Odd };
module Odd = load 'odd.js' with { Even: Even, Odd: Odd };

Now imagine what happens to the combinatorics as your program size increases.

So then you either go down the road of trying to build a more expressive language for wiring together the module graph (that way madness lies), or you fall back to first-class modules-as-objects and programmers have to wire together the module graph by mutating objects.

In my experience, explicit linking is the better-is-better solution that makes programmers' lives harder for not enough gain.

== Solution 3: Forming a more generative Union ==

I didn't understand all this, but eliminating side effects in modules is not going to change the fact that when you load different bits, you get a different module. Solutions involving canonicalization are either going to be too brittle (e.g., trust the user to provide a single, stable set of bits for a given canonical name) or too clunky (e.g., crypto-hashes, a total non-starter).

Hi Ihab,

Thanks for your thoughts on this. I'll just respond to what I understand to be your main points.

> The problem is that, in the space of module names, the current Simple
> Modules strawman introduces a hazard of inadvertent name capture

Yes, implicit linking means that you can mis-link. In my mind, the questions are how much of a hazard this really is, how much architecture it would require imposing on programmers to address it, and what we lose if we do that.

Many of your suggested alternatives involve explicit linking. Explicit linking certainly has its appeal: it lets you really say exactly what you mean. But there are massive differences in the convenience of explicit and implicit linking systems. Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

> The author cannot prevent this capture by controlling the
> environment of "two.js", nor can they foresee all possible
> environments such as "one.js" in which they may be embedded.

This is true. Now, it's true regardless of whether linking is implicit or explicit. Either way, if the interface of the library changes in the wild to add another dependency, its clients will likely break. The difference is that with explicit linking it will necessarily fail whereas with implicit linking, it might a) succeed, b) fail by not receiving a module binding, or c) fail unpredictably by receiving the wrong binding. You said as much, I'm just calling out the fact that no module system can change the fact that code changes and programmers must deal versioning.

> I claim we must fail fast.

Failing fast would be nice, but I'm not convinced it's a necessity. We are not trying to solve all versioning problems ever. People can easily add version information to their modules with whatever protocols they like, and we don't need to enforce them. There are plenty of over-engineered library systems in use today, with crypto hashes and byzantine manifest formats etc etc, and they're a nightmare for programmers.

>  module One = load 'one.js' with {Drawing, jQuery};

This is just the kind of thing that looks nice with a single example, but as soon as you put it into practice it gets out of control. Watch what happens with even a trivial cyclic dependency:

    module Even = load 'even.js' with { Even: Even, Odd: Odd };
    module Odd = load 'odd.js' with { Even: Even, Odd: Odd };

Now imagine what happens to the combinatorics as your program size increases.

So then you either go down the road of trying to build a more expressive language for wiring together the module graph (that way madness lies), or you fall back to first-class modules-as-objects and programmers have to wire together the module graph by mutating objects.

In my experience, explicit linking is the better-is-better solution that makes programmers' lives harder for not enough gain.

> == Solution 3: Forming a more generative Union ==

I didn't understand all this, but eliminating side effects in modules is not going to change the fact that when you load different bits, you get a different module. Solutions involving canonicalization are either going to be too brittle (e.g., trust the user to provide a single, stable set of bits for a given canonical name) or too clunky (e.g., crypto-hashes, a total non-starter).

Dave

# Brendan Eich (15 years ago)

On May 27, 2010, at 4:19 PM, David Herman wrote:

The problem is that, in the space of module names, the current Simple Modules strawman introduces a hazard of inadvertent name capture

Yes, implicit linking means that you can mis-link. In my mind, the
questions are how much of a hazard this really is, how much
architecture it would require imposing on programmers to address it,
and what we lose if we do that.

Many of your suggested alternatives involve explicit linking.
Explicit linking certainly has its appeal: it lets you really say
exactly what you mean. But there are massive differences in the
convenience of explicit and implicit linking systems. Years of PL
research and experience have demonstrated that explicit linking
tends to be unwieldy and inconvenient.

Last I checked, CommonJS (ignoring packages, which NodeJS and others
are avoiding) uses implicit linking via the filesystem.

This is extremely common -- see Python and many other languages. It is
so convenient than any explicit-linking system delivered in an Ecma de- jure standard will beget implicit linking systems built on top (like
CommonJS's) in the wild, imposing costs that we could head off by
standardizing implicit linking. Which simple modules proposes.

Hazards involve trade-offs. There's no risk-free solution. It is hard
to prove decisively that implicit linking won't lead to some bad name
dependency bug down the road -- it could. But I argue that it's
inevitable in the wild if we try to impose explicit linking, so we
should not standardize around it.

Rather, we should try to standardize a module system people will use
well, which restores lexical scoping all the way up (ridding us of the
global object), which improves integrity (const exports, shallowly
frozen if functions, etc.), and which does not instantly beget another
module system (or non-standard systems) on its back for want of
implicit-linking convenience.

On May 27, 2010, at 4:19 PM, David Herman wrote:

>> The problem is that, in the space of module names, the current Simple
>> Modules strawman introduces a hazard of inadvertent name capture
>
> Yes, implicit linking means that you can mis-link. In my mind, the  
> questions are how much of a hazard this really is, how much  
> architecture it would require imposing on programmers to address it,  
> and what we lose if we do that.
>
> Many of your suggested alternatives involve explicit linking.  
> Explicit linking certainly has its appeal: it lets you really say  
> exactly what you mean. But there are massive differences in the  
> convenience of explicit and implicit linking systems. Years of PL  
> research and experience have demonstrated that explicit linking  
> tends to be unwieldy and inconvenient.

Last I checked, CommonJS (ignoring packages, which NodeJS and others  
are avoiding) uses implicit linking via the filesystem.

This is extremely common -- see Python and many other languages. It is  
so convenient than any explicit-linking system delivered in an Ecma de- 
jure standard will beget implicit linking systems built on top (like  
CommonJS's) in the wild, imposing costs that we could head off by  
standardizing implicit linking. Which simple modules proposes.

Hazards involve trade-offs. There's no risk-free solution. It is hard  
to prove decisively that implicit linking won't lead to some bad name  
dependency bug down the road -- it could. But I argue that it's  
inevitable in the wild if we try to impose explicit linking, so we  
should not standardize around it.

Rather, we should try to standardize a module system people will use  
well, which restores lexical scoping all the way up (ridding us of the  
global object), which improves integrity (const exports, shallowly  
frozen if functions, etc.), and which does not instantly beget another  
module system (or non-standard systems) on its back for want of  
implicit-linking convenience.

/be

# ihab.awad at gmail.com (15 years ago)

Sorry for the slow reply -- was sick....

On Thu, May 27, 2010 at 4:19 PM, David Herman <dherman at mozilla.com> wrote:

Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

That needs to be added to my reading list. Cite away! :)

People can easily add version information to their modules with whatever protocols they like, and we don't need to enforce them. ...

People are already creating module systems with versioning information (see CommonJS). We need to make the world safe for them.

module Even = load 'even.js' with { Even: Even, Odd: Odd };

module Odd = load 'odd.js' with { Even: Even, Odd: Odd };

With concise object literals, would that not be:

module Even = load 'even.js' with { Odd };
module Odd = load 'odd.js' with { Even };

In my experience, explicit linking is the better-is-better solution that

makes programmers' lives harder for not enough gain.

But didn't you hear? Worse is also worse:

dreamsongs.com/Files/worse-is-worse.pdf

Hi Dave,

Sorry for the slow reply -- was sick....

On Thu, May 27, 2010 at 4:19 PM, David Herman <dherman at mozilla.com> wrote:

> Years of PL research and experience have demonstrated that explicit linking
> tends to be unwieldy and inconvenient.
>

That needs to be added to my reading list. Cite away! :)

> People can easily add version information to their modules with whatever
> protocols they like, and we don't need to enforce them. ...
>

People are already creating module systems with versioning information (see
CommonJS). We need to make the world safe for them.

   module Even = load 'even.js' with { Even: Even, Odd: Odd };
>    module Odd = load 'odd.js' with { Even: Even, Odd: Odd };
>

With concise object literals, would that not be:

    module Even = load 'even.js' with { Odd };
    module Odd = load 'odd.js' with { Even };

In my experience, explicit linking is the better-is-better solution that
> makes programmers' lives harder for not enough gain.
>

But didn't you hear? Worse is also worse:

  http://dreamsongs.com/Files/worse-is-worse.pdf

# ihab.awad at gmail.com (15 years ago)

On Thu, May 27, 2010 at 6:57 PM, Brendan Eich <brendan at mozilla.com> wrote:

Last I checked, CommonJS (ignoring packages, which NodeJS and others are avoiding) uses implicit linking via the filesystem.

CommonJS is right now working on much the same issues: what does it mean for two modules to be the "same"? There is an active discussion, and a variety of implemented and in-process specs including:

wiki.commonjs.org/wiki/Packages/Mappings/B

The common desire is to be able to grant a package (of modules) autonomy regarding what it depends upon. This is by no means a filesystem-only approach, and the "implicit"-ness of the linking is debatable given that it is driven by pretty extensive metadata.

Now, a bit of background. In CommonJS, the expression:

require('foo/bar/baz')

means, "find the module 'foo/bar/baz' in whatever package mappings may exist; instantiate a singleton in the current sandbox; and return the singleton". A CommonJS sandbox is equivalent to Sam and Dave's "loader".

With that in mind, NodeJS does not, at this point, implement an API for building a new sandbox/loader:

nodejs.org/api.html

This means that the singleton of 'foo/bar/baz' is a singleton in an entire OS process. They are doing just what Python has done for a long time, based on filesystems, and are not trying to deal with the problem of multiple loaders connected by object references.

This is extremely common -- see Python and many other languages.

Yes, and Python and these other languages pull modules out of a centrally curated PATH of some sort wherein 'foo/bar/baz' has an unambiguous meaning to all other modules "installed" on the "system". We are tasked with creating a module system for a world in which these initial conditions do not apply.

It is so convenient than any explicit-linking system delivered in an Ecma

de-jure standard will beget implicit linking systems built on top (like CommonJS's) in the wild, imposing costs that we could head off by standardizing implicit linking. Which simple modules proposes.

At this point, it's not clear where CommonJS is going to end up after navel-gazing over the problem of distributed module management. It is clear that they have not arrived at the current "simple modules" design. Pace the definition of implicit vs. explicit linking, I think the problem deserves some further thought. I'm on the hook to provide a clear restatement of my Solution 2 (which, I repeat, may not solve the problem but will I hope clarify some issues), and I have every intention of delivering. :)

Ihab

Hi Brendan,

On Thu, May 27, 2010 at 6:57 PM, Brendan Eich <brendan at mozilla.com> wrote:

> Last I checked, CommonJS (ignoring packages, which NodeJS and others are
> avoiding) uses implicit linking via the filesystem.
>

CommonJS is right now working on much the same issues: what does it mean for
two modules to be the "same"? There is an active discussion, and a variety
of implemented and in-process specs including:

  http://wiki.commonjs.org/wiki/Packages/Mappings/B

The common desire is to be able to grant a package (of modules) autonomy
regarding what it depends upon. This is by no means a filesystem-only
approach, and the "implicit"-ness of the linking is debatable given that it
is driven by pretty extensive metadata.

Now, a bit of background. In CommonJS, the expression:

  require('foo/bar/baz')

means, "find the module 'foo/bar/baz' in whatever package mappings may
exist; instantiate a singleton in the current sandbox; and return the
singleton". A CommonJS sandbox is equivalent to Sam and Dave's "loader".

With that in mind, NodeJS does not, at this point, implement an API for
building a new sandbox/loader:

  http://nodejs.org/api.html

This means that the singleton of 'foo/bar/baz' is a singleton in an entire
OS process. They are doing just what Python has done for a long time, based
on filesystems, and are *not* trying to deal with the problem of multiple
loaders connected by object references.

This is extremely common -- see Python and many other languages.

Yes, and Python and these other languages pull modules out of a centrally
curated PATH of some sort wherein 'foo/bar/baz' has an unambiguous meaning
to all other modules "installed" on the "system". We are tasked with
creating a module system for a world in which these initial conditions do
not apply.

It is so convenient than any explicit-linking system delivered in an Ecma
> de-jure standard will beget implicit linking systems built on top (like
> CommonJS's) in the wild, imposing costs that we could head off by
> standardizing implicit linking. Which simple modules proposes.
>

At this point, it's not clear where CommonJS is going to end up after
navel-gazing over the problem of distributed module management. It is clear
that they have not arrived at the current "simple modules" design. _Pace_
the definition of implicit vs. explicit linking, I think the problem
deserves some further thought. I'm on the hook to provide a clear
restatement of my Solution 2 (which, I repeat, may not solve the problem but
will I hope clarify some issues), and I have every intention of delivering.
:)

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100601/f4c3dedf/attachment.html>

# ihab.awad at gmail.com (15 years ago)

As promised, more detail on Solution 2.

Reviewing the problematic case:

// zero.js:
module jQuery = load 'jquery.js';
module Drawing = load 'footgun.js';
module One = load 'one.js';

// one.js:
import jQuery.ajax;
module Two = load 'two.js';

// two.js:
import jQuery.ajax;
import Drawing.draw;
draw(); // intended to draw a picture

There are dual formulations of the problem:

(a) The author of "one.js" is not given the opportunity to intermediate the name propagation -- they can neither control the names introduced by "zero.js", nor hide them from the view of "two.js"; or

(b) The names used in "import" statements are short, local nicknames, admitting of diverse interpretations as to their semantics.

This specific solution to the problem attempts to show the consequences of approach (b). In other words, if we got everyone to "import" via names that have a low likelihood of collision, then modules "zero.js", "one.js" and "two.js" can, assuming good intentions (our scheme is not intended to guard against malice), cooperate to populate that namespace.

We can construct long-winded names that are unlikely to collide, perhaps using DNS. Therefore, a module can:

import ... "com.drawings.art" ...;

or, to use XML namespace-like URIs-as-names:

import ... "http://art.drawings.com/module" ...;

The mapping from the uttered name to an actual code resource can occur in any of a number of places. Perhaps each loader has a single mapping. Perhaps each module can override the mappings for each module that it "load"s. The point is that these names are known to be distinct from:

import ... "com.guns.footshooting" ...;

The important thing is that we are now importing names, not just capturing program identifiers, and the semantics are those of a map lookup somewhere.

Ihab

As promised, more detail on Solution 2.

Reviewing the problematic case:

    // zero.js:
    module jQuery = load 'jquery.js';
    module Drawing = load 'footgun.js';
    module One = load 'one.js';

    // one.js:
    import jQuery.ajax;
    module Two = load 'two.js';

    // two.js:
    import jQuery.ajax;
    import Drawing.draw;
    draw(); // intended to draw a picture

There are dual formulations of the problem:

(a) The author of "one.js" is not given the opportunity to intermediate the
name propagation -- they can neither control the names introduced by
"zero.js", nor hide them from the view of "two.js"; or

(b) The names used in "import" statements are short, local nicknames,
admitting of diverse interpretations as to their semantics.

This specific solution to the problem attempts to show the consequences of
approach (b). In other words, if we got everyone to "import" via names that
have a low likelihood of collision, then modules "zero.js", "one.js" and
"two.js" can, assuming good intentions (our scheme is not intended to guard
against malice), cooperate to populate that namespace.

We can construct long-winded names that are unlikely to collide, perhaps
using DNS. Therefore, a module can:

    import ... "com.drawings.art" ...;

or, to use XML namespace-like URIs-as-names:

    import ... "http://art.drawings.com/module" ...;

The mapping from the uttered name to an actual code resource can occur in
any of a number of places. Perhaps each loader has a single mapping. Perhaps
each module can override the mappings for each module that it "load"s. The
point is that these names are known to be distinct from:

   import ... "com.guns.footshooting" ...;

The important thing is that we are now importing *names*, not just capturing
program identifiers, and the semantics are those of a map lookup somewhere.

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100601/ed7c798d/attachment-0001.html>

# Waldemar Horwat (15 years ago)

ihab.awad at gmail.com wrote:

// one.js: import jQuery.ajax; module Two = load 'two.js';

// two.js: import jQuery.ajax;

At the time that "one.js" was written, "two.js" did not contain a reference to Drawing. Now, unbeknownst to the author of "one.js", "two.js" changed and now refers to something named Drawing, which it expected to draw a picture:

// two.js: import jQuery.ajax; import Drawing.draw; draw(); // intended to draw a picture

I don't understand your example of how this is supposed to work in the regular (non-accidental-aliasing) case. As you wrote in your example, two.js evolves to reference the identifier "Drawing" unbeknownst to one.js. There is no definition of it, so two.js wouldn't work at all.

Waldemar

ihab.awad at gmail.com wrote:
>   // one.js:
>   import jQuery.ajax;
>   module Two = load 'two.js';
> 
>   // two.js:
>   import jQuery.ajax;
> 
> At the time that "one.js" was written, "two.js" did not contain a
> reference to Drawing. Now, unbeknownst to the author of "one.js",
> "two.js" changed and now refers to something named Drawing, which it
> expected to draw a picture:
> 
>   // two.js:
>   import jQuery.ajax;
>   import Drawing.draw;
>   draw(); // intended to draw a picture

I don't understand your example of how this is supposed to work in the regular (non-accidental-aliasing) case.  As you wrote in your example, two.js evolves to reference the identifier "Drawing" unbeknownst to one.js.  There is no definition of it, so two.js wouldn't work at all.

    Waldemar

# ihab.awad at gmail.com (15 years ago)

On Tue, Jun 1, 2010 at 6:26 PM, Waldemar Horwat <waldemar at google.com> wrote:

I don't understand your example of how this is supposed to work in the regular (non-accidental-aliasing) case. As you wrote in your example, two.js evolves to reference the identifier "Drawing" unbeknownst to one.js. There is no definition of it, so two.js wouldn't work at all.

[ I hope I understand your question. ]

In my original example, "zero.js" defined "Drawing". According to the current proposal, this would be propagated down to "two.js".

Does that help? Or do I misunderstand?

Ihab

On Tue, Jun 1, 2010 at 6:26 PM, Waldemar Horwat <waldemar at google.com> wrote:

> I don't understand your example of how this is supposed to work in the
> regular (non-accidental-aliasing) case.  As you wrote in your example,
> two.js evolves to reference the identifier "Drawing" unbeknownst to one.js.
>  There is no definition of it, so two.js wouldn't work at all.
>

[ I hope I understand your question. ]

In my original example, "zero.js" defined "Drawing". According to the
current proposal, this would be propagated down to "two.js".

Does that help? Or do I misunderstand?

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100601/49206bdc/attachment.html>

# David Herman (15 years ago)

Sorry for the slow reply -- was sick....

No worries-- hope you're feeling better.

Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.

That needs to be added to my reading list. Cite away! :)

ML is dead; what more evidence do you need? ;)

Really, though, the research literature on modules is enormous. I don't have the time or inclination to provide a full bibliography. Personally, I've worked with several advanced, explicitly-linked module systems, including ML functors and PLT Scheme units.

With concise object literals, would that not be:
module Even = load 'even.js' with { Odd };
module Odd = load 'odd.js' with { Even };

Possibly, depending on whether you want to present modules to themselves as well.

But really, I've seen it before: these kinds of specification languages for module graphs spin out of control. You'll wish you had the ability to abstract the thing on the RHS of "with" -- and then you'll have to introduce the complexity of compile-time bindings of module graphs, and figure out how to shoe-horn those into the existing syntax and semantics. Or, you'll hold the line and force programmers to keep writing out the full module graph over and over again, in which case they just won't ever use modules at all.

But seriously: I am not necessarily suggesting explicit linking (however defined). I am pointing out the necessary consequences of a dangerous design that promises more than it can deliver.

You've not demonstrated that.

> Sorry for the slow reply -- was sick....

No worries-- hope you're feeling better.

> Years of PL research and experience have demonstrated that explicit linking tends to be unwieldy and inconvenient.
> 
> That needs to be added to my reading list. Cite away! :)

ML is dead; what more evidence do you need? ;)

Really, though, the research literature on modules is enormous. I don't have the time or inclination to provide a full bibliography. Personally, I've worked with several advanced, explicitly-linked module systems, including ML functors and PLT Scheme units.

> With concise object literals, would that not be:
> 
>     module Even = load 'even.js' with { Odd };
>     module Odd = load 'odd.js' with { Even };

Possibly, depending on whether you want to present modules to themselves as well.

But really, I've seen it before: these kinds of specification languages for module graphs spin out of control. You'll wish you had the ability to abstract the thing on the RHS of "with" -- and then you'll have to introduce the complexity of compile-time bindings of module graphs, and figure out how to shoe-horn those into the existing syntax and semantics. Or, you'll hold the line and force programmers to keep writing out the full module graph over and over again, in which case they just won't ever use modules at all.

> But seriously: I am not *necessarily* suggesting explicit linking (however defined). I am pointing out the necessary consequences of a dangerous design that promises more than it can deliver.

You've not demonstrated that.

Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100602/983f27d3/attachment.html>

# ihab.awad at gmail.com (15 years ago)

We are having two discussions here:

Discussion of the relative merits of explicit linking in its various forms; and
Discussion of the specifics of the current proposal for implicit linking, and alternatives holding fixed the initial condition that implicit linking is a desideratum.

I'm mostly interested in the second discussion, but I do not wish to let slip by some important distinctions regarding the first. As such, I will respond to the first here, and to the second in a separate email.

On Wed, Jun 2, 2010 at 10:38 AM, David Herman <dherman at mozilla.com> wrote:

I don't have the time or inclination to provide a full bibliography.

I consider your argument withdrawn, then.

Personally, I've worked with several advanced, explicitly-linked module systems, including ML functors and PLT Scheme units.

I'd be interested to hear more about your experience.

module Even = load 'even.js' with { Odd };
module Odd = load 'odd.js' with { Even };
Possibly, depending on whether you want to present modules to themselves as well.

As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.

As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.

Ihab

Hi Dave,

We are having two discussions here:

* Discussion of the relative merits of explicit linking in its various
forms; and

* Discussion of the specifics of the current proposal for implicit linking,
and alternatives *holding fixed* the initial condition that implicit linking
is a desideratum.

I'm mostly interested in the second discussion, but I do not wish to let
slip by some important distinctions regarding the first. As such, I will
respond to the first here, and to the second in a separate email.

On Wed, Jun 2, 2010 at 10:38 AM, David Herman <dherman at mozilla.com> wrote:

> I don't have the time or inclination to provide a full bibliography.
>

I consider your argument withdrawn, then.

> Personally, I've worked with several advanced, explicitly-linked module
> systems, including ML functors and PLT Scheme units.
>

I'd be interested to hear more about your experience.

>     module Even = load 'even.js' with { Odd };
>     module Odd = load 'odd.js' with { Even };
>
> Possibly, depending on whether you want to present modules to themselves as
> well.
>

As I believe we discussed in our most recent f2f, it is possible to provide
modular code with access to its own reified module instance via some
distinguished symbol (e.g., "this" at the top level). And of course, modular
code always has direct individual access to its own exports.

As recast, therefore, the example introduces Odd to "even.js" and Even to
"odd.js". It's pretty minimal.

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100602/5d2164f6/attachment.html>

# David Herman (15 years ago)

I don't have the time or inclination to provide a full bibliography.

I consider your argument withdrawn, then.

Excuse me? My argument is not "withdrawn" (are we in court?). If you are unaware of decades of prior art on modules, that's not my failing but yours.

My argument was and remains that others have gone down that road, and it's still very much an open research topic how to create module systems that provide the generality of explicit linking with the convenience of implicit linking. See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.

Possibly, depending on whether you want to present modules to themselves as well.

As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.

Hence "depending."

As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.

And yet it's still too expensive. No one will take the step from non-module code to module code. They just won't. Besides, a not-quite-so-bad example of the Odd and Even modules is pretty weak tea.

The point is, you can special-case "this" if you want, but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.

> I don't have the time or inclination to provide a full bibliography.
> 
> I consider your argument withdrawn, then.

Excuse me? My argument is not "withdrawn" (are we in court?). If you are unaware of decades of prior art on modules, that's not my failing but yours.

My argument was and remains that others have gone down that road, and it's still very much an open research topic how to create module systems that provide the generality of explicit linking with the convenience of implicit linking. See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.

> Possibly, depending on whether you want to present modules to themselves as well.
> 
> As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.

Hence "depending."

> As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.

And yet it's still too expensive. No one will take the step from non-module code to module code. They just won't. Besides, a not-quite-so-bad example of the Odd and Even modules is pretty weak tea.

The point is, you can special-case "this" if you want, but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.

Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100602/1e5f5e8a/attachment.html>

# Kris Kowal (15 years ago)

On Wed, Jun 2, 2010 at 12:14 PM, David Herman <dherman at mozilla.com> wrote:

but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.

I agree that requiring explicit linking is a non-starter. I do however favor the option of explicit linking at some level of granularity. At reasonable expense, Narwhal provides several layers at which someone can buy-into explicit linking:

by manually instantiating a module using the module constructor proferred by the loader.

var module = require.loader.load(id); module(freeVariables);
by manually instantiating a module using a facility of the sandbox that provides the import and export facilities but a.) does not memoize the module and b.) permits additional free variables to be injected. This is useful for creating module-enhanged "DSL's" that permit scripts designed for QUnit or Bogart to be migrated without alteration, subverting their use of global variables with explicitly injected free variables.

require.once(id, freeVariables);
by manually instantiating a system of modules with a preopopulated memo of module instances.

var SANDBOX = require("narwhal/sandbox"); var subRequire = SANDBOX.Sandbox({ "modules": { "even": EVEN } }); var EVEN = subRequire("odd");

I would invoke the axiom, "Simple should be easy, powerful should be possible". It's reasonable to pay for what you get. At the risk of misrepresenting their views, Ihab and Mark have argued that people should always use explicit linking for a variety of reasons, but I for one agree that implicit linking should be the norm, and explicit linking can at least be deferred to the layer of "packages", or coherently designed sets of modules linking to other coherently designed sets of modules. I presume that it is possible to isolate and explicitly link groups of modules.

Kris Kowal

On Wed, Jun 2, 2010 at 12:14 PM, David Herman <dherman at mozilla.com> wrote:
> but if you have a module graph of N modules, and each needs to be
> explicitly linked with N - 1 other modules, then you impose a
> quadratic code-size requirement on programmers. Unless, as I said,
> you beef up your linking-specification language.

I agree that requiring explicit linking is a non-starter.  I do
however favor the option of explicit linking at some level of
granularity.  At reasonable expense, Narwhal provides several layers
at which someone can buy-into explicit linking:

* by manually instantiating a module using the module constructor
  proferred by the loader.

    var module = require.loader.load(id);
    module(freeVariables);

* by manually instantiating a module using a facility of the sandbox
  that provides the import and export facilities but a.) does not
  memoize the module and b.) permits additional free variables to be
  injected.  This is useful for creating module-enhanged "DSL's" that
  permit scripts designed for QUnit or Bogart to be migrated without
  alteration, subverting their use of global variables with explicitly
  injected free variables.

    require.once(id, freeVariables);

* by manually instantiating a system of modules with a preopopulated
  memo of module instances.

    var SANDBOX = require("narwhal/sandbox");
    var subRequire = SANDBOX.Sandbox({
        "modules": {
            "even": EVEN
        }
    });
    var EVEN = subRequire("odd");

I would invoke the axiom, "Simple should be easy, powerful should be
possible".  It's reasonable to pay for what you get.  At the risk of
misrepresenting their views, Ihab and Mark have argued that people
should always use explicit linking for a variety of reasons, but I for
one agree that implicit linking should be the norm, and explicit
linking can at least be deferred to the layer of "packages", or
coherently designed sets of modules linking to other coherently
designed sets of modules.  I presume that it is possible to isolate
and explicitly link groups of modules.

Kris Kowal

# Brendan Eich (15 years ago)

On Jun 1, 2010, at 5:23 PM, ihab.awad at gmail.com wrote:

This is extremely common -- see Python and many other languages.

Yes, and Python and these other languages pull modules out of a
centrally curated PATH of some sort wherein 'foo/bar/baz' has an
unambiguous meaning to all other modules "installed" on the
"system". We are tasked with creating a module system for a world in
which these initial conditions do not apply.

"We are tasked" is a bit much. More important: you don't define the
novel initial conditions, but I'm guessing you mean the Web, which is
of course not centrally curated.

If so, then my response is to dispute your premise and not swallow
whatever conclusion you think follows from it.

It's not clear at all that the Harmony module system has so different
a set of constraints from those facing NodeJS and other users of
CommonJS modules-and-not-packages. Client-side programmers do not
include random modules from uncontrolled domains -- not in production
pages and web apps. Let's dig into this a bit.

I'm not suggesting that we use the web server's doc-tree (filesystem)
for implicit linking, only that some embeddings of the language, in
particular Node, seem to want exactly that. From where Node sits on
the server side, the nearby parts of filesystem are sufficiently well- curated ("centrally" or not). TC39 is trying to take non-browser
embedding use-cases into account.

To turn to the browser embedding, at least these questions seem to be
raised by analogy to the server-side situation:

Is the URL space used by a web app not curated well (centrally or
otherwise), and somehow fatally unreliable?
Is the lexical binding space, a tree of scopes with no object
aliasing badness, which simple modules proposes programmers can create
by writing module declarations in <script type="harmony"> tags,

inherently not well-curated, unlike the case of the server-side
filesystem for a Node app?

I contend that the answers are "no" and "no".

Any web app author has to control URLs and provide the needed
resources, whether they use code.google.com/apis/ajaxlibs or
their own hosted/edge-cached copies of modules.

For in-language module naming (as opposed to URL naming), the simple
modules proposal lets the author of the web page compose modules in
lexical scopes (and only lexical scopes), naming modules with consumer- chosen identifiers and protecting inner bindings within explicit outer
modules if needed.

We're not considering mutual suspicion. However, the usual rules for
production web apps apply: URL provisioning means you don't just
include some uncontrolled version of a module in a production app.

Therefore if there is a good filesystem/PATH curator -- or set of
curators cooperating -- in the Node server-side case, then there's a
good curator or set of curators cooperating in the client-side page or
web app content and the set of modules loaded by that content.

It's true that a module loaded later in a page, app, or other module
may use a free name that ends up bound at top level by the module
loader (and only at top level -- there's no injection possible in the
middle of a module). But this aside, below the top level lexical
scope, all naming is under control of whoever curates the module
source at that level.

The extensibility of the simple modules top-level lexical environment
under the default module loader is neither a fatal flaw ("unhygienic
capture") nor an unalloyed good. On the plus side, it avoids any new,
secondary naming system, package indirection, brittle reverse-DNS
convention, or explicit-linking configuration language. This is a big
plus for most programmers.

At this point, it's not clear where CommonJS is going to end up
after navel-gazing over the problem of distributed module
management. It is clear that they have not arrived at the current
"simple modules" design.

The "require" system is pretty close, but lacking new syntax to help
static analysis -- or really to support second-class modules properly.
Simple modules fills this gap by extending syntax.

Pace the definition of implicit vs. explicit linking, I think the
problem deserves some further thought. I'm on the hook to provide a
clear restatement of my Solution 2 (which, I repeat, may not solve
the problem but will I hope clarify some issues), and I have every
intention of delivering. :)

Ok, but this is a discussion list, and discussion can go on and on. In
TC39, we've been going over first-class module ideas for about 18
months. Until simple modules were proposed this year, we weren't
getting anywhere quickly.

TC39 is not going to wait for novel research. The simple modules
strawman is heading toward prototype implementation and
harmony:proposals status. So some evolution of it is likely to be in
the next edition.

On Jun 1, 2010, at 5:23 PM, ihab.awad at gmail.com wrote:

> This is extremely common -- see Python and many other languages.
>
> Yes, and Python and these other languages pull modules out of a  
> centrally curated PATH of some sort wherein 'foo/bar/baz' has an  
> unambiguous meaning to all other modules "installed" on the  
> "system". We are tasked with creating a module system for a world in  
> which these initial conditions do not apply.

"We are tasked" is a bit much. More important: you don't define the  
novel initial conditions, but I'm guessing you mean the Web, which is  
of course not centrally curated.

If so, then my response is to dispute your premise and not swallow  
whatever conclusion you think follows from it.

It's not clear at all that the Harmony module system has so different  
a set of constraints from those facing NodeJS and other users of  
CommonJS modules-and-not-packages. Client-side programmers do not  
include random modules from uncontrolled domains -- not in production  
pages and web apps. Let's dig into this a bit.

I'm not suggesting that we use the web server's doc-tree (filesystem)  
for implicit linking, only that some embeddings of the language, in  
particular Node, seem to want exactly that. From where Node sits on  
the server side, the nearby parts of filesystem are sufficiently well- 
curated ("centrally" or not). TC39 is trying to take non-browser  
embedding use-cases into account.

To turn to the browser embedding, at least these questions seem to be  
raised by analogy to the server-side situation:

1. Is the URL space used by a web app not curated well (centrally or  
otherwise), and somehow fatally unreliable?

2. Is the lexical binding space, a tree of scopes with no object  
aliasing badness, which simple modules proposes programmers can create  
by writing module declarations in <script type="harmony"> tags,  
inherently not well-curated, unlike the case of the server-side  
filesystem for a Node app?

I contend that the answers are "no" and "no".

Any web app author has to control URLs and provide the needed  
resources, whether they use http://code.google.com/apis/ajaxlibs/ or  
their own hosted/edge-cached copies of modules.

For in-language module naming (as opposed to URL naming), the simple  
modules proposal lets the author of the web page compose modules in  
lexical scopes (and only lexical scopes), naming modules with consumer- 
chosen identifiers and protecting inner bindings within explicit outer  
modules if needed.

We're not considering mutual suspicion. However, the usual rules for  
production web apps apply: URL provisioning means you don't just  
include some uncontrolled version of a module in a production app.

Therefore if there is a good filesystem/PATH curator -- or set of  
curators cooperating -- in the Node server-side case, then there's a  
good curator or set of curators cooperating in the client-side page or  
web app content and the set of modules loaded by that content.

It's true that a module loaded later in a page, app, or other module  
may use a free name that ends up bound at top level by the module  
loader (and only at top level -- there's no injection possible in the  
middle of a module). But this aside, below the top level lexical  
scope, all naming is under control of whoever curates the module  
source at that level.

The extensibility of the simple modules top-level lexical environment  
under the default module loader is neither a fatal flaw ("unhygienic  
capture") nor an unalloyed good. On the plus side, it avoids any new,  
secondary naming system, package indirection, brittle reverse-DNS  
convention, or explicit-linking configuration language. This is a big  
plus for most programmers.

> At this point, it's not clear where CommonJS is going to end up  
> after navel-gazing over the problem of distributed module  
> management. It is clear that they have not arrived at the current  
> "simple modules" design.

The "require" system is pretty close, but lacking new syntax to help  
static analysis -- or really to support second-class modules properly.  
Simple modules fills this gap by extending syntax.

> _Pace_ the definition of implicit vs. explicit linking, I think the  
> problem deserves some further thought. I'm on the hook to provide a  
> clear restatement of my Solution 2 (which, I repeat, may not solve  
> the problem but will I hope clarify some issues), and I have every  
> intention of delivering. :)

Ok, but this is a discussion list, and discussion can go on and on. In  
TC39, we've been going over first-class module ideas for about 18  
months. Until simple modules were proposed this year, we weren't  
getting anywhere quickly.

TC39 is not going to wait for novel research. The simple modules  
strawman is heading toward prototype implementation and  
harmony:proposals status. So some evolution of it is likely to be in  
the next edition.

/be

# Kam Kasravi (15 years ago)

By explicit linking are we talking about mechanism's to unambiguously reference modules that may otherwise be ambiguous? For example in java if 'Node' could refer to a Node class in several different packages, the language allows one to fully qualify the Node class eg foo.Node. Just want to make sure I'm clear on the distinction between explicit and implicit linking.

Thx kam

Hi Dave

By explicit linking are we talking about mechanism's to unambiguously reference modules that may otherwise be ambiguous? For example in java if 'Node' could refer to a Node class in several different packages, the language allows one to fully qualify the Node class eg foo.Node. Just want to make sure I'm clear on the distinction between explicit and implicit linking.

Thx
kam



On Jun 2, 2010, at 12:14 PM, David Herman <dherman at mozilla.com> wrote:

>> I don't have the time or inclination to provide a full bibliography.
>> 
>> I consider your argument withdrawn, then.
> 
> Excuse me? My argument is not "withdrawn" (are we in court?). If you are unaware of decades of prior art on modules, that's not my failing but yours.
> 
> My argument was and remains that others have gone down that road, and it's still very much an open research topic how to create module systems that provide the generality of explicit linking with the convenience of implicit linking. See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.
> 
>> Possibly, depending on whether you want to present modules to themselves as well.
>> 
>> As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.
> 
> Hence "depending."
> 
>> As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.
> 
> And yet it's still too expensive. No one will take the step from non-module code to module code. They just won't. Besides, a not-quite-so-bad example of the Odd and Even modules is pretty weak tea.
> 
> The point is, you can special-case "this" if you want, but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.
> 
> Dave
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100602/0d1e6c6c/attachment.html>

# David Herman (15 years ago)

At some point we switched from "internal vs external" to "implicit vs explicit" (might've been me) but it's the same basic idea. Here are some handy definitions from Owens and Flatt '06 [1]:

"Internal linking supports definite reference to a specific implementation of an interface. Internally linked module systems are common; examples include Java's packages and classes, ML's structures, and Haskell's modules. In each of these systems, two modules are linked when one directly mentions the name of another, either with a dotted path (ModuleName.x), or import statement (import ModuleName).

External linking supports parameterized reference to an arbitrary implementation of an interface. ML's module system includes a functor construct that supports external linking. A functor declares an interface over which its definitions are parameterized; the parameterization is resolved outside of the functor..."

In other words: in an internal linking system, each module names -- directly within its body -- the specific modules that implement its dependencies, and all the modules are wired together automatically by the linking system. In an external linking system, a module lists only its dependencies by name and/or interface, and some external linking code, written by a programmer, chooses the particular implementations of those interfaces by explicitly wiring up the module graph.

Dave

[1] www.cl.cam.ac.uk/~so294/documents/icfp06.pdf

At some point we switched from "internal vs external" to "implicit vs explicit" (might've been me) but it's the same basic idea. Here are some handy definitions from Owens and Flatt '06 [1]:

"Internal linking supports definite reference to a specific implementation of an interface. Internally linked module systems are common; examples include Java's packages and classes, ML's structures, and Haskell's modules. In each of these systems, two modules are linked when one directly mentions the name of another, either with a dotted path (ModuleName.x), or import statement (import ModuleName).

External linking supports parameterized reference to an arbitrary implementation of an interface. ML's module system includes a functor construct that supports external linking. A functor declares an interface over which its definitions are parameterized; the parameterization is resolved outside of the functor..."

In other words: in an internal linking system, each module names -- directly within its body -- the specific modules that implement its dependencies, and all the modules are wired together automatically by the linking system. In an external linking system, a module lists only its dependencies by name and/or interface, and some external linking code, written by a programmer, chooses the particular implementations of those interfaces by explicitly wiring up the module graph.

Dave

[1] http://www.cl.cam.ac.uk/~so294/documents/icfp06.pdf

On Jun 2, 2010, at 2:08 PM, Kam Kasravi wrote:

> Hi Dave
> 
> By explicit linking are we talking about mechanism's to unambiguously reference modules that may otherwise be ambiguous? For example in java if 'Node' could refer to a Node class in several different packages, the language allows one to fully qualify the Node class eg foo.Node. Just want to make sure I'm clear on the distinction between explicit and implicit linking.
> 
> Thx
> kam
> 
> 
> 
> On Jun 2, 2010, at 12:14 PM, David Herman <dherman at mozilla.com> wrote:
> 
>>> I don't have the time or inclination to provide a full bibliography.
>>> 
>>> I consider your argument withdrawn, then.
>> 
>> Excuse me? My argument is not "withdrawn" (are we in court?). If you are unaware of decades of prior art on modules, that's not my failing but yours.
>> 
>> My argument was and remains that others have gone down that road, and it's still very much an open research topic how to create module systems that provide the generality of explicit linking with the convenience of implicit linking. See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.
>> 
>>> Possibly, depending on whether you want to present modules to themselves as well.
>>> 
>>> As I believe we discussed in our most recent f2f, it is possible to provide modular code with access to its own reified module instance via some distinguished symbol (e.g., "this" at the top level). And of course, modular code always has direct individual access to its own exports.
>> 
>> Hence "depending."
>> 
>>> As recast, therefore, the example introduces Odd to "even.js" and Even to "odd.js". It's pretty minimal.
>> 
>> And yet it's still too expensive. No one will take the step from non-module code to module code. They just won't. Besides, a not-quite-so-bad example of the Odd and Even modules is pretty weak tea.
>> 
>> The point is, you can special-case "this" if you want, but if you have a module graph of N modules, and each needs to be explicitly linked with N - 1 other modules, then you impose a quadratic code-size requirement on programmers. Unless, as I said, you beef up your linking-specification language.
>> 
>> Dave
>> 
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20100602/d4620ecc/attachment.html>

# ihab.awad at gmail.com (15 years ago)

On Wed, Jun 2, 2010 at 12:14 PM, David Herman <dherman at mozilla.com> wrote:

See e.g. Derek Dreyer's work, starting with his thesis and continuing to this day.

Aha. Finally, something vaguely resembling a citation. I will look at this. Thank you.

Ihab

On Wed, Jun 2, 2010 at 12:14 PM, David Herman <dherman at mozilla.com> wrote:
> See e.g. Derek Dreyer's work, starting with his thesis and
> continuing to this day.

Aha. Finally, something vaguely resembling a citation. I will look at
this. Thank you.

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA

# Waldemar Horwat (15 years ago)

ihab.awad at gmail.com wrote:

On Tue, Jun 1, 2010 at 6:26 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
I don't understand your example of how this is supposed to work in
the regular (non-accidental-aliasing) case.  As you wrote in your
example, two.js evolves to reference the identifier "Drawing"
unbeknownst to one.js.  There is no definition of it, so two.js
wouldn't work at all.
[ I hope I understand your question. ]

In my original example, "zero.js" defined "Drawing". According to the current proposal, this would be propagated down to "two.js".

Does that help?

No. In one sentence you wrote that two.js changed to require its invoker to provide a Drawing API; in another you wrote that two.js did not tell its invoker, one.js, to provide a Drawing API. The combination of the two is meaningless.

Waldemar

ihab.awad at gmail.com wrote:
> 
> On Tue, Jun 1, 2010 at 6:26 PM, Waldemar Horwat <waldemar at google.com 
> <mailto:waldemar at google.com>> wrote:
> 
>     I don't understand your example of how this is supposed to work in
>     the regular (non-accidental-aliasing) case.  As you wrote in your
>     example, two.js evolves to reference the identifier "Drawing"
>     unbeknownst to one.js.  There is no definition of it, so two.js
>     wouldn't work at all.
> 
> 
> [ I hope I understand your question. ]
> 
> In my original example, "zero.js" defined "Drawing". According to the 
> current proposal, this would be propagated down to "two.js".
> 
> Does that help?

No.  In one sentence you wrote that two.js changed to require its invoker to provide a Drawing API; in another you wrote that two.js did not tell its invoker, one.js, to provide a Drawing API.  The combination of the two is meaningless.

    Waldemar