Module naming and declarations

internal

The module proposal has made good progress, thanks to the hard work by
Dave & Sam. I'm glad to see it close to the home stretch for the ES6
race (some healthy minor controversies on the side notwithstanding :)
).

However, there is one central part of the design over which we still
do not have agreement: the naming and declaration mechanism for
modules. We did not yet have a serious discussion about it -- and I
fully admit this being my fault as well, since despite my repeated
criticism, I hadn't found the time to put that in a coherent form. I
try to make up for that with this post. :) Summary is that I still
(strongly) believe that adopting what's currently on the table would
be a severe mistake, and that it's best to take a step back and
discuss the motivation as well as alternatives.

My sincere apologies for the excessively long post...

** The problem **

In the original module proposal, modules were declared via ordinary
lexical identifiers, but could also be imported from external sources
denoted by strings. In November, the proposal was changed to use
string-valued "module ids" for both. The motivation was to simplify
the language, and to provide better support for configuration and
concatenation.

Unfortunately, the new scheme leads to a significant conflation of
concerns. In particular, it tries to obscure the fact that there are
fairly fundamental differences between _internal_ names and _external_ ones:

- internal names reference _language entities_, external names
reference _resources_;
- internal references are under control of the language, while
external references are delegated to a platform mechanism (with
potentially platform-dependent interpretations and behaviour);
- internal references are definite, stable, and high-integrity, while
external references are indefinite, late-bound, and can fail or be
clobbered.

As an analogy, take strings. A string may exist inside the language,
as a value bound to a variable. Or it may be stored in some file that
you can access via a file name. Both the variable and the file name
ultimately reference "a string", but in completely different worlds.
Nobody would suggest to use file paths in place of variable
identifiers internally. Yet, that is almost exactly what the proposal
does for modules!

Conflating the two notions may seem tempting and convenient at first,
but it's bad. In the case of the current ES module proposal in
particular, the attempt to do so with the "module id" scheme has all
kinds of awkward consequences:

* As an internal naming mechanisms, it lacks standard scoping semantics.

  As various discussions show, people want and expect scope chain
behaviour, and for very good reasons: e.g. nesting modules, confining
modules to local scope, convenient local names, etc. The module id
approach cannot sanely support that (which is why e.g. nested modules
got pulled from the proposal).

* As an external naming mechanisms, it violates standard relative
path/URL semantics.

  When using paths to actually reference external names, one might
likewise expect certain semantics, e.g., that "a" and "./a" refer to
the same thing, like they usually do on the web or in a file system.
The current mechanism intentionally breaks with this via a
non-standard interpretation of "paths". It implies that a set of
module files, by default, is not relocatable within a project tree, or
across project trees, as the standard idiom for module names actually
denotes semi-absolute paths within a project tree. The main reason for
it is that paths are overloaded to serve both internal and external
naming. (The path semantics is inherited from legacy module frameworks
for JS, such as AMD. It is a fine solution under the constraints that
these frameworks have to operate in -- in particular, the inability to
extend syntax or add new primitives. However, for ES6 most of these
constraints don't apply, and there is no particular reason to limit
the design to making the same compromises.)

* The shared name space between internal and external modules can lead
to incoherent programs.

  Internal and external module definitions can resolve to the same
path. For example. there might be a definition, somewhere, for module
"a/b", but also a file "a/b.js". Which one takes precedence generally
depends on the execution order of a (staged) program (e.g., when
_other_ imports are performed), and can, in fact, differ at different
points in time. It is worth noting that, presumably for this reason,
AMD strongly discourages the use of named module declarations, except
by optimization tools (and Node does not support it at all, AFAICT).
With the ES proposal, however, nifty syntax strongly conveys the
impression that named module declarations are a good and recommended
feature to use manually, instead of discouraging their day-to-day use.

* Likewise, a single global name space for all internally defined
modules can lead to incoherent programs.

  Several internally defined modules can clash arbitrarily, there is
nothing preventing two completely unrelated modules in completely
unrelated files from clobbering the same name and stepping on each
other's feet. Worse, there is no way to confine a module to a local
scope, all definitions _have to be_ globally visible and compete
within the same global name space. (And unlike URLs, this name space
is not particularly structured.) Of course, conventions can help to
work around the problem in practice, but clearly, a well-designed
language mechanism is preferable and more reliable.

* "Local" references are globally overridable.

  Since every reference goes through the loader, there is no way to
have a definite (i.e., static, stable) module reference, even within a
single script or scope. Any other script can come by and _modify the
meaning_ of any seemingly local module reference, either accidentally
or intentionally (unless laboriously sandboxed). This clearly is bad
for abstraction, encapsulation, integrity, security, and all related
notions. And to add insult to injury, the loader also induces
non-trivial runtime cost for a mechanism that isn't even desirable in
these cases. Plain and simple, we repeat the mistake of the JS global
object, but worse, because there is no other scope for modules to
escape to if you care about integrity.

* Internal module definition is coupled with external module
registration, and modules cannot be renamed or re-registered.

  It is not possible to define a module without registering it
globally, and dually, it is not possible to register a module without
defining a new one. In particular, that prevents renaming a module,
and more importantly, registering a module under a different
(external) name than the one under which it was defined/imported. The
latter, however, is needed for some configuration use cases. (In the
current proposal, it has to be simulated awkwardly by "eta-expanding"
the renamed module, i.e., declaring "module 'A' { export * from 'B';
}", which creates a separate module.)

* Language-level naming semantics interferes with file system/URL
naming semantics.

  As other communities have rediscovered many times before, it is a
problem to naively map between internal and external names, because
the meaning of internal definitions or references may then depend on
idiosyncrasies of the hosting environment. For example, it may affect
the program behaviour whether names are case-sensitive in the OS (not
all JS hosts are browsers), or what makes a well-formed file name.
What if you define a module "M" and also have a file "m.js"? Something
else in node.js on Windows than on Linux, potentially. It is best to
minimise the problem by limiting the use of external names to actual
external references.

* Bundling ("concatenation") generally would require embedding
arbitrary string resources, not syntactic modules.

  One powerful feature of loaders is the translation hook. When
importing a module from an external source, there is no restriction on
its syntactic content, since a translation hook can transform it
freely. But if one were to bundle ("concatenate") an application that
actually makes use of this liberty, then the current proposal could
not actually support that consistently. Consequently, if one primary
goal the current proposal is to make module declarations a mechanism
for bundling modules, then they are an incomplete solution. In
general, you'd need to be able to embed arbitrary _resources_ (as
strings), and be able to run translate hooks on those.

* Module "declarations" are not a declarative mechanism, but an operational one.

  Because module declaration is coupled with loader registration, it
has a non-trivial operational effect, and its semantics is in turn
subject to interference from other operational effects from inside and
outside the program (as described above). Yet, it is disguised in a
seemingly innocent declarative syntax. That is likely to create
pitfalls and wrong assumptions (again, like with the global object).

In summary, not only do path-named module declarations lack desirable
expressiveness, regularity, and integrity, they also do not support
more interesting configuration and concatenation use cases, which is
what they were intended for.

Most of the above problems cannot be fixed without adding lexically
scoped modules to the language. It seems very clear that we need
those, rather sooner than later. Also, I think we want a more
structured approach to the global name space for external modules. At
that point, rethinking the proposed approach may be the best idea.

** Proposal **

I think it is highly advisable to follow a simple strategy for the
whole naming business: avoid ad-hoc inventions, stick to
well-established, standard mechanisms.

Specifically:

1. Have a clean separation of concerns between internal and external names.
2. For internal names, use the standard language mechanism: lexical scope.
3. For external names, use the standard web mechanism: URLs.
4. Have a clean separation of concerns between the declarative
_definition_ of a module, and the operational notion of _registering_
it with a loader.

Let me point out again that both lexical scope and URLs are backed by
decades of experience and have proved superior over and over again, in
many, many different languages and environments. _We can only lose if
we try to do "better"_.

More concretely, I envision using lexical module declarations as the
primary means for defining modules (you guessed that from the start :)
). Modules can nest. (Supporting local modules, like Mark suggested,
is more difficult, because of unclear interactions with other
constructs like classes or eval. Certainly not ES6.)

There may (or may not, see below) be a (separate?) pseudo-declarative
form for registering modules as resources with the loader -- however,
if so, it should ideally scale to handle non-ES sources.

Module resources are identified by URLs. Those can be absolute or
relative; relative URLs are interpreted relative to the importing file
(just like HTML links). The loader table only contains absolute URLs.
Likewise, every script is associated with its absolute URL. Any
relative import is first normalised to absolute using the absolute URL
of the importer (plus obvious steps for normalising occurrences of "."
and "..").

A custom loader can, in principle, perform arbitrary interpretation or
rewriting of URLs. In particular, this could be used to implement
interop to absolute repository paths a la AMD or Node, e.g. by
interpreting an "amd:" schema for importing AMD modules that are
relative to a separately configured base URL. In other words, you'd
write

  import M1 from "a/b";  // native ES6 import, relative path
  import M2 from "amd:c/d";  // import of AMD module, relative to AMD base URL

Should we have a declarative form for registering resources, then its
URL would be resolved in the same manner, relative to the path of the
containing file. However, the programmer is free to use an absolute
URL.

At that point, the only remaining purpose for path-named module
declarations would be registering external references. However,
registration is already possible through the loader API. Doing so
requires additional staging (a script setting up the loader before
executing the actual script). But staging is necessary anyway, for
every slightly more interesting configuration case -- e.g. any
scenario that involves loader.ondemand, translation, etc. It is not
clear to me that the remaining cases justify an additional semantic
short cut, and I think there is a case to be made that path-named
module declarations are neither sufficient _nor_ necessary. But that
point is mostly independent from the rest.

Clearly, there are more details to be discussed and worked out. I also
know that Dave & Sam have been over lots of it before. Still, I'm
positive that it is a fairly solvable problem, and will yield a more
well-behaved and more scalable solution. And yes, essentially it means
reverting this part of the module proposal to its earlier stage -- but
the good news is that not much else in the proposal is affected. :)

/Andreas

Module naming and declarations

The key takeaway

Some smaller points

Assets

Liabilities

The key takeaway

Some smaller points

The key takeaway

Some smaller points