simple modules: module managment vs. configuration management

# Allen Wirfs-Brock (16 years ago)

There is quite a bit of literature on "2nd class" module systems, but most of it is from a period before first class modules became the primary modularity research interest. I'll see if I can dig up some references to papers that may be helpful.

One of the points that I recall is the importance of not entangling "module management" with "configuration management". In this context, "module management" means structuring the source code of a program into independent explicitly coupled pieces. This is done to manage complexity, allow parallel or independent development, simplify maintenance, etc. "Configuration management" means controlling and structuring the assembly instructions that describe how an actual runable program is instantiated as a composition of specific module instances. Configuration management is about selecting which specific module versions from various possible alternatives and sources get bound together for execution.

As a more concrete example, a "program" might be logically composed of 4 modules with import/export based dependencies: module Main { ...}; module Subsys { ...}; module DB { ...}; module yui3 {...};

To actually run this program you have to identify specific versions of containers that contain the modules definitions (eg, which specific files to load, which versions to load from a repository, which server to load from, etc.). How configurations are specified is usually quite specific to the actual mechanism used construct/instantiate executable programs. It might be a makefile, or a manifest, or some other declarative or imperative description. On the web, it might simply be the set of script tags in a HTML document: <script type='text/javascript" src='/myapp/main/version2.js'> <script type='text/javascript" src='/myapp/subsys/version1.js'> <script type="text/javascript" src='/library/DB-2010.js'> <script type="text/javascript" src='developer.yahoo.com/modules/yui3.js'>

Module management becomes entangled with configuration management when configuration management information is embedded within the actual module definitions. When this occurs, configuration changes require the modification of the actual source code of modules. For example, if a module import statement uses a hard-code file path or server name then moving the location of an imported module requires modification of every module that performs such an import.

The simple module proposal has a couple places where this sort of entanglement is possible. For example, from the samples: import 'json.org/modules/json2.js' as JSON;

import 'developer.yahoo.com/modules/yui3.js#dom' as dom;

import "compiler/Lexer"; if "compiler/Lexer" is interpreted by the module id resolver as an actual access path import ['JSON', 'json.org/modules/json2.js'] as JSON; (but loadModule probably isn't a problem. See note below)

This is a problem even if the intent of the proposal is that module ID's are implementation dependent values and any concrete interpretation is imposed by an implementation dependent module id resolver. As the examples illustrate it is all too easy for module writers or a module id resolver designers to decide to use the ModuleSpecifier string literal in a configuration management like manner. The use of a string literal as ModuleSpecifiers is an attractive nuisance that makes it too easy to entangle module management and configuration management.

As an alternative my suggestions is that we completely avoid using strings within ECMAScript source code to (statically) identify modules. Instead, modules might be identified (in both Module declarations and ImportDeclarations ) as any of the following alternatives 1) a simple identifier, 2) a sequences of dot separated identifiers (my current preference), or 3) a sequence of identifiers separated by some other delimiter such as "::". These identifiers populate a single module name space that is distinct from the EnvironmentRecords used for normal identifier resolution. The module name space gets populated in accordance with a specified set of semantic rules by processing Module declarations contained within Applications. The manner in which the actual containers (files, etc.) of the Applications for a specific program are identified would be completely implementation or environment dependent.

(note about loadModule: I don't see a problem with loadModule using container/configuration information. By its very nature, loadModule exists to support dynamic, open-ended program configuration. (if you know the static module structure you shouldn't need to use loadModule.) In order to dynamically add modules to an inflight program you are going to have to provide some sort of container or configuration information and loadModule sees like a fine place to do so. However, while I think that the semantics of dynamic processing a Module declaration (for example, via eval...) needs to be well specified in the ECMAScript core semantics, it is less clear to me that loadModule needs to be part of the standard ECMAScript library. Externally locating and retrieving a container of an Application declaration seems like an inherently implementation or environment specific action and perhaps the definition of an API like loadModule should be relegated to environment specific libraries (eg a browser API)).

# Kam Kasravi (16 years ago)

Allen:

I agree with the assessment of entangling configuration management and module management. Both module specifications have this issue, although Ihab's package proposal attempts to separate the two. In the community, the commonjs syntax is 'require("./compiler/Lexer");' which implicitly says to go get the script at relative location ./compiler/Lexer.js, where a '.js' is appended to the literal string. This is an implicit rule which inherently ties configuration management with module management. BTW, string literal's (which are in both module specifications) doesn't work well with combo handlers used by Yahoo and others where a 'module' (defined to be a collection of script files) is fetched using the URL 'yui.yahooapis.com/combo?version/resource1.js&version/resource2.js&version/resource3.js' The implicit rule would be to add '.js' to the above que

An approach taken by commonjs when specifying a string literal eg: require('import/Lexer') is to have a 'paths' property on the require function like require.paths=["url1","url2","url3"]; which resembles java's class path mechanism where first found is first loaded. Of course this assumes that the relative path is constant among the different path entries.

An approach I had taken was to separate loading resources completely. I use client-side websockets which communicate using a derived MessageEvent that specifies resources by name, mimetype, if-modified-since or etag. This is a 'pull' mechanism where the client asks for resources which are then used to create modules. These derived MessageTypes would need to be specified at the WebIDL (W3C) level and used by a module loader. Delegating resource acquisition to an implementation of the loader does decouple module management from configuration management, there should be MessageEvent's or JSON structures that module management can use when evaluating resources fetched by the configuration management system. This would be a great example of specifying the ecmascript mapping from WebIDL.

kam

# Dave Herman (16 years ago)

There is quite a bit of literature on “2nd class” module systems, but most of it is from a period before first class modules became the primary modularity research interest. I’ll see if I can dig up some references to papers that may be helpful.

That would be great. I'll be glad to include them in a "references" section on the strawman page. I should also throw in some references to some languages that have similar or related module systems.

One of the points that I recall is the importance of not entangling “module management” with “configuration management”. In this context,

Absolutely. Like in Ihab and Kris's proposals, the intention was to separate out the notion of module ID's and module ID resolution in order to allow host-dependent behavior. The examples I gave were to give some concreteness, but the actual mechanism(s) for resolving those identifiers would not be (at least fully) specified in the language itself.

As a more concrete example, a “program” might be logically composed of

FWIW, I tried using the word "application" instead of "program" for this use, to distinguish from the Program non-terminal in ES <= 5.

Module management becomes entangled with configuration management when configuration management information is embedded within the actual module definitions. When this occurs, configuration changes require the modification of the actual source code of modules.

Well put, and I agree. Still, as you mentioned in the F2F, it's critical to make the common cases as absolutely convenient as possible. In that light, it might be ideal to allow imports to specify their dependencies either directly or indirectly through some mediating entity. That way, simple one-offs could be written like:

import 'http://json.org/modules/json2.js' as JSON;

and more mature applications could write:

import org.json.JSON as JSON;

with some top-level table that indicates how to interpret org.json.JSON.

I don't feel this strawman depends too deeply on the exact nature of the ModuleSpecifier non-terminal, syntactically or semantically. In many ways it's probably necessary not to specify at all for ES. But independent of what goes into any eventual standard, it does have to be compatible with actual implementations, so I'm glad you're bringing up these points.

As an alternative my suggestions is that we completely avoid using strings within ECMAScript source code to (statically) identify modules. Instead, modules might be identified (in both Module declarations and ImportDeclarations ) as any of the following alternatives 1) a simple identifier, 2) a sequences of dot separated identifiers (my current preference), or 3) a sequence of identifiers separated by some other delimiter such as “::”. These identifiers populate a single module name space that is distinct from the EnvironmentRecords used for normal identifier resolution.

Syntactically, I don't have strong feelings, as long as it's statically resolvable. I agree that these live in a separate namespace, entirely independent of the lexical environment. (If I didn't mention it, that was meant to be part of the strawman.)

However, while I think that the semantics of dynamic processing a Module declaration (for example, via eval…) needs to be well specified in the ECMAScript core semantics, it is less clear to me that loadModule needs to be part of the standard ECMAScript library.

Great point-- agreed. Consider loadModule to be a plausibility argument. I'll work on identifying which parts of the strawman are "suggestive" rather than "strawman-normative". :)

Externally locating and retrieving a container of an Application declaration seems like an inherently implementation or environment specific action and perhaps the definition of an API like loadModule should be relegated to environment specific libraries (eg a browser API)).

Indeed.

Thanks for the feedback-- keep it coming!

# ihab.awad at gmail.com (16 years ago)

Since this thread was referred to by the "simple modules" thread, here are some remarks.

On Sat, Jan 30, 2010 at 7:43 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:

One of the points that I recall is the importance of not entangling “module management” with “configuration management”. ... Module management becomes entangled with configuration  management when configuration management  information is embedded within the actual module definitions. When this occurs, configuration changes require the modification of the actual source code of modules.  For example, if a module import statement uses a hard-code file path or server name then moving the location of an imported module requires modification of every module that performs such an import. ... The use of a string literal as ModuleSpecifiers is an attractive nuisance that makes it too easy to entangle module management and configuration management.

I don't have an issue per se with whether module IDs are string literals or not, but just to guide where this is going, I'd like to propose a candidate goal: The module naming and identification scheme should scale to the open Web.

One of the most instructive examples of module naming, imho, is Java; how it evolved as it grew, and the solutions developed that attempt to address its shortcomings.

Java classes declare themselves into a global namespace. They do so using short names, which were okay for casual development. In response to the problem of packaging, distribution and versioning at scale, OSGi came along defining "bundles" in which classes live, and bundles could import Java packages from one another. Crucially, however, it is still impossible for one bundle to import two different versions of class com.foo.Utils from two different bundles. This is baked into the default interaction between Java ClassLoaders and classes. It is not impossible to fix this, theoretically, but in practice, the ClassLoader gymnastics required render it pretty much unusable.

OSGi was built to allow modules to be assembled together, and a mutually acceptable version solution to be computed, at run time (think: the Eclipse plugin system, which uses OSGi). Apache Maven is a similar solution to OSGi -- but one that operates at build time. It has the same properties -- and problems.

The module name space gets populated in accordance with a specified set of semantic rules by processing Module declarations contained within Applications.

As Kris Kowal pointed out in a separate thread, this is the first mistake of Java: that classes have authority over where they are placed in the namespace. The practicality of your suggestion is dependent on what you mean by "module name space" and how it is managed, but the most important point is that the declared name of a module must not be the [only] way in which it is imported by others.

As for whether modules should declare their own names by which they are imported at all: I am a bit on the fence, but leaning towards "no". It's a pain to have to write modules and manually try to keep the file names in sync with the internal module names; it makes refactoring hard. Of course, "files" and HTTP resources are an antiquated concept that should be discarded in favor of pure objects, right? :) But if I were designing a module object system, I would put the name -> module mapping logically outside the module; the

module would be reachable from a dictionary but would not contain its own name in that dictionary.

Another point is that Applications are increasingly obsolete. :) The Web is made of large numbers of entities collaborating. One of the important use cases is plugins, gadgets, ... loaded recursively into other environments. A "guest" module should be able to have some say in how its dependencies are resolved, even if this is subject to the permission of its "host".

My answer to this is the packages proposal. If you can recast your comments as a discussion of that, I think it would be helpful.

Instead, modules might be identified (in both Module declarations and ImportDeclarations ) as any of the following alternatives 1) a simple identifier, 2) a sequences of dot separated identifiers (my current preference), or 3) a sequence of identifiers separated by some other delimiter such as “::”.  These identifiers populate a single module name space that is distinct from the EnvironmentRecords used for normal identifier resolution.

It seems like you want the identifiers to have some internal, compiler-visible structure. What does that map onto? Just a convention? Some hierarchy in deployment?

Ihab

# ihab.awad at gmail.com (16 years ago)

Reading over my comments, I realized I could summarize:

Configuration management is just as important to the integrity of the code in modules as the code itself. Getting the wrong "version" of some important library can break your code just as badly as having a bug in your own code. With that, scaling to the open Web means:

  1. We should anticipate that the world will look at our module solution and try to build on top of it -- if we haven't already built for them -- something like OSGi or Apache Maven in order to manage their dependencies precisely; and

  2. We should anticipate what the equivalent of OSGi or Apache Maven would look like applied to a fine-grained, distributed system where snippets of code are deployed with no central point of administration on the target environments.

Ihab

# Allen Wirfs-Brock (16 years ago)

-----Original Message----- From: ihab.awad at gmail.com [mailto:ihab.awad at gmail.com] Sent: Wednesday, February 03, 2010 12:52 PM ... I don't have an issue per se with whether module IDs are string literals or not, but just to guide where this is going, I'd like to propose a candidate goal: The module naming and identification scheme should scale to the open Web.

It's hard to disagree in the abstract, but that does not mean that the specific form of identification ("name") that appears in a module declaration needs to scale to have meaning that unambiguously spans the entire web. You have to look at the specific usage of the name. In this case it is to form associations between imports and exports that are combined within a single "container" (avoiding the tainted word "context"). As the modules declarations that assign the names may be independently authored, it is import to support conventions that have a low risk of unintended name conflict. It is generally undesirable to associated additional semantics to these name so any proposal to do so should be very carefully examined.

... Java...

I'm relatively familiar, with Java naming, OSGi, and classloaders, etc...

It is important not to conflate the identity of runtime entities described by source code declaration with the actual source code declaration itself or with the source code based name of the entity. Arguably, much of the complexity of Java compiling and loading arises from such issues and is also related to the need for Java to maintain the static invariants of the Java type systems.

Because ECMAScript is a dynamically typed language, we avoid the static type system issues and we should also try hard to avoid the conflating names and identity.

OSGi was built to allow modules to be assembled together, and a mutually acceptable version solution to be computed, at run time (think: the Eclipse plugin system, which uses OSGi). Apache Maven is a similar solution to OSGi -- but one that operates at build time. It has the same properties -- and problems.

The simple module system being proposed imposes very few static invariants. That makes it easy for component and/or configuration management systems to do their stuff including layering on addition compositional constraints and invariants. By keeping language constructs simple we empower higher level tool makers.

The module name space gets populated in accordance with a specified set of semantic rules by processing Module declarations contained within Applications.

As Kris Kowal pointed out in a separate thread, this is the first mistake of Java: that classes have authority over where they are placed in the namespace. The practicality of your suggestion is dependent on what you mean by "module name space" and how it is managed, but the most important point is that the declared name of a module must not be the [only] way in which it is imported by others.

Kris' issue is only a problem if semantics is associated with placement in a namespace.

I intentionally didn't try to define the semantics of "module name space" as I didn't think it was necessary in the context of the statement. However, here are the characteristics that it should have:

  1. Flat - there is no semantic hierarchy associated with names
  2. Identifier based names, not string literals
  3. Associated with a top-level/"global context"/"primordial world" or whatever you want to call that concept.

I support using dotted identifier sequences as names as long as no additional semantics is assigned to the sequence structure.

As for whether modules should declare their own names by which they are imported at all: I am a bit on the fence, but leaning towards "no". It's a pain to have to write modules and manually try to keep the file names in sync with the internal module names; it makes refactoring hard. Of course, "files" and HTTP resources are an antiquated concept that should be discarded in favor of pure objects, right? :) But if I were designing a module object system, I would put the name -> module mapping logically outside the module; the module would be reachable from a dictionary but would not contain its own name in that dictionary.

There needs to be some way to identify individual modules, particularly when multiple module declarations are contained in a single "file".

At the language level there has to be a mechanism to "name" modules that is independent of any specific external storage mechanism.

The language should impose no requirement that a module name must match a file name.

Keep simple things simple, assigning a name to a module definition will be simple and natural for the majority of ECMAScript programmers. Decoupled naming won't be.

Things that are separate tend to get lost...

I would argue that the success of the web relative to other hypermedia systems was in part due to the fact that it is composed of "files" with little or no intrinsic semantics rather than a richer object system.

Another point is that Applications are increasingly obsolete. :) The Web is made of large numbers of entities collaborating. One of the important use cases is plugins, gadgets, ... loaded recursively into other environments. A "guest" module should be able to have some say in how its dependencies are resolved, even if this is subject to the permission of its "host".

"Application" is just a word. Simple modules are trying to provide an organization structure for what could be represented as a single source file. We should first solve that problem and then built upon it for other forms of containerization/isolation/componentization/etc.

My answer to this is the packages proposal. If you can recast your comments as a discussion of that, I think it would be helpful.

Can only digest a limited number of proposal at one time. My general feeling is that we should focus on things that can only be dealt with at a language semantics level language and leave the rest to hosts and tools. Simple modules have a place within the language because they have a direct effect upon the semantics of identifier resolution. If packaging/code distribution doesn't impact the language at that level (and I hope it doesn't) then it probably doesn't need to be part of the language definition.