feature-based and compartment-based language versioning (was: JavaScript parser API)

# Claus Reinke (14 years ago)

The discussion of parser APIs had a side-thread on language versioning which I'd like to raise separately, for those not following the parser discussions. I include the relevant context, then continue the thread below.

But there are also tough questions about what the parser should do with engine-specific language extensions.

Actually, that starts before the AST: I'd like to see feature-based language versioning, instead of the current monolithic version numbering - take generators as an example feature:

Perhaps JS1.7 ("javascript;version=1.7") happens to be the first JS version to support "yield", and is backwards compatible with JS1.5, which might happen to match ES3; and JS1.8.5, which happens to match ES5, might be backwards compatible with JS1.7. But it is unlikely that the JSx which happens to match ES6 will be backwards compatible with JS1.7 (while ES5-breaking changes will be limited, replacing experimental JS1.x features with standardized variants is another matter).

Whereas, if I was able to specify "use yield", and be similarly selective about other language features, then either of JS1.7, JS1.8.5 and ES6 engines might be able to do the job, depending on what other language features my code depends on. Also, other engines might want to implement some features -like "yield"- selectively, without aiming to support all of JS1.7, and long before being able to support all of ES6.

That's asking for quite a modularized/configurable parser.

Not necessarily: provided the extensions are not conflicting, write a single parser for the whole collection of extensions, then reject code that tries to use not currently active extensions via semantic actions.

As long as each language extension is connected to new syntax (rather than reinterpreting existing syntax), that also means one can give reasonable error messages ("looks like you need to enable 'yield', or find an engine that supports this extension").

For comparison, take Haskell, which has been a hot-bed of language research and experimentation, so it has the versioning problem in spades. Initially, each implementation had its own catch-all flags to enable whatever extensions that system supported. Nowadays, every extension gets its own name - most of the implementations implement few extensions, but the extensions are standardized across implementations (they are still extensions - the language committees have been very conservative about incorporating extensions into the language report).

Extensions can be selectively enabled via (per-module) source code pragmas, so that parsers, compilers, package repositories, and build tools can agree on what language is being processed, and whether the chosen compiler supports the set of extensions required by the package source about to be built.

Here is a list of pragma-enabled extensions that a Haskell parser has to support these days, from the main Haskell-in- Haskell parser:

hackage.haskell.org/packages/archive/haskell-src-exts/1.11.1/doc/html/Language-Haskell-Exts-Extension.html

A real problem might be that, with JS, the phase distinction between compile time and run time is in user-land. So the agent rejecting the code might be the browser visiting a web page. But that problem already exists, even with monolithic language versioning.

There is another versioning problem for which more fine control would help, and that is about building an application from scripts written for different language versions.

In many discussions of ES evolution, there seems to be a background assumption that, with monolithic language versions, either a feature is supported everywhere, or nowhere; so if we don't want to break old code, we have to carry old features into new versions of the language;

However, versioning by script-type seems to allow for loading scripts written for different language versions, so it might be possible to limit the backwards-compatibility tax to support for loading old-version code.

has this been discussed before? For instance, what happens if an application consists mostly of ES/next scripts, but there is one ES5 script loaded as well (which happens to use 'with')?
how does this carry over to other language features?
- can an ES/next script call 'eval' on ES5 code? How, if 'eval' has no version parameters? what about the other way round?
- modules obviously require ES/next at least, but how are they language-versioned beyond that, if they are not loaded via a script tag?

From a first glance, there would seem to be advantages to

having in-source language versioning (similar to strict pragmas), either monolithic or feature-based, so that ES/N code can load/eval ES/M code (for M<=N, at least).

Source without language-version pragmas could either default to ES5, or one might ask developers to tag their code as ES5, if they do not want to port it to ES/next.

Once such a convention is established, future ES versions might feel less pressure to simply continue including old features in new language versions by default.

Thoughts? Claus

The discussion of parser APIs had a side-thread on language
versioning which I'd like to raise separately, for those not
following the parser discussions. I include the relevant
context, then continue the thread below.

>>> But there are also tough questions about what the parser
>>> should do with engine-specific language extensions.
>>
>> Actually, that starts before the AST: I'd like to see feature-based
>> language versioning, instead of the current monolithic version
>> numbering - take generators as an example feature:
>>
>> Perhaps JS1.7 ("javascript;version=1.7") happens to be the first
>> JS version to support "yield", and is backwards compatible with
>> JS1.5, which might happen to match ES3; and JS1.8.5, which
>> happens to match ES5, might be backwards compatible with
>> JS1.7. But it is unlikely that the JSx which happens to match ES6
>> will be backwards compatible with JS1.7 (while ES5-breaking
>> changes will be limited, replacing experimental JS1.x features
>> with standardized variants is another matter).
>>
>> Whereas, if I was able to specify "use yield", and be similarly
>> selective about other language features, then either of JS1.7,
>> JS1.8.5 and ES6 engines might be able to do the job, depending
>> on what other language features my code depends on. Also,
>> other engines might want to implement some features -like
>> "yield"- selectively, without aiming to support all of JS1.7, and
>> long before being able to support all of ES6.
>
> That's asking for quite a modularized/configurable parser.

Not necessarily: provided the extensions are not conflicting,
write a single parser for the whole collection of extensions,
then reject code that tries to use not currently active extensions
via semantic actions.

As long as each language extension is connected to new syntax
(rather than reinterpreting existing syntax), that also means one
can give reasonable error messages ("looks like you need to
enable 'yield', or find an engine that supports this extension").

For comparison, take Haskell, which has been a hot-bed of
language research and experimentation, so it has the versioning
problem in spades. Initially, each implementation had its own
catch-all flags to enable whatever extensions that system
supported. Nowadays, every extension gets its own name -
most of the implementations implement few extensions, but
the extensions are standardized across implementations
(they are still extensions - the language committees have
been very conservative about incorporating extensions
into the language report).

Extensions can be selectively enabled via (per-module) source
code pragmas, so that parsers, compilers, package repositories,
and build tools can agree on what language is being processed,
and whether the chosen compiler supports the set of extensions
required by the package source about to be built.

Here is a list of pragma-enabled extensions that a Haskell
parser has to support these days, from the main Haskell-in-
Haskell parser:

http://hackage.haskell.org/packages/archive/haskell-src-exts/1.11.1/doc/html/Language-Haskell-Exts-Extension.html

A real problem might be that, with JS, the phase distinction
between compile time and run time is in user-land. So the
agent rejecting the code might be the browser visiting a
web page. But that problem already exists, even with
monolithic language versioning.

There is another versioning problem for which more fine
control would help, and that is about building an application
from scripts written for different language versions.

In many discussions of ES evolution, there seems to be
a background assumption that, with monolithic language
versions, either a feature is supported everywhere, or
nowhere; so if we don't want to break old code, we have
to carry old features into new versions of the language;

However, versioning by script-type seems to allow for
loading scripts written for different language versions,
so it might be possible to limit the backwards-compatibility
tax to support for loading old-version code.

- has this been discussed before? For instance, what happens
    if an application consists mostly of ES/next scripts, but there
    is one ES5 script loaded as well (which happens to use 'with')?

- how does this carry over to other language features?

    - can an ES/next script call 'eval' on ES5 code? How, if 'eval'
        has no version parameters? what about the other way round?

    - modules obviously require ES/next at least, but how are
        they language-versioned beyond that, if they are not
        loaded via a script tag?

>From a first glance, there would seem to be advantages to
having in-source language versioning (similar to strict
pragmas), either monolithic or feature-based, so that ES/N
code can load/eval ES/M code (for M<=N, at least).

Source without language-version pragmas could either
default to ES5, or one might ask developers to tag their
code as ES5, if they do not want to port it to ES/next.

Once such a convention is established, future ES versions
might feel less pressure to simply continue including old
features in new language versions by default.

Thoughts?
Claus