ES4 draft meta-issues

# Lars Hansen (18 years ago)

Over the next several weeks I'll be sending out draft specs for all(?) the ES4 library classes, one class at a time (in the order I get to them).

The ES4 library is expressed in terms of ES4 fragments: the spec uses executable -- and tested -- ES4 code in places where the ES3 spec uses pseudocode. As a consequence, the draft library spec makes some assumptions about what ES4 will look like when it's finished.

Below I am going to outline some aspects of ES4 that it will be useful for the readers of the draft specs to know, beyond what's in ES3. This outline will be updated from time to time as new draft specs require it. (Probably much of the information here is already written up in the language overview available on ecmascript.org, so go there for the full story.)

Namespaces, names

ES4 puts all names into namespaces. A name is in exactly one namespace and it is placed in that namespace by prefixing the binding keyword for the name (class, var, const, function, and others) with the namespace name. If MyNS is a namespace then

MyNS var x

creates a variable whose fully qualified name is "MyNS::x".

There are several predefined namespaces. The namespace "ES4" is used for all top-level names that are new to ES4 if they're not in one of the other namespaces (except for the name "ES4" itself, which is the only unqualified top-level name introduced by ES4). Important predefined namespaces are "ES4::intrinsic" and "ES4::reflect".

In order to avoid having to fully qualify names all the time, namespaces can be opened; the names defined in the namespace will then be available without qualificiation. The namespace "ES4" is opened for all ES4 code, so in practice the two predefined namespaces listed above are known just as "intrinsic" and "reflect". (Opening a namespace may introduce ambiguities, which can be resolved by fully qualifying ambiguous names. Ambiguities are not common because a namespace opened in an inner lexical scope takes precedence over namespaces opened in outer scopes.)

The intrinsic namespace is reserved; user code is not allowed to introduce new names in this namespace. The intrinsic namespace is used primarily for methods in the predefined classes. For every prototype method M there is a corresponding intrinsic method M in the class. For example, there is Array.prototype.concat and also an intrinsic::concat method on Array instances. The prototype methods are fully compatible with ES3 in the types they accept and how they convert values. The intrinsic methods normally have more tightly constrained signatures and, like all class methods, are immutable (though they can be overridden in subclasses -- that's allowed even for user code).

The intrinsic namespace provides integrity (code that calls an intrinsic method will know that it references the original method, it is not at the mercy of changes to the prototype method) and optimization opportunities (early binding to the slot that holds the method in the presence of type annotations). The specification of the predefined classes in terms of ES4 code makes use of other predefined classes and their methods, and predefined methods are careful to call intrinsic methods to invoke known behavior and to call public methods to invoke explicitly variant behavior. Normally, such invocations are always explicitly qualified in the text in order to avoid any ambiguity in the reader's mind.

Types and annotations

Bindings in ES4 are typed, and the type can be provided explicitly by following the name with a colon and the type:

var x: Array

If the type is omitted, it is "*" (read as "any"), which means it is unconstrained. If we assume just run-time type checking for the time being, then a check is performed every time a value is stored into an annotated variable: the type of the value must be a subtype of the annotated type.

Functions can be annotated too, in both their parameter and return positions. Annotations on parameters constrain how the function can be called. Annotations in the return position constrains what the function can return:

function f(x: string): RegExp { ... }

There are two classes of types, nominal types and structural types.

Nominal types are introduced by class definitions and interface definitions. Values of nominal types are created by instantiating classes (using the "new" operator). The syntax and semantics are broadly as in Java: A nominal type is equal only to itself; a value is of a class type only if it was instantiated from that type; and it is of an interface type only if it was instantiated from a class type that declares that it implements that interface. (Note that the access control keywords like private and public are actually aliases for language-provided namespaces.)

Methods on classes appear as function definitions in the class body. The class instance is in scope in the body of a method.

Structural types are record types (for example {x:int, y:int}), array types (for example [int]), tuple types (for example [int,string]), union types (for example (int|string|RegExp)), function types (for example function(int):boolean), and some special types (null and undefined). A structural type is equal to any other structural type that has the same fields with the same types (in any order), and a value is of a structural type if it has fixed (non-deletable) fields with the names and types given by the structural type. (So if Point is a class with x and y integer fields, an instance of Point is of the structural type {x:int, y:int}.) Structural types can't be recursive.

Types can be given names by type definitions:

type Num = (int|double)

Type definitions, class definitions, and interface definitions can be parameterized:

class Map.<K,V> { ... } type Box.<T> = { value: T }

Record and array types are instantiated by suffixing the literal with the type:

{ value: 7 } : Box.<int> [1,2,3] : [int]

but now we're getting esoteric so let's stop there -- this is not the language spec.

Any type is a subtype of , and Box.<T> is a subtype of Box.<>, for any

One of the important aspecs of the type system is that the types provide a specification for fixtures on the objects that are of the type: in any value of type Box.<T>, the "value" property can't be removed. (Instances of structural types can always have extra non-fixture fields, as can instances of classes designated "dynamic".)

Functions

Functions can take optional arguments (they have default values) and rest arguments:

function f(x, y=0) { ... } // y is optional function f(x, ...rest) { ... }

The rest argument appears as a regular Array object holding the excess parameter values.

Function bodies that contain a simple return statement (which typically returns the result of a call to another function) are common; ES4 introduces a shorthand where the body is a brace-less expression:

function f(x, y) g(x*2, y, 0)

Informative and helper methods

The spec is normative, which means the ES4 code in the spec is normative too. In order to avoid overspecification the spec factors out non-normative sections as methods in the "informative" namespace, which are described by prose. A good example is the global hashcode function:

intrinsic const function hashcode(o): uint {
    switch type (o) {
    case (x: null)      { return 0u }
    case (x: undefined) { return 0u }
    case (x: boolean)   { return uint(x) }
    case (x: Boolean)   { return uint(x) }
    case (x: int)       { return x < 0 ? -x : x }
    case (x: uint)      { return x }
    case (x: double)    { return isNaN(x) ? 0u : uint(x) }
    case (x: decimal)   { return isNaN(x) ? 0u : uint(x) }
    case (x: Number)    { return isNaN(x) ? 0u : uint(x) }
    case (x: string)    { return informative::stringHash(string(x))

} case (x: String) { return informative::stringHash(string(x)) } case (x: *) { return informative::objectHash(x) } } }

Hashing on null, undefined, booleans, and numbers are normatively specified, but hashing on strings and other objects are only informatively specified.

In order to share code, the spec also factors out commonalities as methods in the "helper" namespace. A common case is where both prototype methods and intrinsic methods take a variable number of arguments, as for the concat method in Array:

    prototype function concat(...items)
        Array.helper::concat(this, items);

    intrinsic function concat(...items): Array
        Array.helper::concat(this, items);

(In this case the helper function is a static method on the Array class, because it accomodates the static concat method too.)

Meta-level methods

The predefined namespace "meta" is used for methods that participate in language-level protocols: invocation and property access and update. A class that defines meta::invoke is callable as a function (the meta::invoke method is invoked in response to the call); the meta::get, meta::set, meta::has, and meta::delete methods are invoked in response to accesses to non-fixture properties on the object.

Other aspecs of the language will hopefully become clear as things move along. Do ask.

Hi all,

Over the next several weeks I'll be sending out draft specs for all(?)
the ES4 library classes, one class at a time (in the order I get to
them).

The ES4 library is expressed in terms of ES4 fragments: the spec uses
executable -- and tested -- ES4 code in places where the ES3 spec uses
pseudocode.  As a consequence, the draft library spec makes some
assumptions about what ES4 will look like when it's finished.

Below I am going to outline some aspects of ES4 that it will be useful
for the readers of the draft specs to know, beyond what's in ES3.  This
outline will be updated from time to time as new draft specs require it.
(Probably much of the information here is already written up in the
language overview available on ecmascript.org, so go there for the full
story.)


Namespaces, names

ES4 puts all names into namespaces.  A name is in exactly one namespace
and it is placed in that namespace by prefixing the binding keyword for
the name (class, var, const, function, and others) with the namespace
name.  If MyNS is a namespace then

  MyNS var x 

creates a variable whose fully qualified name is "MyNS::x".

There are several predefined namespaces.  The namespace "__ES4__" is
used for all top-level names that are new to ES4 if they're not in one
of the other namespaces (except for the name "__ES4__" itself, which is
the only unqualified top-level name introduced by ES4).  Important
predefined namespaces are "__ES4__::intrinsic" and "__ES4__::reflect".

In order to avoid having to fully qualify names all the time, namespaces
can be opened; the names defined in the namespace will then be available
without qualificiation.  The namespace "__ES4__" is opened for all ES4
code, so in practice the two predefined namespaces listed above are
known just as "intrinsic" and "reflect".  (Opening a namespace may
introduce ambiguities, which can be resolved by fully qualifying
ambiguous names.  Ambiguities are not common because a namespace opened
in an inner lexical scope takes precedence over namespaces opened in
outer scopes.)

The intrinsic namespace is reserved; user code is not allowed to
introduce new names in this namespace.  The intrinsic namespace is used
primarily for methods in the predefined classes.  For every prototype
method M there is a corresponding intrinsic method M in the class.  For
example, there is Array.prototype.concat and also an intrinsic::concat
method on Array instances.  The prototype methods are fully compatible
with ES3 in the types they accept and how they convert values.  The
intrinsic methods normally have more tightly constrained signatures and,
like all class methods, are immutable (though they can be overridden in
subclasses -- that's allowed even for user code).

The intrinsic namespace provides integrity (code that calls an intrinsic
method will know that it references the original method, it is not at
the mercy of changes to the prototype method) and optimization
opportunities (early binding to the slot that holds the method in the
presence of type annotations).  The specification of the predefined
classes in terms of ES4 code makes use of other predefined classes and
their methods, and predefined methods are careful to call intrinsic
methods to invoke known behavior and to call public methods to invoke
explicitly variant behavior.  Normally, such invocations are always
explicitly qualified in the text in order to avoid any ambiguity in the
reader's mind.


Types and annotations

Bindings in ES4 are typed, and the type can be provided explicitly by
following the name with a colon and the type:

  var x: Array

If the type is omitted, it is "*" (read as "any"), which means it is
unconstrained.  If we assume just run-time type checking for the time
being, then a check is performed every time a value is stored into an
annotated variable: the type of the value must be a subtype of the
annotated type.

Functions can be annotated too, in both their parameter and return
positions.  Annotations on parameters constrain how the function can be
called.  Annotations in the return position constrains what the function
can return:

  function f(x: string): RegExp { ... }

There are two classes of types, nominal types and structural types.

Nominal types are introduced by class definitions and interface
definitions.  Values of nominal types are created by instantiating
classes (using the "new" operator).  The syntax and semantics are
broadly as in Java: A nominal type is equal only to itself; a value is
of a class type only if it was instantiated from that type; and it is of
an interface type only if it was instantiated from a class type that
declares that it implements that interface.  (Note that the access
control keywords like private and public are actually aliases for
language-provided namespaces.)

Methods on classes appear as function definitions in the class body.
The class instance is in scope in the body of a method.

Structural types are record types (for example {x:int, y:int}), array
types (for example [int]), tuple types (for example [int,string]), union
types (for example (int|string|RegExp)), function types (for example
function(int):boolean), and some special types (null and undefined).  A
structural type is equal to any other structural type that has the same
fields with the same types (in any order), and a value is of a
structural type if it has fixed (non-deletable) fields with the names
and types given by the structural type.  (So if Point is a class with x
and y integer fields, an instance of Point is of the structural type
{x:int, y:int}.) Structural types can't be recursive.

Types can be given names by type definitions:

  type Num = (int|double)

Type definitions, class definitions, and interface definitions can be
parameterized:

  class Map.<K,V> { ... }
  type Box.<T> = { value: T }

Record and array types are instantiated by suffixing the literal with
the type:

  { value: 7 } : Box.<int>
  [1,2,3] : [int]

but now we're getting esoteric so let's stop there -- this is not the
language spec.

Any type is a subtype of *, and Box.<T> is a subtype of Box.<*>, for any
T.

One of the important aspecs of the type system is that the types provide
a specification for fixtures on the objects that are of the type: in any
value of type Box.<T>, the "value" property can't be removed.
(Instances of structural types can always have extra non-fixture fields,
as can instances of classes designated "dynamic".)


Functions

Functions can take optional arguments (they have default values) and
rest arguments:

  function f(x, y=0) { ... }   // y is optional
  function f(x, ...rest) { ... }

The rest argument appears as a regular Array object holding the excess
parameter values.

Function bodies that contain a simple return statement (which typically
returns the result of a call to another function) are common; ES4
introduces a shorthand where the body is a brace-less expression:

  function f(x, y)
    g(x*2, y, 0)


Informative and helper methods

The spec is normative, which means the ES4 code in the spec is normative
too.  In order to avoid overspecification the spec factors out
non-normative sections as methods in the "informative" namespace, which
are described by prose.  A good example is the global hashcode function:

    intrinsic const function hashcode(o): uint {
        switch type (o) {
        case (x: null)      { return 0u }
        case (x: undefined) { return 0u }
        case (x: boolean)   { return uint(x) }
        case (x: Boolean)   { return uint(x) }
        case (x: int)       { return x < 0 ? -x : x }
        case (x: uint)      { return x }
        case (x: double)    { return isNaN(x) ? 0u : uint(x) }
        case (x: decimal)   { return isNaN(x) ? 0u : uint(x) }
        case (x: Number)    { return isNaN(x) ? 0u : uint(x) }
        case (x: string)    { return informative::stringHash(string(x))
}
        case (x: String)    { return informative::stringHash(string(x))
}
        case (x: *)         { return informative::objectHash(x) }
        }
    }

Hashing on null, undefined, booleans, and numbers are normatively
specified, but hashing on strings and other objects are only
informatively specified.

In order to share code, the spec also factors out commonalities as
methods in the "helper" namespace.  A common case is where both
prototype methods and intrinsic methods take a variable number of
arguments, as for the concat method in Array:

        prototype function concat(...items)
            Array.helper::concat(this, items);

        intrinsic function concat(...items): Array
            Array.helper::concat(this, items);

(In this case the helper function is a static method on the Array class,
because it accomodates the static concat method too.)


Meta-level methods 

The predefined namespace "meta" is used for methods that participate in
language-level protocols: invocation and property access and update.  A
class that defines meta::invoke is callable as a function (the
meta::invoke method is invoked in response to the call); the meta::get,
meta::set, meta::has, and meta::delete methods are invoked in response
to accesses to non-fixture properties on the object.


Other aspecs of the language will hopefully become clear as things move
along.  Do ask.

--lars

# Brendan Eich (18 years ago)

On Feb 27, 2008, at 9:00 AM, Lars Hansen wrote:

Meta-level methods

The predefined namespace "meta" is used for methods that
participate in language-level protocols: invocation and property access and
update. A class that defines meta::invoke is callable as a function (the meta::invoke method is invoked in response to the call); the
meta::get, meta::set, meta::has, and meta::delete methods are invoked in response to accesses to non-fixture properties on the object.

Pedantry alert, forgive me -- but it may be important to know that
meta::invoke has static and instance forms.

Given class C { ... meta static function invoke(...) ... }, you can
call C as a function:

x = y + C(z);

This is used, e.g., by class Date in builtins/Date.es.

If you define a non-static function (a method) named meta::invoke
(via class C { ... meta function invoke(...) ... }), then as with
meta::get, etc., it is the instances of C that are themselves callable:

c = new C; x = y + c(z);

So there's a meta function invoke(...) ... in class Function in the
RI's builtins/Function.es, for example.

HTH,

On Feb 27, 2008, at 9:00 AM, Lars Hansen wrote:

> Meta-level methods
>
> The predefined namespace "meta" is used for methods that  
> participate in
> language-level protocols: invocation and property access and  
> update.  A
> class that defines meta::invoke is callable as a function (the
> meta::invoke method is invoked in response to the call); the  
> meta::get,
> meta::set, meta::has, and meta::delete methods are invoked in response
> to accesses to non-fixture properties on the object.

Pedantry alert, forgive me -- but it may be important to know that  
meta::invoke has static and instance forms.

Given class C { ... meta static function invoke(...) ... }, you can  
call C as a function:

   x = y + C(z);

This is used, e.g., by class Date in builtins/Date.es.

If you define a non-static function (a method) named meta::invoke  
(via class C { ... meta function invoke(...) ... }), then as with  
meta::get, etc., it is the instances of C that are themselves callable:

   c = new C;
   x = y + c(z);

So there's a meta function invoke(...) ... in class Function in the  
RI's builtins/Function.es, for example.

HTH,

/be

# Lars Hansen (18 years ago)

For some drafts coming this week, the following information will also be useful.

The specification makes use of a predefined namespace "magic". This namespace is reserved in the specification but not in any actual implementation of the language. It is used only to tag top-level functions that are implementation hooks. The hooks provide functionality that is not available in the language, for example, accessing the internal [[prototype]] property of objects.

Magic functions are defined by prose for the moment; it is probablye that they will be (partly?) exposed as SML fragments later, in the style of the semantic functions we're planning for other parts of the spec.

The specification also makes use of a type EnumerableId, which is a union type that currently looks like this:

type EnumerableId = (int | uint | string | Name)

For some drafts coming this week, the following
information will also be useful.

The specification makes use of a predefined namespace
"magic".  This namespace is reserved in the specification
but not in any actual implementation of the language.
It is used only to tag top-level functions that are
implementation hooks.  The hooks provide functionality
that is not available in the language, for example,
accessing the internal [[prototype]] property of objects.

Magic functions are defined by prose for the moment;
it is probablye that they will be (partly?) exposed as
SML fragments later, in the style of the semantic functions
we're planning for other parts of the spec.

The specification also makes use of a type EnumerableId,
which is a union type that currently looks like this:

  type EnumerableId = (int | uint | string | Name)

--lars

# Lars Hansen (18 years ago)

Information/discussion item.

In the drafts for predefined classes I've sent out so far, the interaction between the intrinsic methods and the prototype methods has more or less uniformly been specified as the prototype calling the corresponding intrinsic method on its "this" object, for example:

class C {

prototype function toString(this:C)
  this.intrinsic::toString()

intrinsic function toString()
  ...

}

The thinking was that by using this structure the prototype method can then take advantage of subclasses that override the intrinsic method (since the intrinsic is virtual the prototype method picks up the method in the subclass when calling the intrinsic). However, this thinking is flawed. The meaning of many prototype methods is fixed by E262-3 and must remain unchanged in E262-4 for compatibility reasons. An important example of this is the original Object.prototype.toString method. Not infrequently one sees code like this:

var x = <some object of unknown class, call it "C"> x.toString = Object.prototype.toString x.toString() // expected to return "[object C]"

It's a hack but it can be used to discover class names. However, if C overrides intrinsic::toString then this idiom no longer works, because Object.prototype.toString calls this.intrinsic::toString which is the overridden method in C, not the one in Object.

As a consequence, I think that the libraries need to be adjusted a little bit. Prototype methods should have fixed meanings that depend only on the type of object they were extracted from, whereas intrinsic methods can be overridden and can be specified such that they will pick up overridden methods. The structure would now be:

class C {

prototype function toString(this:C)
  this.private::toString()

intrinsic function toString()
  private::toString()

private function toString()
  ...

}

With this structure, the idiom outlined above would return the expected string, but (new C).intrinsic::toString() would pick up C's toString method as expected. In the absence of overriding, the prototype and intrinsic methods would work identically, as expected.

There are variations on this pattern; if the prototype method is generic, then it might forward to a static helper method (since an instance method would be this-constrained).

The complexity increase of this is annoying but less bad than it might seem at first glance because the truly huge classes -- Array, String -- already have all the functionality factored as static helper methods, and the main adjustment is to the prototype methods. For specification purposes it will make sense to follow the pattern pretty strictly, but in a practical implementation it would be more reasonable to duplicate some functionality, especially for simple methods.

Information/discussion item.

In the drafts for predefined classes I've sent out so far, the
interaction between the intrinsic methods and the prototype methods has
more or less uniformly been specified as the prototype calling the
corresponding intrinsic method on its "this" object, for example:

  class C {

    prototype function toString(this:C)
      this.intrinsic::toString()

    intrinsic function toString()
      ...
  }

The thinking was that by using this structure the prototype method can
then take advantage of subclasses that override the intrinsic method
(since the intrinsic is virtual the prototype method picks up the method
in the subclass when calling the intrinsic).  However, this thinking is
flawed.  The meaning of many prototype methods is fixed by E262-3 and
must remain unchanged in E262-4 for compatibility reasons.  An important
example of this is the original Object.prototype.toString method.  Not
infrequently one sees code like this:

  var x = <some object of unknown class, call it "C">
  x.toString = Object.prototype.toString
  x.toString() // expected to return "[object C]"

It's a hack but it can be used to discover class names.  However, if C
overrides intrinsic::toString then this idiom no longer works, because
Object.prototype.toString calls this.intrinsic::toString which is the
overridden method in C, not the one in Object.

As a consequence, I think that the libraries need to be adjusted a
little bit.  Prototype methods should have fixed meanings that depend
only on the type of object they were extracted from, whereas intrinsic
methods can be overridden and can be specified such that they will pick
up overridden methods.  The structure would now be:

  class C {

    prototype function toString(this:C)
      this.private::toString()

    intrinsic function toString()
      private::toString()

    private function toString()
      ...
  }

With this structure, the idiom outlined above would return the expected
string, but (new C).intrinsic::toString() would pick up C's toString
method as expected.  In the absence of overriding, the prototype and
intrinsic methods would work identically, as expected.

There are variations on this pattern; if the prototype method is
generic, then it might forward to a static helper method (since an
instance method would be this-constrained).

The complexity increase of this is annoying but less bad than it might
seem at first glance because the truly huge classes -- Array, String --
already have all the functionality factored as static helper methods,
and the main adjustment is to the prototype methods.  For specification
purposes it will make sense to follow the pattern pretty strictly, but
in a practical implementation it would be more reasonable to duplicate
some functionality, especially for simple methods.

--lars