on semantics, specification, metacircularity, \JS, etc.

# Shriram Krishnamurthi (15 years ago)

The recent thread about specifications and meta-circularity devolved into a discussion about semantics. Before one can argue whether or not the standards committee should invest effort in helping out (much less incorporating) a semantics, it helps to understand what form a useful semantics can take, and how much (in this case, how little) the committee needs to do to help the semantics along.

First I want to summarize the semantics effort from my group at Brown (Arjun Guha, Joe Politz, Claudiu Saftoiu, Spiros Eliopoulos), because I don't think it has been represented accurately, and the details really matter to this discussion. The semantics comes as a bundle of software tools, tests, and other mechanized and executable artifacts, all of which are here:

www.cs.brown.edu/research/plt/dl/jssem/v1

This effort is meant to be a bridge between practitioners and researchers. In particular, practitioners want and need tools that work on actual source programs in the wild, not just on cute little abstract artifacts; some researchers (like us) need the abstract artifact but also want it to conform to reality.

(NB: This is not meant to be a coded criticism of the work of Taly, Maffeis, and Mitchell, whose semantics effort is heroic. Both groups are aligned in wanting to build the same kind of bridge; but we differ enormously in our choices of media and means of construction.)

===== The Origin of the Semantics =====

First of all, why did we build our semantics? Our goal wasn't to build a semantics -- it was (and is) to build real tools that work on real programs. We built three different tools:

  • a very rich flow analysis for use in Ajax intrusion detection;
  • a static type system;
  • a different flow analysis for use in the type system.

By the end of this we got very frustrated because it's very hard to build tools for a large language, and it was impossible for us to state anything formal about what these tools did.

This forced us to split the language into syntactic sugar and core, and re-build all our tools atop the core. This has tremendously simplified and accelerated our work -- both re-implementing the above tools and creating new ones. Ultimately, all these tools run on real JavaScript source -- for instance, we currently have a family of checkers that verify the properties of ADsafe by operating on the actual ADsafe source code.

===== The Structure and Guarantee of the Semantics =====

As I said, our semantics splits the language into a small core and a desugarer. The core (\JS: read "lambda JS") is small (fits on the proverbial page, at least A4 <-;) and is designed to support the proofs researchers do. Desugar is much larger, but it is a combination of mostly small pieces that are also amenable to study.

So we have two functions, desugar and \JS. For a program e, we can run \JS(desugar(e)) to obtain an answer. But does this have any connection to reality? We can also just run e through a JavaScript evaluator like the one in your favorite browser. We show, for a significant part of the Mozilla JavaScript test suite, that these produce the SAME ANSWER. That is, the semantics is (nearly) every bit as tested as your browser's evaluator itself is!

This now yields a strategy for researchers to have impact on practice: develop your work atop \JS; if you ever want to run it on a real source program, use desugar to convert the source into \JS; you can now have very high confidence in your result, since we commit to maintaining confirmity with the browser test suites.

===== Presentation of the Semantics (where Metacircularity Goes) =====

Some prior messages have assumed that developers are incapable of reading a semantics (I'm going to sidestep this very condescending assumption) and, implicitly, that there is only one way of writing one. This is not true at all.

Each of the artifacts above can be written in a variety of ways:

desugar:

  • as a set of transformation rules on paper
  • in some funky term-rewriting language (Maude, XSLT, ...)
  • as, say, a Haskell program
  • as a JavaScript program

\JS:

  • as a set of semantic rules on paper
  • using a mechanization of those rules (eg, PLT Redex)
  • as an interpreter in, say, Scheme
  • as an interpreter in JavaScript

Note:

  • Because each strategy offers strengths and weaknesses, there's value to having more than one description of each layer. (Conformity follows from testing.)

  • The two layers don't need to be written in the same language: the surface syntax of \JS defines the interface between them. Systems can fruitfully use implementations of each layer in different languages.

Observe that in this world, metacircularity is just one of many ways of presenting the system. Personally, I think using a metacircular interpretation as a way of understanding the language is both misguided and under-defined, but that is a separate point that I won't get into here. More importantly for those who are really attached to that idea, it is perfectly consistent with this semantics strategy, and it lives in harmony with others.

===== The Relationship to Standardization =====

Finally, what does all this have to do with standardization?

Proving properties of systems is not the business of the standardization committee. However, a committee can decide that it is sympathetic to this goal, and wants to enable it. It can do so in two ways.

  1. Working with practicing semanticists can be of real value. For instance, we are currently migrating desugar and \JS to ES5; this has been a very interesting exercise, and we are struggling to find the right locus for certain decisions. This experience might be of value to the standardizers. In short, the standard and a semantics can potentially grow symbiotically.

  2. But if the standardization committee wants to have nothing to do with formal semantics, that's fine, too. In that case, the most valuable thing you can do is commit to maintaining conformance suites. Our re-design for ES5 is using the Conformance Suite to guide our work: think of it as a test-driven development of the semantics.

Thus, I hope this message contains at least one surprise for those working on ES5 standardization: you may have assumed that hardening the Conformance Suite is only of value to JavaScript evaluator builders; we're here to say it is just as valuable to pointy-headed semanticists, at least ones who use a semantics strategy like ours.

I appreciate your reading this far, and we value your comments.

Shriram

# Mike Samuel (15 years ago)

2010/5/21 Shriram Krishnamurthi <sk at cs.brown.edu>:

The recent thread about specifications and meta-circularity devolved into a discussion about semantics.  Before one can argue whether or not the standards committee should invest effort in helping out (much less incorporating) a semantics, it helps to understand what form a useful semantics can take, and how much (in this case, how little) the committee needs to do to help the semantics along.

First I want to summarize the semantics effort from my group at Brown (Arjun Guha, Joe Politz, Claudiu Saftoiu, Spiros Eliopoulos), because I don't think it has been represented accurately, and the details really matter to this discussion.  The semantics comes as a bundle of software tools, tests, and other mechanized and executable artifacts, all of which are here:

www.cs.brown.edu/research/plt/dl/jssem/v1

This effort is meant to be a bridge between practitioners and researchers.  In particular, practitioners want and need tools that work on actual source programs in the wild, not just on cute little abstract artifacts; some researchers (like us) need the abstract artifact but also want it to conform to reality.

(NB: This is not meant to be a coded criticism of the work of Taly, Maffeis, and Mitchell, whose semantics effort is heroic.  Both groups are aligned in wanting to build the same kind of bridge; but we differ enormously in our choices of media and means of construction.)

===== The Origin of the Semantics =====

First of all, why did we build our semantics?  Our goal wasn't to build a semantics -- it was (and is) to build real tools that work on real programs.  We built three different tools:

  • a very rich flow analysis for use in Ajax intrusion detection;
  • a static type system;
  • a different flow analysis for use in the type system.

By the end of this we got very frustrated because it's very hard to build tools for a large language, and it was impossible for us to state anything formal about what these tools did.

This forced us to split the language into syntactic sugar and core, and re-build all our tools atop the core.  This has tremendously simplified and accelerated our work -- both re-implementing the above tools and creating new ones.  Ultimately, all these tools run on real JavaScript source -- for instance, we currently have a family of checkers that verify the properties of ADsafe by operating on the actual ADsafe source code.

===== The Structure and Guarantee of the Semantics =====

As I said, our semantics splits the language into a small core and a desugarer.  The core (\JS: read "lambda JS") is small (fits on the proverbial page, at least A4 <-;) and is designed to support the proofs researchers do.  Desugar is much larger, but it is a combination of mostly small pieces that are also amenable to study.

So we have two functions, desugar and \JS.  For a program e, we can run \JS(desugar(e)) to obtain an answer.  But does this have any connection to reality?  We can also just run e through a JavaScript evaluator like the one in your favorite browser.  We show, for a significant part of the Mozilla JavaScript test suite, that these produce the SAME ANSWER.  That is, the semantics is (nearly) every bit as tested as your browser's evaluator itself is!

This now yields a strategy for researchers to have impact on practice: develop your work atop \JS; if you ever want to run it on a real source program, use desugar to convert the source into \JS; you can now have very high confidence in your result, since we commit to maintaining confirmity with the browser test suites.

===== Presentation of the Semantics (where Metacircularity Goes) =====

Some prior messages have assumed that developers are incapable of reading a semantics (I'm going to sidestep this very condescending assumption) and, implicitly, that there is only one way of writing one.  This is not true at all.

Each of the artifacts above can be written in a variety of ways:

desugar:

  • as a set of transformation rules on paper
  • in some funky term-rewriting language (Maude, XSLT, ...)
  • as, say, a Haskell program
  • as a JavaScript program

\JS:

  • as a set of semantic rules on paper
  • using a mechanization of those rules (eg, PLT Redex)
  • as an interpreter in, say, Scheme
  • as an interpreter in JavaScript

Note:

  • Because each strategy offers strengths and weaknesses, there's value to having more than one description of each layer.  (Conformity follows from testing.)

  • The two layers don't need to be written in the same language: the surface syntax of \JS defines the interface between them.  Systems can fruitfully use implementations of each layer in different languages.

David Herman argued that overspecification is a problem in some parts of the spec and cited Array.prototype.sort. In building your semantics, were there parts of the spec that stood out as under-specified that could benefit from being described via \JS or desugar?

# Allen Wirfs-Brock (15 years ago)

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Mike Samuel ... David Herman argued that overspecification is a problem in some parts of the spec and cited Array.prototype.sort. In building your semantics, were there parts of the spec that stood out as under- specified that could benefit from being described via \JS or desugar?

Did you really mean "overspecification" in the first sentence? Array.prototype.sort is actually one of the most loosely specified built-in methods. For example, it doesn't say anything about requiring a specific sort algorithm and leaves many possible cases as "implementation-defined". Whether or not this is an appropriate level of specification for this function is a separate discussion.

# Brendan Eich (15 years ago)

On May 21, 2010, at 11:44 AM, Allen Wirfs-Brock wrote:

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Mike Samuel ... David Herman argued that overspecification is a problem in some
parts of the spec and cited Array.prototype.sort. In building your semantics, were there parts of the spec that stood
out as under- specified that could benefit from being described via \JS or desugar?

Did you really mean "overspecification" in the first sentence?
Array.prototype.sort is actually one of the most loosely specified
built-in methods. For example, it doesn't say anything about
requiring a specific sort algorithm and leaves many possible cases
as "implementation-defined". Whether or not this is an appropriate
level of specification for this function is a separate discussion.

You two are in agreement. Dave's point was that sorting must be
underspecified. The spec does not want to dictate a sort algorithm or
even asymptotic complexity. Mike was citing that point as a potential
issue with any more thorough or systematic semantic specification.
Graydon noted it in connection with self-hosting.

# Mike Samuel (15 years ago)

2010/5/21 Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com>:

-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Mike Samuel ... David Herman argued that overspecification is a problem in some parts of the spec and cited Array.prototype.sort. In building your semantics, were there parts of the spec that stood out as under- specified that could benefit from being described via \JS or desugar?

Did you really mean "overspecification" in the first sentence?  Array.prototype.sort is actually one of the most loosely specified built-in methods.  For example, it doesn't say anything about requiring a specific sort algorithm and leaves many possible cases as "implementation-defined".  Whether or not this is an appropriate level of specification for this function is a separate discussion.

I did mean "overspecification." David Herman (CCed) said in esdiscuss/2010-May/011236

For example, I don't think the spec should ever provide
an executable presentation of |Array.prototype.sort|, for example.

and I tend to agree.

He may not agree with my reasons.

I think that the spec should (1) obviously specify for sort that given a valid comparator, that the array is sorted afterwards (2) try to bound the badness that can happen in corner cases, e.g. around bad comparators (3) specify other properties like stability that are of major concern to users

But inevitably some APIs, and the design choices around them are better understood than others. And in the first version of a spec to specify an API, we should provide leeway to implementors to experiment with tradeoffs.

The spec shouldn't require exact conformance for all corner cases across the board until the tradeoffs have been properly weighed.

Specifically it shouldn't prematurely nail down exactly what should happen if a misbehaving comparator decides to modify the array mid-sort.

Doing so prematurely would not leave enough leeway for implementors to experiment with ways to make sort behave as efficiently as possible for the typical case.

# Arjun Guha (15 years ago)

In building your semantics, were there parts of the spec that stood out as under-specified that could benefit from being described via \JS or desugar?

I don't have a good answer to this question. The paper by Maffeis, Mitchell, and Taly catalogs some of these cases. We had the same problems they did, but didn't think to catalog them since we tried to match implementations, not the specification. Joe Politz is building a semantics for the ES5 spec, so he'll be able to say more about this.

Arjun