Specification Language

# Douglas Crockford (15 years ago)

I would like to see the next edition of the specification use the ECMAScript language to describe the ECMAScript language. I think this would significantly improve the likelihood that a programmer could correctly understand ECMAScript by reading the standard. I think that would make the web a better place. A strawman is in the wiki.

strawman:specification_language

# Graydon Hoare (15 years ago)

On 17/05/2010 2:48 PM, Douglas Crockford wrote:

I would like to see the next edition of the specification use the ECMAScript language to describe the ECMAScript language. I think this would significantly improve the likelihood that a programmer could correctly understand ECMAScript by reading the standard. I think that would make the web a better place. A strawman is in the wiki.

strawman:specification_language

Some thoughts on this (as one of the authors of the ES4 RI):

  • The circularity hazards you mention are real, and were a large reason why we "bottomed out" in SML. Unless you're expecting users to take the fixpoint of the spec or something (which may not even be unique) you will probably need to draw a line around a kernel language that you define "some other way". SML has a denotational semantics, which is why we picked it. I'm not suggesting you pick it again (plenty of problems arose; mostly cultural) but I think you'll get the most mileage here when describing the standard libraries, and possibly the front-end, rather than the core concepts like "evaluate expression" or "look up name".

  • There is already a fair quantity of code for both those bits (library and front-end). ES4 libraries were being written in ES4; Narcissus has a decent front-end and Mozilla has people working on it still. Consider cribbing from these.

  • Another serious hazard (and strong-ish argument for drawing a line around a non-self-defined kernel language) is the risk of over-specification. It's easier to avoid in library code, much harder when dealing with bits of the most-primitive semantics, which differ quite a lot between different implementations of ES.

  • Be careful with legibility. Another (secondary) reason we settled on SML is that it presented what we considered at the time to be a relatively clean typographic quality and can be rendered as "not very much like source code" with a certain amount of mechanical translation and/or marked redaction work. We were repeatedly warned that we would need to have a "natural language" translation produced from the executable steps, on the basis that ISO and/or ECMA would reject standards containing source and that "IP issues" would substantially cloud and possibly derail the process if any source code was proposed. Find out if this is a real problem before you go too far down this road.

# Dirk Pranke (15 years ago)

I wonder if you can answer some of the metacircularity concerns by defining the necessary parts using operational semantics, as in jssec.net/semantics/sjs.pdf .

As an aside, has anyone actually attempted to formally document the necessary kernel subset (apart from the above paper)?

# David Herman (15 years ago)

Arjun Guha, Claudio Saftoiu and Shriram Krishnamurthi have a recent paper on the topic:

http://www.cs.brown.edu/~sk/Publications/Papers/Published/gsk-essence-javascript/

Having had some experience with this question myself, let me just say that while formalization is appealing, it's a very subtle and time-consuming task.

Despite its well-known shortcomings, English is hard to beat for flexibility. When it comes to hitting the right level of specification (not over- and not under-), formalism can be frustratingly rigid. That includes more rigorous approaches like operational semantics as well as reference implementations or meta-circular definitions. As Graydon pointed out, you end up being forced to choose concrete representations that over-specify, particularly with the choice of data structures. Math gives you more wiggle-room, but it's still finicky and much slower going than code. And even most research languages don't have fully formalized semantics. [ Standard ML is about the only one I know of that is 100% formalized. Don't tell me Scheme is unless you want to hear me drone on about my thesis. ;) ]

There are specific pitfalls to the meta-circular approach. As nice as it sounds to just take a language that everyone already knows, you end up standing on a delicate precipice. Fall over one side, and you accidentally leave implicit parts of the semantics that never get defined. Fall over the other side, and you end up spending all your time carefully defining the subset of ES that you want to use for your defining language-- in which case, you've essentially designed another language (one spec for the price of two!).

(I'm not saying improving the spec language is a bad idea, just trying to spell out some of the challenges.)

# Arjun Guha (15 years ago)

Arjun Guha, Claudio Saftoiu and Shriram Krishnamurthi have a recent paper on the topic: www.cs.brown.edu/~sk/Publications/Papers/Published/gsk-essence-javascript Having had some experience with this question myself, let me just say that while formalization is appealing, it's a very subtle and time-consuming task.

I don't agree with this. Two of us formalized, implemented, and tested that paper in a month. That is hardly time-consuming, and it's not very subtle since we have test suites to test our formalization.

We haven't perfectly matched the entire spec. (e.g. we mapped JS regexps to regexps in Scheme). But, I believe we got a covered a good portion of ES3 (see section 3 of the paper).

That includes more rigorous approaches like operational semantics

Don't try to build a semantics for the entire language. Instead, build a semantics for the "essentials" (i.e. objects and functions) and write a function that elaborates all the details of language into the semantics.

Math gives you more wiggle-room, but it's still finicky and much slower going than code.

You do not have to choose between "math" and "code". You can use both as appropriate. If you look at the paper, think of LambdaJS as the essential math, and desugaring as code for the remaining details.

And > even most research languages don't have fully formalized

semantics. [ Standard ML is about the only

one I know of that is 100% formalized. Don't tell me Scheme is

I don't think anyone claims that it is.

(I'm not saying improving the spec language is a bad idea, just trying to spell out some of the challenges.)

Btw, Joe Politz (student at Brown) is currently formalizing ES5 following the approach Claudiu and I used for current-JavaScript. He's striving to conform to the conformance suite and has already made some progress. The project is on Github, as usual:

arjunguha/ML-LambdaJS

It's the LambdaJS approach: a semantics for the essentials, which lets you write precisely write down the algorithms in the spec:

arjunguha/ML-LambdaJS/blob/master/data/es5-lib.es5

Arjun

# David Herman (15 years ago)

I don't agree with this. Two of us formalized, implemented, and tested that paper in a month. That is hardly time-consuming, and it's not very subtle since we have test suites to test our formalization.

I brought up your paper because it's good work. I wasn't criticizing it. But there's a difference between formalizing the operational core of a language and writing a language standard.

BTW, only having two people work on it is an advantage -- committee work is hard. :)

That includes more rigorous approaches like operational semantics

Don't try to build a semantics for the entire language. Instead, build a semantics for the "essentials" (i.e. objects and functions) and write a function that elaborates all the details of language into the semantics.

a) I'm the one who advertised your paper, remember?

b) You're arguing with a strawman. I'm not saying "operational semantics is hard! let's go shopping!" I'm saying that making things more precise and formal runs the risk of over-specification. Maybe these will becomes less of an issue if we decide to determinize more of the language (e.g. object enumeration). But I believe libraries will still be tricky. For example, I don't think the spec should ever provide an executable presentation of |Array.prototype.sort|, for example. Again, English can always swoop in and save the day. But at any rate, these are some of the challenges I'm talking about.

c) This isn't the first time we've discussed a formally specified core + elaboration. We considered it back in the ES4 days. But the committee decided to go with something more traditional and moved towards a reference implementation. At the time, people complained that our choice of ML was too freaky and academic. (Now imagine selling a reduction semantics...)

Doug has more recently suggested we use a subset of ES and write a meta-circular implementation, with the rationale that ES is more familiar and would be more approachable than ML.

Semanticists prefer choosing the framework that most naturally expresses the semantics at hand. I'm certainly on board in principle. But I'm not really the one to convince. ;)

d) If you're serious about suggesting lambda-JS as a basis (or starting point, anyway) for future editions of ECMA-262, may I make a suggestion? Do a proof-of-concept by taking the ES5 document and rewriting some of it in your suggested style. Not just the semantics but the document itself. It could be an illuminating exercise, and it would also likely result in a more compelling product. Nothing sells like concrete examples.

# Arjun Guha (15 years ago)

I brought up your paper because it's good work. I wasn't criticizing it. But there's a difference between formalizing the operational core of a language and writing a language standard.

To be clear, we formalized much more than an operational core of the language. We've formalized a portion of the standard. We run and pass portions of the Mozilla JS test suite, which is only possible once you account for (some) standard library functions.

For example, here is Array.prototype.sort:

arjunguha/LambdaJS/blob/master/LambdaJS/src/BrownPLT/JavaScript/Semantics/ECMAEnvironment.hs#L169

(Claudiu apologizes for implementing insertion sort instead of quicksort. He's really lazy.)

Arjun

# David Herman (15 years ago)

You're still not understanding me. My cat could write an executable implementation of the ES standard library. (Seriously, he's an amazing cat.) The point is whether you can hit the right level of abstraction, the right level of presentation, and the right level of specification-- neither over- nor under-. The fact that you wrote this:

For example, here is Array.prototype.sort:

arjunguha/LambdaJS/blob/master/LambdaJS/src/BrownPLT/JavaScript/Semantics/ECMAEnvironment.hs#L169

indicates that I'm failing to get this point across.

# David Herman (15 years ago)

You're still not understanding me. My cat could write an executable implementation of the ES standard library. (Seriously, he's an amazing cat.)

Hm. I was feeling silly at the time but that just came across mean. Sorry about that. All I meant was that there's more to the spec than just whether it's executable, and in fact being executable is sometimes exactly not what you want.

# Mike Samuel (15 years ago)

2010/5/19 David Herman <dherman at mozilla.com>:

You're still not understanding me. My cat could write an executable implementation of the ES standard library. (Seriously, he's an amazing cat.)

Hm. I was feeling silly at the time but that just came across mean. Sorry about that. All I meant was that there's more to the spec than just whether it's executable, and in fact being executable is sometimes exactly not what you want.

If I understand,

we could specify Array.prototype.sort in some subset of an imperative language which would mean that for all inputs we would specify the state of the outputs.

In English, we specify macroscopic properties of the outputs -- the properties that programmers can rely on -- while leaving enough slack for implementors to experiment.

(The below is only for comparative value, not a suggestion.) One language based specification equivalent of this might be specifying Array.prototype.sort in something like prolog, where you have can define predicates describing sortedness (given a good comparator), stability, termination given misbehaving comparators, etc.

But ES is better suited to concisely defining a sort algorithm than defining sortedness.

# Joseph Politz (15 years ago)

d) If you're serious about suggesting lambda-JS as a basis (or starting point, anyway) for future editions of ECMA-262, may I make a suggestion? Do a proof-of-concept by taking the ES5 document and rewriting some of it in your suggested style.

If I understand your suggestion right, this is exactly what's going on here: arjunguha/ML-LambdaJS/blob/master/data/es5-lib.es5

In the repo Arjun pointed to (arjunguha/ML-LambdaJS), there is:

  1. An updated LambdaJS for ES5, with an interpreter;
  2. An implementation of desugaring (elaboration) from JavaScript source to ES5-LambdaJS;
  3. A start at writing down built-in objects and functions from the specification in ES5-LambdaJS.

The implementation passes around 5-6% of the tests at es5conform.codeplex.com, predominantly those that test the object model (e.g. Object.defineProperty, Object.getOwnPropertyDescriptor). We're on our way to implementing more of the built-in objects' functionality.

We are serious about using LambdaJS to model real JavaScript, and we're quite focused on getting the abstractions right at the levels of semantics, libraries, and elaboration. I'd love for people to check out the implementation and give feedback on this and any other issues as we move forward with it.

Joe

# Mark S. Miller (15 years ago)

I'd just like to express my enthusiasm for taking a formal approach to the kernel language. For everything outside the kernel, defining it by self-hosting, by a meta-circular interpreter (where the interpreter is written only in the kernel subset of the language) or by desugaring is fine. These other techniques may or may not be ideal for expressing the semantics of these non-kernel constructs, but it does ensure that an adversary's code cannot take any action beyond those actions allowed by the kernel language.

All security reasoning is reasoning about limits on what an adversary may do. Thus, all security arguments rest on an induction over all the actions available to the adversary. I think it is no accident that both prominent attempts to formalize the semantics of JS (LambdaJS and < www-cs-students.stanford.edu/~ataly/Papers>) took security as their

driving motivation.

The related pressing issue is how to tighten up the chapter 16 exemptions. So long as these exemptions are this broad, the induction is broken and all security arguments are unsound. I do not yet have concrete suggestions about how we can tighten these up in ways that JS vendors will agree to. But until we do, we are just as vulnerable to disasters such as having "foo[-6]", for function foo, give foo's caller. By ch16, the major browser that did indeed ship this cannot even be said to have violated the spec. This simple extension allowed by ch16 simultaneously broke all secure JS subsets (Caja, FBJS, WebSandbox, ADsafe, Jacaranda) and all proof about security of any possible subsets.

# Mark S. Miller (15 years ago)

On Thu, May 20, 2010 at 8:49 AM, Mark S. Miller <erights at google.com> wrote:

I'd just like to express my enthusiasm for taking a formal approach to the kernel language. For everything outside the kernel, defining it by self-hosting, by a meta-circular interpreter (where the interpreter is written only in the kernel subset of the language) or by desugaring is fine. These other techniques may or may not be ideal for expressing the semantics of these non-kernel constructs, but it does ensure that an adversary's code cannot take any action beyond those actions allowed by the kernel language.

All security reasoning is reasoning about limits on what an adversary may do. Thus, all security arguments rest on an induction over all the actions available to the adversary. I think it is no accident that both prominent attempts to formalize the semantics of JS (LambdaJS and < www-cs-students.stanford.edu/~ataly/Papers>) took security as their driving motivation.

I had meant to link to < www-cs-students.stanford.edu/~ataly/#publications> above.

# Allen Wirfs-Brock (15 years ago)

There is another approach to using JavaScript to specify JavaScript that we have discussed within TC39. The idea to use the exact same style of specification as current used for ES5 but to use JavaScript as the "meta language" rather than pseudo-code.

Because this work is relevant to this discussion but hasn't had visibility outside of TC39 I've made it available on the ES wiki: strawman:es5definterp.js.txt

Allen

From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Mark S. Miller Sent: Thursday, May 20, 2010 8:52 AM To: Joseph Politz; Arjun Guha; Ankur Taly; John Mitchell; Sergio Maffeis Cc: es-discuss at mozilla.org Subject: Re: Specification Language

On Thu, May 20, 2010 at 8:49 AM, Mark S. Miller <erights at google.com<mailto:erights at google.com>> wrote:

I'd just like to express my enthusiasm for taking a formal approach to the kernel language. For everything outside the kernel, defining it by self-hosting, by a meta-circular interpreter (where the interpreter is written only in the kernel subset of the language) or by desugaring is fine. These other techniques may or may not be ideal for expressing the semantics of these non-kernel constructs, but it does ensure that an adversary's code cannot take any action beyond those actions allowed by the kernel language.

All security reasoning is reasoning about limits on what an adversary may do. Thus, all security arguments rest on an induction over all the actions available to the adversary. I think it is no accident that both prominent attempts to formalize the semantics of JS (LambdaJS and www-cs-students.stanford.edu/~ataly/Papers) took security as their driving motivation.

I had meant to link to www-cs-students.stanford.edu/~ataly/#publications above.

The related pressing issue is how to tighten up the chapter 16 exemptions. So long as these exemptions are this broad, the induction is broken and all security arguments are unsound. I do not yet have concrete suggestions about how we can tighten these up in ways that JS vendors will agree to. But until we do, we are just as vulnerable to disasters such as having "foo[-6]", for function foo, give foo's caller. By ch16, the major browser that did indeed ship this cannot even be said to have violated the spec. This simple extension allowed by ch16 simultaneously broke all secure JS subsets (Caja, FBJS, WebSandbox, ADsafe, Jacaranda) and all proof about security of any possible subsets.

On Wed, May 19, 2010 at 7:03 PM, Joseph Politz <joe at cs.brown.edu<mailto:joe at cs.brown.edu>> wrote:

d) If you're serious about suggesting lambda-JS as a basis (or starting point, anyway) for future editions of ECMA-262, may I make a suggestion? Do a proof-of-concept by taking the ES5 document and rewriting some of it in your suggested style.

If I understand your suggestion right, this is exactly what's going on here: arjunguha/ML-LambdaJS/blob/master/data/es5-lib.es5

In the repo Arjun pointed to (arjunguha/ML-LambdaJS), there is:

  1. An updated LambdaJS for ES5, with an interpreter;
  2. An implementation of desugaring (elaboration) from JavaScript source to ES5-LambdaJS;
  3. A start at writing down built-in objects and functions from the specification in ES5-LambdaJS.

The implementation passes around 5-6% of the tests at es5conform.codeplex.com, predominantly those that test the object model (e.g. Object.defineProperty, Object.getOwnPropertyDescriptor). We're on our way to implementing more of the built-in objects' functionality.

We are serious about using LambdaJS to model real JavaScript, and we're quite focused on getting the abstractions right at the levels of semantics, libraries, and elaboration. I'd love for people to check out the implementation and give feedback on this and any other issues as we move forward with it.

Joe

# David Herman (15 years ago)

I'd just like to express my enthusiasm for taking a formal approach to the kernel language. For everything outside the kernel, defining it by self-hosting, by a meta-circular interpreter (where the interpreter is written only in the kernel subset of the language) or by desugaring is fine. These other techniques may or may not be ideal for expressing the semantics of these non-kernel constructs, but it does ensure that an adversary's code cannot take any action beyond those actions allowed by the kernel language.

It's important to distinguish whether this is a valuable exercise for:

  1. research;
  2. groups focused on security; or
  3. the language standard.

(I know security is not a separable concern. But it is also not the exclusive concern of language design.)

I am thrilled that researchers are working on formal models of ES. But it's not within the purview of TC39 to do research. And there's nothing wrong with an unofficial, non-normative formalization of a normative spec coming out of an entirely different group. It could be just as useful-- and would undoubtedly be of higher quality-- than one we tried to do in the standard.

All security reasoning is reasoning about limits on what an adversary may do. Thus, all security arguments rest on an induction over all the actions available to the adversary.

This statement's pretty muddled. Your "thus" does not follow (there are more proof techniques on heaven and earth, Horatio...) and your induction is not defined. But I see that Sergio, John, and Ankur have some papers with formal definitions-- I look forward to reading them. Thanks for the links.

# Mark S. Miller (15 years ago)

On Thu, May 20, 2010 at 9:11 AM, Allen Wirfs-Brock < Allen.Wirfs-Brock at microsoft.com> wrote:

There is another approach to using JavaScript to specify JavaScript that we have discussed within TC39. The idea to use the exact same style of specification as current used for ES5 but to use JavaScript as the “meta language” rather than pseudo-code.

Because this work is relevant to this discussion but hasn’t had visibility outside of TC39 I’ve made it available on the ES wiki:

strawman:es5definterp.js.txt

Hi Allen,

As you know, I have avoided looking at this until the IPR situation was straightened out. Since it now seems to be, could you (or someone representing Ecma) repost this with a BSD license?

Everyone concerned about keeping their brain free of unnecessary proprietary contamination, I suggest not clicking on the link above until we get an ack that the text at that link has been reposted with an open source license.

# Allen Wirfs-Brock (15 years ago)

It's just a document copyrighted by ecma that happens to mostly be code and is largely derivate from the ES5 spec. which is also copyrighted by Ecma. Paranoia about reading it is a matter between you and your lawyers.

Allen

From: Mark S. Miller [mailto:erights at google.com] Sent: Thursday, May 20, 2010 1:37 PM To: Allen Wirfs-Brock Cc: Joseph Politz; Arjun Guha; Ankur Taly; John Mitchell; Sergio Maffeis; es-discuss at mozilla.org Subject: Re: Specification Language

On Thu, May 20, 2010 at 9:11 AM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com<mailto:Allen.Wirfs-Brock at microsoft.com>> wrote:

There is another approach to using JavaScript to specify JavaScript that we have discussed within TC39. The idea to use the exact same style of specification as current used for ES5 but to use JavaScript as the "meta language" rather than pseudo-code.

Because this work is relevant to this discussion but hasn't had visibility outside of TC39 I've made it available on the ES wiki: strawman:es5definterp.js.txt Hi Allen,

As you know, I have avoided looking at this until the IPR situation was straightened out. Since it now seems to be, could you (or someone representing Ecma) repost this with a BSD license?

Everyone concerned about keeping their brain free of unnecessary proprietary contamination, I suggest not clicking on the link above until we get an ack that the text at that link has been reposted with an open source license.

Allen

From: es-discuss-bounces at mozilla.org<mailto:es-discuss-bounces at mozilla.org> [mailto:es-discuss-bounces at mozilla.org<mailto:es-discuss-bounces at mozilla.org>] On Behalf Of Mark S. Miller

Sent: Thursday, May 20, 2010 8:52 AM To: Joseph Politz; Arjun Guha; Ankur Taly; John Mitchell; Sergio Maffeis Cc: es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>

Subject: Re: Specification Language

On Thu, May 20, 2010 at 8:49 AM, Mark S. Miller <erights at google.com<mailto:erights at google.com>> wrote:

I'd just like to express my enthusiasm for taking a formal approach to the kernel language. For everything outside the kernel, defining it by self-hosting, by a meta-circular interpreter (where the interpreter is written only in the kernel subset of the language) or by desugaring is fine. These other techniques may or may not be ideal for expressing the semantics of these non-kernel constructs, but it does ensure that an adversary's code cannot take any action beyond those actions allowed by the kernel language.

All security reasoning is reasoning about limits on what an adversary may do. Thus, all security arguments rest on an induction over all the actions available to the adversary. I think it is no accident that both prominent attempts to formalize the semantics of JS (LambdaJS and www-cs-students.stanford.edu/~ataly/Papers) took security as their driving motivation.

I had meant to link to www-cs-students.stanford.edu/~ataly/#publications above.

The related pressing issue is how to tighten up the chapter 16 exemptions. So long as these exemptions are this broad, the induction is broken and all security arguments are unsound. I do not yet have concrete suggestions about how we can tighten these up in ways that JS vendors will agree to. But until we do, we are just as vulnerable to disasters such as having "foo[-6]", for function foo, give foo's caller. By ch16, the major browser that did indeed ship this cannot even be said to have violated the spec. This simple extension allowed by ch16 simultaneously broke all secure JS subsets (Caja, FBJS, WebSandbox, ADsafe, Jacaranda) and all proof about security of any possible subsets.

On Wed, May 19, 2010 at 7:03 PM, Joseph Politz <joe at cs.brown.edu<mailto:joe at cs.brown.edu>> wrote:

d) If you're serious about suggesting lambda-JS as a basis (or starting point, anyway) for future editions of ECMA-262, may I make a suggestion? Do a proof-of-concept by taking the ES5 document and rewriting some of it in your suggested style.

If I understand your suggestion right, this is exactly what's going on here: arjunguha/ML-LambdaJS/blob/master/data/es5-lib.es5

In the repo Arjun pointed to (arjunguha/ML-LambdaJS), there is:

  1. An updated LambdaJS for ES5, with an interpreter;
  2. An implementation of desugaring (elaboration) from JavaScript source to ES5-LambdaJS;
  3. A start at writing down built-in objects and functions from the specification in ES5-LambdaJS.

The implementation passes around 5-6% of the tests at es5conform.codeplex.com, predominantly those that test the object model (e.g. Object.defineProperty, Object.getOwnPropertyDescriptor). We're on our way to implementing more of the built-in objects' functionality.

We are serious about using LambdaJS to model real JavaScript, and we're quite focused on getting the abstractions right at the levels of semantics, libraries, and elaboration. I'd love for people to check out the implementation and give feedback on this and any other issues as we move forward with it.

Joe

# Mark S. Miller (15 years ago)

I wrote:

As you know, I have avoided looking at this until the IPR situation was straightened out. Since it now seems to be, could you (or someone representing Ecma) repost this with a BSD license?

On Thu, May 20, 2010 at 2:08 PM, Allen Wirfs-Brock < Allen.Wirfs-Brock at microsoft.com> wrote:

It’s just a document copyrighted by ecma that happens to mostly be code and is largely derivate from the ES5 spec. which is also copyrighted by Ecma. Paranoia about reading it is a matter between you and your lawyers.

Allen's response surprised me, as I thought we were now well past this issue. My mistake. The ECMA CC has agreed, but it's not official until the ECMA GA approves this decision at their June meeting. Hopefully, after the June GA meeting, we will see this code reposted with a BSD license.