Machine readable specifications
I expect that what you'll hear from implementers is that parsing isn't the hard bit of a modern JS engine -- it's certainly not the thorniest part of Traceur, and it doesn't do most of the work a JIT-ing engine would.
If you would like to concretely improve the situation, you might ask Allen and folks from engine teams what would simplify their jobs before proposing a solution.
Aren't you confusing "machine readable specification" with "machine readable syntax specification"? Syntax is only a small part of a language spec, and quite frankly, the least interesting one by far. Also, I don't see the benefit of your format. The EcmaScript spec actually goes to quite some length to give a grammar that is LR(1). The whole point is to keep it de facto "machine readable". But unlike yours, it also is human readable.
(FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor. Techniques for formalised language specs have made a great leap over the last decade, and I sincerely hope we'll see them becoming mainstream at some point. But as usual, the mainstream is very conservative, probably for valid reasons.)
The specification is hard enough to digest for average developers now; I'd hate to point one of my guys at a machine-readable document when he's having trouble with some corner-case.
On 22 March 2013 12:33, Andreas Rossberg <rossberg at google.com> wrote:
Aren't you confusing "machine readable specification" with "machine readable syntax specification"? Syntax is only a small part of a language spec, and quite frankly, the least interesting one by far. Also, I don't see the benefit of your format. The EcmaScript spec actually goes to quite some length to give a grammar that is LR(1). The whole point is to keep it de facto "machine readable". But unlike yours, it also is human readable.
That was the reason to discuss, I think it could be applied to more than just syntax rules. You don't like it and that's fine with me I just wanted to post my ideas.
On 22 March 2013 12:36, Wes Garland <wes at page.ca> wrote:
The specification is hard enough to digest for average developers now; I'd hate to point one of my guys at a machine-readable document when he's having trouble with some corner-case.
You wouldn't do that, you'd point them to the human readable form. The machine version is specifically designed to be read by machines with low overhead to update functionality without touching code.
Is there a formalized way to translate between the human and machine-readable specifications? If they are found to have different meanings (not hard to imagine), which specification would be considered authoritative?
The human readable form would be generated from the machine readable form. The human form should be a descriptive representation of the actual rules defined by the machine readable form. There should be no confusion only human error in understanding the descriptive text. The machine readable form should be authoritative since the idea is to implement whatever code based on the rules from the machine readable specification.
To translate what I wrote in the initial mail into human form would be something like:
Function Statement can follow Nothing and then expects Function Statement Identifier to follow. and Function Expression cannot follow Nothing.
On 22 March 2013 13:37, gaz Heyes <gazheyes at gmail.com> wrote:
On 22 March 2013 12:33, Andreas Rossberg <rossberg at google.com> wrote:
Aren't you confusing "machine readable specification" with "machine readable syntax specification"? Syntax is only a small part of a language spec, and quite frankly, the least interesting one by far. Also, I don't see the benefit of your format. The EcmaScript spec actually goes to quite some length to give a grammar that is LR(1). The whole point is to keep it de facto "machine readable". But unlike yours, it also is human readable.
That was the reason to discuss, I think it could be applied to more than just syntax rules. You don't like it and that's fine with me I just wanted to post my ideas.
You may be underestimating the challenge ;). If you are aiming for more than just syntax then you certainly wouldn't want a low-level encoding like the one you suggest. The standard approach to formalising a language is through a structured operational semantics over an abstract syntax, and the standard approach to making that accessible to a machine is encoding it in an established theorem prover or language like Coq or Isabelle or Agda or Twelf.
Andreas Rossberg wrote:
FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor.
Could you give (or point me to) details of these projects?
I've done some work along these lines (though I'm not sure it qualifies as "serious"), so I'm wondering if I'm just reinventing the work of others.
The two projects Andreas is referring to are:
S5 and LambdaJS, by Arjun Guha and others: www.jswebtools.org Another, unnamed semantics, by Ankur Taly and others: jssec.net
S5, which covers (almost) all of ES5, is still under development, and passes a large portion of Test262.
[+arjun, +joe, +ankur, +philippa, +gareth]
On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu>wrote:
On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:
Andreas Rossberg wrote:
FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor.
Could you give (or point me to) details of these projects?
I've done some work along these lines (though I'm not sure it qualifies as "serious"), so I'm wondering if I'm just reinventing the work of others.
The two projects Andreas is referring to are:
S5 and LambdaJS, by Arjun Guha and others: www.jswebtools.org Another, unnamed semantics, by Ankur Taly and others: jssec.net
See also static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37199.pdffor a description of Ankur's semantics for SES-on-ES5.
Both semantics have been used as the basis for further work by other groups, such as the JSCert group you link to above. (Btw, I am in London right now in order to meet with them this Monday and Tuesday. I'll also be doing a presentation on Dr. SES at Imperial on Monday.)
On Sat, Mar 23, 2013 at 3:36 PM, Mark S. Miller <erights at google.com> wrote:
[+arjun, +joe, +ankur, +philippa, +gareth]
On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu>wrote:
On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:
Andreas Rossberg wrote:
FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor.
Could you give (or point me to) details of these projects?
I've done some work along these lines (though I'm not sure it qualifies as "serious"), so I'm wondering if I'm just reinventing the work of others.
The two projects Andreas is referring to are:
S5 and LambdaJS, by Arjun Guha and others: www.jswebtools.org Another, unnamed semantics, by Ankur Taly and others: jssec.net
See also static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37199.pdffor a description of Ankur's semantics for SES-on-ES5.
Both semantics have been used as the basis for further work by other groups, such as the JSCert group you link to above.
My mistake. You didn't link directly to JSCert, but rather the page you did link to does. JSCert is at jscert.org
I'd like to discuss a radical change on how JS specifications and others are constructed. I suggest int based rules are used to define language behaviour. I know this works with last state and next state tracking and expected states too maybe other behaviour could be defined this way as well. The first part of the specification should define a list of ints that correspond to the state starting from 0 with 0 being a start state of Nothing. The initial states should be specified as follows:
Once every state is defined as an int, each state can be used in a lookup table to determine the allowed last state and/or expected states. The lookup table is used for error messages by the parser to convert the corresponding int to human readable form. To define that FunctionStatement can follow Nothing but FunctionExpression can't follow nothing and FunctionStatementIdentifier is expected after FunctionStatement.
"lastStates" and "expectedStates" would also be ints but I added the text for clarity, true could be shortened to 1 for compression. To use these rules the parser can simply check the state machine which was the last state and if the next state is valid. E.g.
The same technique could be used for expected states since you'd just need to lookup the next state with the assigned expected state from the last state. Once a machine readable specification has been done we could then generate a human readable form of it based on the rules and it could be checked that it conforms to what was intended. Both specifications (human readable and machine readable) could be used by the implementer, when a specification changes the parsers could automatically update based on the machine readable form. Emulation for older parsers could be added within the browser itself by using the new specification rules in a parser shim.