Machine readable specifications

# gaz Heyes (12 years ago)

I'd like to discuss a radical change on how JS specifications and others are constructed. I suggest int based rules are used to define language behaviour. I know this works with last state and next state tracking and expected states too maybe other behaviour could be defined this way as well. The first part of the specification should define a list of ints that correspond to the state starting from 0 with 0 being a start state of Nothing. The initial states should be specified as follows:

initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement", 3:"FunctionStatementIdentifier" ...

Once every state is defined as an int, each state can be used in a lookup table to determine the allowed last state and/or expected states. The lookup table is used for error messages by the parser to convert the corresponding int to human readable form. To define that FunctionStatement can follow Nothing but FunctionExpression can't follow nothing and FunctionStatementIdentifier is expected after FunctionStatement.

rules = {
    //FunctionStatement State
    2: {
        lastStates: {
            //FunctionStatement is allowed after last state "Nothing"
            0: true
        },
        expectedStates: {
          //FunctionStatement expects the next state to be FunctionStatementIdentifier
          3:true
        }
    }
};

lastState = 0;//Last state was Nothing
state = 2;//FunctionStatement state
if(rules[state].lastStates[lastState]) {
  // FunctionStatement is allowed to follow Nothing
}

lastState = 0;//Last state was Nothing
state = 1;//FunctionExpression state
if(!rules[state].lastStates[lastState]) {
   //FunctionExpression is not allowed to follow Nothing
   error();
}

The same technique could be used for expected states since you'd just need to lookup the next state with the assigned expected state from the last state. Once a machine readable specification has been done we could then generate a human readable form of it based on the rules and it could be checked that it conforms to what was intended. Both specifications (human readable and machine readable) could be used by the implementer, when a specification changes the parsers could automatically update based on the machine readable form. Emulation for older parsers could be added within the browser itself by using the new specification rules in a parser shim.

Hi all

I'd like to discuss a radical change on how JS specifications and others
are constructed. I suggest int based rules are used to define language
behaviour. I know this works with last state and next state tracking and
expected states too maybe other behaviour could be defined this way as
well. The first part of the specification should define a list of ints that
correspond to the state starting from 0 with 0 being a start state of
Nothing. The initial states should be specified as follows:

initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement",
3:"FunctionStatementIdentifier" ...

Once every state is defined as an int, each state can be used in a lookup
table to determine the allowed last state and/or expected states. The
lookup table is used for error messages by the parser to convert the
corresponding int to human readable form. To define that FunctionStatement
can follow Nothing but FunctionExpression can't follow nothing and
FunctionStatementIdentifier is expected after FunctionStatement.

rules = {
//FunctionStatement State
2: {
lastStates: {
//FunctionStatement is allowed
after last state "Nothing"
0: true
},
expectedStates: {
//FunctionStatement expects the
next state to be FunctionStatementIdentifier
3:true
}
}
};

"lastStates" and "expectedStates" would also be ints but I added the text
for clarity, true could be shortened to 1 for compression. To use these
rules the parser can simply check the state machine which was the last
state and if the next state is valid. E.g.
lastState = 0;//Last state was Nothing
state = 2;//FunctionStatement state
if(rules[state].lastStates[lastState]) {
// FunctionStatement is allowed to follow Nothing
}

lastState = 0;//Last state was Nothing
state = 1;//FunctionExpression state
if(!rules[state].lastStates[lastState]) {
//FunctionExpression is not allowed to follow Nothing
error();
}

The same technique could be used for expected states since you'd just need
to lookup the next state with the assigned expected state from the last
state.
Once a machine readable specification has been done we could then generate
a human readable form of it based on the rules and it could be checked that
it conforms to what was intended. Both specifications (human readable and
machine readable) could be used by the implementer, when a specification
changes the parsers could automatically update based on the machine
readable form. Emulation for older parsers could be added within the
browser itself by using the new specification rules in a parser shim.

Thoughts?

Cheers

Gareth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130322/696a7005/attachment-0001.html>

# Alex Russell (12 years ago)

I expect that what you'll hear from implementers is that parsing isn't the hard bit of a modern JS engine -- it's certainly not the thorniest part of Traceur, and it doesn't do most of the work a JIT-ing engine would.

If you would like to concretely improve the situation, you might ask Allen and folks from engine teams what would simplify their jobs before proposing a solution.

I expect that what you'll hear from implementers is that parsing isn't the
hard bit of a modern JS engine -- it's certainly not the thorniest part of
Traceur, and it doesn't do _most_ of the work a JIT-ing engine would.

If you would like to concretely improve the situation, you might ask Allen
and folks from engine teams what would simplify their jobs before proposing
a solution.

On Friday, March 22, 2013, gaz Heyes wrote:

> Hi all
>
> I'd like to discuss a radical change on how JS specifications and others
> are constructed. I suggest int based rules are used to define language
> behaviour. I know this works with last state and next state tracking and
> expected states too maybe other behaviour could be defined this way as
> well. The first part of the specification should define a list of ints that
> correspond to the state starting from 0 with 0 being a start state of
> Nothing. The initial states should be specified as follows:
>
> initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement",
> 3:"FunctionStatementIdentifier" ...
>
> Once every state is defined as an int, each state can be used in a lookup
> table to determine the allowed last state and/or expected states. The
> lookup table is used for error messages by the parser to convert the
> corresponding int to human readable form. To define that FunctionStatement
> can follow Nothing but FunctionExpression can't follow nothing and
> FunctionStatementIdentifier is expected after FunctionStatement.
>
> rules = {
>         //FunctionStatement State
>          2:    {
>                    lastStates: {
>                                           //FunctionStatement is allowed
> after last state "Nothing"
>                                           0: true
>                                     },
>                    expectedStates: {
>                                           //FunctionStatement expects the
> next state to be FunctionStatementIdentifier
>                                           3:true
>                                    }
>                 }
> };
>
> "lastStates" and "expectedStates" would also be ints but I added the text
> for clarity, true could be shortened to 1 for compression. To use these
> rules the parser can simply check the state machine which was the last
> state and if the next state is valid. E.g.
> lastState = 0;//Last state was Nothing
> state = 2;//FunctionStatement state
> if(rules[state].lastStates[lastState]) {
>   // FunctionStatement is allowed to follow Nothing
> }
>
> lastState = 0;//Last state was Nothing
> state = 1;//FunctionExpression state
> if(!rules[state].lastStates[lastState]) {
>    //FunctionExpression is not allowed to follow Nothing
>    error();
> }
>
> The same technique could be used for expected states since you'd just need
> to lookup the next state with the assigned expected state from the last
> state.
> Once a machine readable specification has been done we could then generate
> a human readable form of it based on the rules and it could be checked that
> it conforms to what was intended. Both specifications (human readable and
> machine readable) could be used by the implementer, when a specification
> changes the parsers could automatically update based on the machine
> readable form. Emulation for older parsers could be added within the
> browser itself by using the new specification rules in a parser shim.
>
> Thoughts?
>
> Cheers
>
> Gareth
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130322/d3ff83e1/attachment.html>

# Andreas Rossberg (12 years ago)

Aren't you confusing "machine readable specification" with "machine readable syntax specification"? Syntax is only a small part of a language spec, and quite frankly, the least interesting one by far. Also, I don't see the benefit of your format. The EcmaScript spec actually goes to quite some length to give a grammar that is LR(1). The whole point is to keep it de facto "machine readable". But unlike yours, it also is human readable.

(FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor. Techniques for formalised language specs have made a great leap over the last decade, and I sincerely hope we'll see them becoming mainstream at some point. But as usual, the mainstream is very conservative, probably for valid reasons.)

On 22 March 2013 12:51, gaz Heyes <gazheyes at gmail.com> wrote:
> I'd like to discuss a radical change on how JS specifications and others are
> constructed. I suggest int based rules are used to define language
> behaviour. I know this works with last state and next state tracking and
> expected states too maybe other behaviour could be defined this way as well.
> The first part of the specification should define a list of ints that
> correspond to the state starting from 0 with 0 being a start state of
> Nothing. The initial states should be specified as follows:
>
> initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement",
> 3:"FunctionStatementIdentifier" ...

Aren't you confusing "machine readable specification" with "machine
readable syntax specification"? Syntax is only a small part of a
language spec, and quite frankly, the least interesting one by far.
Also, I don't see the benefit of your format. The EcmaScript spec
actually goes to quite some length to give a grammar that is LR(1).
The whole point is to keep it de facto "machine readable". But unlike
yours, it also is human readable.

(FWIW, there are at least two serious efforts of actually formalising
and mechanising the JavaScript language, i.e. its _semantics_, which
is a much more interesting endeavor. Techniques for formalised
language specs have made a great leap over the last decade, and I
sincerely hope we'll see them becoming mainstream at some point. But
as usual, the mainstream is very conservative, probably for valid
reasons.)

/Andreas

# Wes Garland (12 years ago)

The specification is hard enough to digest for average developers now; I'd hate to point one of my guys at a machine-readable document when he's having trouble with some corner-case.

The specification is hard enough to digest for average developers now;
I'd hate to point one of my guys at a machine-readable document when
he's having trouble with some corner-case.

# gaz Heyes (12 years ago)

On 22 March 2013 12:33, Andreas Rossberg <rossberg at google.com> wrote:

Aren't you confusing "machine readable specification" with "machine readable syntax specification"? Syntax is only a small part of a language spec, and quite frankly, the least interesting one by far. Also, I don't see the benefit of your format. The EcmaScript spec actually goes to quite some length to give a grammar that is LR(1). The whole point is to keep it de facto "machine readable". But unlike yours, it also is human readable.

That was the reason to discuss, I think it could be applied to more than just syntax rules. You don't like it and that's fine with me I just wanted to post my ideas.

On 22 March 2013 12:33, Andreas Rossberg <rossberg at google.com> wrote:

> Aren't you confusing "machine readable specification" with "machine
> readable syntax specification"? Syntax is only a small part of a
> language spec, and quite frankly, the least interesting one by far.
> Also, I don't see the benefit of your format. The EcmaScript spec
> actually goes to quite some length to give a grammar that is LR(1).
> The whole point is to keep it de facto "machine readable". But unlike
> yours, it also is human readable.
>

That was the reason to discuss, I think it could be applied to more than
just syntax rules. You don't like it and that's fine with me I just wanted
to post my ideas.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130322/f5b7afbe/attachment.html>

# gaz Heyes (12 years ago)

On 22 March 2013 12:36, Wes Garland <wes at page.ca> wrote:

The specification is hard enough to digest for average developers now; I'd hate to point one of my guys at a machine-readable document when he's having trouble with some corner-case.

You wouldn't do that, you'd point them to the human readable form. The machine version is specifically designed to be read by machines with low overhead to update functionality without touching code.

On 22 March 2013 12:36, Wes Garland <wes at page.ca> wrote:

> The specification is hard enough to digest for average developers now;
> I'd hate to point one of my guys at a machine-readable document when
> he's having trouble with some corner-case.
>

You wouldn't do that, you'd point them to the human readable form. The
machine version is specifically designed to be read by machines with low
overhead to update functionality without touching code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130322/3c806492/attachment.html>

# Wes Garland (12 years ago)

Is there a formalized way to translate between the human and machine-readable specifications? If they are found to have different meanings (not hard to imagine), which specification would be considered authoritative?

On 22 March 2013 08:41, gaz Heyes <gazheyes at gmail.com> wrote:
> You wouldn't do that, you'd point them to the human readable form. The
> machine version is specifically designed to be read by machines with low
> overhead to update functionality without touching code.

Is there a formalized way to translate between the human and
machine-readable specifications?  If they are found to have different
meanings (not hard to imagine), which specification would be
considered authoritative?

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

# gaz Heyes (12 years ago)

The human readable form would be generated from the machine readable form. The human form should be a descriptive representation of the actual rules defined by the machine readable form. There should be no confusion only human error in understanding the descriptive text. The machine readable form should be authoritative since the idea is to implement whatever code based on the rules from the machine readable specification.

To translate what I wrote in the initial mail into human form would be something like:

Function Statement can follow Nothing and then expects Function Statement Identifier to follow. and Function Expression cannot follow Nothing.

On 22 March 2013 12:57, Wes Garland <wes at page.ca> wrote:

> Is there a formalized way to translate between the human and
> machine-readable specifications?  If they are found to have different
> meanings (not hard to imagine), which specification would be
> considered authoritative?

The human readable form would be generated from the machine readable form.
The human form should be a descriptive representation of the actual rules
defined by the machine readable form. There should be no confusion only
human error in understanding the descriptive text. The machine readable
form should be authoritative since the idea is to implement whatever code
based on the rules from the machine readable specification.

To translate what I wrote in the initial mail into human form would be
something like:

Function Statement can follow Nothing and then expects Function Statement
Identifier to follow.
and
Function Expression cannot follow Nothing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130322/f10b368e/attachment.html>

# Andreas Rossberg (12 years ago)

On 22 March 2013 13:37, gaz Heyes <gazheyes at gmail.com> wrote:

On 22 March 2013 12:33, Andreas Rossberg <rossberg at google.com> wrote:

Aren't you confusing "machine readable specification" with "machine readable syntax specification"? Syntax is only a small part of a language spec, and quite frankly, the least interesting one by far. Also, I don't see the benefit of your format. The EcmaScript spec actually goes to quite some length to give a grammar that is LR(1). The whole point is to keep it de facto "machine readable". But unlike yours, it also is human readable.

That was the reason to discuss, I think it could be applied to more than just syntax rules. You don't like it and that's fine with me I just wanted to post my ideas.

You may be underestimating the challenge ;). If you are aiming for more than just syntax then you certainly wouldn't want a low-level encoding like the one you suggest. The standard approach to formalising a language is through a structured operational semantics over an abstract syntax, and the standard approach to making that accessible to a machine is encoding it in an established theorem prover or language like Coq or Isabelle or Agda or Twelf.

On 22 March 2013 13:37, gaz Heyes <gazheyes at gmail.com> wrote:
> On 22 March 2013 12:33, Andreas Rossberg <rossberg at google.com> wrote:
>>
>> Aren't you confusing "machine readable specification" with "machine
>> readable syntax specification"? Syntax is only a small part of a
>> language spec, and quite frankly, the least interesting one by far.
>> Also, I don't see the benefit of your format. The EcmaScript spec
>> actually goes to quite some length to give a grammar that is LR(1).
>> The whole point is to keep it de facto "machine readable". But unlike
>> yours, it also is human readable.
>
> That was the reason to discuss, I think it could be applied to more than
> just syntax rules. You don't like it and that's fine with me I just wanted
> to post my ideas.

You may be underestimating the challenge ;). If you are aiming for
more than just syntax then you certainly wouldn't want a low-level
encoding like the one you suggest. The standard approach to
formalising a language is through a structured operational semantics
over an abstract syntax, and the standard approach to making that
accessible to a machine is encoding it in an established theorem
prover or language like Coq or Isabelle or Agda or Twelf.

/Andreas

# Michael Dyck (12 years ago)

Andreas Rossberg wrote:

FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor.

Could you give (or point me to) details of these projects?

I've done some work along these lines (though I'm not sure it qualifies as "serious"), so I'm wondering if I'm just reinventing the work of others.

Andreas Rossberg wrote:
> 
> FWIW, there are at least two serious efforts of actually formalising
> and mechanising the JavaScript language, i.e. its _semantics_, which
> is a much more interesting endeavor.

Could you give (or point me to) details of these projects?

I've done some work along these lines (though I'm not sure it qualifies
as "serious"), so I'm wondering if I'm just reinventing the work of
others.

-Michael

# Sam Tobin-Hochstadt (12 years ago)

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: www.jswebtools.org Another, unnamed semantics, by Ankur Taly and others: jssec.net

S5, which covers (almost) all of ES5, is still under development, and passes a large portion of Test262.

On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:
> Andreas Rossberg wrote:
>>
>>
>> FWIW, there are at least two serious efforts of actually formalising
>> and mechanising the JavaScript language, i.e. its _semantics_, which
>> is a much more interesting endeavor.
>
>
> Could you give (or point me to) details of these projects?
>
> I've done some work along these lines (though I'm not sure it qualifies
> as "serious"), so I'm wondering if I'm just reinventing the work of
> others.

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: http://www.jswebtools.org/
Another, unnamed semantics, by Ankur Taly and others: http://jssec.net/

S5, which covers (almost) all of ES5, is still under development, and
passes a large portion of Test262.

Sam

# Mark S. Miller (12 years ago)

[+arjun, +joe, +ankur, +philippa, +gareth]

On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu>wrote:

On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:

Andreas Rossberg wrote:

FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor.

Could you give (or point me to) details of these projects?

I've done some work along these lines (though I'm not sure it qualifies as "serious"), so I'm wondering if I'm just reinventing the work of others.

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: www.jswebtools.org Another, unnamed semantics, by Ankur Taly and others: jssec.net

See also static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37199.pdffor a description of Ankur's semantics for SES-on-ES5.

Both semantics have been used as the basis for further work by other groups, such as the JSCert group you link to above. (Btw, I am in London right now in order to meet with them this Monday and Tuesday. I'll also be doing a presentation on Dr. SES at Imperial on Monday.)

[+arjun, +joe, +ankur, +philippa, +gareth]


On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu>wrote:

> On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:
> > Andreas Rossberg wrote:
> >>
> >>
> >> FWIW, there are at least two serious efforts of actually formalising
> >> and mechanising the JavaScript language, i.e. its _semantics_, which
> >> is a much more interesting endeavor.
> >
> >
> > Could you give (or point me to) details of these projects?
> >
> > I've done some work along these lines (though I'm not sure it qualifies
> > as "serious"), so I'm wondering if I'm just reinventing the work of
> > others.
>
> The two projects Andreas is referring to are:
>
> S5 and LambdaJS, by Arjun Guha and others: http://www.jswebtools.org/
> Another, unnamed semantics, by Ankur Taly and others: http://jssec.net/


See also
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37199.pdffor
a description of Ankur's semantics for SES-on-ES5.

Both semantics have been used as the basis for further work by other
groups, such as the JSCert group you link to above. (Btw, I am in London
right now in order to meet with them this Monday and Tuesday. I'll also be
doing a presentation on Dr. SES at Imperial on Monday.)



>
>
> S5, which covers (almost) all of ES5, is still under development, and
> passes a large portion of Test262.
>
> Sam
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>



-- 
    Cheers,
    --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130323/b61309b8/attachment-0001.html>

# Mark S. Miller (12 years ago)

On Sat, Mar 23, 2013 at 3:36 PM, Mark S. Miller <erights at google.com> wrote:

[+arjun, +joe, +ankur, +philippa, +gareth]

On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu>wrote:

On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:

Andreas Rossberg wrote:

FWIW, there are at least two serious efforts of actually formalising and mechanising the JavaScript language, i.e. its semantics, which is a much more interesting endeavor.

Could you give (or point me to) details of these projects?

I've done some work along these lines (though I'm not sure it qualifies as "serious"), so I'm wondering if I'm just reinventing the work of others.

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: www.jswebtools.org Another, unnamed semantics, by Ankur Taly and others: jssec.net

See also static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37199.pdffor a description of Ankur's semantics for SES-on-ES5.

Both semantics have been used as the basis for further work by other groups, such as the JSCert group you link to above.

My mistake. You didn't link directly to JSCert, but rather the page you did link to does. JSCert is at jscert.org

On Sat, Mar 23, 2013 at 3:36 PM, Mark S. Miller <erights at google.com> wrote:

> [+arjun, +joe, +ankur, +philippa, +gareth]
>
>
> On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu>wrote:
>
>> On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:
>> > Andreas Rossberg wrote:
>> >>
>> >>
>> >> FWIW, there are at least two serious efforts of actually formalising
>> >> and mechanising the JavaScript language, i.e. its _semantics_, which
>> >> is a much more interesting endeavor.
>> >
>> >
>> > Could you give (or point me to) details of these projects?
>> >
>> > I've done some work along these lines (though I'm not sure it qualifies
>> > as "serious"), so I'm wondering if I'm just reinventing the work of
>> > others.
>>
>> The two projects Andreas is referring to are:
>>
>> S5 and LambdaJS, by Arjun Guha and others: http://www.jswebtools.org/
>> Another, unnamed semantics, by Ankur Taly and others: http://jssec.net/
>
>
> See also
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37199.pdffor a description of Ankur's semantics for SES-on-ES5.
>
> Both semantics have been used as the basis for further work by other
> groups, such as the JSCert group you link to above.
>

My mistake. You didn't link directly to JSCert, but rather the page you did
link to does. JSCert is at http://jscert.org/



> (Btw, I am in London right now in order to meet with them this Monday and
> Tuesday. I'll also be doing a presentation on Dr. SES at Imperial on
> Monday.)
>
>
>
>>
>>
>> S5, which covers (almost) all of ES5, is still under development, and
>> passes a large portion of Test262.
>>
>> Sam
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
>
> --
>     Cheers,
>     --MarkM
>



-- 
    Cheers,
    --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130323/f0edbdc7/attachment.html>