A new proposal for syntax-checking and sandbox: ECMAScript Parser proposal

# Jack Works (6 years ago)

Just like DOMParser mdn.io/DOMParser in HTML and Houdini's parser

API in CSS WICG/CSS-Parser-API/blob/master/README.md,

a built-in parser for ECMAScript itself is quite useful in many ways.

Check out Jack-Works/proposal-ecmascript-parser for details (and also, finding champions!)

Just like DOMParser <http://mdn.io/DOMParser> in HTML and Houdini's parser
API in CSS <https://github.com/WICG/CSS-Parser-API/blob/master/README.md>,
a built-in parser for ECMAScript itself is quite useful in many ways.

Check out https://github.com/Jack-Works/proposal-ecmascript-parser for
details (and also, finding champions!)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190914/159809c7/attachment.html>

# David Teller (6 years ago)

Out of curiosity, what is the expected benefit wrt Esprima, Babel or Shift? In particular since there is no standard AST for ECMAScript yet [1]?

Cheers, David

[1] Ok, that's a subset of tc39/proposal-binary-ast, which is in the pipes.

Out of curiosity, what is the expected benefit wrt Esprima, Babel or
Shift? In particular since there is no standard AST for ECMAScript yet [1]?

Cheers,
 David

[1] Ok, that's a subset of https://github.com/tc39/proposal-binary-ast,
which is in the pipes.

On 14/09/2019 07:46, Jack Works wrote:
> Just like DOMParser <http://mdn.io/DOMParser> in HTML and Houdini's
> parser API in CSS
> <https://github.com/WICG/CSS-Parser-API/blob/master/README.md>, a
> built-in parser for ECMAScript itself is quite useful in many ways.
> 
> Check out https://github.com/Jack-Works/proposal-ecmascript-parser for
> details (and also, finding champions!)
> 
> 
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Jack Works (6 years ago)

This proposal is not a part of the binary AST proposal. Because that proposal wants a binary representation and will not generate AST directly from the ecmascript spec. Because run those parsers in browser is pretty slow. Since the JS engine can already parse the JavaScript code, just expose those interfaces will make things easier.

Out of curiosity, what is the expected benefit wrt Esprima, Babel or

This proposal is not a part of the binary AST proposal. Because that
proposal wants a binary representation and will not generate AST directly
from the ecmascript spec.
Because run those parsers in browser is pretty slow. Since the JS engine
can already parse the JavaScript code, just expose those interfaces will
make things easier.


Out of curiosity, what is the expected benefit wrt Esprima, Babel or
> Shift? In particular since there is no standard AST for ECMAScript yet [1]?
>
> Cheers,
>  David
>
> [1] Ok, that's a subset of https://github.com/tc39/proposal-binary-ast,
> which is in the pipes.
>
> On 14/09/2019 07:46, Jack Works wrote:
> > Just like DOMParser <http://mdn.io/DOMParser> in HTML and Houdini's
> > parser API in CSS
> > <https://github.com/WICG/CSS-Parser-API/blob/master/README.md>, a
> > built-in parser for ECMAScript itself is quite useful in many ways.
> >
> > Check out https://github.com/Jack-Works/proposal-ecmascript-parser for
> > details (and also, finding champions!)
> >
> >
> >
> > _______________________________________________
> > es-discuss mailing list
> > es-discuss at mozilla.org
> > https://mail.mozilla.org/listinfo/es-discuss
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190914/8328d606/attachment-0001.html>

# Gareth Heyes (6 years ago)

I had a few goes with making a JS sandbox. I also created a safe DOM environment that allowed safe manipulation of innerHTML etc

JS sandbox with regular expressions www.businessinfo.co.uk/labs/jsreg/jsreg.html

JS sandbox and safe DOM environment businessinfo.co.uk/labs/MentalJS/MentalJS.html

It would be great to have a parser in JS!

I had a few goes with making a JS sandbox. I also created a safe DOM environment that allowed safe manipulation of innerHTML etc

JS sandbox with regular expressions
http://www.businessinfo.co.uk/labs/jsreg/jsreg.html

JS sandbox and safe DOM environment
http://businessinfo.co.uk/labs/MentalJS/MentalJS.html

It would be great to have a parser in JS!

> On 14 Sep 2019, at 06:46, Jack Works <zjwpeter at gmail.com> wrote:
> 
> Just like DOMParser in HTML and Houdini's parser API in CSS, a built-in parser for ECMAScript itself is quite useful in many ways.
> 
> Check out https://github.com/Jack-Works/proposal-ecmascript-parser for details (and also, finding champions!)
> 
> 
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190914/4e19d223/attachment.html>

# Isiah Meadows (6 years ago)

I do want to note a couple things here, as someone familiar with the implementation aspect of JS and programming languages in general:

The HTML and CSS parsers (for inline style sheets) have to build a full DOM trees for each anyways just to conform to spec, so they can't just, say, parse .foo { display: block; color: red; } as .foo { display: block; } .foo { color: red } with a cached selector (which would be easier to process later on). In this case, they're basically just exposing the same parsers they'd have to use in practice anyways, so it's literally trivial for them to add.
No JS engine parses nodes the way the spec processes them, just in a way it's unobservable mod timings. They internally parse 1 and 1.0 as different types, and they will do things like constant propagation - 3 * 5 gets parsed as 15 usually, and "a" + "b" will usually get read as "ab" by some engines. Furthermore, browser engines lazily parse functions where they can, only validating them for early errors and storing the source code to reparse them on first call, because it helps them start up faster with less memory. And of course, typeof value === "string" is often not simply compiled to %IsString(value) but literally parsed as such if value is defined in that scope. And finally, engines typically merge the steps of AST generation and scope detection, not only to detect let/const errors but also to speed up bytecode generation.

So although it sounds like JS engines could reuse their logic, they really couldn't. This is further evidenced by SpiderMonkey's parser API (the predecessor to the ESTree spec) not sharing the same implementation as the core language parser. There's two vastly different concerns between generating an AST for tooling and generating an AST to execute. In the former, you want as much info as possible readily available. In the latter, you just want to have the bare minimum to compile to bytecode with relevant source locations for stack traces, and anything else is literally just unnecessary overhead.

Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com

I do want to note a couple things here, as someone familiar with the
implementation aspect of JS and programming languages in general:

1. The HTML and CSS parsers (for inline style sheets) have to build a
full DOM trees for each anyways just to conform to spec, so they can't
just, say, parse `.foo { display: block; color: red; }` as `.foo {
display: block; } .foo { color: red }` with a cached selector (which
*would* be easier to process later on). In this case, they're
basically just exposing the same parsers they'd have to use in
practice anyways, so it's literally trivial for them to add.
2. No JS engine parses nodes the way the spec processes them, just in
a way it's unobservable mod timings. They internally parse `1` and
`1.0` as different types, and they will do things like constant
propagation - `3 * 5` gets parsed as `15` usually, and `"a" + "b"`
will usually get read as `"ab"` by some engines. Furthermore, browser
engines lazily parse functions where they can, only validating them
for early errors and storing the source code to reparse them on first
call, because it helps them start up faster with less memory. And of
course, `typeof value === "string"` is often not simply compiled to
`%IsString(value)` but literally parsed as such if `value` is defined
in that scope. And finally, engines typically merge the steps of AST
generation and scope detection, not only to detect `let`/`const`
errors but also to speed up bytecode generation.

So although it sounds like JS engines could reuse their logic, they
really couldn't. This is further evidenced by SpiderMonkey's parser
API (the predecessor to the ESTree spec) not sharing the same
implementation as the core language parser. There's two vastly
different concerns between generating an AST for tooling and
generating an AST to execute. In the former, you want as much info as
possible readily available. In the latter, you just want to have the
bare minimum to compile to bytecode with relevant source locations for
stack traces, and anything else is literally just unnecessary
overhead.

-----

Isiah Meadows
contact at isiahmeadows.com
www.isiahmeadows.com

On Sat, Sep 14, 2019 at 9:41 AM Gareth Heyes
<gareth.heyes at portswigger.net> wrote:
>
> I had a few goes with making a JS sandbox. I also created a safe DOM environment that allowed safe manipulation of innerHTML etc
>
> JS sandbox with regular expressions
> http://www.businessinfo.co.uk/labs/jsreg/jsreg.html
>
> JS sandbox and safe DOM environment
> http://businessinfo.co.uk/labs/MentalJS/MentalJS.html
>
> It would be great to have a parser in JS!
>
> On 14 Sep 2019, at 06:46, Jack Works <zjwpeter at gmail.com> wrote:
>
> Just like DOMParser in HTML and Houdini's parser API in CSS, a built-in parser for ECMAScript itself is quite useful in many ways.
>
> Check out https://github.com/Jack-Works/proposal-ecmascript-parser for details (and also, finding champions!)
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# David Teller (6 years ago)

Before you can have a standard parser, you need a standard AST. There is no such thing as the moment, so the v8 parser, the SpiderMonkey parser and the JSCore parser, etc. all use distinct internal ASTs, each of which changes every so often, either because the language changes or because the VM needs to attach different information to help with compilation.

That's the main reason for which there hasn't been a standard user-accessible ECMAScript parser in ECMAScript.

As Binary AST relies upon having a standard AST, standandardizing the AST is part of the Binary AST proposal. You may find the latest version of this AST online binast/binjs-ref/blob/master/spec/es6.webidl

Before you can have a standard parser, you need a standard AST. There is
no such thing as the moment, so the v8 parser, the SpiderMonkey parser
and the JSCore parser, etc. all use distinct internal ASTs, each of
which changes every so often, either because the language changes or
because the VM needs to attach different information to help with
compilation.

That's the main reason for which there hasn't been a standard
user-accessible ECMAScript parser in ECMAScript.

As Binary AST relies upon having a standard AST, standandardizing the
AST is part of the Binary AST proposal. You may find the latest version
of this AST online
https://github.com/binast/binjs-ref/blob/master/spec/es6.webidl

Cheers,
 David

On 14/09/2019 10:10, Jack Works wrote:
> This proposal is not a part of the binary AST proposal. Because that
> proposal wants a binary representation and will not generate AST
> directly from the ecmascript spec.
> Because run those parsers in browser is pretty slow. Since the JS engine
> can already parse the JavaScript code, just expose those interfaces will
> make things easier.
> 
> 
>     Out of curiosity, what is the expected benefit wrt Esprima, Babel or
>     Shift? In particular since there is no standard AST for ECMAScript
>     yet [1]?
> 
>     Cheers,
>      David
> 
>     [1] Ok, that's a subset of https://github.com/tc39/proposal-binary-ast,
>     which is in the pipes.
>

# Jack Works (6 years ago)

Happy to see standard ast in binary ast proposal.

For compiler, it can have a "slow" mode when parsing with this parser API and still use fast code generation in other cases. But unfortunately it seems there are much more work than I think to provide such an API.

David Teller <dteller at mozilla.com> 于 2019年9月15日周日下午7:02写道：

Happy to see standard ast in binary ast proposal.

For compiler, it can have a "slow" mode when parsing with this parser API
and still use fast code generation in other cases. But unfortunately it
seems there are much more work than I think to provide such an API.

David Teller <dteller at mozilla.com> 于 2019年9月15日周日 下午7:02写道：

> Before you can have a standard parser, you need a standard AST. There is
> no such thing as the moment, so the v8 parser, the SpiderMonkey parser
> and the JSCore parser, etc. all use distinct internal ASTs, each of
> which changes every so often, either because the language changes or
> because the VM needs to attach different information to help with
> compilation.
>
> That's the main reason for which there hasn't been a standard
> user-accessible ECMAScript parser in ECMAScript.
>
> As Binary AST relies upon having a standard AST, standandardizing the
> AST is part of the Binary AST proposal. You may find the latest version
> of this AST online
> https://github.com/binast/binjs-ref/blob/master/spec/es6.webidl
>
> Cheers,
>  David
>
> On 14/09/2019 10:10, Jack Works wrote:
> > This proposal is not a part of the binary AST proposal. Because that
> > proposal wants a binary representation and will not generate AST
> > directly from the ecmascript spec.
> > Because run those parsers in browser is pretty slow. Since the JS engine
> > can already parse the JavaScript code, just expose those interfaces will
> > make things easier.
> >
> >
> >     Out of curiosity, what is the expected benefit wrt Esprima, Babel or
> >     Shift? In particular since there is no standard AST for ECMAScript
> >     yet [1]?
> >
> >     Cheers,
> >      David
> >
> >     [1] Ok, that's a subset of
> https://github.com/tc39/proposal-binary-ast,
> >     which is in the pipes.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190915/276809e8/attachment.html>

# David Teller (6 years ago)

In theory, it should be possible to have both modes, if the parser is designed for it. Unfortunately, that's not the case at the moment.

Mozilla has recently started working on a new parser which could be used both by VMs and by JS/wasm devs. It might help towards this issue, but it's still early days.

In theory, it should be possible to have both modes, if the parser is
designed for it. Unfortunately, that's not the case at the moment.

Mozilla has recently started working on a new parser which could be used
both by VMs and by JS/wasm devs. It might help towards this issue, but
it's still early days.

Cheers,
 David

On 15/09/2019 13:09, Jack Works wrote:
> Happy to see standard ast in binary ast proposal.
> 
> For compiler, it can have a "slow" mode when parsing with this parser
> API and still use fast code generation in other cases. But unfortunately
> it seems there are much more work than I think to provide such an API.
>

# kai zhu (6 years ago)

adding datapoint on application in code-coverage.

a builtin parser-api would be ideal (and appreciate the insight on implementation difficulties). lacking that, the next best alternative i've found is acorn (based on esprima), available as a single, embedabble file runnable in browser:

curl https://registry.npmjs.org/acorn/-/acorn-6.3.0.tgz | tar -O -xz
package/dist/acorn.js > acorn.rollup.js

ls -l acorn.rollup.js
-rwxr-xr-x 1 root root 191715 Sep 15 16:49 acorn.rollup.js

i recently added es9 syntax-support to in-browser-variant of istanbul by replacing its aging esprima-parser with acorn [1]. ideally, i hope a standardized ast will be available someday, and get rid of acorn/babel/shift altogether (or maybe acorn can become that standard?). even better, is if [cross-compatible] instrumentation becomes a common bultin-feature in engines, and get rid of istanbul.

chrome/puppeteer's instrumentation-api is not yet ideal for my use-case because it currently lack code-coverage-info on branches (which istanbul-instrumentation provides).

[1] istanbul-lite - embeddable, es9 browser-variant of istanbul code-coverage kaizhu256.github.io/node-istanbul-lite/build..beta..travis-ci.org/app

adding datapoint on application in code-coverage.

a builtin parser-api would be ideal (and appreciate the insight on
implementation difficulties).
lacking that, the next best alternative i've found is acorn (based on
esprima),
available as a single, embedabble file runnable in browser:

```shell
curl https://registry.npmjs.org/acorn/-/acorn-6.3.0.tgz | tar -O -xz
package/dist/acorn.js > acorn.rollup.js
ls -l acorn.rollup.js
-rwxr-xr-x 1 root root 191715 Sep 15 16:49 acorn.rollup.js
```

i recently added es9 syntax-support to in-browser-variant of istanbul by
replacing its aging esprima-parser with acorn [1].
ideally, i hope a standardized ast will be available someday, and get rid
of acorn/babel/shift altogether (or maybe acorn can become that standard?).
even better, is if [cross-compatible] instrumentation becomes a common
bultin-feature in engines, and get rid of istanbul.

chrome/puppeteer's instrumentation-api is not yet ideal for my use-case
because it currently lack code-coverage-info on branches (which
istanbul-instrumentation provides).

[1] istanbul-lite - embeddable, es9 browser-variant of istanbul
code-coverage
https://kaizhu256.github.io/node-istanbul-lite/build..beta..travis-ci.org/app/

On Sun, Sep 15, 2019 at 9:08 AM David Teller <dteller at mozilla.com> wrote:

> In theory, it should be possible to have both modes, if the parser is
> designed for it. Unfortunately, that's not the case at the moment.
>
> Mozilla has recently started working on a new parser which could be used
> both by VMs and by JS/wasm devs. It might help towards this issue, but
> it's still early days.
>
> Cheers,
>  David
>
> On 15/09/2019 13:09, Jack Works wrote:
> > Happy to see standard ast in binary ast proposal.
> >
> > For compiler, it can have a "slow" mode when parsing with this parser
> > API and still use fast code generation in other cases. But unfortunately
> > it seems there are much more work than I think to provide such an API.
> >
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190916/b40b972b/attachment.html>

# Isiah Meadows (6 years ago)

Nit: Acorn's output is based on Esprima. Its code is not and hasn't been for a few years now. It started a fork of Esprima, but it wasn't long before it was rewritten the first time.

Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com

Nit: Acorn's *output* is based on Esprima. Its code is *not* and
hasn't been for a few years now. It started a fork of Esprima, but it
wasn't long before it was rewritten the first time.

-----

Isiah Meadows
contact at isiahmeadows.com
www.isiahmeadows.com

On Mon, Sep 16, 2019 at 1:58 AM kai zhu <kaizhu256 at gmail.com> wrote:
>
> adding datapoint on application in code-coverage.
>
> a builtin parser-api would be ideal (and appreciate the insight on implementation difficulties).
> lacking that, the next best alternative i've found is acorn (based on esprima),
> available as a single, embedabble file runnable in browser:
>
> ```shell
> curl https://registry.npmjs.org/acorn/-/acorn-6.3.0.tgz | tar -O -xz package/dist/acorn.js > acorn.rollup.js
> ls -l acorn.rollup.js
> -rwxr-xr-x 1 root root 191715 Sep 15 16:49 acorn.rollup.js
> ```
>
> i recently added es9 syntax-support to in-browser-variant of istanbul by replacing its aging esprima-parser with acorn [1].
> ideally, i hope a standardized ast will be available someday, and get rid of acorn/babel/shift altogether (or maybe acorn can become that standard?).
> even better, is if [cross-compatible] instrumentation becomes a common bultin-feature in engines, and get rid of istanbul.
>
> chrome/puppeteer's instrumentation-api is not yet ideal for my use-case because it currently lack code-coverage-info on branches (which istanbul-instrumentation provides).
>
> [1] istanbul-lite - embeddable, es9 browser-variant of istanbul code-coverage
> https://kaizhu256.github.io/node-istanbul-lite/build..beta..travis-ci.org/app/
>
>
>
> On Sun, Sep 15, 2019 at 9:08 AM David Teller <dteller at mozilla.com> wrote:
>>
>> In theory, it should be possible to have both modes, if the parser is
>> designed for it. Unfortunately, that's not the case at the moment.
>>
>> Mozilla has recently started working on a new parser which could be used
>> both by VMs and by JS/wasm devs. It might help towards this issue, but
>> it's still early days.
>>
>> Cheers,
>>  David
>>
>> On 15/09/2019 13:09, Jack Works wrote:
>> > Happy to see standard ast in binary ast proposal.
>> >
>> > For compiler, it can have a "slow" mode when parsing with this parser
>> > API and still use fast code generation in other cases. But unfortunately
>> > it seems there are much more work than I think to provide such an API.
>> >
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss