A new proposal for syntax-checking and sandbox: ECMAScript Parser proposal

# Jack Works (5 years ago)

Just like DOMParser mdn.io/DOMParser in HTML and Houdini's parser

API in CSS WICG/CSS-Parser-API/blob/master/README.md,

a built-in parser for ECMAScript itself is quite useful in many ways.

Check out Jack-Works/proposal-ecmascript-parser for details (and also, finding champions!)

# David Teller (5 years ago)

Out of curiosity, what is the expected benefit wrt Esprima, Babel or Shift? In particular since there is no standard AST for ECMAScript yet [1]?

Cheers, David

[1] Ok, that's a subset of tc39/proposal-binary-ast, which is in the pipes.

# Jack Works (5 years ago)

This proposal is not a part of the binary AST proposal. Because that proposal wants a binary representation and will not generate AST directly from the ecmascript spec. Because run those parsers in browser is pretty slow. Since the JS engine can already parse the JavaScript code, just expose those interfaces will make things easier.

Out of curiosity, what is the expected benefit wrt Esprima, Babel or

# Gareth Heyes (5 years ago)

I had a few goes with making a JS sandbox. I also created a safe DOM environment that allowed safe manipulation of innerHTML etc

JS sandbox with regular expressions www.businessinfo.co.uk/labs/jsreg/jsreg.html

JS sandbox and safe DOM environment businessinfo.co.uk/labs/MentalJS/MentalJS.html

It would be great to have a parser in JS!

# Isiah Meadows (5 years ago)

I do want to note a couple things here, as someone familiar with the implementation aspect of JS and programming languages in general:

  1. The HTML and CSS parsers (for inline style sheets) have to build a full DOM trees for each anyways just to conform to spec, so they can't just, say, parse .foo { display: block; color: red; } as .foo { display: block; } .foo { color: red } with a cached selector (which would be easier to process later on). In this case, they're basically just exposing the same parsers they'd have to use in practice anyways, so it's literally trivial for them to add.
  2. No JS engine parses nodes the way the spec processes them, just in a way it's unobservable mod timings. They internally parse 1 and 1.0 as different types, and they will do things like constant propagation - 3 * 5 gets parsed as 15 usually, and "a" + "b" will usually get read as "ab" by some engines. Furthermore, browser engines lazily parse functions where they can, only validating them for early errors and storing the source code to reparse them on first call, because it helps them start up faster with less memory. And of course, typeof value === "string" is often not simply compiled to %IsString(value) but literally parsed as such if value is defined in that scope. And finally, engines typically merge the steps of AST generation and scope detection, not only to detect let/const errors but also to speed up bytecode generation.

So although it sounds like JS engines could reuse their logic, they really couldn't. This is further evidenced by SpiderMonkey's parser API (the predecessor to the ESTree spec) not sharing the same implementation as the core language parser. There's two vastly different concerns between generating an AST for tooling and generating an AST to execute. In the former, you want as much info as possible readily available. In the latter, you just want to have the bare minimum to compile to bytecode with relevant source locations for stack traces, and anything else is literally just unnecessary overhead.


Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com

# David Teller (5 years ago)

Before you can have a standard parser, you need a standard AST. There is no such thing as the moment, so the v8 parser, the SpiderMonkey parser and the JSCore parser, etc. all use distinct internal ASTs, each of which changes every so often, either because the language changes or because the VM needs to attach different information to help with compilation.

That's the main reason for which there hasn't been a standard user-accessible ECMAScript parser in ECMAScript.

As Binary AST relies upon having a standard AST, standandardizing the AST is part of the Binary AST proposal. You may find the latest version of this AST online binast/binjs-ref/blob/master/spec/es6.webidl

# Jack Works (5 years ago)

Happy to see standard ast in binary ast proposal.

For compiler, it can have a "slow" mode when parsing with this parser API and still use fast code generation in other cases. But unfortunately it seems there are much more work than I think to provide such an API.

David Teller <dteller at mozilla.com> 于 2019年9月15日周日 下午7:02写道:

# David Teller (5 years ago)

In theory, it should be possible to have both modes, if the parser is designed for it. Unfortunately, that's not the case at the moment.

Mozilla has recently started working on a new parser which could be used both by VMs and by JS/wasm devs. It might help towards this issue, but it's still early days.

# kai zhu (5 years ago)

adding datapoint on application in code-coverage.

a builtin parser-api would be ideal (and appreciate the insight on implementation difficulties). lacking that, the next best alternative i've found is acorn (based on esprima), available as a single, embedabble file runnable in browser:

curl https://registry.npmjs.org/acorn/-/acorn-6.3.0.tgz | tar -O -xz
package/dist/acorn.js > acorn.rollup.js

ls -l acorn.rollup.js
-rwxr-xr-x 1 root root 191715 Sep 15 16:49 acorn.rollup.js

i recently added es9 syntax-support to in-browser-variant of istanbul by replacing its aging esprima-parser with acorn [1]. ideally, i hope a standardized ast will be available someday, and get rid of acorn/babel/shift altogether (or maybe acorn can become that standard?). even better, is if [cross-compatible] instrumentation becomes a common bultin-feature in engines, and get rid of istanbul.

chrome/puppeteer's instrumentation-api is not yet ideal for my use-case because it currently lack code-coverage-info on branches (which istanbul-instrumentation provides).

[1] istanbul-lite - embeddable, es9 browser-variant of istanbul code-coverage kaizhu256.github.io/node-istanbul-lite/build..beta..travis-ci.org/app

# Isiah Meadows (5 years ago)

Nit: Acorn's output is based on Esprima. Its code is not and hasn't been for a few years now. It started a fork of Esprima, but it wasn't long before it was rewritten the first time.


Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com