LR(1) grammar/parser and lookahead-restrictions

# Michael Dyck (9 days ago)

In the past, it has been said (usually by Brendan Eich) that TC39 intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so much asking about the "1", but more about the "LR".)

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

  • modify the grammar (before the parser is constructed),
  • modify the construction of the parser, and/or
  • modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

# Dmitry Soshnikov (9 days ago)

One yet has to check the pure ES6+ grammar to be LR(1) compatible (the Cover grammar with nodes reinterpretations makes it harder), but for a very practical use-case, e.g. combined with Flow, it'll not be easy to have an automated parser (since Flow's parser uses backtracking for some edge cases which is already not LR-parser).

I have some subset of ES6 working as an example language for Syntax tool[1] (since I just took a bunch of productions from JS), but I haven't tried the full ES6 grammar (not having time to work on this project, since it'll be not a straightforward process of just matching the grammar 1:1 to the parser generator -- based on the above reasons).

Long time ago I also had a brief discussion with authors of another popular parser generators for JS. Both didn't try it as well, but mentioned that even for ES5 it was hard to parse e.g. regexp, etc.

Someone (with enough time for the project) should just sit, and spend some time actually porting the grammar from the spec on some powerful parsers generator.

[1] Syntax: DmitrySoshnikov/syntax [2] twitter.com/DmitrySoshnikov/status/666327209959751680

# Isiah Meadows (9 days ago)

Heuristically, I doubt it's even context-free at this point, considering the concept and widespread prevalence of early errors now. I suspect it's mildly context-sensitive (maybe tree-adjoining?), but I'm no formal language expert here.

# Michael Dyck (9 days ago)

On 17-01-11 09:09 PM, Isiah Meadows wrote:

Heuristically, I doubt it's even context-free at this point, considering the concept and widespread prevalence of early errors now. I suspect it's mildly context-sensitive (maybe tree-adjoining?), but I'm no formal language expert here.

I think the claim of/desire for LR(1)-ness only applies to the language defined by the BNF-ish rules, i.e. ignoring the effects of prose-y things like early errors and ASI. Anyhow, that's the language I'm asking about.

# Dmitry Soshnikov (9 days ago)

Well, it's still context-free (contex-sensitive may have terminals on LHS too, JS cannot). It requires extra cover grammar, and additional static semantics, but still potentially can be parsed by a LALR/CLR parser - yes, you'll have to do parsed nodes reinterpretation in semantics handlers (e.g. validate and transform parsed ObjectLiteral into an ObjectPattern when parsing destructuring), but should be doable in general. It won't be doable if lookaheafs won't be enough, and only a backtracking can recover (like the cases with Flow parser).

Dmitry

# Isiah Meadows (9 days ago)

Okay. I see now. How do early errors factor into this?

# Dmitry Soshnikov (9 days ago)

The "early errors" are just parser errors which are enforced not by grammar rules, but by additional validation algorithms which follow the parsed node. E.g. eval cannot be assigned in strict mode: it's not a grammar issue (grammar allows eval on LHS), and it's not a runtime issue, it's an "early error" enforced by static semantics, which still issues it as a ParseError. Example in handling it in a semantic action for a production (on a simplified AssignmentExpression):

%%

AssignmentExpression
  : IDENTIFIER '=' Expression {
      const id = $1;

      // Static semantics: early parser error:

      if (mode.isStrict && (id === 'eval' || id == 'arguments')) {
        throw new ParseError(
          '`eval` and `arguments` cannot be assigned in strict mode.'
        );
      }

      return {
        type: 'AssigmentExpression',
        left: $1,
        right: $3,
      };
    }

Similarly other "early errors".

BTW, here is the example I mentioned above of how Flow has to do backtracking in parsing in some edge cases, making the grammar completely not LR(1) anymore: babel/babel#1991

Dmitry

# Isiah Meadows (8 days ago)

Okay. I see now. Lots of backtracking in the case of arrow functions and ternary expressions vs type annotations.