LR(1) grammar/parser and lookahead-restrictions

# Michael Dyck (9 years ago)

In the past, it has been said (usually by Brendan Eich) that TC39 intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so much asking about the "1", but more about the "LR".)

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

modify the grammar (before the parser is constructed),
modify the construction of the parser, and/or
modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

In the past, it has been said (usually by Brendan Eich) that TC39 intends 
that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so 
much asking about the "1", but more about the "LR".)

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! 
terminals]) fit into the LR approach. It seems like there are three 
possibilities:
   - modify the grammar (before the parser is constructed),
   - modify the construction of the parser, and/or
   - modify the operation of the parser.
It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If 
so, how did you you handle lookahead-restrictions?

-Michael

# Dmitry Soshnikov (9 years ago)

One yet has to check the pure ES6+ grammar to be LR(1) compatible (the Cover grammar with nodes reinterpretations makes it harder), but for a very practical use-case, e.g. combined with Flow, it'll not be easy to have an automated parser (since Flow's parser uses backtracking for some edge cases which is already not LR-parser).

I have some subset of ES6 working as an example language for Syntax tool[1] (since I just took a bunch of productions from JS), but I haven't tried the full ES6 grammar (not having time to work on this project, since it'll be not a straightforward process of just matching the grammar 1:1 to the parser generator -- based on the above reasons).

Long time ago I also had a brief discussion with authors of another popular parser generators for JS. Both didn't try it as well, but mentioned that even for ES5 it was hard to parse e.g. regexp, etc.

Someone (with enough time for the project) should just sit, and spend some time actually porting the grammar from the spec on some powerful parsers generator.

[1] Syntax: DmitrySoshnikov/syntax [2] twitter.com/DmitrySoshnikov/status/666327209959751680

One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
Cover grammar with nodes reinterpretations makes it harder), but for a very
practical use-case, e.g. combined with Flow, it'll not be easy to have an
automated parser (since Flow's parser uses backtracking for some edge cases
which is already not LR-parser).

I have some subset of ES6 working as an example language for Syntax tool[1]
(since I just took a bunch of productions from JS), but I haven't tried the
full ES6 grammar (not having time to work on this project, since it'll be
not a straightforward process of just matching the grammar 1:1 to the
parser generator -- based on the above reasons).

Long time ago I also had a brief discussion with authors of another popular
parser generators for JS. Both didn't try it as well, but mentioned that
even for ES5 it was hard to parse e.g. regexp, etc.

Someone (with enough time for the project) should just sit, and spend some
time actually porting the grammar from the spec on some powerful parsers
generator.

[1] Syntax: https://github.com/DmitrySoshnikov/syntax
[2] https://twitter.com/DmitrySoshnikov/status/666327209959751680

On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> In the past, it has been said (usually by Brendan Eich) that TC39 intends
> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
> much asking about the "1", but more about the "LR".)
>
> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
> terminals]) fit into the LR approach. It seems like there are three
> possibilities:
>   - modify the grammar (before the parser is constructed),
>   - modify the construction of the parser, and/or
>   - modify the operation of the parser.
> It's not clear to me that any of these (or any combination of these) will
> work.
>
> Has anyone algorithmically generated an LR parser from the spec grammar?
> If so, how did you you handle lookahead-restrictions?
>
> -Michael
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170111/4bcef884/attachment.html>

# Isiah Meadows (9 years ago)

Heuristically, I doubt it's even context-free at this point, considering the concept and widespread prevalence of early errors now. I suspect it's mildly context-sensitive (maybe tree-adjoining?), but I'm no formal language expert here.

Heuristically, I doubt it's even context-free at this point, considering
the concept and widespread prevalence of early errors now. I suspect it's
mildly context-sensitive (maybe tree-adjoining?), but I'm no formal
language expert here.

On Wed, Jan 11, 2017, 14:23 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
wrote:

> One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
> Cover grammar with nodes reinterpretations makes it harder), but for a very
> practical use-case, e.g. combined with Flow, it'll not be easy to have an
> automated parser (since Flow's parser uses backtracking for some edge cases
> which is already not LR-parser).
>
> I have some subset of ES6 working as an example language for Syntax
> tool[1] (since I just took a bunch of productions from JS), but I haven't
> tried the full ES6 grammar (not having time to work on this project, since
> it'll be not a straightforward process of just matching the grammar 1:1 to
> the parser generator -- based on the above reasons).
>
> Long time ago I also had a brief discussion with authors of another
> popular parser generators for JS. Both didn't try it as well, but mentioned
> that even for ES5 it was hard to parse e.g. regexp, etc.
>
> Someone (with enough time for the project) should just sit, and spend some
> time actually porting the grammar from the spec on some powerful parsers
> generator.
>
> [1] Syntax: https://github.com/DmitrySoshnikov/syntax
> [2] https://twitter.com/DmitrySoshnikov/status/666327209959751680
>
> On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:
>
> In the past, it has been said (usually by Brendan Eich) that TC39 intends
> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
> much asking about the "1", but more about the "LR".)
>
> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
> terminals]) fit into the LR approach. It seems like there are three
> possibilities:
>   - modify the grammar (before the parser is constructed),
>   - modify the construction of the parser, and/or
>   - modify the operation of the parser.
> It's not clear to me that any of these (or any combination of these) will
> work.
>
> Has anyone algorithmically generated an LR parser from the spec grammar?
> If so, how did you you handle lookahead-restrictions?
>
> -Michael
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170112/d12942e6/attachment.html>

# Michael Dyck (9 years ago)

On 17-01-11 09:09 PM, Isiah Meadows wrote:

Heuristically, I doubt it's even context-free at this point, considering the concept and widespread prevalence of early errors now. I suspect it's mildly context-sensitive (maybe tree-adjoining?), but I'm no formal language expert here.

I think the claim of/desire for LR(1)-ness only applies to the language defined by the BNF-ish rules, i.e. ignoring the effects of prose-y things like early errors and ASI. Anyhow, that's the language I'm asking about.

On 17-01-11 09:09 PM, Isiah Meadows wrote:
> Heuristically, I doubt it's even context-free at this point, considering the
> concept and widespread prevalence of early errors now. I suspect it's mildly
> context-sensitive (maybe tree-adjoining?), but I'm no formal language expert
> here.

I think the claim of/desire for LR(1)-ness only applies to the language 
defined by the BNF-ish rules, i.e. ignoring the effects of prose-y things 
like early errors and ASI. Anyhow, that's the language I'm asking about.

-Michael

# Dmitry Soshnikov (9 years ago)

Well, it's still context-free (contex-sensitive may have terminals on LHS too, JS cannot). It requires extra cover grammar, and additional static semantics, but still potentially can be parsed by a LALR/CLR parser - yes, you'll have to do parsed nodes reinterpretation in semantics handlers (e.g. validate and transform parsed ObjectLiteral into an ObjectPattern when parsing destructuring), but should be doable in general. It won't be doable if lookaheafs won't be enough, and only a backtracking can recover (like the cases with Flow parser).

Dmitry

Well, it's still context-free (contex-sensitive may have terminals on LHS
too, JS cannot). It requires extra cover grammar, and additional static
semantics, but still potentially can be parsed by a LALR/CLR parser - yes,
you'll have to do parsed nodes reinterpretation in semantics handlers (e.g.
validate and transform parsed ObjectLiteral into an ObjectPattern when
parsing destructuring), but should be doable in general. It won't be doable
if lookaheafs won't be enough, and only a backtracking can recover (like
the cases with Flow parser).

Dmitry


On Wed, Jan 11, 2017 at 6:09 PM Isiah Meadows <isiahmeadows at gmail.com>
wrote:

> Heuristically, I doubt it's even context-free at this point, considering
> the concept and widespread prevalence of early errors now. I suspect it's
> mildly context-sensitive (maybe tree-adjoining?), but I'm no formal
> language expert here.
>
>
>
> On Wed, Jan 11, 2017, 14:23 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
> wrote:
>
> One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
> Cover grammar with nodes reinterpretations makes it harder), but for a very
> practical use-case, e.g. combined with Flow, it'll not be easy to have an
> automated parser (since Flow's parser uses backtracking for some edge cases
> which is already not LR-parser).
>
> I have some subset of ES6 working as an example language for Syntax
> tool[1] (since I just took a bunch of productions from JS), but I haven't
> tried the full ES6 grammar (not having time to work on this project, since
> it'll be not a straightforward process of just matching the grammar 1:1 to
> the parser generator -- based on the above reasons).
>
> Long time ago I also had a brief discussion with authors of another
> popular parser generators for JS. Both didn't try it as well, but mentioned
> that even for ES5 it was hard to parse e.g. regexp, etc.
>
> Someone (with enough time for the project) should just sit, and spend some
> time actually porting the grammar from the spec on some powerful parsers
> generator.
>
> [1] Syntax: https://github.com/DmitrySoshnikov/syntax
> [2] https://twitter.com/DmitrySoshnikov/status/666327209959751680
>
> On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:
>
> In the past, it has been said (usually by Brendan Eich) that TC39 intends
> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
> much asking about the "1", but more about the "LR".)
>
>
>
>
>
> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
> terminals]) fit into the LR approach. It seems like there are three
> possibilities:
>
>
>   - modify the grammar (before the parser is constructed),
>
>
>   - modify the construction of the parser, and/or
>
>
>   - modify the operation of the parser.
>
>
> It's not clear to me that any of these (or any combination of these) will
> work.
>
>
>
>
>
> Has anyone algorithmically generated an LR parser from the spec grammar?
> If so, how did you you handle lookahead-restrictions?
>
>
>
>
>
> -Michael
>
>
> _______________________________________________
>
>
> es-discuss mailing list
>
>
> es-discuss at mozilla.org
>
>
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
>
>
> _______________________________________________
>
>
> es-discuss mailing list
>
>
> es-discuss at mozilla.org
>
>
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170112/1e536068/attachment.html>

# Isiah Meadows (9 years ago)

Okay. I see now. How do early errors factor into this?

Okay. I see now. How do early errors factor into this?

On Wed, Jan 11, 2017, 21:49 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
wrote:

> Well, it's still context-free (contex-sensitive may have terminals on LHS
> too, JS cannot). It requires extra cover grammar, and additional static
> semantics, but still potentially can be parsed by a LALR/CLR parser - yes,
> you'll have to do parsed nodes reinterpretation in semantics handlers (e.g.
> validate and transform parsed ObjectLiteral into an ObjectPattern when
> parsing destructuring), but should be doable in general. It won't be doable
> if lookaheafs won't be enough, and only a backtracking can recover (like
> the cases with Flow parser).
>
> Dmitry
>
>
> On Wed, Jan 11, 2017 at 6:09 PM Isiah Meadows <isiahmeadows at gmail.com>
> wrote:
>
> Heuristically, I doubt it's even context-free at this point, considering
> the concept and widespread prevalence of early errors now. I suspect it's
> mildly context-sensitive (maybe tree-adjoining?), but I'm no formal
> language expert here.
>
>
>
> On Wed, Jan 11, 2017, 14:23 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
> wrote:
>
> One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
> Cover grammar with nodes reinterpretations makes it harder), but for a very
> practical use-case, e.g. combined with Flow, it'll not be easy to have an
> automated parser (since Flow's parser uses backtracking for some edge cases
> which is already not LR-parser).
>
> I have some subset of ES6 working as an example language for Syntax
> tool[1] (since I just took a bunch of productions from JS), but I haven't
> tried the full ES6 grammar (not having time to work on this project, since
> it'll be not a straightforward process of just matching the grammar 1:1 to
> the parser generator -- based on the above reasons).
>
> Long time ago I also had a brief discussion with authors of another
> popular parser generators for JS. Both didn't try it as well, but mentioned
> that even for ES5 it was hard to parse e.g. regexp, etc.
>
> Someone (with enough time for the project) should just sit, and spend some
> time actually porting the grammar from the spec on some powerful parsers
> generator.
>
> [1] Syntax: https://github.com/DmitrySoshnikov/syntax
> [2] https://twitter.com/DmitrySoshnikov/status/666327209959751680
>
> On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:
>
> In the past, it has been said (usually by Brendan Eich) that TC39 intends
> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
> much asking about the "1", but more about the "LR".)
>
>
>
>
>
> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
> terminals]) fit into the LR approach. It seems like there are three
> possibilities:
>
>
>   - modify the grammar (before the parser is constructed),
>
>
>   - modify the construction of the parser, and/or
>
>
>   - modify the operation of the parser.
>
>
> It's not clear to me that any of these (or any combination of these) will
> work.
>
>
>
>
>
> Has anyone algorithmically generated an LR parser from the spec grammar?
> If so, how did you you handle lookahead-restrictions?
>
>
>
>
>
> -Michael
>
>
> _______________________________________________
>
>
> es-discuss mailing list
>
>
> es-discuss at mozilla.org
>
>
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
>
>
> _______________________________________________
>
>
> es-discuss mailing list
>
>
> es-discuss at mozilla.org
>
>
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170112/07770aad/attachment.html>

# Dmitry Soshnikov (9 years ago)

The "early errors" are just parser errors which are enforced not by grammar rules, but by additional validation algorithms which follow the parsed node. E.g. eval cannot be assigned in strict mode: it's not a grammar issue (grammar allows eval on LHS), and it's not a runtime issue, it's an "early error" enforced by static semantics, which still issues it as a ParseError. Example in handling it in a semantic action for a production (on a simplified AssignmentExpression):

%%

AssignmentExpression
  : IDENTIFIER '=' Expression {
      const id = $1;

      // Static semantics: early parser error:

      if (mode.isStrict && (id === 'eval' || id == 'arguments')) {
        throw new ParseError(
          '`eval` and `arguments` cannot be assigned in strict mode.'
        );
      }

      return {
        type: 'AssigmentExpression',
        left: $1,
        right: $3,
      };
    }

Similarly other "early errors".

BTW, here is the example I mentioned above of how Flow has to do backtracking in parsing in some edge cases, making the grammar completely not LR(1) anymore: babel/babel#1991

Dmitry

The "early errors" are just parser errors which are enforced not by grammar
rules, but by additional validation algorithms which follow the parsed
node. E.g. `eval` cannot be assigned in strict mode: it's not a grammar
issue (grammar allows `eval` on LHS), and it's not a runtime issue, it's an
"early error" enforced by static semantics, which still issues it as a
`ParseError`. Example in handling it in a semantic action for a production
(on a simplified `AssignmentExpression`):

```
%%

AssignmentExpression
  : IDENTIFIER '=' Expression {
      const id = $1;

      // Static semantics: early parser error:

      if (mode.isStrict && (id === 'eval' || id == 'arguments')) {
        throw new ParseError(
          '`eval` and `arguments` cannot be assigned in strict mode.'
        );
      }

      return {
        type: 'AssigmentExpression',
        left: $1,
        right: $3,
      };
    }
```

Similarly other "early errors".

BTW, here is the example I mentioned above of how Flow has to do
backtracking in parsing in some edge cases, making the grammar completely
not LR(1) anymore:
https://github.com/babel/babel/issues/1991#issuecomment-121708242

Dmitry


On Wed, Jan 11, 2017 at 7:13 PM, Isiah Meadows <isiahmeadows at gmail.com>
wrote:

> Okay. I see now. How do early errors factor into this?
>
> On Wed, Jan 11, 2017, 21:49 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
> wrote:
>
>> Well, it's still context-free (contex-sensitive may have terminals on LHS
>> too, JS cannot). It requires extra cover grammar, and additional static
>> semantics, but still potentially can be parsed by a LALR/CLR parser - yes,
>> you'll have to do parsed nodes reinterpretation in semantics handlers (e.g.
>> validate and transform parsed ObjectLiteral into an ObjectPattern when
>> parsing destructuring), but should be doable in general. It won't be doable
>> if lookaheafs won't be enough, and only a backtracking can recover (like
>> the cases with Flow parser).
>>
>> Dmitry
>>
>>
>> On Wed, Jan 11, 2017 at 6:09 PM Isiah Meadows <isiahmeadows at gmail.com>
>> wrote:
>>
>> Heuristically, I doubt it's even context-free at this point, considering
>> the concept and widespread prevalence of early errors now. I suspect it's
>> mildly context-sensitive (maybe tree-adjoining?), but I'm no formal
>> language expert here.
>>
>>
>>
>> On Wed, Jan 11, 2017, 14:23 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
>> wrote:
>>
>> One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
>> Cover grammar with nodes reinterpretations makes it harder), but for a very
>> practical use-case, e.g. combined with Flow, it'll not be easy to have an
>> automated parser (since Flow's parser uses backtracking for some edge cases
>> which is already not LR-parser).
>>
>> I have some subset of ES6 working as an example language for Syntax
>> tool[1] (since I just took a bunch of productions from JS), but I haven't
>> tried the full ES6 grammar (not having time to work on this project, since
>> it'll be not a straightforward process of just matching the grammar 1:1 to
>> the parser generator -- based on the above reasons).
>>
>> Long time ago I also had a brief discussion with authors of another
>> popular parser generators for JS. Both didn't try it as well, but mentioned
>> that even for ES5 it was hard to parse e.g. regexp, etc.
>>
>> Someone (with enough time for the project) should just sit, and spend
>> some time actually porting the grammar from the spec on some powerful
>> parsers generator.
>>
>> [1] Syntax: https://github.com/DmitrySoshnikov/syntax
>> [2] https://twitter.com/DmitrySoshnikov/status/666327209959751680
>>
>> On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org>
>> wrote:
>>
>> In the past, it has been said (usually by Brendan Eich) that TC39 intends
>> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
>> much asking about the "1", but more about the "LR".)
>>
>>
>>
>>
>>
>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>> terminals]) fit into the LR approach. It seems like there are three
>> possibilities:
>>
>>
>>   - modify the grammar (before the parser is constructed),
>>
>>
>>   - modify the construction of the parser, and/or
>>
>>
>>   - modify the operation of the parser.
>>
>>
>> It's not clear to me that any of these (or any combination of these) will
>> work.
>>
>>
>>
>>
>>
>> Has anyone algorithmically generated an LR parser from the spec grammar?
>> If so, how did you you handle lookahead-restrictions?
>>
>>
>>
>>
>>
>> -Michael
>>
>>
>> _______________________________________________
>>
>>
>> es-discuss mailing list
>>
>>
>> es-discuss at mozilla.org
>>
>>
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>>
>> es-discuss mailing list
>>
>>
>> es-discuss at mozilla.org
>>
>>
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170111/682ec400/attachment.html>

# Isiah Meadows (9 years ago)

Okay. I see now. Lots of backtracking in the case of arrow functions and ternary expressions vs type annotations.

Okay. I see now. Lots of backtracking in the case of arrow functions and
ternary expressions vs type annotations.

On Thu, Jan 12, 2017, 00:45 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
wrote:

> The "early errors" are just parser errors which are enforced not by
> grammar rules, but by additional validation algorithms which follow the
> parsed node. E.g. `eval` cannot be assigned in strict mode: it's not a
> grammar issue (grammar allows `eval` on LHS), and it's not a runtime issue,
> it's an "early error" enforced by static semantics, which still issues it
> as a `ParseError`. Example in handling it in a semantic action for a
> production (on a simplified `AssignmentExpression`):
>
> ```
> %%
>
> AssignmentExpression
>   : IDENTIFIER '=' Expression {
>       const id = $1;
>
>       // Static semantics: early parser error:
>
>       if (mode.isStrict && (id === 'eval' || id == 'arguments')) {
>         throw new ParseError(
>           '`eval` and `arguments` cannot be assigned in strict mode.'
>         );
>       }
>
>       return {
>         type: 'AssigmentExpression',
>         left: $1,
>         right: $3,
>       };
>     }
> ```
>
> Similarly other "early errors".
>
> BTW, here is the example I mentioned above of how Flow has to do
> backtracking in parsing in some edge cases, making the grammar completely
> not LR(1) anymore:
> https://github.com/babel/babel/issues/1991#issuecomment-121708242
>
> Dmitry
>
>
> On Wed, Jan 11, 2017 at 7:13 PM, Isiah Meadows <isiahmeadows at gmail.com>
> wrote:
>
> Okay. I see now. How do early errors factor into this?
>
> On Wed, Jan 11, 2017, 21:49 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
> wrote:
>
> Well, it's still context-free (contex-sensitive may have terminals on LHS
> too, JS cannot). It requires extra cover grammar, and additional static
> semantics, but still potentially can be parsed by a LALR/CLR parser - yes,
> you'll have to do parsed nodes reinterpretation in semantics handlers (e.g.
> validate and transform parsed ObjectLiteral into an ObjectPattern when
> parsing destructuring), but should be doable in general. It won't be doable
> if lookaheafs won't be enough, and only a backtracking can recover (like
> the cases with Flow parser).
>
> Dmitry
>
>
> On Wed, Jan 11, 2017 at 6:09 PM Isiah Meadows <isiahmeadows at gmail.com>
> wrote:
>
> Heuristically, I doubt it's even context-free at this point, considering
> the concept and widespread prevalence of early errors now. I suspect it's
> mildly context-sensitive (maybe tree-adjoining?), but I'm no formal
> language expert here.
>
>
>
> On Wed, Jan 11, 2017, 14:23 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
> wrote:
>
> One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
> Cover grammar with nodes reinterpretations makes it harder), but for a very
> practical use-case, e.g. combined with Flow, it'll not be easy to have an
> automated parser (since Flow's parser uses backtracking for some edge cases
> which is already not LR-parser).
>
> I have some subset of ES6 working as an example language for Syntax
> tool[1] (since I just took a bunch of productions from JS), but I haven't
> tried the full ES6 grammar (not having time to work on this project, since
> it'll be not a straightforward process of just matching the grammar 1:1 to
> the parser generator -- based on the above reasons).
>
> Long time ago I also had a brief discussion with authors of another
> popular parser generators for JS. Both didn't try it as well, but mentioned
> that even for ES5 it was hard to parse e.g. regexp, etc.
>
> Someone (with enough time for the project) should just sit, and spend some
> time actually porting the grammar from the spec on some powerful parsers
> generator.
>
> [1] Syntax: https://github.com/DmitrySoshnikov/syntax
> [2] https://twitter.com/DmitrySoshnikov/status/666327209959751680
>
> On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:
>
> In the past, it has been said (usually by Brendan Eich) that TC39 intends
> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
> much asking about the "1", but more about the "LR".)
>
>
>
>
>
> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
> terminals]) fit into the LR approach. It seems like there are three
> possibilities:
>
>
>   - modify the grammar (before the parser is constructed),
>
>
>   - modify the construction of the parser, and/or
>
>
>   - modify the operation of the parser.
>
>
> It's not clear to me that any of these (or any combination of these) will
> work.
>
>
>
>
>
> Has anyone algorithmically generated an LR parser from the spec grammar?
> If so, how did you you handle lookahead-restrictions?
>
>
>
>
>
> -Michael
>
>
> _______________________________________________
>
>
> es-discuss mailing list
>
>
> es-discuss at mozilla.org
>
>
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
>
>
> _______________________________________________
>
>
> es-discuss mailing list
>
>
> es-discuss at mozilla.org
>
>
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170112/27b7e7f0/attachment-0001.html>

# Waldemar Horwat (9 years ago)

On 01/11/2017 10:28, Michael Dyck wrote:

In the past, it has been said (usually by Brendan Eich) that TC39 intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so much asking about the "1", but more about the "LR".)

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

modify the grammar (before the parser is constructed),

modify the construction of the parser, and/or

modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

I'm the source of the LR(1) condition. I've been doing that ever since ECMAScript Edition 3, and in fact am using the LR parser to help design and debug the spec. I have an implementation of the parser with a few extensions to the LR grammar, including support for parametrized productions, lookahead restrictions, and lexer-parser feedback used to disambiguate things such as what token / will start. I validate various things such as making sure that there is no place where both an identifier and a regular expression can occur.

 Waldemar

On 01/11/2017 10:28, Michael Dyck wrote:
> In the past, it has been said (usually by Brendan Eich) that TC39 intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so much asking about the "1", but more about the "LR".)
>
> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:
>   - modify the grammar (before the parser is constructed),
>   - modify the construction of the parser, and/or
>   - modify the operation of the parser.
> It's not clear to me that any of these (or any combination of these) will work.
>
> Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

I'm the source of the LR(1) condition.  I've been doing that ever since ECMAScript Edition 3, and in fact am using the LR parser to help design and debug the spec.  I have an implementation of the parser with a few extensions to the LR grammar, including support for parametrized productions, lookahead restrictions, and lexer-parser feedback used to disambiguate things such as what token / will start.  I validate various things such as making sure that there is no place where both an identifier and a regular expression can occur.

     Waldemar

# Dmitry Soshnikov (9 years ago)

On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at google.com>

wrote:

On 01/11/2017 10:28, Michael Dyck wrote:

In the past, it has been said (usually by Brendan Eich) that TC39 intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so much asking about the "1", but more about the "LR".)

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

modify the grammar (before the parser is constructed),

modify the construction of the parser, and/or

modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

I'm the source of the LR(1) condition. I've been doing that ever since ECMAScript Edition 3, and in fact am using the LR parser to help design and debug the spec. I have an implementation of the parser with a few extensions to the LR grammar, including support for parametrized productions, lookahead restrictions, and lexer-parser feedback used to disambiguate things such as what token / will start. I validate various things such as making sure that there is no place where both an identifier and a regular expression can occur.

Thanks for the confirmation.

lookahead restrictions

Yeah, all the lookahead restrictions, such as lookahead ∉ {{, function} add a lot of extra "No<Lookahead>" productions, which makes the grammar (and hence the parsing table) bigger unfortunately. Example is e.g. ExpressionNoIn, VariableDeclarationNoIn from the spec, and similar for ...NoLeftCurrly, and ...NoFunction to support those lookahead restrictions, and allow ExpressionStatement to be parsed without "reduce/reduce" conflicts.

That's why in the mentioned "sub-set" of the ECMAScript example language [1] I chose not to use the same syntax for function expression and function declaration: fn foo() {} for declaration, and just an arrow function for function-expressions :) -- no ambiguity, no lookahead restrictions, hence no larger grammar, hence a faster parser.

lexer-parser feedback

Yes, having a lexer/parser state solves it more elegantly. Also, if a parser tool allows semantic action not only at the end (i.e on reduce), but for each symbol on RHS (i.e. "S-attributed" definition, and not "L-attributed"), handling the lookahead ∉ {{, function} could be even easier: on matching the { from the statement position, one can track the parser state that the block should be parsed, and not an object literal. This probably can be solved with lexer state too -- to avoid again a bunch of extra "No<lookahead>" productions. And such cases are simpler in manual

recursive decent.

[1] Example lang based on subset of ECMAScript grammar: DmitrySoshnikov/syntax/blob/master/examples/lang.bnf

Dmitry

On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at google.com>
wrote:

> On 01/11/2017 10:28, Michael Dyck wrote:
>
>> In the past, it has been said (usually by Brendan Eich) that TC39 intends
>> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
>> much asking about the "1", but more about the "LR".)
>>
>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>> terminals]) fit into the LR approach. It seems like there are three
>> possibilities:
>>   - modify the grammar (before the parser is constructed),
>>   - modify the construction of the parser, and/or
>>   - modify the operation of the parser.
>> It's not clear to me that any of these (or any combination of these) will
>> work.
>>
>> Has anyone algorithmically generated an LR parser from the spec grammar?
>> If so, how did you you handle lookahead-restrictions?
>>
>
> I'm the source of the LR(1) condition.  I've been doing that ever since
> ECMAScript Edition 3, and in fact am using the LR parser to help design and
> debug the spec.  I have an implementation of the parser with a few
> extensions to the LR grammar, including support for parametrized
> productions, lookahead restrictions, and lexer-parser feedback used to
> disambiguate things such as what token / will start.  I validate various
> things such as making sure that there is no place where both an identifier
> and a regular expression can occur.
>
>
Thanks for the confirmation.

> lookahead restrictions

Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}`
add a lot of extra "No<Lookahead>" productions, which makes the grammar
(and hence the parsing table) bigger unfortunately. Example is e.g.
`ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for
`...NoLeftCurrly`, and `...NoFunction` to support those lookahead
restrictions, and allow `ExpressionStatement` to be parsed without
"reduce/reduce" conflicts.

That's why in the mentioned "sub-set" of the ECMAScript example language
[1] I chose not to use the same syntax for function expression and function
declaration: `fn foo() {}` for declaration, and just an arrow function for
function-expressions :) -- no ambiguity, no lookahead restrictions, hence
no larger grammar, hence a faster parser.

 > lexer-parser feedback

Yes, having a lexer/parser state solves it more elegantly. Also, if a
parser tool allows semantic action not only at the end (i.e on reduce), but
for each symbol on RHS (i.e. "S-attributed" definition, and not
"L-attributed"), handling the `lookahead ∉ {{, function}` could be even
easier: on matching the `{` from the statement position, one can track the
parser state that the block should be parsed, and not an object literal.
This probably can be solved with lexer state too -- to avoid again a bunch
of extra "No<lookahead>" productions. And such cases are simpler in manual
recursive decent.

[1] Example lang based on subset of ECMAScript grammar:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf

Dmitry

>     Waldemar
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170123/0937e6b9/attachment-0001.html>

# Michael Dyck (9 years ago)

On 17-01-23 08:24 PM, Waldemar Horwat wrote:

On 01/11/2017 10:28, Michael Dyck wrote:

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

modify the grammar (before the parser is constructed),

modify the construction of the parser, and/or

modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

I have an implementation of the parser with a few extensions to the LR grammar, including support for ... lookahead restrictions, ...

Great! So how do you handle lookahead restrictions?

On 17-01-23 08:24 PM, Waldemar Horwat wrote:
> On 01/11/2017 10:28, Michael Dyck wrote:
>>
>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>> terminals]) fit into the LR approach. It seems like there are three
>> possibilities:
>>   - modify the grammar (before the parser is constructed),
>>   - modify the construction of the parser, and/or
>>   - modify the operation of the parser.
>> It's not clear to me that any of these (or any combination of these) will
>> work.
>>
>> Has anyone algorithmically generated an LR parser from the spec grammar?
>> If so, how did you you handle lookahead-restrictions?
>
> I have an implementation of the parser with a few extensions to the LR
> grammar, including support for ... lookahead  restrictions, ...

Great! So how do you handle lookahead restrictions?

-Michael

# Dmitry Soshnikov (9 years ago)

On Tue, Jan 24, 2017 at 8:23 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

On 17-01-23 08:24 PM, Waldemar Horwat wrote:

On 01/11/2017 10:28, Michael Dyck wrote:

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

modify the grammar (before the parser is constructed),

modify the construction of the parser, and/or

modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

I have an implementation of the parser with a few extensions to the LR grammar, including support for ... lookahead restrictions, ...

Great! So how do you handle lookahead restrictions?

As mentioned above, there are two ways to handle lookahead restrictions in LR-parser:

Using "No<Lookahead>" productions. This unfortunately may double number

of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral in ECMAScript, one needs to double all the expression productions to correctly parse ExpressionStatement. As an example in the spec itself, take a look e.g. at RelationalExpression and RelationalExpressionNoIn, and how it repeats (doubles) all the productions but with "NoIn" suffix es5.github.io/#x11.8

Another approach is to use Cover grammar. It's when you parse a node with two possible ways, and enforce (reinterpret) needed sub-nodes in the cover-grammar post-processor (after having parsed a generic node). E.g. I chose this approach to parse generically ObjectLiteral, and Block to handle { lookahead restriction for ExpressionStatement. This approach allows not to double grammar (hence making parsing table smaller), however adds some post-processing overhead. DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L206-L222

P.S.: there is actually option #3 -- to use "S-attributed" parser actions, it's when you may execute handling action not only at the end of production (i.e. not only on "reduce"), but after each element on RHS:

statement
options
{
k = 1 ;
}
: { input.LA(1) == LBRACE }? block
| statementTail
;
statementTail
: variableStatement
| emptyStatement
| expressionStatement
        ...

This option is probably the best alternative.

Dmitry

On Tue, Jan 24, 2017 at 8:23 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-23 08:24 PM, Waldemar Horwat wrote:
>
>> On 01/11/2017 10:28, Michael Dyck wrote:
>>
>>>
>>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>>> terminals]) fit into the LR approach. It seems like there are three
>>> possibilities:
>>>   - modify the grammar (before the parser is constructed),
>>>   - modify the construction of the parser, and/or
>>>   - modify the operation of the parser.
>>> It's not clear to me that any of these (or any combination of these) will
>>> work.
>>>
>>> Has anyone algorithmically generated an LR parser from the spec grammar?
>>> If so, how did you you handle lookahead-restrictions?
>>>
>>
>> I have an implementation of the parser with a few extensions to the LR
>> grammar, including support for ... lookahead  restrictions, ...
>>
>
> Great! So how do you handle lookahead restrictions?
>
>
As mentioned above, there are two ways to handle lookahead restrictions in
LR-parser:

1. Using "No<Lookahead>" productions. This unfortunately may double number
of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral
in ECMAScript, one needs to double all the expression productions to
correctly parse `ExpressionStatement`. As an example in the spec itself,
take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`,
and how it repeats (doubles) all the productions but with "NoIn" suffix
https://es5.github.io/#x11.8

2. Another approach is to use Cover grammar. It's when you parse a node
with two possible ways, and enforce (reinterpret) needed sub-nodes in the
cover-grammar post-processor (after having parsed a generic node). E.g. I
chose this approach to parse generically `ObjectLiteral`, and `Block` to
handle `{` lookahead restriction for `ExpressionStatement`. This approach
allows not to double grammar (hence making parsing table smaller), however
adds some post-processing overhead.
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L206-L222

P.S.: there is actually option #3 -- to use "S-attributed" parser actions,
it's when you may execute handling action not only at the end of production
(i.e. not only on "reduce"), but after each element on RHS:

```
statement
options
{
k = 1 ;
}
: { input.LA(1) == LBRACE }? block
| statementTail
;
statementTail
: variableStatement
| emptyStatement
| expressionStatement
        ...
```

This option is probably the best alternative.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170129/f89aadb7/attachment-0001.html>

# Michael Dyck (9 years ago)

On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:

As mentioned above, there are two ways to handle lookahead restrictions in LR-parser:

Using "No<Lookahead>" productions. This unfortunately may double number of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral in ECMAScript, one needs to double all the expression productions to correctly parse ExpressionStatement. As an example in the spec itself, take a look e.g. at RelationalExpression and RelationalExpressionNoIn, and how it repeats (doubles) all the productions but with "NoIn" suffix es5.github.io/#x11.8

So, e.g., one could take the production IterationStatement : ... [lookahead != 'let'] LHSExpr ... and transform it into IterationStatement : ... LHSExpr-that-does-not-start-with-let ... and then generate productions for the new nonterminal, e.g. LHSExpr-that-does-not-start-with-let : New-Expression-that-does-not-start-with-let Call-Expression-that-does-not-start-with-let etc, (eventually bottoming out when a RHS of the original grammar will either always or never start with 'let').

The problem with this approach is that, in general, a lookahead-restriction is not a restriction on the nonterminal that immediately follows it in the production, it's a restriction on the next few input elements, which might be generated by more than just that nonterminal. I'm not saying this is insurmountable, but it does seem to get messy fairly quickly.

And it's unclear how this approach would handle a lookahead-restriction at the end of a production.

Another approach is to use Cover grammar.

Is there a way to automatically (a) construct the cover grammar and (b) reconstruct the parse tree as if parsed by the original grammar, or do you code it manually?

P.S.: there is actually option #3 -- to use "S-attributed" parser actions, it's when you may execute handling action not only at the end of production (i.e. not only on "reduce"), but after each element on RHS:

I'm doubtful this would help. I wouldn't expect the LR-parser-generator to 'understand' the content of actions, so it can't use them to alter the generation of the parser, and so the LR automaton will have lots of conflicts (because it can't 'separate' the alternatives that the lookahead-restrictions are there to separate).

On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>
> As mentioned above, there are two ways to handle lookahead restrictions in
> LR-parser:
>
> 1. Using "No<Lookahead>" productions. This unfortunately may double number
> of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral
> in ECMAScript, one needs to double all the expression productions to
> correctly parse `ExpressionStatement`. As an example in the spec itself,
> take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`,
> and how it repeats (doubles) all the productions but with "NoIn" suffix
> https://es5.github.io/#x11.8

So, e.g., one could take the production
     IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
and transform it into
     IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
and then generate productions for the new nonterminal, e.g.
     LHSExpr-that-does-not-start-with-let :
         New-Expression-that-does-not-start-with-let
         Call-Expression-that-does-not-start-with-let
etc, (eventually bottoming out when a RHS of the original grammar will 
either always or never start with 'let').

The problem with this approach is that, in general, a lookahead-restriction 
is not a restriction on the nonterminal that immediately follows it in the 
production, it's a restriction on the next few input elements, which might 
be generated by more than just that nonterminal. I'm not saying this is 
insurmountable, but it does seem to get messy fairly quickly.

And it's unclear how this approach would handle a lookahead-restriction at 
the end of a production.

> 2. Another approach is to use Cover grammar.

Is there a way to automatically (a) construct the cover grammar and (b) 
reconstruct the parse tree as if parsed by the original grammar, or do you 
code it manually?

> P.S.: there is actually option #3 -- to use "S-attributed" parser actions,
> it's when you may execute handling action not only at the end of production
> (i.e. not only on "reduce"), but after each element on RHS:

I'm doubtful this would help. I wouldn't expect the LR-parser-generator to 
'understand' the content of actions, so it can't use them to alter the 
generation of the parser, and so the LR automaton will have lots of 
conflicts (because it can't 'separate' the alternatives that the 
lookahead-restrictions are there to separate).

-Michael

# Dmitry Soshnikov (9 years ago)

On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:

As mentioned above, there are two ways to handle lookahead restrictions in LR-parser:

Using "No<Lookahead>" productions. This unfortunately may double number of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral in ECMAScript, one needs to double all the expression productions to correctly parse ExpressionStatement. As an example in the spec itself, take a look e.g. at RelationalExpression and RelationalExpressionNoIn, and how it repeats (doubles) all the productions but with "NoIn" suffix es5.github.io/#x11.8

So, e.g., one could take the production IterationStatement : ... [lookahead != 'let'] LHSExpr ... and transform it into IterationStatement : ... LHSExpr-that-does-not-start-with-let ... and then generate productions for the new nonterminal, e.g. LHSExpr-that-does-not-start-with-let : New-Expression-that-does-not-start-with-let Call-Expression-that-does-not-start-with-let etc, (eventually bottoming out when a RHS of the original grammar will either always or never start with 'let').

Correct, and this is how ES spec itself handles all those "No<lookahead>"

duplicated productions.

The problem with this approach is that, in general, a lookahead-restriction is not a restriction on the nonterminal that immediately follows it in the production, it's a restriction on the next few input elements, which might be generated by more than just that nonterminal.

Well, in this case it's not a "lookahead" anymore. You could have k=2,3,...N for lookahead in parser generators, but it's not like parse the whole non-terminal, and then see a lookahead after it. This "parse whole non-terminal till the end" might be parse the whole program. In this case a cover grammar is simpler, and that's why it's used starting since ES6 spec. Take a look e.g. at CoverParenthesizedExpressionAndArrowParameterList from ES2015.

A simplified example: DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L514-L524

By just looking at:

(x, y)

you can't tell whether it's a grouping operator (SequenceExpression node), or a beginning of a lambda function (i.e. its parameters node). So in the generic ParenthisizedExpression production, you reinterpret it either to lambda parameters, or keep just a grouping operator, if there is no LambdaTail. Here it's a function of course, though for this you have to parse (x, y) not as LambdaParameters, and not as SequenceExpression, but as a generic ParenthisizedExpression, and using Cover grammar:

(x, y) -> { return x + y; }

I'm not saying this is insurmountable, but it does seem to get messy fairly quickly.

And it's unclear how this approach would handle a lookahead-restriction at the end of a production.

Yes, it adds a bunch of duplicated productions to the grammar, which on practice may increases number of parsing states simnifically. As a real practical example: ES5 spec in LALR(1) mode would increase from ~350 states to 470.

Another approach is to use Cover grammar.

Is there a way to automatically (a) construct the cover grammar and (b) reconstruct the parse tree as if parsed by the original grammar, or do you code it manually?

Usually it's a manual process of designing a grammar, and defining post-processing static semantics. Take a look again in the example language I gave above, it have couple of case of cover grammar.

P.S.: there is actually option #3 -- to use "S-attributed" parser actions,

it's when you may execute handling action not only at the end of production (i.e. not only on "reduce"), but after each element on RHS:

I'm doubtful this would help. I wouldn't expect the LR-parser-generator to 'understand' the content of actions, so it can't use them to alter the generation of the parser, and so the LR automaton will have lots of conflicts (because it can't 'separate' the alternatives that the lookahead-restrictions are there to separate).

Oh, it'll work for sure, depending on how powerful a parser generator is. Yes, technically it'll be a "reduce-reduce" conflict in the table, but parser will never enter this state based on the semantic action executed before entering some non-terminal (not after). In this "before" action you can communicated to lexer, and check the lookahead.

Dmitry

On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>
>>
>> As mentioned above, there are two ways to handle lookahead restrictions in
>> LR-parser:
>>
>> 1. Using "No<Lookahead>" productions. This unfortunately may double number
>> of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral
>> in ECMAScript, one needs to double all the expression productions to
>> correctly parse `ExpressionStatement`. As an example in the spec itself,
>> take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`,
>> and how it repeats (doubles) all the productions but with "NoIn" suffix
>> https://es5.github.io/#x11.8
>>
>
> So, e.g., one could take the production
>     IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
> and transform it into
>     IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
> and then generate productions for the new nonterminal, e.g.
>     LHSExpr-that-does-not-start-with-let :
>         New-Expression-that-does-not-start-with-let
>         Call-Expression-that-does-not-start-with-let
> etc, (eventually bottoming out when a RHS of the original grammar will
> either always or never start with 'let').
>
>

Correct, and this is how ES spec itself handles all those "No<lookahead>"
duplicated productions.



> The problem with this approach is that, in general, a
> lookahead-restriction is not a restriction on the nonterminal that
> immediately follows it in the production, it's a restriction on the next
> few input elements, which might be generated by more than just that
> nonterminal.


Well, in this case it's not a "lookahead" anymore. You could have
k=2,3,...N for lookahead in parser generators, but it's not like parse the
whole non-terminal, and then see a lookahead after it. This "parse whole
non-terminal till the end" might be parse the whole program. In this case a
cover grammar is simpler, and that's why it's used starting since ES6 spec.
Take a look e.g. at `CoverParenthesizedExpressionAndArrowParameterList`
from ES2015.

A simplified example:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L514-L524

By just looking at:

```
(x, y)
```

you can't tell whether it's a grouping operator (`SequenceExpression`
node), or a beginning of a lambda function (i.e. its parameters node). So
in the generic `ParenthisizedExpression` production, you reinterpret it
either to lambda parameters, or keep just a grouping operator, if there is
no `LambdaTail`. Here it's a function of course, though for this you have
to parse `(x, y)` not as `LambdaParameters`, and not as
`SequenceExpression`, but as a generic `ParenthisizedExpression`, and using
Cover grammar:

```
(x, y) -> { return x + y; }
```



> I'm not saying this is insurmountable, but it does seem to get messy
> fairly quickly.
>
> And it's unclear how this approach would handle a lookahead-restriction at
> the end of a production.


Yes, it adds a bunch of duplicated productions to the grammar, which on
practice may increases number of parsing states simnifically. As a real
practical example: ES5 spec in LALR(1) mode would increase from ~350 states
to 470.


>
>
>
> 2. Another approach is to use Cover grammar.
>>
>
> Is there a way to automatically (a) construct the cover grammar and (b)
> reconstruct the parse tree as if parsed by the original grammar, or do you
> code it manually?
>
>
Usually it's a manual process of designing a grammar, and defining
post-processing static semantics. Take a look again in the example language
I gave above, it have couple of case of cover grammar.


>
> P.S.: there is actually option #3 -- to use "S-attributed" parser actions,
>> it's when you may execute handling action not only at the end of
>> production
>> (i.e. not only on "reduce"), but after each element on RHS:
>>
>
> I'm doubtful this would help. I wouldn't expect the LR-parser-generator to
> 'understand' the content of actions, so it can't use them to alter the
> generation of the parser, and so the LR automaton will have lots of
> conflicts (because it can't 'separate' the alternatives that the
> lookahead-restrictions are there to separate).
>
>
Oh, it'll work for sure, depending on how powerful a parser generator is.
Yes, technically it'll be a "reduce-reduce" conflict in the table, but
parser will never enter this state based on the semantic action executed
*before* entering some non-terminal (not after). In this "before" action
you can communicated to lexer, and check the lookahead.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170130/ed464968/attachment.html>

# Michael Dyck (9 years ago)

On 17-01-30 03:00 PM, Dmitry Soshnikov wrote:

On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org <mailto:jmdyck at ibiblio.org>> wrote:

On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:


    As mentioned above, there are two ways to handle lookahead
    restrictions in
    LR-parser:

    1. Using "No<Lookahead>" productions.

So, e.g., one could take the production
    IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
and transform it into
    IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
and then generate productions for the new nonterminal, e.g.
    LHSExpr-that-does-not-start-with-let :
        New-Expression-that-does-not-start-with-let
        Call-Expression-that-does-not-start-with-let
etc, (eventually bottoming out when a RHS of the original grammar will
either always or never start with 'let').

Correct, and this is how ES spec itself handles all those "No<lookahead>" duplicated productions.

If you're talking about the 'NoIn' productions in ES5, that seems only marginally relevant, since they don't address a lookahead problem (i.e., a problem that could have been solved via a lookahead-restriction).

The problem with this approach is that, in general, a
lookahead-restriction is not a restriction on the nonterminal that
immediately follows it in the production, it's a restriction on the next
few input elements, which might be generated by more than just that
nonterminal.

Well, in this case it's not a "lookahead" anymore.

It isn't a "lookahead" in the resulting grammar, but it is in the original, and the two grammars should define the same language. What I'm saying is that it's incorrect to assume that the effect of the lookahead (in the original) can be confined to the nonterminal that immediately follows, which is what the suggested transformation does.

You could have k=2,3,...N for lookahead in parser generators, but it's not like parse the whole non-terminal, and then see a lookahead after it. This "parse whole non-terminal till the end" might be parse the whole program.

I don't really follow what you're suggesting there.

In this case a cover grammar is simpler, and that's why it's used starting since ES6 spec.

But note that none of the ES5 lookahead-restrictions was replaced with a cover grammar in ES6. They're all still lookahead-restrictions.

I'm not saying this is insurmountable, but it does seem to get messy
fairly quickly.

And it's unclear how this approach would handle a lookahead-restriction
at the end of a production.
Yes, it adds a bunch of duplicated productions to the grammar, which on practice may increases number of parsing states simnifically. As a real practical example: ES5 spec in LALR(1) mode would increase from ~350 states to 470.

By "messy", I wasn't talking about the duplicated productions, but rather the process of generating them.

    2. Another approach is to use Cover grammar.


Is there a way to automatically (a) construct the cover grammar and (b)
reconstruct the parse tree as if parsed by the original grammar, or do
you code it manually?
Usually it's a manual process of designing a grammar, and defining post-processing static semantics.

Okay, I'm not really interested then. I'm looking for an algorithm (i.e., a modification of the LR algorithm) that starts with a grammar-with-lookahead-restrictions and ends with an LR parser.

    P.S.: there is actually option #3 -- to use "S-attributed" parser
    actions,
    it's when you may execute handling action not only at the end of
    production
    (i.e. not only on "reduce"), but after each element on RHS:


I'm doubtful this would help. I wouldn't expect the LR-parser-generator
to 'understand' the content of actions, so it can't use them to alter
the generation of the parser, and so the LR automaton will have lots of
conflicts (because it can't 'separate' the alternatives that the
lookahead-restrictions are there to separate).

Oh, it'll work for sure, depending on how powerful a parser generator is. Yes, technically it'll be a "reduce-reduce" conflict in the table,

So when your generated parser has conflicts, is it easy to tell which you don't have to worry about and which you do?

but parser will never enter this state based on the semantic action executed before entering some non-terminal (not after). In this "before" action you can communicated to lexer, and check the lookahead.

It seems to me that the parser will enter such a state, but always via only one of the conflicting 'paths' (the other(s) being prevented at parse-time by the semantic actions). So only one of the conflicting actions will be valid, but you've still got to figure out which one that is.

On 17-01-30 03:00 PM, Dmitry Soshnikov wrote:
>
> On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org
> <mailto:jmdyck at ibiblio.org>> wrote:
>
>     On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>
>
>         As mentioned above, there are two ways to handle lookahead
>         restrictions in
>         LR-parser:
>
>         1. Using "No<Lookahead>" productions.
>
>     So, e.g., one could take the production
>         IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
>     and transform it into
>         IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
>     and then generate productions for the new nonterminal, e.g.
>         LHSExpr-that-does-not-start-with-let :
>             New-Expression-that-does-not-start-with-let
>             Call-Expression-that-does-not-start-with-let
>     etc, (eventually bottoming out when a RHS of the original grammar will
>     either always or never start with 'let').
>
> Correct, and this is how ES spec itself handles all those "No<lookahead>"
> duplicated productions.

If you're talking about the 'NoIn' productions in ES5, that seems only 
marginally relevant, since they don't address a lookahead problem (i.e., a 
problem that could have been solved via a lookahead-restriction).

>     The problem with this approach is that, in general, a
>     lookahead-restriction is not a restriction on the nonterminal that
>     immediately follows it in the production, it's a restriction on the next
>     few input elements, which might be generated by more than just that
>     nonterminal.
>
>
> Well, in this case it's not a "lookahead" anymore.

It isn't a "lookahead" in the resulting grammar, but it is in the original, 
and the two grammars should define the same language. What I'm saying is 
that it's incorrect to assume that the effect of the lookahead (in the 
original) can be confined to the nonterminal that immediately follows, which 
is what the suggested transformation does.

> You could have k=2,3,...N for lookahead in parser generators, but it's
> not like parse the whole non-terminal, and then see a lookahead after it.
> This "parse whole non-terminal till the end" might be parse the whole
> program.

I don't really follow what you're suggesting there.


> In this case a cover grammar is simpler, and that's why it's used
> starting since ES6 spec.

But note that none of the ES5 lookahead-restrictions was replaced with a 
cover grammar in ES6. They're all still lookahead-restrictions.

>
>     I'm not saying this is insurmountable, but it does seem to get messy
>     fairly quickly.
>
>     And it's unclear how this approach would handle a lookahead-restriction
>     at the end of a production.
>
>
> Yes, it adds a bunch of duplicated productions to the grammar, which on
> practice may increases number of parsing states simnifically. As a real
> practical example: ES5 spec in LALR(1) mode would increase from ~350 states
> to 470.

By "messy", I wasn't talking about the duplicated productions, but rather 
the process of generating them.


>         2. Another approach is to use Cover grammar.
>
>
>     Is there a way to automatically (a) construct the cover grammar and (b)
>     reconstruct the parse tree as if parsed by the original grammar, or do
>     you code it manually?
>
>
> Usually it's a manual process of designing a grammar, and defining
> post-processing static semantics.

Okay, I'm not really interested then. I'm looking for an algorithm (i.e., a 
modification of the LR algorithm) that starts with a 
grammar-with-lookahead-restrictions and ends with an LR parser.


>         P.S.: there is actually option #3 -- to use "S-attributed" parser
>         actions,
>         it's when you may execute handling action not only at the end of
>         production
>         (i.e. not only on "reduce"), but after each element on RHS:
>
>
>     I'm doubtful this would help. I wouldn't expect the LR-parser-generator
>     to 'understand' the content of actions, so it can't use them to alter
>     the generation of the parser, and so the LR automaton will have lots of
>     conflicts (because it can't 'separate' the alternatives that the
>     lookahead-restrictions are there to separate).
>
>
> Oh, it'll work for sure, depending on how powerful a parser generator is.
> Yes, technically it'll be a "reduce-reduce" conflict in the table,

So when your generated parser has conflicts, is it easy to tell which you 
don't have to worry about and which you do?

> but parser will never enter this state based on the semantic action executed
> *before* entering some non-terminal (not after). In this "before" action you
> can communicated to lexer, and check the lookahead.

It seems to me that the parser *will* enter such a state, but always via 
only one of the conflicting 'paths' (the other(s) being prevented at 
parse-time by the semantic actions). So only one of the conflicting actions 
will be valid, but you've still got to figure out which one that is.

-Michael

# Dmitry Soshnikov (9 years ago)

On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:

On 17-01-30 03:00 PM, Dmitry Soshnikov wrote:
On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org <mailto:jmdyck at ibiblio.org>> wrote:
On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:


    As mentioned above, there are two ways to handle lookahead
    restrictions in
    LR-parser:

    1. Using "No<Lookahead>" productions.

So, e.g., one could take the production
    IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
and transform it into
    IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
and then generate productions for the new nonterminal, e.g.
    LHSExpr-that-does-not-start-with-let :
        New-Expression-that-does-not-start-with-let
        Call-Expression-that-does-not-start-with-let
etc, (eventually bottoming out when a RHS of the original grammar will
either always or never start with 'let').
Correct, and this is how ES spec itself handles all those "No<lookahead>" duplicated productions.
If you're talking about the 'NoIn' productions in ES5, that seems only marginally relevant, since they don't address a lookahead problem (i.e., a problem that could have been solved via a lookahead-restriction).
The problem with this approach is that, in general, a
lookahead-restriction is not a restriction on the nonterminal that
immediately follows it in the production, it's a restriction on the
next few input elements, which might be generated by more than just that nonterminal.

Well, in this case it's not a "lookahead" anymore.
It isn't a "lookahead" in the resulting grammar, but it is in the original, and the two grammars should define the same language. What I'm saying is that it's incorrect to assume that the effect of the lookahead (in the original) can be confined to the nonterminal that immediately follows, which is what the suggested transformation does.

A "lookahead" is a terminal token. A non-terminal is not a "lookahead".

To reiterate, imaging you have:

• (x, y)

and

• (x, y) -> { return x + y; }

where • is a cursor position. Your lookaheads are "(", "x", "," "y", etc. Usually 1 lookahead token is enough to route the parser, in this case "(".

What you're talking that you need not these lookaheads (at the beginning of the cursor), but after the parsed non-terminal (i.e. after fully parsed (x, y)), i.e. you're looking for "->", which goes at the end of the

non-terminal. This is not possible, since it's not a lookahead at that specific cursor position. It will be a lookahead after you have parsed the non-terminal corresponding to "(x, y)".

So you can't have two non-terminals for this, since it'll be a "reduce/reduce" conflict, and you need a cover grammar here.

You could have k=2,3,...N for lookahead in parser generators, but it's

not like parse the whole non-terminal, and then see a lookahead after it. This "parse whole non-terminal till the end" might be parse the whole program.

I don't really follow what you're suggesting there.

Like mentioned above:

• (x, y) -> { return x + y; }

In order to determine whether it's a grouping operator, or arrow function params, it's not enough analyzing a "lookahead" being that the cursor position. You need to actually fully parse this non-terminal, and advance to this position:

(x, y) • -> { return x + y; }

and from there you may already analyze a real lookahead. However, the problem is the "(x, y)" might the full program (imagine million of entries on several GBs file source code -- what kind of "lookahead" can you analyze here? :))

In this case a cover grammar is simpler, and that's why it's used

starting since ES6 spec.

But note that none of the ES5 lookahead-restrictions was replaced with a cover grammar in ES6. They're all still lookahead-restrictions.

Sure, because it's normally possible to do with the "No<Lookahead>" doubled

productions. So this actually answers your initial question (it's a real practice).

I'm not saying this is insurmountable, but it does seem to get messy
fairly quickly.

And it's unclear how this approach would handle a
lookahead-restriction at the end of a production.

Yes, it adds a bunch of duplicated productions to the grammar, which on practice may increases number of parsing states simnifically. As a real practical example: ES5 spec in LALR(1) mode would increase from ~350 states to 470.
By "messy", I wasn't talking about the duplicated productions, but rather the process of generating them.
    2. Another approach is to use Cover grammar.
Is there a way to automatically (a) construct the cover grammar and
(b) reconstruct the parse tree as if parsed by the original grammar, or do you code it manually?

Usually it's a manual process of designing a grammar, and defining post-processing static semantics.
Okay, I'm not really interested then. I'm looking for an algorithm (i.e., a modification of the LR algorithm) that starts with a grammar-with-lookahead-restrictions and ends with an LR parser.

OK, yeah, at this point I think we already clarified that it is possible to parse ES grammar, and handle it's lookahead restrictions with several techniques described. If you're interested in generic LR parsing, feel free to contact me (we can omit generic off-topic discussions on this list).

Dmitry

On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-30 03:00 PM, Dmitry Soshnikov wrote:
>
>>
>> On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org
>> <mailto:jmdyck at ibiblio.org>> wrote:
>>
>>     On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>>
>>
>>         As mentioned above, there are two ways to handle lookahead
>>         restrictions in
>>         LR-parser:
>>
>>         1. Using "No<Lookahead>" productions.
>>
>>     So, e.g., one could take the production
>>         IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
>>     and transform it into
>>         IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
>>     and then generate productions for the new nonterminal, e.g.
>>         LHSExpr-that-does-not-start-with-let :
>>             New-Expression-that-does-not-start-with-let
>>             Call-Expression-that-does-not-start-with-let
>>     etc, (eventually bottoming out when a RHS of the original grammar will
>>     either always or never start with 'let').
>>
>> Correct, and this is how ES spec itself handles all those "No<lookahead>"
>> duplicated productions.
>>
>
> If you're talking about the 'NoIn' productions in ES5, that seems only
> marginally relevant, since they don't address a lookahead problem (i.e., a
> problem that could have been solved via a lookahead-restriction).
>
>     The problem with this approach is that, in general, a
>>     lookahead-restriction is not a restriction on the nonterminal that
>>     immediately follows it in the production, it's a restriction on the
>> next
>>     few input elements, which might be generated by more than just that
>>     nonterminal.
>>
>>
>> Well, in this case it's not a "lookahead" anymore.
>>
>
> It isn't a "lookahead" in the resulting grammar, but it is in the
> original, and the two grammars should define the same language. What I'm
> saying is that it's incorrect to assume that the effect of the lookahead
> (in the original) can be confined to the nonterminal that immediately
> follows, which is what the suggested transformation does.

A "lookahead" is a terminal token. A non-terminal is not a "lookahead".

To reiterate, imaging you have:

```
• (x, y)
```

and

```
• (x, y) -> { return x + y; }
```

where `•` is a cursor position. Your lookaheads are "(", "x", "," "y", etc.
Usually 1 lookahead token is enough to route the parser, in this case "(".

What you're talking that you need not these lookaheads (at the beginning of
the cursor), but *after* the parsed non-terminal (i.e. after fully parsed
`(x, y)`), i.e. you're looking for "->", which goes at the end of the
non-terminal. This is not possible, since it's not a lookahead at that
specific cursor position. It will be a lookahead after you have parsed the
non-terminal corresponding to "(x, y)".

So you can't have two non-terminals for this, since it'll be a
"reduce/reduce" conflict, and you need a cover grammar here.

>
>
> You could have k=2,3,...N for lookahead in parser generators, but it's
>> not like parse the whole non-terminal, and then see a lookahead after it.
>> This "parse whole non-terminal till the end" might be parse the whole
>> program.
>>
>
> I don't really follow what you're suggesting there.

Like mentioned above:

```
• (x, y) -> { return x + y; }
```

In order to determine whether it's a grouping operator, or arrow function
params, it's not enough analyzing a "lookahead" being that the cursor
position. You need to actually fully parse this non-terminal, and advance
to this position:

```
(x, y) • -> { return x + y; }
```

and from there you may already analyze a real lookahead. However, the
problem is the "(x, y)" might the full program (imagine million of entries
on several GBs file source code -- what kind of "lookahead" can you analyze
here? :))

>
>
>
> In this case a cover grammar is simpler, and that's why it's used
>> starting since ES6 spec.
>>
>
> But note that none of the ES5 lookahead-restrictions was replaced with a
> cover grammar in ES6. They're all still lookahead-restrictions.

Sure, because it's normally possible to do with the "No<Lookahead>" doubled
productions. So this actually answers your initial question (it's a real
practice).

>
>
>
>>     I'm not saying this is insurmountable, but it does seem to get messy
>>     fairly quickly.
>>
>>     And it's unclear how this approach would handle a
>> lookahead-restriction
>>     at the end of a production.
>>
>>
>> Yes, it adds a bunch of duplicated productions to the grammar, which on
>> practice may increases number of parsing states simnifically. As a real
>> practical example: ES5 spec in LALR(1) mode would increase from ~350
>> states
>> to 470.
>>
>
> By "messy", I wasn't talking about the duplicated productions, but rather
> the process of generating them.
>
>
>         2. Another approach is to use Cover grammar.
>>
>>
>>     Is there a way to automatically (a) construct the cover grammar and
>> (b)
>>     reconstruct the parse tree as if parsed by the original grammar, or do
>>     you code it manually?
>>
>>
>> Usually it's a manual process of designing a grammar, and defining
>> post-processing static semantics.
>>
>
> Okay, I'm not really interested then. I'm looking for an algorithm (i.e.,
> a modification of the LR algorithm) that starts with a
> grammar-with-lookahead-restrictions and ends with an LR parser.

OK, yeah, at this point I think we already clarified that it is possible to
parse ES grammar, and handle it's lookahead restrictions with several
techniques described. If you're interested in generic LR parsing, feel free
to contact me (we can omit generic off-topic discussions on this list).

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170130/f069c4c6/attachment.html>

# Michael Dyck (9 years ago)

On 17-01-30 06:20 PM, Dmitry Soshnikov wrote:

On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org <mailto:jmdyck at ibiblio.org>> wrote:

            1. Using "No<Lookahead>" productions.

        The problem with this approach is that, in general, a
        lookahead-restriction is not a restriction on the nonterminal that
        immediately follows it in the production, it's a restriction on
        the next few input elements, which might be generated by more
        than just that nonterminal.

    Well, in this case it's not a "lookahead" anymore.

It isn't a "lookahead" in the resulting grammar, but it is in the
original, and the two grammars should define the same language. What I'm
saying is that it's incorrect to assume that the effect of the lookahead
(in the original) can be confined to the nonterminal that immediately
follows, which is what the suggested transformation does.

A "lookahead" is a terminal token. A non-terminal is not a "lookahead".

Okay, so then in my previous statement, read "lookahead" as "lookahead-restriction". (But I did say "lookahead-restriction" in the statement before that.)

To reiterate, imaging you have:
• (x, y)
and
• (x, y) -> { return x + y; }
where • is a cursor position. Your lookaheads are "(", "x", "," "y", etc. Usually 1 lookahead token is enough to route the parser, in this case "(".

What you're talking that you need not these lookaheads (at the beginning of the cursor), but after the parsed non-terminal (i.e. after fully parsed (x, y)), i.e. you're looking for "->", which goes at the end of the non-terminal.

I'm not saying I don't need the lookahead tokens right after the cursor. But I am saying that the lookahead tokens that the hypothetical parser needs to examine (in order to enforce a lookahead-restriction), even if it's only 1 or 2, might extend past the 'next' nonterminal.

But that's an awkward way to express my objection.

To recap, I'm talking about the suggestion to replace: [lookahead != <some sequence of terminals>] Nonterminal

in a production in the original grammar with Nonterminal-that-does-not-start-with-that-sequence (and a bunch of new productions) in the transformed grammar, and whether that's a valid transformation.

I'm saying it isn't valid when the Nonterminal can derive phrases with fewer terminals than the prohibited lookahead-sequence. (E.g., if Nonterminal is nullable, or if it can derive a phrase of length 1 when the prohibited lookahead-sequence is length 2.)

This is not possible, since it's not a lookahead at that specific cursor position. It will be a lookahead after you have parsed the non-terminal corresponding to "(x, y)".

So you can't have two non-terminals for this, since it'll be a "reduce/reduce" conflict, and you need a cover grammar here.

Well, I'm saying you'd run into problems before you even tried to construct the LR automaton, but I think you're agreeing with me that this approach is not generally applicable?

    In this case a cover grammar is simpler, and that's why it's used
    starting since ES6 spec.

But note that none of the ES5 lookahead-restrictions was replaced with a
cover grammar in ES6. They're all still lookahead-restrictions.

Sure, because it's normally possible to do with the "No<Lookahead>" doubled productions.

But they weren't replaced with "No" productions (or grammatical parameters) either! All the lookahead-restrictions in ES5 are still lookahead-restrictions in the current draft.

So this actually answers your initial question (it's a real practice).

No, I don't think it does.

            2. Another approach is to use Cover grammar.

    Usually it's a manual process of designing a grammar, and defining
    post-processing static semantics.

Okay, I'm not really interested then. I'm looking for an algorithm
(i.e., a modification of the LR algorithm) that starts with a
grammar-with-lookahead-restrictions and ends with an LR parser.
OK, yeah, at this point I think we already clarified that it is possible to parse ES grammar, and handle it's lookahead restrictions with several techniques described.

Well, ES implementations exist, so clearly there are techniques for handling those aspects of the ES grammar. But that doesn't mean that any of them are parsing according to the ES grammar. E.g., it's unlikely that they're finding exactly the parse tree indicated by the ES grammar. (Nor is there any requirement that they do so.)

By way of contrast, consider "opt" subscripts and grammatical parameters. If you look up the LR parser-construction algorithm, it doesn't talk about how to handle such things, but the spec says how to transform them into plain context-free productions, which the LR algorithm can handle. So it's an easy shorthand to say that the ES grammar is LR(1) iff the 'expanded' grammar is LR(1).

But the spec doesn't give that kind of CFG-equivalence for lookahead-restrictions. Until there's agreement about what they mean in a CFG or LR sense, it isn't really meaningful to ask if the ES grammar is LR(1).

On 17-01-30 06:20 PM, Dmitry Soshnikov wrote:
> On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org
> <mailto:jmdyck at ibiblio.org>> wrote:
>
>
>                 1. Using "No<Lookahead>" productions.
>
>             The problem with this approach is that, in general, a
>             lookahead-restriction is not a restriction on the nonterminal that
>             immediately follows it in the production, it's a restriction on
>             the next few input elements, which might be generated by more
>             than just that nonterminal.
>
>         Well, in this case it's not a "lookahead" anymore.
>
>     It isn't a "lookahead" in the resulting grammar, but it is in the
>     original, and the two grammars should define the same language. What I'm
>     saying is that it's incorrect to assume that the effect of the lookahead
>     (in the original) can be confined to the nonterminal that immediately
>     follows, which is what the suggested transformation does.
>
>
> A "lookahead" is a terminal token. A non-terminal is not a "lookahead".

Okay, so then in my previous statement, read "lookahead" as 
"lookahead-restriction". (But I did say "lookahead-restriction" in the 
statement before that.)

> To reiterate, imaging you have:
>
> ```
> • (x, y)
> ```
>
> and
>
> ```
> • (x, y) -> { return x + y; }
> ```
>
> where `•` is a cursor position. Your lookaheads are "(", "x", "," "y", etc.
> Usually 1 lookahead token is enough to route the parser, in this case "(".
>
> What you're talking that you need not these lookaheads (at the beginning of
> the cursor), but *after* the parsed non-terminal (i.e. after fully parsed
> `(x, y)`), i.e. you're looking for "->", which goes at the end of the
> non-terminal.

I'm not saying I don't need the lookahead tokens right after the cursor. But 
I am saying that the lookahead tokens that the hypothetical parser needs to 
examine (in order to enforce a lookahead-restriction), even if it's only 1 
or 2, might extend past the 'next' nonterminal.

But that's an awkward way to express my objection.

To recap, I'm talking about the suggestion to replace:
     [lookahead != <some sequence of terminals>] Nonterminal
in a production in the original grammar with
     Nonterminal-that-does-not-start-with-that-sequence
(and a bunch of new productions) in the transformed grammar, and whether 
that's a valid transformation.

I'm saying it isn't valid when the Nonterminal can derive phrases with fewer 
terminals than the prohibited lookahead-sequence. (E.g., if Nonterminal is 
nullable, or if it can derive a phrase of length 1 when the prohibited 
lookahead-sequence is length 2.)

> This is not possible, since it's not a lookahead at that
> specific cursor position. It will be a lookahead after you have parsed the
> non-terminal corresponding to "(x, y)".
>
> So you can't have two non-terminals for this, since it'll be a
> "reduce/reduce" conflict, and you need a cover grammar here.

Well, I'm saying you'd run into problems before you even tried to construct 
the LR automaton, but I think you're agreeing with me that this approach is 
not generally applicable?

>         In this case a cover grammar is simpler, and that's why it's used
>         starting since ES6 spec.
>
>     But note that none of the ES5 lookahead-restrictions was replaced with a
>     cover grammar in ES6. They're all still lookahead-restrictions.
>
> Sure, because it's normally possible to do with the "No<Lookahead>" doubled
> productions.

But they weren't replaced with "No" productions (or grammatical parameters) 
either! All the lookahead-restrictions in ES5 are still 
lookahead-restrictions in the current draft.

> So this actually answers your initial question (it's a real practice).

No, I don't think it does.

----------------------------

>                 2. Another approach is to use Cover grammar.
>
>         Usually it's a manual process of designing a grammar, and defining
>         post-processing static semantics.
>
>     Okay, I'm not really interested then. I'm looking for an algorithm
>     (i.e., a modification of the LR algorithm) that starts with a
>     grammar-with-lookahead-restrictions and ends with an LR parser.
>
> OK, yeah, at this point I think we already clarified that it is possible to
> parse ES grammar,  and handle it's lookahead restrictions with several
> techniques described.

Well, ES implementations exist, so clearly there are techniques for handling 
those aspects of the ES grammar. But that doesn't mean that any of them are 
parsing *according to* the ES grammar. E.g., it's unlikely that they're 
finding exactly the parse tree indicated by the ES grammar. (Nor is there 
any requirement that they do so.)

----

By way of contrast, consider "opt" subscripts and grammatical parameters. If 
you look up the LR parser-construction algorithm, it doesn't talk about how 
to handle such things, but the spec says how to transform them into plain 
context-free productions, which the LR algorithm can handle. So it's an easy 
shorthand to say that the ES grammar is LR(1) iff the 'expanded' grammar is 
LR(1).

But the spec doesn't give that kind of CFG-equivalence for 
lookahead-restrictions. Until there's agreement about what they mean in a 
CFG or LR sense, it isn't really meaningful to ask if the ES grammar is LR(1).

-Michael

# Dmitry Soshnikov (9 years ago)

The spec is a formal language definition (including for syntactic grammar). It doesn't talk about specifics of implementation much, nor it's supposed to do so (so demanding this from spec isn't appropriate).

It doesn't tell one to implement an LR(1), neither it describes the techniques how one going to do so (even though as was confirmed in this thread it's LR(1)-parsable). It just uses some formal notation for this, e.g. lookahead ∉ {{, function}. Specific techniques how exactly you going to achieve this, is not the purpose of the specification.

(but it's possible to clarify such techniques on discussions which we have now -- please note though, the techniques described above are not an "assumption", or a "guess", they are actual techniques used on practice to achieve this in LR(1)).

The spec in definitions is not consistent though. In one place it uses explicit technique for lookahead restriction via the "No<lookahead>"

productions. In some cases it just uses formal description as lookahead ∉ {{, function}, leaving a specific technique to achieve this up to implementations.

In fact (for consistency), the rule from ES5:

for ( ExpressionNoIn ; Expression ; Expression ) Statement

could be described as:

for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement

and would mean the same.

You may search for ES5 LR(1) construction (there are several over internets), they use the described above techniques for lookahead ∉ {{, function} (adding NoCurlyBrace, or NoFunction productions). That's also said, it could be rewritten using Cover grammar which adds though post-processing overhead -- that might be the reason why some of the No<lookahead> productions were removed from ES6 spec, but lookahead ∉ {{, function} is still there. More complex cases like Arrow functions params is still implemented as a Cover grammar.

Dmitry

The spec is a formal language definition (including for syntactic grammar).
It doesn't talk about specifics of implementation much, nor it's supposed
to do so (so demanding this from spec isn't appropriate).

It doesn't tell one to implement an LR(1), neither it describes the
techniques *how* one going to do so (even though as was confirmed in this
thread it's LR(1)-parsable). It just uses some formal notation for this,
e.g. `lookahead ∉ {{, function}`. Specific techniques *how exactly* you
going to achieve this, is not the purpose of the specification.

(but it's possible to clarify such techniques on discussions which we have
now -- please note though, the techniques described above are not an
"assumption", or a "guess", they are actual techniques used on practice to
achieve this in LR(1)).

The spec in definitions is not consistent though. In one place it uses
explicit *technique* for lookahead restriction via the "No<lookahead>"
productions. In some cases it just uses formal description as  `lookahead ∉
{{, function}`, leaving a specific technique to achieve this up to
implementations.

In fact (for consistency), the rule from ES5:

```
for ( ExpressionNoIn ; Expression ; Expression ) Statement
```

could be described as:

```
for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
```

and would mean the same.

You may search for ES5 LR(1) construction (there are several over
internets), they use the described above techniques for `lookahead ∉ {{,
function}` (adding `NoCurlyBrace`, or `NoFunction` productions). That's
also said, it could be rewritten using Cover grammar which adds though
post-processing overhead -- that might be the reason why some of the
`No<lookahead>` productions were removed from ES6 spec, but `lookahead ∉
{{, function}` is still there. More complex cases like Arrow functions
params is still implemented as a Cover grammar.

Dmitry


On Tue, Jan 31, 2017 at 7:56 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-30 06:20 PM, Dmitry Soshnikov wrote:
>
>> On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org
>> <mailto:jmdyck at ibiblio.org>> wrote:
>>
>>
>>                 1. Using "No<Lookahead>" productions.
>>
>>             The problem with this approach is that, in general, a
>>             lookahead-restriction is not a restriction on the nonterminal
>> that
>>             immediately follows it in the production, it's a restriction
>> on
>>             the next few input elements, which might be generated by more
>>             than just that nonterminal.
>>
>>         Well, in this case it's not a "lookahead" anymore.
>>
>>     It isn't a "lookahead" in the resulting grammar, but it is in the
>>     original, and the two grammars should define the same language. What
>> I'm
>>     saying is that it's incorrect to assume that the effect of the
>> lookahead
>>     (in the original) can be confined to the nonterminal that immediately
>>     follows, which is what the suggested transformation does.
>>
>>
>> A "lookahead" is a terminal token. A non-terminal is not a "lookahead".
>>
>
> Okay, so then in my previous statement, read "lookahead" as
> "lookahead-restriction". (But I did say "lookahead-restriction" in the
> statement before that.)
>
>
> To reiterate, imaging you have:
>>
>> ```
>> • (x, y)
>> ```
>>
>> and
>>
>> ```
>> • (x, y) -> { return x + y; }
>> ```
>>
>> where `•` is a cursor position. Your lookaheads are "(", "x", "," "y",
>> etc.
>> Usually 1 lookahead token is enough to route the parser, in this case "(".
>>
>> What you're talking that you need not these lookaheads (at the beginning
>> of
>> the cursor), but *after* the parsed non-terminal (i.e. after fully parsed
>> `(x, y)`), i.e. you're looking for "->", which goes at the end of the
>> non-terminal.
>>
>
> I'm not saying I don't need the lookahead tokens right after the cursor.
> But I am saying that the lookahead tokens that the hypothetical parser
> needs to examine (in order to enforce a lookahead-restriction), even if
> it's only 1 or 2, might extend past the 'next' nonterminal.
>
> But that's an awkward way to express my objection.
>
> To recap, I'm talking about the suggestion to replace:
>     [lookahead != <some sequence of terminals>] Nonterminal
> in a production in the original grammar with
>     Nonterminal-that-does-not-start-with-that-sequence
> (and a bunch of new productions) in the transformed grammar, and whether
> that's a valid transformation.
>
> I'm saying it isn't valid when the Nonterminal can derive phrases with
> fewer terminals than the prohibited lookahead-sequence. (E.g., if
> Nonterminal is nullable, or if it can derive a phrase of length 1 when the
> prohibited lookahead-sequence is length 2.)
>
> This is not possible, since it's not a lookahead at that
>> specific cursor position. It will be a lookahead after you have parsed the
>> non-terminal corresponding to "(x, y)".
>>
>> So you can't have two non-terminals for this, since it'll be a
>> "reduce/reduce" conflict, and you need a cover grammar here.
>>
>
> Well, I'm saying you'd run into problems before you even tried to
> construct the LR automaton, but I think you're agreeing with me that this
> approach is not generally applicable?
>
>
>         In this case a cover grammar is simpler, and that's why it's used
>>         starting since ES6 spec.
>>
>>     But note that none of the ES5 lookahead-restrictions was replaced
>> with a
>>     cover grammar in ES6. They're all still lookahead-restrictions.
>>
>> Sure, because it's normally possible to do with the "No<Lookahead>"
>> doubled
>> productions.
>>
>
> But they weren't replaced with "No" productions (or grammatical
> parameters) either! All the lookahead-restrictions in ES5 are still
> lookahead-restrictions in the current draft.
>
> So this actually answers your initial question (it's a real practice).
>>
>
> No, I don't think it does.
>
> ----------------------------
>
>                 2. Another approach is to use Cover grammar.
>>
>>         Usually it's a manual process of designing a grammar, and defining
>>         post-processing static semantics.
>>
>>     Okay, I'm not really interested then. I'm looking for an algorithm
>>     (i.e., a modification of the LR algorithm) that starts with a
>>     grammar-with-lookahead-restrictions and ends with an LR parser.
>>
>> OK, yeah, at this point I think we already clarified that it is possible
>> to
>> parse ES grammar,  and handle it's lookahead restrictions with several
>> techniques described.
>>
>
> Well, ES implementations exist, so clearly there are techniques for
> handling those aspects of the ES grammar. But that doesn't mean that any of
> them are parsing *according to* the ES grammar. E.g., it's unlikely that
> they're finding exactly the parse tree indicated by the ES grammar. (Nor is
> there any requirement that they do so.)
>
> ----
>
> By way of contrast, consider "opt" subscripts and grammatical parameters.
> If you look up the LR parser-construction algorithm, it doesn't talk about
> how to handle such things, but the spec says how to transform them into
> plain context-free productions, which the LR algorithm can handle. So it's
> an easy shorthand to say that the ES grammar is LR(1) iff the 'expanded'
> grammar is LR(1).
>
> But the spec doesn't give that kind of CFG-equivalence for
> lookahead-restrictions. Until there's agreement about what they mean in a
> CFG or LR sense, it isn't really meaningful to ask if the ES grammar is
> LR(1).
>
> -Michael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170131/66e42705/attachment-0001.html>

# Michael Dyck (9 years ago)

On 17-01-31 02:17 PM, Dmitry Soshnikov wrote:

The spec is a formal language definition (including for syntactic grammar).

(Well, the extent to which the spec is formal is debatable. The grammar is the most formal part of it, but that's true of most programming language definitions.)

It doesn't talk about specifics of implementation much, nor it's supposed to do so (so demanding this from spec isn't appropriate).

Right (and I'm pretty sure I didn't make such a demand).

It doesn't tell one to implement an LR(1),

Nor should it.

neither it describes the techniques how one going to do so (even though as was confirmed in this thread it's LR(1)-parsable). It just uses some formal notation for this, e.g. lookahead ∉ {{, function}. Specific techniques how exactly you going to achieve this, is not the purpose of the specification.

Right.

(but it's possible to clarify such techniques on discussions which we have now -- please note though, the techniques described above are not an "assumption", or a "guess", they are actual techniques used on practice to achieve this in LR(1)).

I'm not disputing the actualness of the techniques. I'm saying:

technique #1 (duplicated productions): As stated, looks insufficient for the ES grammar. Which may just mean that there's a more complicated version of the approach that is sufficient for ES.
technique #2 (cover grammar): Can you algorithmically (a) construct the cover grammar and (b) recover the parse according to the original grammar?
technique #3 (parse-time actions): In the LR automaton, can you algorithmically distinguish action-avoided conflicts from 'true' conflicts?

The spec in definitions is not consistent though. In one place it uses explicit technique for lookahead restriction via the "No<lookahead>" productions. In some cases it just uses formal description as lookahead ∉ {{, function}, leaving a specific technique to achieve this up to implementations.

In fact (for consistency), the rule from ES5:
for ( ExpressionNoIn ; Expression ; Expression ) Statement
could be described as:
for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
and would mean the same.

No, they wouldn't. ExpressionNoIn (or Expression[~In]) is a version of Expression that doesn't ("openly") use the 'in' operator (at the RelationalExpression level), i.e. it doesn't have the 'in' operator somewhere in the middle, whereas [lookahead ∉ {in}] Expression prohibits an Expression that starts with the token 'in'. You can't express the 'NoIn' restriction with any finite amount of lookahead, because a RelationalExpression can be arbitrarily large.

You may search for ES5 LR(1) construction (there are several over internets), they use the described above techniques for lookahead ∉ {{, function} (adding NoCurlyBrace, or NoFunction productions).

But the chances are low that they've been algorithmically generated from the ES spec. And while it's conceivable I could figure out how to do it algorithmically by examining examples of it done manually, I'm doubtful of that too.

That's also said, it could be rewritten using Cover grammar which adds though post-processing overhead -- that might be the reason why some of the No<lookahead> productions were removed from ES6 spec,

What removed productions are you thinking of? ES5's "NoIn" productions became ES6+'s "[In]" parameter (which is basically the same thing with a more compact notation). And ES5's "NoDash" productions are still in the current draft.

On 17-01-31 02:17 PM, Dmitry Soshnikov wrote:
> The spec is a formal language definition (including for syntactic grammar).

(Well, the extent to which the spec is formal is debatable. The grammar is 
the most formal part of it, but that's true of most programming language 
definitions.)

> It doesn't talk about specifics of implementation much, nor it's supposed to
> do so (so demanding this from spec isn't appropriate).

Right (and I'm pretty sure I didn't make such a demand).

> It doesn't tell one to implement an LR(1),

Nor should it.

> neither it describes the
> techniques *how* one going to do so (even though as was confirmed in this
> thread it's LR(1)-parsable). It just uses some formal notation for this,
> e.g. `lookahead ∉ {{, function}`. Specific techniques *how exactly* you
> going to achieve this, is not the purpose of the specification.

Right.

> (but it's possible to clarify such techniques on discussions which we have
> now -- please note though, the techniques described above are not an
> "assumption", or a "guess", they are actual techniques used on practice to
> achieve this in LR(1)).

I'm not disputing the actualness of the techniques. I'm saying:

  - technique #1 (duplicated productions): As stated, looks insufficient for 
the ES grammar. Which may just mean that there's a more complicated version 
of the approach that *is* sufficient for ES.

  - technique #2 (cover grammar): Can you algorithmically (a) construct the 
cover grammar and (b) recover the parse according to the original grammar?

  - technique #3 (parse-time actions): In the LR automaton, can you 
algorithmically distinguish action-avoided conflicts from 'true' conflicts?


> The spec in definitions is not consistent though. In one place it uses
> explicit *technique* for lookahead restriction via the "No<lookahead>"
> productions. In some cases it just uses formal description as  `lookahead ∉
> {{, function}`, leaving a specific technique to achieve this up to
> implementations.
>
> In fact (for consistency), the rule from ES5:
>
> ```
> for ( ExpressionNoIn ; Expression ; Expression ) Statement
> ```
>
> could be described as:
>
> ```
> for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
> ```
>
> and would mean the same.

No, they wouldn't. ExpressionNoIn (or Expression[~In]) is a version of 
Expression that doesn't ("openly") use the 'in' operator (at the 
RelationalExpression level), i.e. it doesn't have the 'in' operator 
somewhere in the *middle*, whereas
    [lookahead ∉ {in}] Expression
prohibits an Expression that *starts* with the token 'in'. You can't express 
the 'NoIn' restriction with any finite amount of lookahead, because a 
RelationalExpression can be arbitrarily large.


> You may search for ES5 LR(1) construction (there are several over
> internets), they use the described above techniques for `lookahead ∉ {{,
> function}` (adding `NoCurlyBrace`, or `NoFunction` productions).

But the chances are low that they've been algorithmically generated from the 
ES spec. And while it's conceivable I could figure out how to do it 
algorithmically by examining examples of it done manually, I'm doubtful of 
that too.

> That's also said, it could be rewritten using Cover grammar which adds
> though post-processing overhead -- that might be the reason why some of
> the `No<lookahead>` productions were removed from ES6 spec,

What removed productions are you thinking of? ES5's "NoIn" productions 
became ES6+'s "[In]" parameter (which is basically the same thing with a 
more compact notation). And ES5's "NoDash" productions are still in the 
current draft.

-Michael

# Dmitry Soshnikov (9 years ago)

On Wed, Feb 1, 2017 at 7:37 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

On 17-01-31 02:17 PM, Dmitry Soshnikov wrote:

The spec is a formal language definition (including for syntactic grammar).

(Well, the extent to which the spec is formal is debatable. The grammar is the most formal part of it, but that's true of most programming language definitions.)

It doesn't talk about specifics of implementation much, nor it's supposed

to do so (so demanding this from spec isn't appropriate).

Right (and I'm pretty sure I didn't make such a demand).

It doesn't tell one to implement an LR(1),

Nor should it.

neither it describes the

techniques how one going to do so (even though as was confirmed in this thread it's LR(1)-parsable). It just uses some formal notation for this, e.g. lookahead ∉ {{, function}. Specific techniques how exactly you going to achieve this, is not the purpose of the specification.

Right.

OK, I mostly meant that a spec can be fully abstract, even in abstract definitions themselves, and use different notations instead of providing fully "CFG-equivalence for lookahead restrictions".

(but it's possible to clarify such techniques on discussions which we have

now -- please note though, the techniques described above are not an "assumption", or a "guess", they are actual techniques used on practice to achieve this in LR(1)).

I'm not disputing the actualness of the techniques. I'm saying:

technique #1 (duplicated productions): As stated, looks insufficient for the ES grammar. Which may just mean that there's a more complicated version of the approach that is sufficient for ES.

Here's an example: raw.githubusercontent.com/zaach/jison/master/examples/jscore.jison (see how lookahead restriction for { in ExpressionStatement is implemented using NoBF productions).

technique #2 (cover grammar): Can you algorithmically (a) construct the cover grammar and (b) recover the parse according to the original grammar?

And this approach I chose (instead of duplicating productions) to handle the same { lookahead restriction, what this comment explains: DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L206-L222

As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).

technique #3 (parse-time actions): In the LR automaton, can you algorithmically distinguish action-avoided conflicts from 'true' conflicts?

The spec in definitions is not consistent though. In one place it uses
explicit technique for lookahead restriction via the "No<lookahead>" productions. In some cases it just uses formal description as lookahead ∉ {{, function}, leaving a specific technique to achieve this up to implementations.

In fact (for consistency), the rule from ES5:
for ( ExpressionNoIn ; Expression ; Expression ) Statement
could be described as:
for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
and would mean the same.
No, they wouldn't. ExpressionNoIn (or Expression[~In]) is a version of Expression that doesn't ("openly") use the 'in' operator (at the RelationalExpression level), i.e. it doesn't have the 'in' operator somewhere in the middle, whereas [lookahead ∉ {in}] Expression prohibits an Expression that starts with the token 'in'. You can't express the 'NoIn' restriction with any finite amount of lookahead, because a RelationalExpression can be arbitrarily large.

I agree, the example not the best, I was leaning to, that a spec could use even more abstract definition for this instead of explicit productions.

You may search for ES5 LR(1) construction (there are several over

internets), they use the described above techniques for lookahead ∉ {{, function} (adding NoCurlyBrace, or NoFunction productions).

But the chances are low that they've been algorithmically generated from the ES spec. And while it's conceivable I could figure out how to do it algorithmically by examining examples of it done manually, I'm doubtful of that too.

I'm not sure about "algorithmic" transformations, but yeah, how the grammar is described in the spec, and which extra transformation you have to do in the resulting LR(1), might be different things. Does it mean the spec doesn't define an LR(1) grammar? Perhaps, but it doesn't really matter, since again the spec uses formal notation, and for shortness may omit actual techniques.

That's also said, it could be rewritten using Cover grammar which adds

though post-processing overhead -- that might be the reason why some of the No<lookahead> productions were removed from ES6 spec,

Yeah, seems most of them were replaced with inverted notation (using [In] suffix) -- still leaving the interpretation of this abstract notation up to implementations.

P.S:

All this reduces to that practically today it might be easier to do just a manual recursive decent/accent. Yes, it's a lot of recursion, yes, it might be slower than table-driven parser. But gives you much control, and avoid all those described techniques (hence a manual recursive descent is chosen on practice by most of the implementations today). Or, the syntax of a language could be different itself, and avoid ambiguities.

Dmitry

On Wed, Feb 1, 2017 at 7:37 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-31 02:17 PM, Dmitry Soshnikov wrote:
>
>> The spec is a formal language definition (including for syntactic
>> grammar).
>>
>
> (Well, the extent to which the spec is formal is debatable. The grammar is
> the most formal part of it, but that's true of most programming language
> definitions.)
>
> It doesn't talk about specifics of implementation much, nor it's supposed
>> to
>> do so (so demanding this from spec isn't appropriate).
>>
>
> Right (and I'm pretty sure I didn't make such a demand).
>
> It doesn't tell one to implement an LR(1),
>>
>
> Nor should it.
>
> neither it describes the
>> techniques *how* one going to do so (even though as was confirmed in this
>> thread it's LR(1)-parsable). It just uses some formal notation for this,
>> e.g. `lookahead ∉ {{, function}`. Specific techniques *how exactly* you
>> going to achieve this, is not the purpose of the specification.
>>
>
> Right.



OK, I mostly meant that a spec can be fully abstract, even in abstract
definitions themselves, and use different notations instead of providing
fully "CFG-equivalence for lookahead restrictions".


>
>
> (but it's possible to clarify such techniques on discussions which we have
>> now -- please note though, the techniques described above are not an
>> "assumption", or a "guess", they are actual techniques used on practice to
>> achieve this in LR(1)).
>>
>
> I'm not disputing the actualness of the techniques. I'm saying:
>
>  - technique #1 (duplicated productions): As stated, looks insufficient
> for the ES grammar. Which may just mean that there's a more complicated
> version of the approach that *is* sufficient for ES.
>


Here's an example:
https://raw.githubusercontent.com/zaach/jison/master/examples/jscore.jison
(see how lookahead restriction for `{` in `ExpressionStatement` is
implemented using `NoBF` productions).



>
>  - technique #2 (cover grammar): Can you algorithmically (a) construct the
> cover grammar and (b) recover the parse according to the original grammar?
>

And this approach I chose (instead of duplicating productions) to handle
the same `{` lookahead restriction, what this comment explains:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L206-L222

As mentioned, Cover grammar is usually the process of the grammar design
itself (as in ES6 spec itself). I'm not aware of automatic transformations
for this (if you find any please let me know).



>
>  - technique #3 (parse-time actions): In the LR automaton, can you
> algorithmically distinguish action-avoided conflicts from 'true' conflicts?
>
>
> The spec in definitions is not consistent though. In one place it uses
>> explicit *technique* for lookahead restriction via the "No<lookahead>"
>> productions. In some cases it just uses formal description as  `lookahead
>> ∉
>> {{, function}`, leaving a specific technique to achieve this up to
>> implementations.
>>
>> In fact (for consistency), the rule from ES5:
>>
>> ```
>> for ( ExpressionNoIn ; Expression ; Expression ) Statement
>> ```
>>
>> could be described as:
>>
>> ```
>> for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
>> ```
>>
>> and would mean the same.
>>
>
> No, they wouldn't. ExpressionNoIn (or Expression[~In]) is a version of
> Expression that doesn't ("openly") use the 'in' operator (at the
> RelationalExpression level), i.e. it doesn't have the 'in' operator
> somewhere in the *middle*, whereas
>    [lookahead ∉ {in}] Expression
> prohibits an Expression that *starts* with the token 'in'. You can't
> express the 'NoIn' restriction with any finite amount of lookahead, because
> a RelationalExpression can be arbitrarily large.
>
>
I agree, the example not the best, I was leaning to, that a spec could use
even more abstract definition for this instead of explicit productions.



>
> You may search for ES5 LR(1) construction (there are several over
>> internets), they use the described above techniques for `lookahead ∉ {{,
>> function}` (adding `NoCurlyBrace`, or `NoFunction` productions).
>>
>
> But the chances are low that they've been algorithmically generated from
> the ES spec. And while it's conceivable I could figure out how to do it
> algorithmically by examining examples of it done manually, I'm doubtful of
> that too.
>
>
I'm not sure about "algorithmic" transformations, but yeah, how the grammar
is described in the spec, and which extra transformation you have to do in
the resulting LR(1), might be different things. Does it mean the spec
doesn't define an LR(1) grammar? Perhaps, but it doesn't really matter,
since again the spec uses formal notation, and for shortness may omit
actual techniques.



> That's also said, it could be rewritten using Cover grammar which adds
>> though post-processing overhead -- that might be the reason why some of
>> the `No<lookahead>` productions were removed from ES6 spec,
>>
>
>
Yeah, seems most of them were replaced with inverted notation (using [In]
suffix) -- still leaving the interpretation of this abstract notation up to
implementations.

P.S:

All this reduces to that practically today it might be easier to do just a
manual recursive decent/accent. Yes, it's a lot of recursion, yes, it might
be slower than table-driven parser. But gives you much control, and avoid
all those described techniques (hence a manual recursive descent is chosen
on practice by most of the implementations today). Or, the syntax of a
language could be different itself, and avoid ambiguities.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170201/6433541e/attachment-0001.html>

# Waldemar Horwat (8 years ago)

On 02/01/2017 10:25, Dmitry Soshnikov wrote:

As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).

Cover grammars are an ugly hack that we had to add when there was no other practical choice. Avoid them as much as possible. They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.

When designing the grammar, the preferences are:

Use standard LR(1) productions
Use parametrized productions
Use lookahead restrictions if parametrized productions would introduce too many parameters into rules
Use a cover grammar if the grammar can't be reasonably expressed in any other way. They're a last resort with lots of problems.

Lookahead restrictions fit very well into an LR(1) engine as long as they're limited to only one token, and that's what I've implemented in the validator. You need to be very careful with them if looking more than one token ahead because lexing of the tokens can vary based on context. For example, if the next few characters in front of the cursor are )/abc/i+, then what is the second token? What is the third token? It's actually context-dependent.

The same problem is even worse for cover grammars.

 Waldemar

On 02/01/2017 10:25, Dmitry Soshnikov wrote:
> As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).

Cover grammars are an ugly hack that we had to add when there was no other practical choice.  Avoid them as much as possible.  They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.

When designing the grammar, the preferences are:

- Use standard LR(1) productions
- Use parametrized productions
- Use lookahead restrictions if parametrized productions would introduce too many parameters into rules
- Use a cover grammar if the grammar can't be reasonably expressed in any other way.  They're a last resort with lots of problems.

Lookahead restrictions fit very well into an LR(1) engine as long as they're limited to only one token, and that's what I've implemented in the validator.  You need to be very careful with them if looking more than one token ahead because lexing of the tokens can vary based on context.  For example, if the next few characters in front of the cursor are )/abc/i+, then what is the second token?  What is the third token?  It's actually context-dependent.

The same problem is even worse for cover grammars.

     Waldemar

# Dmitry Soshnikov (8 years ago)

On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com> wrote:

On 02/01/2017 10:25, Dmitry Soshnikov wrote:

As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).

Cover grammars are an ugly hack that we had to add when there was no other practical choice. Avoid them as much as possible. They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.

Yeah, absolutely agreed. The reason why I used cover grammar in that example to parse { as a Block vs. ObjectLiteral, and handle ExpressionStatement is to make the resulting grammar short, and don't introduce a bunch of NoCurly, etc extra productions for this. That's also said, this ugly hack also forces you to do post-processing overhead -- either validation of the nodes, or even re-interpretation (rewriting) to other nodes -- in my case I have to actually check that all nodes between { and } are properties, or vice-versa, statements, based on the expression/statement position.

Practically, a cover grammar still can be slower because of such post-processing overhead (imaging you have a giant file with just object -- all those now are needed to be validated to contain property nodes). OTOH, a trade off with lookahead restrictions is introducing ~1.5x more parsing states.

When designing the grammar, the preferences are:

Use standard LR(1) productions

Use parametrized productions

Use lookahead restrictions if parametrized productions would introduce too many parameters into rules

Use a cover grammar if the grammar can't be reasonably expressed in any other way. They're a last resort with lots of problems.

Just to double-check, by "parametrized productions" you mean the No<lookahead> extra productions?

Lookahead restrictions fit very well into an LR(1) engine as long as they're limited to only one token, and that's what I've implemented in the validator. You need to be very careful with them if looking more than one token ahead because lexing of the tokens can vary based on context. For example, if the next few characters in front of the cursor are )/abc/i+, then what is the second token? What is the third token? It's actually context-dependent.

The same problem is even worse for cover grammars.

Yeah, that's also true. Thanks again for the confirmation, that such a validator even exists, it's good -- at least we know it's parsable in principle. Those, that's said, on real practice now implementing a manual recursive descent for ECMAScript is more approachable now.

Dmitry

On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com> wrote:

> On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>
>> As mentioned, Cover grammar is usually the process of the grammar design
>> itself (as in ES6 spec itself). I'm not aware of automatic transformations
>> for this (if you find any please let me know).
>>
>
> Cover grammars are an ugly hack that we had to add when there was no other
> practical choice.  Avoid them as much as possible.  They are only used in
> situations where lookahead restrictions and parametrized grammar rules do
> not work in any practical way.
>
>
Yeah, absolutely agreed. The reason why I used cover grammar in that
example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle
`ExpressionStatement` is to make the resulting grammar short, and don't
introduce a bunch of `NoCurly`, etc extra productions for this. That's also
said, this ugly hack also forces you to do post-processing overhead --
either validation of the nodes, or even re-interpretation (rewriting) to
other nodes -- in my case I have to actually check that all nodes between
`{` and `}` are properties, or vice-versa, statements, based on the
expression/statement position.

Practically, a cover grammar still can be slower because of such
post-processing overhead (imaging you have a giant file with just object --
all those now are needed to be validated to contain property nodes). OTOH,
a trade off with lookahead restrictions is introducing ~1.5x more parsing
states.


> When designing the grammar, the preferences are:
>
> - Use standard LR(1) productions
> - Use parametrized productions
> - Use lookahead restrictions if parametrized productions would introduce
> too many parameters into rules
> - Use a cover grammar if the grammar can't be reasonably expressed in any
> other way.  They're a last resort with lots of problems.
>
>
Just to double-check, by "parametrized productions" you mean the
`No<lookahead>` extra productions?


> Lookahead restrictions fit very well into an LR(1) engine as long as
> they're limited to only one token, and that's what I've implemented in the
> validator.  You need to be very careful with them if looking more than one
> token ahead because lexing of the tokens can vary based on context.  For
> example, if the next few characters in front of the cursor are )/abc/i+,
> then what is the second token?  What is the third token?  It's actually
> context-dependent.
>
> The same problem is even worse for cover grammars.
>
>
Yeah, that's also true. Thanks again for the confirmation, that such a
validator even exists, it's good -- at least we know it's parsable in
principle. Those, that's said, on real practice now implementing a manual
recursive descent for ECMAScript is more approachable now.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170202/61ff3cc3/attachment-0001.html>

# Dmitry Soshnikov (8 years ago)

On Thu, Feb 2, 2017 at 3:56 PM, Dmitry Soshnikov <dmitry.soshnikov at gmail.com

wrote:

On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com> wrote:

On 02/01/2017 10:25, Dmitry Soshnikov wrote:

As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).

Cover grammars are an ugly hack that we had to add when there was no other practical choice. Avoid them as much as possible. They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.

Yeah, absolutely agreed. The reason why I used cover grammar in that example to parse { as a Block vs. ObjectLiteral, and handle ExpressionStatement is to make the resulting grammar short, and don't introduce a bunch of NoCurly, etc extra productions for this. That's also said, this ugly hack also forces you to do post-processing overhead -- either validation of the nodes, or even re-interpretation (rewriting) to other nodes -- in my case I have to actually check that all nodes between { and } are properties, or vice-versa, statements, based on the expression/statement position.

Practically, a cover grammar still can be slower because of such post-processing overhead (imaging you have a giant file with just object -- all those now are needed to be validated to contain property nodes). OTOH, a trade off with lookahead restrictions is introducing ~1.5x more parsing states.

When designing the grammar, the preferences are:

Use standard LR(1) productions

Use parametrized productions

Use lookahead restrictions if parametrized productions would introduce too many parameters into rules

Use a cover grammar if the grammar can't be reasonably expressed in any other way. They're a last resort with lots of problems.

Just to double-check, by "parametrized productions" you mean the No<lookahead> extra productions?

Or you actually mean "lookahead-parametrized" productions (like in SLR(1) which doesn't use it, vs. LALR(1), or CLR(1) which do use it)?

Dmitry

On Thu, Feb 2, 2017 at 3:56 PM, Dmitry Soshnikov <dmitry.soshnikov at gmail.com
> wrote:

> On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com>
> wrote:
>
>> On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>>
>>> As mentioned, Cover grammar is usually the process of the grammar design
>>> itself (as in ES6 spec itself). I'm not aware of automatic transformations
>>> for this (if you find any please let me know).
>>>
>>
>> Cover grammars are an ugly hack that we had to add when there was no
>> other practical choice.  Avoid them as much as possible.  They are only
>> used in situations where lookahead restrictions and parametrized grammar
>> rules do not work in any practical way.
>>
>>
> Yeah, absolutely agreed. The reason why I used cover grammar in that
> example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle
> `ExpressionStatement` is to make the resulting grammar short, and don't
> introduce a bunch of `NoCurly`, etc extra productions for this. That's also
> said, this ugly hack also forces you to do post-processing overhead --
> either validation of the nodes, or even re-interpretation (rewriting) to
> other nodes -- in my case I have to actually check that all nodes between
> `{` and `}` are properties, or vice-versa, statements, based on the
> expression/statement position.
>
> Practically, a cover grammar still can be slower because of such
> post-processing overhead (imaging you have a giant file with just object --
> all those now are needed to be validated to contain property nodes). OTOH,
> a trade off with lookahead restrictions is introducing ~1.5x more parsing
> states.
>
>
>> When designing the grammar, the preferences are:
>>
>> - Use standard LR(1) productions
>> - Use parametrized productions
>> - Use lookahead restrictions if parametrized productions would introduce
>> too many parameters into rules
>> - Use a cover grammar if the grammar can't be reasonably expressed in any
>> other way.  They're a last resort with lots of problems.
>>
>>
> Just to double-check, by "parametrized productions" you mean the
> `No<lookahead>` extra productions?
>
>

Or you actually mean "lookahead-parametrized" productions (like in SLR(1)
which doesn't use it, vs. LALR(1), or CLR(1) which do use it)?

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170202/f80b32d9/attachment.html>

# Michael Dyck (8 years ago)

On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

as long as they're limited to only one token, and that's what I've implemented in the validator.

So when the grammar has a prohibited lookahead-sequence with more than one token (in ExpressionStatement, IterationStatement, and ExportDeclaration), does the validator just use the first token?

You need to be very careful with them if looking more than one token ahead because lexing of the tokens can vary based on context. For example, if the next few characters in front of the cursor are )/abc/i+, then what is the second token? What is the third token? It's actually context-dependent.

But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right? The syntactic level can just assume a stream of (correctly lexed) input elements. Or do you validate deeper than the syntactic grammar?

(And it seems to me that multi-token lookahead-restrictions would be hard for a validator to handle even if lexing wasn't context-dependent, but maybe that depends on how you handle them.)

On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>
> Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the 
construction of the automaton, and/or modify the operation of the parser?

> as long as they're limited to only one token, and that's what I've
> implemented in the validator.

So when the grammar has a prohibited lookahead-sequence with more than one 
token (in ExpressionStatement, IterationStatement, and ExportDeclaration), 
does the validator just use the first token?

> You need to be very careful with them if looking more than one token
> ahead because lexing of the tokens can vary based on context. For
> example, if the next few characters in front of the cursor are )/abc/i+,
> then what is the second token? What is the third token? It's actually
> context-dependent.

But the context-dependentness of lexing is a parse-time problem, not a 
validation-time problem, right? The syntactic level can just assume a stream 
of (correctly lexed) input elements. Or do you validate deeper than the 
syntactic grammar?

(And it seems to me that multi-token lookahead-restrictions would be hard 
for a validator to handle even if lexing *wasn't* context-dependent, but 
maybe that depends on how you handle them.)

-Michael

# Michael Dyck (8 years ago)

On 17-02-02 06:56 PM, Dmitry Soshnikov wrote:

Just to double-check, by "parametrized productions" you mean the No<lookahead> extra productions?

See the 'Grammar Notation' section of ES6+, at the paragraph beginning "A production may be parameterized by a subscripted annotation".

They were introduced in draft 20 of ES6, and replaced the use of the "NoIn" productions of the previous draft (and ES3-5).

(which, strictly speaking, weren't "No<lookahead> productions", because they

didn't express a restriction on lookahead)

On 17-02-02 06:56 PM, Dmitry Soshnikov wrote:
>
> Just to double-check, by "parametrized productions" you mean the
> `No<lookahead>` extra productions?

See the 'Grammar Notation' section of ES6+, at the paragraph beginning
"A production may be parameterized by a subscripted annotation".

They were introduced in draft 20 of ES6, and *replaced* the use of the 
"NoIn" productions of the previous draft (and ES3-5).

(which, strictly speaking, weren't "No<lookahead> productions", because they 
didn't express a restriction on lookahead)

-Michael

# Waldemar Horwat (8 years ago)

On 02/02/2017 15:56, Dmitry Soshnikov wrote:

On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
On 02/01/2017 10:25, Dmitry Soshnikov wrote:

    As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).


Cover grammars are an ugly hack that we had to add when there was no other practical choice.  Avoid them as much as possible.  They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.
Yeah, absolutely agreed. The reason why I used cover grammar in that example to parse { as a Block vs. ObjectLiteral, and handle ExpressionStatement is to make the resulting grammar short, and don't introduce a bunch of NoCurly, etc extra productions for this. That's also said, this ugly hack also forces you to do post-processing overhead -- either validation of the nodes, or even re-interpretation (rewriting) to other nodes -- in my case I have to actually check that all nodes between { and } are properties, or vice-versa, statements, based on the expression/statement position.

Don't use a cover grammar to unify blocks with object literals. That's a really bad idea and you'll likely get it wrong. If the { starts a block, then if the next token is a keyword such as if then it's parsed as a keyword. If the { starts an object literal, then if the next token is a keyword then it's parsed as an identifiername. As we extend the language, the expansions of these can later diverge to the point where you won't know whether a / starts a regular expression or is a division symbol.

 Waldemar

On 02/02/2017 15:56, Dmitry Soshnikov wrote:
> On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
>
>     On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>
>         As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).
>
>
>     Cover grammars are an ugly hack that we had to add when there was no other practical choice.  Avoid them as much as possible.  They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.
>
>
> Yeah, absolutely agreed. The reason why I used cover grammar in that example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle `ExpressionStatement` is to make the resulting grammar short, and don't introduce a bunch of `NoCurly`, etc extra productions for this. That's also said, this ugly hack also forces you to do post-processing overhead -- either validation of the nodes, or even re-interpretation (rewriting) to other nodes -- in my case I have to actually check that all nodes between `{` and `}` are properties, or vice-versa, statements, based on the expression/statement position.

Don't use a cover grammar to unify blocks with object literals.  That's a really bad idea and you'll likely get it wrong.  If the `{` starts a block, then if the next token is a keyword such as `if` then it's parsed as a keyword.  If the `{` starts an object literal, then if the next token is a keyword then it's parsed as an identifiername.  As we extend the language, the expansions of these can later diverge to the point where you won't know whether a `/` starts a regular expression or is a division symbol.

     Waldemar

# Waldemar Horwat (8 years ago)

On 02/03/2017 08:17, Michael Dyck wrote:

On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T. The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

That's different from parametrized rules, which simply macro-expand into lots of individual rules.

as long as they're limited to only one token, and that's what I've implemented in the validator.

So when the grammar has a prohibited lookahead-sequence with more than one token (in ExpressionStatement, IterationStatement, and ExportDeclaration), does the validator just use the first token?

My LR(1) validator can't actually handle the case of multiple tokens of lookahead, and I didn't feel like doing an LR(2) grammar. I had to check these by hand.

You need to be very careful with them if looking more than one token ahead because lexing of the tokens can vary based on context. For example, if the next few characters in front of the cursor are )/abc/i+, then what is the second token? What is the third token? It's actually context-dependent.

But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right?

No.

The syntactic level can just assume a stream of (correctly lexed) input elements.

No! It's impossible to create a stream of correctly lexed input elements without doing syntactic level parsing. See the example I gave above.

Or do you validate deeper than the syntactic grammar?

Yes. The syntactic grammar controls the lexer. The validator looks for problems such as the syntactic grammar giving the lexer contradictory instructions. An example would be any syntactic grammar automaton state where one outgoing syntactic transition would swallow a regexp token and another outgoing syntactic transition from the same state would swallow a / token. If any such state exists, the grammar is broken.

On 02/03/2017 08:17, Michael Dyck wrote:
> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>
>> Lookahead restrictions fit very well into an LR(1) engine
>
> Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T.  The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

That's different from parametrized rules, which simply macro-expand into lots of individual rules.

>> as long as they're limited to only one token, and that's what I've
>> implemented in the validator.
>
> So when the grammar has a prohibited lookahead-sequence with more than one token (in ExpressionStatement, IterationStatement, and ExportDeclaration), does the validator just use the first token?

My LR(1) validator can't actually handle the case of multiple tokens of lookahead, and I didn't feel like doing an LR(2) grammar.  I had to check these by hand.

>> You need to be very careful with them if looking more than one token
>> ahead because lexing of the tokens can vary based on context. For
>> example, if the next few characters in front of the cursor are )/abc/i+,
>> then what is the second token? What is the third token? It's actually
>> context-dependent.
>
> But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right?

No.

> The syntactic level can just assume a stream of (correctly lexed) input elements.

No!  It's impossible to create a stream of correctly lexed input elements without doing syntactic level parsing.  See the example I gave above.

> Or do you validate deeper than the syntactic grammar?

Yes.  The syntactic grammar controls the lexer.  The validator looks for problems such as the syntactic grammar giving the lexer contradictory instructions.  An example would be any syntactic grammar automaton state where one outgoing syntactic transition would swallow a regexp token and another outgoing syntactic transition from the same state would swallow a / token.  If any such state exists, the grammar is broken.

> (And it seems to me that multi-token lookahead-restrictions would be hard for a validator to handle even if lexing *wasn't* context-dependent, but maybe that depends on how you handle them.)
>
> -Michael
>

# Dmitry Soshnikov (8 years ago)

On Fri, Feb 3, 2017 at 2:32 PM, Waldemar Horwat <waldemar at google.com> wrote:

On 02/02/2017 15:56, Dmitry Soshnikov wrote:
On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
On 02/01/2017 10:25, Dmitry Soshnikov wrote:

    As mentioned, Cover grammar is usually the process of the grammar
design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).
Cover grammars are an ugly hack that we had to add when there was no
other practical choice. Avoid them as much as possible. They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.

Yeah, absolutely agreed. The reason why I used cover grammar in that example to parse { as a Block vs. ObjectLiteral, and handle ExpressionStatement is to make the resulting grammar short, and don't introduce a bunch of NoCurly, etc extra productions for this. That's also said, this ugly hack also forces you to do post-processing overhead -- either validation of the nodes, or even re-interpretation (rewriting) to other nodes -- in my case I have to actually check that all nodes between { and } are properties, or vice-versa, statements, based on the expression/statement position.
Don't use a cover grammar to unify blocks with object literals. That's a really bad idea and you'll likely get it wrong. If the { starts a block, then if the next token is a keyword such as if then it's parsed as a keyword. If the { starts an object literal, then if the next token is a keyword then it's parsed as an identifiername. As we extend the language, the expansions of these can later diverge to the point where you won't know whether a / starts a regular expression or is a division symbol.

Ah, good catch. In fact, this can be fixed by allowing keywords as property names, as in JS (and in fact, I have just fixed it after your comment -- see example code changes in this commit: DmitrySoshnikov/syntax/commit/ccb3029cf7515bc078436c37831e027b50cb7fa4 -- there are no ambiguities, and it's still correctly parsed as an ObjectLiteral, or Block in needed position).

Just to note, this example language is a smaller sub-set of ECMAScript grammar, but not ES grammar itself, so, so far, Cover grammar works ;) (still being a "dirty hack" since the language syntax was designed so, allowing the same construct in different logically positions).

P.S.:

Another observation to note, using Cover grammar for this specific case with { requires parsing at least in LALR(1) mode, and SLR(1) will not work, eventually reaching to "reduce-reduce" conflict for when parsing e.g. an empty block inside the empty { { } }:

(output from the parsing states analysis tool)

- BlockStatement -> BlockOrObject • (kernel, reduce by production 13)
- ObjectLiteral -> BlockOrObject • (kernel, reduce by production 68)

LALR(1) uses parametrized lookahead set, and is free from this issue. It's the one which is used the most on practice anyways (taking more time for table calculation though)

Dmitry

On Fri, Feb 3, 2017 at 2:32 PM, Waldemar Horwat <waldemar at google.com> wrote:

> On 02/02/2017 15:56, Dmitry Soshnikov wrote:
>
>> On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com
>> <mailto:waldemar at google.com>> wrote:
>>
>>     On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>>
>>         As mentioned, Cover grammar is usually the process of the grammar
>> design itself (as in ES6 spec itself). I'm not aware of automatic
>> transformations for this (if you find any please let me know).
>>
>>
>>     Cover grammars are an ugly hack that we had to add when there was no
>> other practical choice.  Avoid them as much as possible.  They are only
>> used in situations where lookahead restrictions and parametrized grammar
>> rules do not work in any practical way.
>>
>>
>> Yeah, absolutely agreed. The reason why I used cover grammar in that
>> example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle
>> `ExpressionStatement` is to make the resulting grammar short, and don't
>> introduce a bunch of `NoCurly`, etc extra productions for this. That's also
>> said, this ugly hack also forces you to do post-processing overhead --
>> either validation of the nodes, or even re-interpretation (rewriting) to
>> other nodes -- in my case I have to actually check that all nodes between
>> `{` and `}` are properties, or vice-versa, statements, based on the
>> expression/statement position.
>>
>
> Don't use a cover grammar to unify blocks with object literals.  That's a
> really bad idea and you'll likely get it wrong.  If the `{` starts a block,
> then if the next token is a keyword such as `if` then it's parsed as a
> keyword.  If the `{` starts an object literal, then if the next token is a
> keyword then it's parsed as an identifiername.  As we extend the language,
> the expansions of these can later diverge to the point where you won't know
> whether a `/` starts a regular expression or is a division symbol.
>
>
Ah, good catch. In fact, this can be fixed by allowing keywords as property
names, as in JS (and in fact, I have just fixed it after your comment --
see example code changes in this commit:
https://github.com/DmitrySoshnikov/syntax/commit/ccb3029cf7515bc078436c37831e027b50cb7fa4
-- there are no ambiguities, and it's still correctly parsed as an
`ObjectLiteral`, or `Block` in needed position).

Just to note, this example language is a smaller sub-set of ECMAScript
grammar, but not ES grammar itself, so, so far, Cover grammar works ;)
(still being a "dirty hack" since the language syntax was designed so,
allowing the same construct in different logically positions).

P.S.:

Another observation to note, using Cover grammar for this specific case
with `{` requires parsing at least in LALR(1) mode, and SLR(1) will not
work, eventually reaching to "reduce-reduce" conflict for when parsing e.g.
an empty block inside the empty `{ { } }`:

(output from the parsing states analysis tool)

```
- BlockStatement -> BlockOrObject • (kernel, reduce by production 13)
- ObjectLiteral -> BlockOrObject • (kernel, reduce by production 68)
```

LALR(1) uses parametrized lookahead set, and is free from this issue. It's
the one which is used the most on practice anyways (taking more time for
table calculation though)

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170203/e2a1de31/attachment-0001.html>

# Michael Dyck (8 years ago)

On 17-02-03 05:32 PM, Waldemar Horwat wrote:

On 02/03/2017 08:17, Michael Dyck wrote:

On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T. The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and then remove transitions (using lookahead restrictions)? Or do you integrate the lookahead-restrictions into the generation of the automaton?

That's different from parametrized rules, which simply macro-expand into lots of individual rules.

Yup.

But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right?

No.

The syntactic level can just assume a stream of (correctly lexed) input elements.

(I should have said "Validation of the syntactic level [etc]")

No! It's impossible to create a stream of correctly lexed input elements without doing syntactic level parsing.

I quite agree. I didn't mean to suggest otherwise. What I mean is that, once you've generated the automaton for the syntactic grammar, you can just look at each state's set of expected terminals, and from that deduce the goal symbol that the lexer will need to use to get the next token when the parser is in that state. The point being that you can do that after generating the syntactic automaton. So the context-dependentness of lexing doesn't have to affect the process of generating the syntactic automaton.

(That's assuming an LR(1)-ish parser, and an approach where you don't try to combine the syntactic and lexical grammars to generate a single [scannerless] automaton. Which may not be your approach.)

The validator looks for problems such as the syntactic grammar giving the lexer contradictory instructions. An example would be any syntactic grammar automaton state where one outgoing syntactic transition would swallow a regexp token and another outgoing syntactic transition from the same state would swallow a / token. If any such state exists, the grammar is broken.

Yup, the spec asserts the non-existence of such states ("syntactic grammar contexts").

On 17-02-03 05:32 PM, Waldemar Horwat wrote:
> On 02/03/2017 08:17, Michael Dyck wrote:
>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>
>>> Lookahead restrictions fit very well into an LR(1) engine
>>
>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>> construction of the automaton, and/or modify the operation of the parser?
>
> For each state × token combination, the automaton describes what happens
> when you're in state S and see a token T.  The lookahead restrictions remove
> possible transitions; without them there would be ambiguities where a given
> state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and 
then remove transitions (using lookahead restrictions)? Or do you integrate 
the lookahead-restrictions into the generation of the automaton?

> That's different from parametrized rules, which simply macro-expand into
> lots of individual rules.

Yup.

>> But the context-dependentness of lexing is a parse-time problem, not a
>> validation-time problem, right?
>
> No.
>
>> The syntactic level can just assume a stream of (correctly lexed) input
>> elements.

(I should have said "*Validation* of the syntactic level [etc]")

> No!  It's impossible to create a stream of correctly lexed input elements
> without doing syntactic level parsing.

I quite agree. I didn't mean to suggest otherwise. What I mean is that, once 
you've generated the automaton for the syntactic grammar, you can just look 
at each state's set of expected terminals, and from that deduce the goal 
symbol that the lexer will need to use to get the next token when the parser 
is in that state. The point being that you can do that *after* generating 
the syntactic automaton. So the context-dependentness of lexing doesn't have 
to affect the process of generating the syntactic automaton.

(That's assuming an LR(1)-ish parser, and an approach where you don't try to 
combine the syntactic and lexical grammars to generate a single 
[scannerless] automaton. Which may not be your approach.)

> The validator looks for problems such as the syntactic grammar giving the
> lexer contradictory instructions. An example would be any syntactic
> grammar automaton state where one outgoing syntactic transition would
> swallow a regexp token and another outgoing syntactic transition from the
> same state would swallow a / token. If any such state exists, the grammar
> is broken.

Yup, the spec asserts the non-existence of such states ("syntactic grammar 
contexts").

-Michael

# Waldemar Horwat (8 years ago)

On 02/04/2017 07:20, Michael Dyck wrote:

On 17-02-03 05:32 PM, Waldemar Horwat wrote:

On 02/03/2017 08:17, Michael Dyck wrote:

On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T. The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and then remove transitions (using lookahead restrictions)? Or do you integrate the lookahead-restrictions into the generation of the automaton?

It's integrated. You can't generate a valid automaton without the lookahead restrictions.

That's different from parametrized rules, which simply macro-expand into lots of individual rules.

Yup.

But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right?

No.

The syntactic level can just assume a stream of (correctly lexed) input elements.

(I should have said "Validation of the syntactic level [etc]")

No! It's impossible to create a stream of correctly lexed input elements without doing syntactic level parsing.

I quite agree. I didn't mean to suggest otherwise. What I mean is that, once you've generated the automaton for the syntactic grammar, you can just look at each state's set of expected terminals, and from that deduce the goal symbol that the lexer will need to use to get the next token when the parser is in that state. The point being that you can do that after generating the syntactic automaton. So the context-dependentness of lexing doesn't have to affect the process of generating the syntactic automaton.

That's correct.

(That's assuming an LR(1)-ish parser, and an approach where you don't try to combine the syntactic and lexical grammars to generate a single [scannerless] automaton. Which may not be your approach.)

The parser and lexer stay separate, other than the lexer providing tokens to the parser and the parser selecting one of several top-level lexer goal symbols for lexing the next token. I do not use any kind of unified parser-lexer grammar; that could run into issues such as the syntactic grammar making lexing non-greedy: a+++++b lexing as a ++ + ++ b instead of the correct a ++ ++ + b (which would then generate a syntax error at the syntactic level).

The validator looks for problems such as the syntactic grammar giving the lexer contradictory instructions. An example would be any syntactic grammar automaton state where one outgoing syntactic transition would swallow a regexp token and another outgoing syntactic transition from the same state would swallow a / token. If any such state exists, the grammar is broken.

Yup, the spec asserts the non-existence of such states ("syntactic grammar contexts").

-Michael

 Waldemar

On 02/04/2017 07:20, Michael Dyck wrote:
> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>> On 02/03/2017 08:17, Michael Dyck wrote:
>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>
>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>
>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>> construction of the automaton, and/or modify the operation of the parser?
>>
>> For each state × token combination, the automaton describes what happens
>> when you're in state S and see a token T.  The lookahead restrictions remove
>> possible transitions; without them there would be ambiguities where a given
>> state × token combination would want to do two incompatible things.
>
> Okay, so do you generate the automaton (ignoring lookahead restrictions) and then remove transitions (using lookahead restrictions)? Or do you integrate the lookahead-restrictions into the generation of the automaton?

It's integrated.  You can't generate a valid automaton without the lookahead restrictions.

>> That's different from parametrized rules, which simply macro-expand into
>> lots of individual rules.
>
> Yup.
>
>
>>> But the context-dependentness of lexing is a parse-time problem, not a
>>> validation-time problem, right?
>>
>> No.
>>
>>> The syntactic level can just assume a stream of (correctly lexed) input
>>> elements.
>
> (I should have said "*Validation* of the syntactic level [etc]")
>
>> No!  It's impossible to create a stream of correctly lexed input elements
>> without doing syntactic level parsing.
>
> I quite agree. I didn't mean to suggest otherwise. What I mean is that, once you've generated the automaton for the syntactic grammar, you can just look at each state's set of expected terminals, and from that deduce the goal symbol that the lexer will need to use to get the next token when the parser is in that state. The point being that you can do that *after* generating the syntactic automaton. So the context-dependentness of lexing doesn't have to affect the process of generating the syntactic automaton.

That's correct.

> (That's assuming an LR(1)-ish parser, and an approach where you don't try to combine the syntactic and lexical grammars to generate a single [scannerless] automaton. Which may not be your approach.)

The parser and lexer stay separate, other than the lexer providing tokens to the parser and the parser selecting one of several top-level lexer goal symbols for lexing the next token.  I do not use any kind of unified parser-lexer grammar; that could run into issues such as the syntactic grammar making lexing non-greedy: a+++++b lexing as a ++ + ++ b instead of the correct a ++ ++ + b (which would then generate a syntax error at the syntactic level).

>> The validator looks for problems such as the syntactic grammar giving the
>> lexer contradictory instructions. An example would be any syntactic
>> grammar automaton state where one outgoing syntactic transition would
>> swallow a regexp token and another outgoing syntactic transition from the
>> same state would swallow a / token. If any such state exists, the grammar
>> is broken.
>
> Yup, the spec asserts the non-existence of such states ("syntactic grammar contexts").
>
> -Michael

     Waldemar

# Michael Dyck (8 years ago)

On 17-02-06 07:32 PM, Waldemar Horwat wrote:

On 02/04/2017 07:20, Michael Dyck wrote:

On 17-02-03 05:32 PM, Waldemar Horwat wrote:

On 02/03/2017 08:17, Michael Dyck wrote:

On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T. The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and then remove transitions (using lookahead restrictions)? Or do you integrate the lookahead-restrictions into the generation of the automaton?

It's integrated. You can't generate a valid automaton without the lookahead restrictions.

So, when you're making LR items for a production with a lookahead-restriction (l-r), do you:

-- treat the l-r as a (nullable) symbol, so that you get an item with the dot before the l-r, and one with the dot after, or

-- treat the l-r as occurring between two adjacent symbols, so that you get an item where the dot is "at" the l-r, or

-- something else?

On 17-02-06 07:32 PM, Waldemar Horwat wrote:
> On 02/04/2017 07:20, Michael Dyck wrote:
>> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>>> On 02/03/2017 08:17, Michael Dyck wrote:
>>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>>
>>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>>
>>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>>> construction of the automaton, and/or modify the operation of the parser?
>>>
>>> For each state × token combination, the automaton describes what happens
>>> when you're in state S and see a token T.  The lookahead restrictions remove
>>> possible transitions; without them there would be ambiguities where a given
>>> state × token combination would want to do two incompatible things.
>>
>> Okay, so do you generate the automaton (ignoring lookahead restrictions)
>> and then remove transitions (using lookahead restrictions)? Or do you
>> integrate the lookahead-restrictions into the generation of the automaton?
>
> It's integrated.  You can't generate a valid automaton without the lookahead
> restrictions.

So, when you're making LR items for a production with a 
lookahead-restriction (l-r), do you:

-- treat the l-r as a (nullable) symbol, so that you get an item with the 
dot before the l-r, and one with the dot after, or

-- treat the l-r as occurring between two adjacent symbols, so that you get 
an item where the dot is "at" the l-r, or

-- something else?

-Michael

# Isiah Meadows (8 years ago)

Just out of curiosity, what's the best way to handle async arrow function arguments vs `async` function calls? Is it term rewriting, or is there a better way of handling that? I'm much more familiar with recursive-descent and LL parsers than LR or shift-reduce ones, so I'm not as readily familiar with how to handle these cases.

Isiah Meadows me at isiahmeadows.com

Just out of curiosity, what's the best way to handle async arrow
function arguments vs `async` function calls? Is it term rewriting, or
is there a better way of handling that? I'm much more familiar with
recursive-descent and LL parsers than LR or shift-reduce ones, so I'm
not as readily familiar with how to handle these cases.
-----

Isiah Meadows
me at isiahmeadows.com


On Tue, Feb 7, 2017 at 9:39 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:
> On 17-02-06 07:32 PM, Waldemar Horwat wrote:
>>
>> On 02/04/2017 07:20, Michael Dyck wrote:
>>>
>>> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>>>>
>>>> On 02/03/2017 08:17, Michael Dyck wrote:
>>>>>
>>>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>>>
>>>>>>
>>>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>>>
>>>>>
>>>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>>>> construction of the automaton, and/or modify the operation of the
>>>>> parser?
>>>>
>>>>
>>>> For each state × token combination, the automaton describes what happens
>>>> when you're in state S and see a token T.  The lookahead restrictions
>>>> remove
>>>> possible transitions; without them there would be ambiguities where a
>>>> given
>>>> state × token combination would want to do two incompatible things.
>>>
>>>
>>> Okay, so do you generate the automaton (ignoring lookahead restrictions)
>>> and then remove transitions (using lookahead restrictions)? Or do you
>>> integrate the lookahead-restrictions into the generation of the
>>> automaton?
>>
>>
>> It's integrated.  You can't generate a valid automaton without the
>> lookahead
>> restrictions.
>
>
> So, when you're making LR items for a production with a
> lookahead-restriction (l-r), do you:
>
> -- treat the l-r as a (nullable) symbol, so that you get an item with the
> dot before the l-r, and one with the dot after, or
>
> -- treat the l-r as occurring between two adjacent symbols, so that you get
> an item where the dot is "at" the l-r, or
>
> -- something else?
>
>
> -Michael
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# Waldemar Horwat (8 years ago)

On 02/07/2017 06:39, Michael Dyck wrote:

On 17-02-06 07:32 PM, Waldemar Horwat wrote:

On 02/04/2017 07:20, Michael Dyck wrote:

On 17-02-03 05:32 PM, Waldemar Horwat wrote:

On 02/03/2017 08:17, Michael Dyck wrote:

On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T. The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and then remove transitions (using lookahead restrictions)? Or do you integrate the lookahead-restrictions into the generation of the automaton?

It's integrated. You can't generate a valid automaton without the lookahead restrictions.

So, when you're making LR items for a production with a lookahead-restriction (l-r), do you:

-- treat the l-r as a (nullable) symbol, so that you get an item with the dot before the l-r, and one with the dot after, or

-- treat the l-r as occurring between two adjacent symbols, so that you get an item where the dot is "at" the l-r, or

-- something else?

It's something else — it's directly tied into the generation of the automaton states. Each automaton state contains a collection of possible places in the expansions of grammar rules that the state can represent. Following a terminal symbol T from a state A leads to a state B. A lookahead restriction prevents state B's collection from including expansions that would have been prohibited by the lookahead restriction on T. If that generates an inconsistency (for example, if there are two ways to get to an identical state, one with a lookahead restriction and one without), the grammar validation fails.

 Waldemar

On 02/07/2017 06:39, Michael Dyck wrote:
> On 17-02-06 07:32 PM, Waldemar Horwat wrote:
>> On 02/04/2017 07:20, Michael Dyck wrote:
>>> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>>>> On 02/03/2017 08:17, Michael Dyck wrote:
>>>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>>>
>>>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>>>
>>>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>>>> construction of the automaton, and/or modify the operation of the parser?
>>>>
>>>> For each state × token combination, the automaton describes what happens
>>>> when you're in state S and see a token T.  The lookahead restrictions remove
>>>> possible transitions; without them there would be ambiguities where a given
>>>> state × token combination would want to do two incompatible things.
>>>
>>> Okay, so do you generate the automaton (ignoring lookahead restrictions)
>>> and then remove transitions (using lookahead restrictions)? Or do you
>>> integrate the lookahead-restrictions into the generation of the automaton?
>>
>> It's integrated.  You can't generate a valid automaton without the lookahead
>> restrictions.
>
> So, when you're making LR items for a production with a lookahead-restriction (l-r), do you:
>
> -- treat the l-r as a (nullable) symbol, so that you get an item with the dot before the l-r, and one with the dot after, or
>
> -- treat the l-r as occurring between two adjacent symbols, so that you get an item where the dot is "at" the l-r, or
>
> -- something else?

It's something else — it's directly tied into the generation of the automaton states.  Each automaton state contains a collection of possible places in the expansions of grammar rules that the state can represent.  Following a terminal symbol T from a state A leads to a state B.  A lookahead restriction prevents state B's collection from including expansions that would have been prohibited by the lookahead restriction on T.  If that generates an inconsistency (for example, if there are two ways to get to an identical state, one with a lookahead restriction and one without), the grammar validation fails.

     Waldemar

# Michael Dyck (8 years ago)

On 17-02-07 06:02 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine [...] it's directly tied into the generation of the automaton states. Each automaton state contains a collection of possible places in the expansions of grammar rules that the state can represent. Following a terminal symbol T from a state A leads to a state B. A lookahead restriction prevents state B's collection from including expansions that would have been prohibited by the lookahead restriction on T. If that generates an inconsistency (for example, if there are two ways to get to an identical state, one with a lookahead restriction and one without), the grammar validation fails.

So that sounds like you form the closure of state A in the standard way, and then use the lookahead-restriction to limit the transitions out of A (and the items that A contributes to the kernel of B). But it seems to me that forming the closure is also affected by the lookahead-restriction. Is that true in your validator?

On 17-02-07 06:02 PM, Waldemar Horwat wrote:
>
> Lookahead restrictions fit very well into an LR(1) engine [...]
> it's directly tied into the generation of the automaton states. Each
> automaton state contains a collection of possible places in the
> expansions of grammar rules that the state can represent. Following a
> terminal symbol T from a state A leads to a state B. A lookahead
> restriction prevents state B's collection from including expansions that
> would have been prohibited by the lookahead restriction on T. If that
> generates an inconsistency (for example, if there are two ways to get to
> an identical state, one with a lookahead restriction and one without),
> the grammar validation fails.

So that sounds like you form the closure of state A in the standard way, and 
then use the lookahead-restriction to limit the transitions out of A (and 
the items that A contributes to the kernel of B). But it seems to me that 
forming the closure is also affected by the lookahead-restriction. Is that 
true in your validator?

-Michael

# Dmitry Soshnikov (8 years ago)

On Mon, Jan 23, 2017 at 10:46 PM, Dmitry Soshnikov < dmitry.soshnikov at gmail.com> wrote:

On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at google.com> wrote:

On 01/11/2017 10:28, Michael Dyck wrote:

In the past, it has been said (usually by Brendan Eich) that TC39 intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so much asking about the "1", but more about the "LR".)

If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! terminals]) fit into the LR approach. It seems like there are three possibilities:

modify the grammar (before the parser is constructed),

modify the construction of the parser, and/or

modify the operation of the parser. It's not clear to me that any of these (or any combination of these) will work.

Has anyone algorithmically generated an LR parser from the spec grammar? If so, how did you you handle lookahead-restrictions?

I'm the source of the LR(1) condition. I've been doing that ever since ECMAScript Edition 3, and in fact am using the LR parser to help design and debug the spec. I have an implementation of the parser with a few extensions to the LR grammar, including support for parametrized productions, lookahead restrictions, and lexer-parser feedback used to disambiguate things such as what token / will start. I validate various things such as making sure that there is no place where both an identifier and a regular expression can occur.

Thanks for the confirmation.

lookahead restrictions

Yeah, all the lookahead restrictions, such as lookahead ∉ {{, function} add a lot of extra "No<Lookahead>" productions, which makes the grammar (and hence the parsing table) bigger unfortunately. Example is e.g. ExpressionNoIn, VariableDeclarationNoIn from the spec, and similar for ...NoLeftCurrly, and ...NoFunction to support those lookahead restrictions, and allow ExpressionStatement to be parsed without "reduce/reduce" conflicts.

That's why in the mentioned "sub-set" of the ECMAScript example language [1] I chose not to use the same syntax for function expression and function declaration: fn foo() {} for declaration, and just an arrow function for function-expressions :) -- no ambiguity, no lookahead restrictions, hence no larger grammar, hence a faster parser.

lexer-parser feedback

Yes, having a lexer/parser state solves it more elegantly.

FYI: I just built this as a proof of concept for ECMAScript's { and }, and have parsed normally { } in statement position as a BlockStatement, and in the expression position as an ObjectLiteral. With special "activation productions" one can change the lexer state, and yield different token types for the same characters.

See the example here: DmitrySoshnikov/syntax/blob/master/examples/parser-lexer-communication.g

And some details in the docs: DmitrySoshnikov/syntax#access-tokenizer-from-parser-semantic-actions

Dmitry

On Mon, Jan 23, 2017 at 10:46 PM, Dmitry Soshnikov <
dmitry.soshnikov at gmail.com> wrote:

>
> On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at google.com>
> wrote:
>
>> On 01/11/2017 10:28, Michael Dyck wrote:
>>
>>> In the past, it has been said (usually by Brendan Eich) that TC39
>>> intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm
>>> not so much asking about the "1", but more about the "LR".)
>>>
>>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>>> terminals]) fit into the LR approach. It seems like there are three
>>> possibilities:
>>>   - modify the grammar (before the parser is constructed),
>>>   - modify the construction of the parser, and/or
>>>   - modify the operation of the parser.
>>> It's not clear to me that any of these (or any combination of these)
>>> will work.
>>>
>>> Has anyone algorithmically generated an LR parser from the spec grammar?
>>> If so, how did you you handle lookahead-restrictions?
>>>
>>
>> I'm the source of the LR(1) condition.  I've been doing that ever since
>> ECMAScript Edition 3, and in fact am using the LR parser to help design and
>> debug the spec.  I have an implementation of the parser with a few
>> extensions to the LR grammar, including support for parametrized
>> productions, lookahead restrictions, and lexer-parser feedback used to
>> disambiguate things such as what token / will start.  I validate various
>> things such as making sure that there is no place where both an identifier
>> and a regular expression can occur.
>>
>>
> Thanks for the confirmation.
>
> > lookahead restrictions
>
> Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}`
> add a lot of extra "No<Lookahead>" productions, which makes the grammar
> (and hence the parsing table) bigger unfortunately. Example is e.g.
> `ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for
> `...NoLeftCurrly`, and `...NoFunction` to support those lookahead
> restrictions, and allow `ExpressionStatement` to be parsed without
> "reduce/reduce" conflicts.
>
> That's why in the mentioned "sub-set" of the ECMAScript example language
> [1] I chose not to use the same syntax for function expression and function
> declaration: `fn foo() {}` for declaration, and just an arrow function for
> function-expressions :) -- no ambiguity, no lookahead restrictions, hence
> no larger grammar, hence a faster parser.
>
>  > lexer-parser feedback
>
> Yes, having a lexer/parser state solves it more elegantly.
>


FYI: I just built this as a proof of concept for ECMAScript's `{` and `}`,
and have parsed normally `{ }` in statement position as a `BlockStatement`,
and in the expression position as an `ObjectLiteral`. With special
"activation productions" one can change the lexer state, and yield
different token types for the same characters.

See the example here:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/parser-lexer-communication.g

And some details in the docs:
https://github.com/DmitrySoshnikov/syntax#access-tokenizer-from-parser-semantic-actions

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170214/43fa084a/attachment.html>