MultiLineCommentChars and PostAsteriskCommentChars productions

# Darien Valentine (8 years ago)

I am curious about this lexical production, because if I understand correctly, it seems to imply either backtracking or a lookahead that isn’t made explicit.

MultiLineComment ::
  /* MultiLineCommentChars[opt] */

MultiLineCommentChars ::
  MultiLineNotAsteriskChar MultiLineCommentChars[opt]
  * PostAsteriskCommentChars[opt]

PostAsteriskCommentChars ::
  MultiLineNotForwardSlashOrAsteriskChar MultiLineCommentChars[opt]
  * PostAsteriskCommentChars[opt]

MultiLineNotAsteriskChar ::
  SourceCharacterbut not *

MultiLineNotForwardSlashOrAsteriskChar ::
  SourceCharacterbut not one of / or *

In the second choices within PostAsteriskCommentChars and MultiLineCommentChars, the appearance of PostAsteriskCommentChars following the literal asterisk is optional. Because of this, a naive match will be made for PostAsteriskCommentChars against the * of a terminal */ of the MultiLineComment.

While this is not ultimately ambiguous because, having made that match, the next attempt will fail and we can backtrack one step to find another way out; or, more realistically, an implementation would look ahead at whether the next character (after "*") is / before deciding that PostAsteriskCommentChars/2 should really be matched. However it seems unusual that the grammar is written this way since elsewhere the grammar seems to carefully avoid implied backtracking, and lookaheads are rare and explicit.

This section of the lexical grammar has remained unchanged since ES1, so it may be that it just hasn’t been revisited as the grammar specification style developed. Am I reading it correctly and, if so, is it actually unusual, or does this sort of thing fall within the boundaries of how the grammar is normally described?

I am curious about this lexical production, because if I understand
correctly, it seems to imply either backtracking or a lookahead that isn’t
made explicit.

```
MultiLineComment ::
  /* MultiLineCommentChars[opt] */

MultiLineCommentChars ::
  MultiLineNotAsteriskChar MultiLineCommentChars[opt]
  * PostAsteriskCommentChars[opt]

PostAsteriskCommentChars ::
  MultiLineNotForwardSlashOrAsteriskChar MultiLineCommentChars[opt]
  * PostAsteriskCommentChars[opt]

MultiLineNotAsteriskChar ::
  SourceCharacterbut not *

MultiLineNotForwardSlashOrAsteriskChar ::
  SourceCharacterbut not one of / or *
```

In the second choices within PostAsteriskCommentChars and
MultiLineCommentChars, the appearance of PostAsteriskCommentChars following
the literal asterisk is optional. Because of this, a naive match will be
made for PostAsteriskCommentChars against the `*` of a terminal `*/` of the
MultiLineComment.

While this is not ultimately ambiguous because, having made that match, the
next attempt will fail and we can backtrack one step to find another way
out; or, more realistically, an implementation would look ahead at whether
the next character (after "*") is `/` before deciding that
PostAsteriskCommentChars/2 should _really_ be matched. However it seems
unusual that the grammar is written this way since elsewhere the grammar
seems to carefully avoid implied backtracking, and lookaheads are rare and
explicit.

This section of the lexical grammar has remained unchanged since ES1, so it
may be that it just hasn’t been revisited as the grammar specification
style developed. Am I reading it correctly and, if so, is it actually
unusual, or does this sort of thing fall within the boundaries of how the
grammar is normally described?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170409/6bd396f4/attachment.html>

# Isiah Meadows (8 years ago)

It's a single-character lookahead, which is sufficient for an LR(1) language. All it does is validate that /* * */ is a complete block comment, but not /* * or /* * /.

It's a single-character lookahead, which is sufficient for an LR(1)
language. All it does is validate that `/* * */` is a complete block
comment, but not `/* *` or `/* * /`.

On Sun, Apr 9, 2017, 17:13 Darien Valentine <valentinium at gmail.com> wrote:

> I am curious about this lexical production, because if I understand
> correctly, it seems to imply either backtracking or a lookahead that isn’t
> made explicit.
>
> ```
> MultiLineComment ::
>   /* MultiLineCommentChars[opt] */
>
> MultiLineCommentChars ::
>   MultiLineNotAsteriskChar MultiLineCommentChars[opt]
>   * PostAsteriskCommentChars[opt]
>
> PostAsteriskCommentChars ::
>   MultiLineNotForwardSlashOrAsteriskChar MultiLineCommentChars[opt]
>   * PostAsteriskCommentChars[opt]
>
> MultiLineNotAsteriskChar ::
>   SourceCharacterbut not *
>
> MultiLineNotForwardSlashOrAsteriskChar ::
>   SourceCharacterbut not one of / or *
> ```
>
> In the second choices within PostAsteriskCommentChars and
> MultiLineCommentChars, the appearance of PostAsteriskCommentChars following
> the literal asterisk is optional. Because of this, a naive match will be
> made for PostAsteriskCommentChars against the `*` of a terminal `*/` of the
> MultiLineComment.
>
> While this is not ultimately ambiguous because, having made that match,
> the next attempt will fail and we can backtrack one step to find another
> way out; or, more realistically, an implementation would look ahead at
> whether the next character (after "*") is `/` before deciding that
> PostAsteriskCommentChars/2 should _really_ be matched. However it seems
> unusual that the grammar is written this way since elsewhere the grammar
> seems to carefully avoid implied backtracking, and lookaheads are rare and
> explicit.
>
> This section of the lexical grammar has remained unchanged since ES1, so
> it may be that it just hasn’t been revisited as the grammar specification
> style developed. Am I reading it correctly and, if so, is it actually
> unusual, or does this sort of thing fall within the boundaries of how the
> grammar is normally described?
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170410/6d2617d6/attachment.html>

# Darien Valentine (8 years ago)

Thanks, Isiah. The grammar in the abstract made sense to me, but the lack of an explicit lookahead notice seemed anomalous. For example, TemplateCharacter is written with the explicit lookahead for "{":

TemplateCharacter ::
  $ [lookahead ≠ {]
  \ EscapeSequence
  LineContinuation
  LineTerminatorSequence
  SourceCharacter but not one of ` or \ or $ or LineTerminator

Perhaps this is only to owing to the need to distinguish TemplateMiddle from TemplateTail, though. Since MultiLineComment and TemplateMiddle are the only lexical productions which have a multi-character termination sequence whose first character might appear anywhere prior, maybe that alone accounts for why I couldn’t turn up any other examples where a negative lookahead at the lexical level was left implicit — in which case I suppose it’s not inconsistent after all. But FWIW, I think the explicit form adds clarity.

Thanks, Isiah. The grammar in the abstract made sense to me, but the lack
of an explicit lookahead notice seemed anomalous. For example,
TemplateCharacter is written with the explicit lookahead for "{":

```
TemplateCharacter ::
  $ [lookahead ≠ {]
  \ EscapeSequence
  LineContinuation
  LineTerminatorSequence
  SourceCharacter but not one of ` or \ or $ or LineTerminator
```

Perhaps this is only to owing to the need to distinguish TemplateMiddle
from TemplateTail, though. Since MultiLineComment and TemplateMiddle are
the only lexical productions which have a multi-character termination
sequence whose first character might appear anywhere prior, maybe that
alone accounts for why I couldn’t turn up any other examples where a
negative lookahead at the lexical level was left implicit — in which case I
suppose it’s not inconsistent after all. But FWIW, I think the explicit
form adds clarity.

On Sun, Apr 9, 2017 at 8:17 PM, Isiah Meadows <isiahmeadows at gmail.com>
wrote:

> It's a single-character lookahead, which is sufficient for an LR(1)
> language. All it does is validate that `/* * */` is a complete block
> comment, but not `/* *` or `/* * /`.
>
> On Sun, Apr 9, 2017, 17:13 Darien Valentine <valentinium at gmail.com> wrote:
>
>> I am curious about this lexical production, because if I understand
>> correctly, it seems to imply either backtracking or a lookahead that isn’t
>> made explicit.
>>
>> ```
>> MultiLineComment ::
>>   /* MultiLineCommentChars[opt] */
>>
>> MultiLineCommentChars ::
>>   MultiLineNotAsteriskChar MultiLineCommentChars[opt]
>>   * PostAsteriskCommentChars[opt]
>>
>> PostAsteriskCommentChars ::
>>   MultiLineNotForwardSlashOrAsteriskChar MultiLineCommentChars[opt]
>>   * PostAsteriskCommentChars[opt]
>>
>> MultiLineNotAsteriskChar ::
>>   SourceCharacterbut not *
>>
>> MultiLineNotForwardSlashOrAsteriskChar ::
>>   SourceCharacterbut not one of / or *
>> ```
>>
>> In the second choices within PostAsteriskCommentChars and
>> MultiLineCommentChars, the appearance of PostAsteriskCommentChars following
>> the literal asterisk is optional. Because of this, a naive match will be
>> made for PostAsteriskCommentChars against the `*` of a terminal `*/` of
>> the MultiLineComment.
>>
>> While this is not ultimately ambiguous because, having made that match,
>> the next attempt will fail and we can backtrack one step to find another
>> way out; or, more realistically, an implementation would look ahead at
>> whether the next character (after "*") is `/` before deciding that
>> PostAsteriskCommentChars/2 should _really_ be matched. However it seems
>> unusual that the grammar is written this way since elsewhere the grammar
>> seems to carefully avoid implied backtracking, and lookaheads are rare and
>> explicit.
>>
>> This section of the lexical grammar has remained unchanged since ES1, so
>> it may be that it just hasn’t been revisited as the grammar specification
>> style developed. Am I reading it correctly and, if so, is it actually
>> unusual, or does this sort of thing fall within the boundaries of how the
>> grammar is normally described?
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170409/7fbe7eab/attachment-0001.html>

# Michael Dyck (8 years ago)

On 17-04-09 05:13 PM, Darien Valentine wrote:

I am curious about this lexical production, because if I understand correctly, it seems to imply either backtracking or a lookahead that isn’t made explicit.

Yes, depending on your parsing technique.

..., a naive match will be made for PostAsteriskCommentChars against the * of a terminal */ of the MultiLineComment.

While this is not ultimately ambiguous because, having made that match, the next attempt will fail and we can backtrack one step to find another way out; or, more realistically, an implementation would look ahead at whether the next character (after "*") is / before deciding that PostAsteriskCommentChars/2 should really be matched.

In a bottom-up parser, one would say that, with a next symbol (i.e., character) of '*', there is a shift-reduce conflict that cannot be resolved by that symbol alone. Instead, two symbols of lookahead are required.

However it seems unusual that the grammar is written this way since elsewhere the grammar seems to carefully avoid implied backtracking, and lookaheads are rare and explicit.

Apparently you're referring to phrases like "[lookahead != foo]". You shouldn't think of these as "lookaheads". The spec doesn't have a name for these, but I call them "lookahead-restrictions". Each is a restriction on the applicability of a production, based on the next one or two symbols. They do indicate places where a parser would need to employ lookahead to make a decision, but there are many other such places (such as the one you noted above), where the need for lookahead is simply a consequence of the interaction of productions, and is not called out explicitly in the grammar.

On 17-04-09 05:13 PM, Darien Valentine wrote:
> I am curious about this lexical production, because if I understand
> correctly, it seems to imply either backtracking or a lookahead that isn’t
> made explicit.

Yes, depending on your parsing technique.

> ..., a naive match will be made for PostAsteriskCommentChars against
> the `*` of a terminal `*/` of the MultiLineComment.
>
> While this is not ultimately ambiguous because, having made that match, the
> next attempt will fail and we can backtrack one step to find another way
> out; or, more realistically, an implementation would look ahead at whether
> the next character (after "*") is `/` before deciding that
> PostAsteriskCommentChars/2 should _really_ be matched.

In a bottom-up parser, one would say that, with a next symbol (i.e., 
character) of '*', there is a shift-reduce conflict that cannot be resolved 
by that symbol alone. Instead, two symbols of lookahead are required.

> However it seems unusual that the grammar is written this way since
> elsewhere the grammar  seems to carefully avoid implied backtracking,
> and lookaheads are rare and explicit.

Apparently you're referring to phrases like "[lookahead != foo]". You 
shouldn't think of these as "lookaheads". The spec doesn't have a name for 
these, but I call them "lookahead-restrictions". Each is a restriction on 
the applicability of a production, based on the next one or two symbols. 
They do indicate places where a parser would need to employ lookahead to 
make a decision, but there are many other such places (such as the one you 
noted above), where the need for lookahead is simply a consequence of the 
interaction of productions, and is not called out explicitly in the grammar.

-Michael

# Darien Valentine (8 years ago)

Thanks for explaining, Michael. That does make sense, there are indeed plenty of places where such distinctions must be made for syntactic productions. I’m not sure why I was assumed the lexical grammar was special in this regard; it just happens to be that the only other similar case has the explicit direction for different reasons that don’t apply here.

Thanks for explaining, Michael. That does make sense, there are indeed
plenty of places where such distinctions must be made for syntactic
productions. I’m not sure why I was assumed the lexical grammar was special
in this regard; it just happens to be that the only other similar case has
the explicit direction for different reasons that don’t apply here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170411/687e9577/attachment.html>