Nested Quasis

# Erik Arvidsson (14 years ago)

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

# Brendan Eich (14 years ago)

This is really relaxing the (over-restrictive, IMHO) design that says that for ${expr} in a quasi, expr can be only a very few forms such as Identifier, Identifier '.' IdentifierName, and the like -- right?

I.e. if we allow expr to be the grammar's Expression non-terminal, then it follows that quasi-literals nest inside ${...} in outer quasis. +1 on this.

# Erik Arvidsson (14 years ago)

On Sat, Jan 28, 2012 at 16:07, Brendan Eich <brendan at mozilla.org> wrote:

This is really relaxing the (over-restrictive, IMHO) design that says that for ${expr} in a quasi, expr can be only a very few forms such as Identifier, Identifier '.' IdentifierName, and the like -- right?

Yes. Only allowing IdentifierExpression and MemberLookup is too restrictive. A realistic use case is to allow binary operator and once you allow that it makes sense to allow any expression which includes other quasi literals which leads to the conclusion that nested quasi literals should be allowed.

# Mark S. Miller (14 years ago)

On Sat, Jan 28, 2012 at 5:54 PM, Erik Arvidsson <erik.arvidsson at gmail.com>wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them.

+1000. Quasis as originally proposed had no such restriction. The Unicorns example at harmony:quasis#nesting

is I think fairly representative of what will become a common kind of use case -- unless we cripple quasis. I would be interested in seeing what this code looks like when refactored to live within this restriction.

In E we have quasis that are somewhat similar and somewhat different. But we make much use of the ability to place arbitrary expressions within the dollar-hole, including nested quasis. I think our quasis as well should allow any expression. The issue is not just nested quasis.

# Waldemar Horwat (14 years ago)

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before. Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer? You can't just count parentheses because that breaks regexps.

 Waldemar
# Allen Wirfs-Brock (14 years ago)

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before. Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer? You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part. A few [no whitespace here] tokens will probably be needed

# Mike Samuel (14 years ago)

2012/1/31 Allen Wirfs-Brock <allen at wirfs-brock.com>:

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before.  Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer?  You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward.  Basically, a Quasi is not a single token.   the grammar in the proposal can almost be read that way right now.   It should only take a little cleanup to factor it into a pure lexical part and a syntactic part. A few [no whitespace here] tokens will probably be needed

I addressed this at js-quasis-libraries-and-repl.googlecode.com/svn/trunk/tokenize.html

# Allen Wirfs-Brock (14 years ago)

On Jan 31, 2012, at 4:06 PM, Mike Samuel wrote:

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part. A few [no whitespace here] tokens will probably be needed

I addressed this at js-quasis-libraries-and-repl.googlecode.com/svn/trunk/tokenize.html

A more direct like to this from the Quasis ecmascript.org wiki page would be helpful. The only current link does directly to the demo shell.

# Waldemar Horwat (14 years ago)

On 01/31/2012 03:04 PM, Allen Wirfs-Brock wrote:

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before. Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer? You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part.

I'd love to see this little cleanup. I thought about it for a while and couldn't come up with it myself; I'm not sure it can even be done.

 Waldemar
# Allen Wirfs-Brock (14 years ago)

On Feb 1, 2012, at 11:28 AM, Waldemar Horwat wrote:

On 01/31/2012 03:04 PM, Allen Wirfs-Brock wrote:

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before. Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer? You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part.

I'd love to see this little cleanup. I thought about it for a while and couldn't come up with it myself; I'm not sure it can even be done.

Was there some particular issue you were running into?

# Mike Samuel (14 years ago)

2012/2/1 Waldemar Horwat <waldemar at google.com>:

On 01/31/2012 03:04 PM, Allen Wirfs-Brock wrote:

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before.  Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer?  You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward.  Basically, a Quasi is not a single token.   the grammar in the proposal can almost be read that way right now.   It should only take a little cleanup to factor it into a pure lexical part and a syntactic part.

I'd love to see this little cleanup.  I thought about it for a while and couldn't come up with it myself; I'm not sure it can even be done.

What should I put in the proposal? A delta to the lexical grammar?

# Allen Wirfs-Brock (14 years ago)

On Feb 1, 2012, at 12:12 PM, Mike Samuel wrote:

2012/2/1 Waldemar Horwat <waldemar at google.com>:

On 01/31/2012 03:04 PM, Allen Wirfs-Brock wrote:

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part.

I'd love to see this little cleanup. I thought about it for a while and couldn't come up with it myself; I'm not sure it can even be done.

What should I put in the proposal? A delta to the lexical grammar?

I expect that what we will ultimately end up with is some token additions to the lexical grammar and some new syntactic grammar productions that put those tokens together into complete Quasis. If you want to work on a first cut at those it would be great. Otherwise, I'll need to do the work when I start editing Quasis into the actual specification.

# Waldemar Horwat (14 years ago)

On 02/01/2012 11:35 AM, Allen Wirfs-Brock wrote:

On Feb 1, 2012, at 11:28 AM, Waldemar Horwat wrote:

On 01/31/2012 03:04 PM, Allen Wirfs-Brock wrote:

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before. Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer? You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part.

I'd love to see this little cleanup. I thought about it for a while and couldn't come up with it myself; I'm not sure it can even be done.

Was there some particular issue you were running into?

Here's one which I couldn't express in a lexer grammar: How to restart the quasi after an included expression is over.

 Waldemar
# Allen Wirfs-Brock (14 years ago)

On Feb 1, 2012, at 5:33 PM, Waldemar Horwat wrote:

On 02/01/2012 11:35 AM, Allen Wirfs-Brock wrote:

On Feb 1, 2012, at 11:28 AM, Waldemar Horwat wrote:

On 01/31/2012 03:04 PM, Allen Wirfs-Brock wrote:

On Jan 31, 2012, at 2:36 PM, Waldemar Horwat wrote:

On 01/28/2012 02:54 PM, Erik Arvidsson wrote:

Under the open issues for Quasi Literals, harmony:quasis#nesting , the topic of nesting is brought up.

After implementing Quasi Literals in Traceur it is clear that supporting nested quasi literals is easier than not supporting them. What is the argument for not supporting nesting? Can we resolve this?

This has been hashed out in committee before. Do you have a solution to the grammar problems, such as having a full ECMAScript parser inside the lexer? You can't just count parentheses because that breaks regexps.

I would think the solution to this is pretty straightforward. Basically, a Quasi is not a single token. the grammar in the proposal can almost be read that way right now. It should only take a little cleanup to factor it into a pure lexical part and a syntactic part.

I'd love to see this little cleanup. I thought about it for a while and couldn't come up with it myself; I'm not sure it can even be done.

Was there some particular issue you were running into?

Here's one which I couldn't express in a lexer grammar: How to restart the quasi after an included expression is over.

I wouldn't because I would produce the complete quasi as a single token. I would leave it up to the syntactic grammar to assemble the quasi pieces and inclusion expression into a complete unit.

# Douglas Crockford (14 years ago)

On 11:59 AM, Waldemar Horwat wrote:

On 02/01/2012 11:35 AM, Allen Wirfs-Brock wrote: Here's one which I couldn't express in a lexer grammar: How to restart the quasi after an included expression is over.

If quasis are not nested, then the lexical rule is really simple: Just match the `s, and within the literal, match the {}s.

I would prefer to keep it simple, unless there is a compelling requirement to provide nesting. If we do the simple version now, we could allow the nested case in the future.

# Mark S. Miller (14 years ago)

On Thu, Feb 2, 2012 at 5:09 AM, Douglas Crockford <douglas at crockford.com>wrote:

On 11:59 AM, Waldemar Horwat wrote:

On 02/01/2012 11:35 AM, Allen Wirfs-Brock wrote: Here's one which I couldn't express in a lexer grammar: How to restart the quasi after an included expression is over.

If quasis are not nested, then the lexical rule is really simple: Just match the `s, and within the literal, match the {}s.

I would prefer to keep it simple, unless there is a compelling requirement to provide nesting. If we do the simple version now, we could allow the nested case in the future.

When we came up with this "simplification", I thought I could live with it. Now, having tried to write some examples within these restrictions, I find it unusable.

I think we're overestimating the parsing difficulty. I'll let Mike speak for the real plan. But I'd like to explain what I do in E, so that we can see that none of this need be complicated. It does involve an interaction between the parsing and lexing levels, but much less complex than you may expect, and comparable (IMO less) than the existing unclean interaction that JS already has:

Lexing grammar has four new token types.

QuasiOnly ::

    ` QuasiChar* `

QuasiOpen ::

    ` QuasiChar* $

QuasiMiddle ::

    QuasiChar*

QuasiEnd ::

    QuasiChar `

Parsing grammar:

quasiExpr :

    Identifier? quasiExprLiteral

quasiExprLiteral :

    QuasiOnly

    QuasiOpen quasiHole (QuasiMiddle quasiHole)* QuasiClose

quasiHole :

    Identifier

    curlyBalancedTokenSequence

curlyBalancedTokenSequence :

    { expr }

The key thing is that the curlyBalancedTokenSequence starts a normal lexical expression context and counts curlies. When it sees a "}" tokenthat matches its opening "{", the curlyBalancedTokenSequence is done, and we proceed to continue lexing QuasiChar* until we've lexed a QuasiMiddle or QuasiEnd.

Of course, if you don't need to keep you parser and lexer so strongly separated, you can just use the above grammar directly as a one-level grammar, where you use the full expression parser after the "{". This is what I did the first time in E. Either way works. The reason I changed to the looser coupling is so that I could fully lex a program that didn't parse, so I could give more informative error messages.

# Mark S. Miller (14 years ago)

On Thu, Feb 2, 2012 at 11:03 AM, Mark S. Miller <erights at google.com> wrote:

Of course, if you don't need to keep you parser and lexer so strongly separated, you can just use the above grammar directly as a one-level grammar, where you use the full expression parser after the "{". This is what I did the first time in E. Either way works. The reason I changed to the looser coupling is so that I could fully lex a program that didn't parse, so I could give more informative error messages.

This loose coupling is also exactly what we want for syntax highlighting. Syntax highlighting mainly (always?) distinguishes lexical categories, so we want it to be accurate for a program with only parse errors.

# Waldemar Horwat (14 years ago)

On 02/02/2012 11:03 AM, Mark S. Miller wrote:

On Thu, Feb 2, 2012 at 5:09 AM, Douglas Crockford <douglas at crockford.com <mailto:douglas at crockford.com>> wrote:

On 11:59 AM, Waldemar Horwat wrote:

    On 02/01/2012 11:35 AM, Allen Wirfs-Brock wrote:
    Here's one which I couldn't express in a lexer grammar: How to restart
    the quasi after an included expression is over.


If quasis are not nested, then the lexical rule is really simple: Just match the `s, and within the literal, match the {}s.

I would prefer to keep it simple, unless there is a compelling requirement to provide nesting. If we do the simple version now, we could allow the nested case in the future.

When we came up with this "simplification", I thought I could live with it. Now, having tried to write some examples within these restrictions, I find it unusable.

I think we're overestimating the parsing difficulty. I'll let Mike speak for the real plan. But I'd like to explain what I do in E, so that we can see that none of this need be complicated. It does involve an interaction between the parsing and lexing levels, but much less complex than you may expect, and comparable (IMO less) than the existing unclean interaction that JS already has:

Lexing grammar has four new token types.

 QuasiOnly ::

     ` QuasiChar* `

 QuasiOpen ::

     ` QuasiChar* $

 QuasiMiddle ::

     QuasiChar*

 QuasiEnd ::

     QuasiChar `

(presumably you forgot a * in QuasiEnd?)

That's not a valid lexer grammar. The input

if

is now ambiguous -- it can lex as either a keyword or a QuasiMiddle. The input

3+`

will now lex as QuasiEnd, which may or may not be what you want.

 Waldemar
# Mark S. Miller (14 years ago)

On Thu, Feb 2, 2012 at 11:27 AM, Waldemar Horwat <waldemar at google.com>wrote:

On 02/02/2012 11:03 AM, Mark S. Miller wrote:

On Thu, Feb 2, 2012 at 5:09 AM, Douglas Crockford <douglas at crockford.com<mailto:

douglas at crockford.com>**> wrote:

On 11:59 AM, Waldemar Horwat wrote:

   On 02/01/2012 11:35 AM, Allen Wirfs-Brock wrote:
   Here's one which I couldn't express in a lexer grammar: How to

restart the quasi after an included expression is over.

If quasis are not nested, then the lexical rule is really simple: Just match the `s, and within the literal, match the {}s.

I would prefer to keep it simple, unless there is a compelling requirement to provide nesting. If we do the simple version now, we could allow the nested case in the future.

When we came up with this "simplification", I thought I could live with it. Now, having tried to write some examples within these restrictions, I find it unusable.

I think we're overestimating the parsing difficulty. I'll let Mike speak for the real plan. But I'd like to explain what I do in E, so that we can see that none of this need be complicated. It does involve an interaction between the parsing and lexing levels, but much less complex than you may expect, and comparable (IMO less) than the existing unclean interaction that JS already has:

Lexing grammar has four new token types.

QuasiOnly ::

    ` QuasiChar* `

QuasiOpen ::

    ` QuasiChar* $

QuasiMiddle ::

    QuasiChar*

QuasiEnd ::

    QuasiChar `

(presumably you forgot a * in QuasiEnd?)

y. I also messed up one more thing:

 QuasiMiddle ::

     QuasiChar* $

Sorry for the confusion.

That's not a valid lexer grammar.

I didn't explain well enough. QuasiMiddle and QuasiEnd apply only after a quasiHole, and they apply immediately after a quasiHole. That's the complexity I was referring to: it introduces yet another lexing context, and the determination about whether we're in that lexing context demands counting curlies -- which a regular expression can't do.

The input

if

is now ambiguous -- it can lex as either a keyword or a QuasiMiddle.

If it occurs immediately after a quasiHole, then it is a QuasiMiddle or QuasiEnd, depending on whether it is terminated by a $ or `. (See correction above).

The input

3+`

will now lex as QuasiEnd, which may or may not be what you want.

Only if after a quasiHole.

# John Tamplin (14 years ago)

I think this could take the same approach as Dart in dealing with embedded expressions - code.google.com/p/dart/source/browse/branches/bleeding_edge/dart/compiler/java/com/google/dart/compiler/parser/DartScanner.java?r=1805#898

Basically, the scanner returns token sequences like:

"foo" => STRING(foo) "foo $bar baz" => STRING_SEGMENT(foo ) STRING_EMBED_EXPR_START

IDENTIFIER(bar) STRING_EMBED_EXPR_END

which the parser then handles normally, with rules like:

string-expression : STRING | string-interpolation ; string-interpolation : ( STRING_SEGMENT? embedded-exp? )* STRING_LAST_SEGMENT ; // a simplification embedded-exp : STRING_EMBED_EXP_START expression STRING_EMBED_EXP_END ;

I don't know if this would cause ambiguities in the JS grammar, or if it would have other issues applying it to quasis in JS, but it keeps a clean separation between the scanner and parser (it does require some additional state in the scanner, since these can be nested).

# Waldemar Horwat (14 years ago)

OK. This introduces yet another lexing context, in which all productions except QuasiMiddle and QuasiEnd are disallowed, and white space and comment handling is funny. That works if the expressions must be one of the two forms:

$id ${expr}

Is that the exhaustive list, or are we looking at other forms such as $$, $id.id, $id[expr], etc.?

 Waldemar
# Mark S. Miller (14 years ago)

On Thu, Feb 2, 2012 at 2:00 PM, Waldemar Horwat <waldemar at google.com> wrote:

OK. This introduces yet another lexing context, in which all productions except QuasiMiddle and QuasiEnd are disallowed, and white space and comment handling is funny. That works if the expressions must be one of the two forms:

$id ${expr}

Is that the exhaustive list, or are we looking at other forms such as $$, $ id.id, $id[expr], etc.?

I'll let Mike speak for the details of what he really wants to propose. But here are the answers from E:

escapes with the quasi literal text are taken care of by the QuasiChar production, much like the existing definition of DoubleStringCharacter:

QuasiChar ::
    SourceCharacter but not one of $ or `
    $ $
    $ `
    $ \ EscapeSequence

So that $$ === "$", $`` === "", and $\n === "\n", respectively.

Regarding ...$id.id... and ...$id[expr]..., only the first id in each case in in the quasiHole. All the text afterwards is part of the QuasiClose.

# Mark S. Miller (14 years ago)

On Thu, Feb 2, 2012 at 4:15 PM, Mark S. Miller <erights at google.com> wrote:

On Thu, Feb 2, 2012 at 2:00 PM, Waldemar Horwat <waldemar at google.com>wrote:

OK. This introduces yet another lexing context, in which all productions except QuasiMiddle and QuasiEnd are disallowed, and white space and comment handling is funny. That works if the expressions must be one of the two forms:

$id ${expr}

Is that the exhaustive list, or are we looking at other forms such as $$, $id.id, $id[expr], etc.?

I'll let Mike speak for the details of what he really wants to propose. But here are the answers from E:

escapes with the quasi literal text are taken care of by the QuasiChar production, much like the existing definition of

escapes within ...

# Waldemar Horwat (14 years ago)

On 02/02/2012 04:15 PM, Mark S. Miller wrote:

On Thu, Feb 2, 2012 at 2:00 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:

OK.  This introduces yet another lexing context, in which all productions *except* QuasiMiddle and QuasiEnd are disallowed, and white space and comment handling is funny.  That works if the expressions must be one of the two forms:

$id
${expr}

Is that the exhaustive list, or are we looking at other forms such as $$, $id.id <http://id.id>, $id[expr], etc.?

I'll let Mike speak for the details of what he really wants to propose. But here are the answers from E:

escapes with the quasi literal text are taken care of by the QuasiChar production, much like the existing definition of DoubleStringCharacter:

 QuasiChar ::
     SourceCharacter but not one of $ or `
     $ $
     $ `
     $ \ EscapeSequence

So that $$ === "$", $`` === "", and $\n === "\n", respectively.

Regarding ...$id.id... and ...$id[expr]..., only the first id in each case in in the quasiHole. All the text afterwards is part of the QuasiClose.

Good. I'll have to think about this a bit more, but there's a chance you converted me.

 Waldemar
# Waldemar Horwat (14 years ago)

On 02/02/2012 06:27 PM, Waldemar Horwat wrote:

On 02/02/2012 04:15 PM, Mark S. Miller wrote:

On Thu, Feb 2, 2012 at 2:00 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:

OK. This introduces yet another lexing context, in which all productions except QuasiMiddle and QuasiEnd are disallowed, and white space and comment handling is funny. That works if the expressions must be one of the two forms:

$id ${expr}

Is that the exhaustive list, or are we looking at other forms such as $$, $id.id, id.id, $id[expr], etc.?

I'll let Mike speak for the details of what he really wants to propose. But here are the answers from E:

escapes with the quasi literal text are taken care of by the QuasiChar production, much like the existing definition of DoubleStringCharacter:

QuasiChar :: SourceCharacter but not one of $ or $ $ $ $ \ EscapeSequence

So that $$ === "$", $`` === "", and $\n === "\n", respectively.

Regarding ...$id.id... and ...$id[expr]..., only the first id in each case in in the quasiHole. All the text afterwards is part of the QuasiClose.

Good. I'll have to think about this a bit more, but there's a chance you converted me.

Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp. Here comments and white space are also affected, which can in turn the structure of the lexer upside down. The kinds of cases I'm thinking of are:

abc$/*comment*/identifier// (here we have a /**/ comment and a // comment)

abc$/**/{/**//re//**/}/**/def vs: abc$/**/{/**//re//**/}/*def (in the former all four "/**/"'s are comments. Not sure what the latter would do.)

abc$id def abc$ id def (the lexer removes spaces before all tokens, so the quasi would not contain a space before the "def")

 Waldemar
# Mark S. Miller (14 years ago)

On Fri, Feb 3, 2012 at 12:58 PM, Waldemar Horwat <waldemar at google.com>wrote:

On 02/02/2012 06:27 PM, Waldemar Horwat wrote:

[...]

Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp. Here comments and white space are also affected, which can in turn the structure of the lexer upside down. The kinds of cases I'm thinking of are:

abc$/*comment*/identifier// (here we have a /**/ comment and a // comment)

There is no valid quasiHole above, so the whole thing matches a QuasiOnly. The QuasiOnly includes all characters between the backticks. Nothing is taken to be a comment, just like it wouldn't be if it appeared within a string.

abc$/**/{/**//re//**/}/**/**def vs: abc$/**/{/**//re//**/}/*def (in the former all four "/**/"'s are comments. Not sure what the latter would do.)

Same thing. There is no valid quasiHole here.

abc$id def abc$ id def (the lexer removes spaces before all tokens, so the quasi would not contain a space before the "def")

The first has a valid quasiHole, and so would parse as QuasiOpen("abc"), Identifier("id"), QuasiClose(" def"). (Note space captured in the QuasiClose text.)

The second has no valid quasiHole, and so the whole thing would again parse as a QuasiOnly.

I think of, for example, QuasiMiddle as being much like DoubleStringChars. Once you're lexing that, all spaces are significant. As a lexing context, I don't really see how quasis are weirder than strings.

However, from your example, I think I see what you're getting at. I forgot to state that a quasiExpr is only started if a QuasiOpen or QuasiMiddle ends with an (unescaped by previous $\ ) $ followed immediately, with no intervening characters, by either an Identifier or a "{". I don't see this as weirder than having a string terminate by (\ ") but not by ("). The " is only processed specially if it comes immediately after an (unescaped by previous \ ) \ .

Similarly, you go back into quasi context immediately following the identifier or matching } respectively, i.e., exactly when the quasiHole production is over.

# Waldemar Horwat (14 years ago)

On 02/03/2012 08:07 PM, Mark S. Miller wrote:

On Fri, Feb 3, 2012 at 12:58 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:

On 02/02/2012 06:27 PM, Waldemar Horwat wrote:

[...]

Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp.  Here comments and white space are also affected, which can in turn the structure of the lexer upside down.  The kinds of cases I'm thinking of are:

`abc$/*comment*/identifier//
`
(here we have a /**/ comment and a // comment)

There is no valid quasiHole above, so the whole thing matches a QuasiOnly. The QuasiOnly includes all characters between the backticks. Nothing is taken to be a comment, just like it wouldn't be if it appeared within a string.

According to which lexical grammar? According to the one you provided earlier in this thread, `abc$ is a QuasiOpen token:

QuasiOpen :: ` QuasiChar* $

Parsing further, /comment/identifier is a single identifier token as far as the syntactic grammar is concerned.

 Waldemar
# Mark S. Miller (14 years ago)

On Mon, Feb 6, 2012 at 3:26 PM, Waldemar Horwat <waldemar at google.com> wrote:

On 02/03/2012 08:07 PM, Mark S. Miller wrote:

On Fri, Feb 3, 2012 at 12:58 PM, Waldemar Horwat <waldemar at google.com<mailto:

waldemar at google.com>> wrote:

On 02/02/2012 06:27 PM, Waldemar Horwat wrote:

[...]

Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp. Here comments and white space are also affected, which can in turn the structure of the lexer upside down. The kinds of cases I'm thinking of are:

abc$/*comment*/identifier// (here we have a /**/ comment and a // comment)

There is no valid quasiHole above, so the whole thing matches a QuasiOnly. The QuasiOnly includes all characters between the backticks. Nothing is taken to be a comment, just like it wouldn't be if it appeared within a string.

According to which lexical grammar? According to the one you provided earlier in this thread, `abc$ is a QuasiOpen token:

QuasiOpen :: ` QuasiChar* $

Parsing further, /comment/identifier is a single identifier token as far as the syntactic grammar is concerned.

I was imprecise. I'll try again, using only lexical grammar concepts and making explicit where whitespace, comments, etc may appear.

Token ::
    IdentifierName
    Punctuator
    NumericLiteral
    StringLiteral
    Quasi

Quasi ::
    QuasiOnly
    QuasiOpen QuasiHole (QuasiMiddle QuasiHole)* QuasiClose

QuasiOnly ::
    ` QuasiChar* `

QuasiOpen ::
    ` QuasiChar* $

QuasiMiddle ::
    QuasiChar* $

QuasiEnd ::
    QuasiChar* `

QuasiChar ::
    SourceCharacter *but not one of $ or `*
    $ $
    $ `
    $ \ EscapeSequence

QuasiHole ::
    Identifier
    { Spacing* (BalancedCurlySequence Spacing*)* }

BalancedCurlySequence ::
    Token *but not one of { or }*
    { Spacing* (BalancedCurlySequence Spacing*)* }

Spacing ::
    WhiteSpace
    LineTerminator
    Comment

Within a Quasi, no character sequences are interpreted as whitespace or comments except where indicated by Spacing above.

# Erik Arvidsson (14 years ago)

On Mon, Feb 6, 2012 at 18:49, Mark S. Miller <erights at google.com> wrote:

QuasiHole ::         Identifier         { Spacing* (BalancedCurlySequence Spacing*)* }

If you replace that with:

QuasiHole ::         Identifier         { Spacing* Expression Spacing* }

You can now support nested quasis

Your grammar allows abc{ }def. Was that intentional?

# Erik Arvidsson (14 years ago)

On Mon, Feb 6, 2012 at 18:49, Mark S. Miller <erights at google.com> wrote:

QuasiChar ::         SourceCharacter but not one of $ or $ $         $         $ \ EscapeSequence

This part was never in the Quasi proposal. abc$$def is the same as abc\${def} according to the current proposal.

# Waldemar Horwat (14 years ago)

On 02/06/2012 06:49 PM, Mark S. Miller wrote:

On Mon, Feb 6, 2012 at 3:26 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:

On 02/03/2012 08:07 PM, Mark S. Miller wrote:

    On Fri, Feb 3, 2012 at 12:58 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com> <mailto:waldemar at google.com <mailto:waldemar at google.com>>> wrote:

        On 02/02/2012 06:27 PM, Waldemar Horwat wrote:

    [...]

        Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp.  Here comments and white space are also affected, which can in turn the structure of the lexer upside down.  The kinds of cases I'm thinking of are:

        `abc$/*comment*/identifier//
        `
        (here we have a /**/ comment and a // comment)


    There is no valid quasiHole above, so the whole thing matches a QuasiOnly. The QuasiOnly includes all characters between the backticks. Nothing is taken to be a comment, just like it wouldn't be if it appeared within a string.


According to which lexical grammar?  According to the one you provided earlier in this thread, `abc$ is a QuasiOpen token:

  QuasiOpen ::
        ` QuasiChar* $


Parsing further, /*comment*/identifier is a single identifier token as far as the syntactic grammar is concerned.

I was imprecise. I'll try again, using only lexical grammar concepts and making explicit where whitespace, comments, etc may appear.

 Token ::
     IdentifierName
     Punctuator
     NumericLiteral
     StringLiteral
     Quasi

 Quasi ::
     QuasiOnly
     QuasiOpen QuasiHole (QuasiMiddle QuasiHole)* QuasiClose

 QuasiOnly ::
     ` QuasiChar* `

 QuasiOpen ::
     ` QuasiChar* $

 QuasiMiddle ::
     QuasiChar* $

 QuasiEnd ::
     QuasiChar* `

 QuasiChar ::
     SourceCharacter *but not one of $ or `*
     $ $
     $ `
     $ \ EscapeSequence

 QuasiHole ::
     Identifier
     { Spacing* (BalancedCurlySequence Spacing*)* }

 BalancedCurlySequence ::
     Token *but not one of { or }*
     { Spacing* (BalancedCurlySequence Spacing*)* }

 Spacing ::
     WhiteSpace
     LineTerminator
     Comment

Within a Quasi, no character sequences are interpreted as whitespace or comments except where indicated by Spacing above.

That's going back to the previous approach of treating the whole quasi as a single token. This doesn't work because it's not possible to specify the BalancedCurlySequence production as a lexical grammar. You're confusing the lexical with the syntactic grammars here.

Examples of why BalancedCurlySequence doesn't work:

{/[{]/} (interior parses as five single-character tokens but no matching closing bracket)

{ainb} (interior parses as three tokens: a in b)

{3.toString()} (interior parses as 3 . toString ( ))

 Waldemar
# Erik Arvidsson (14 years ago)

Correction...

This part was never in the Quasi proposal. abc$$def is the same as abc\${def} according to the current proposal.

abc\$${def}

The point is that only $ident and ${...} are special. In all other contexts, $ is a normal character.

# Mark S. Miller (14 years ago)

On Tue, Feb 7, 2012 at 9:48 AM, Erik Arvidsson <erik.arvidsson at gmail.com>wrote:

On Mon, Feb 6, 2012 at 18:49, Mark S. Miller <erights at google.com> wrote:

QuasiHole ::
    Identifier
    { Spacing* (BalancedCurlySequence Spacing*)* }

If you replace that with:

QuasiHole ::
    Identifier
    { Spacing* Expression Spacing* }

You can now support nested quasis

Your grammar allows abc{ }def. Was that intentional?

Hi Erik, it was not my intention. Your grammar does better capture my intention, and is approximately what I specified on Feb 2 before Waldemar's question about spaces and comments. If there's no objection to your way of mixing the lexical and parsing issues, and if it succeeds at avoiding the spacing and comment placement issues Waldemar raises (I think it does), I think that's superior to the approach I was taking. Thanks.

# Mark S. Miller (14 years ago)

To reiterate, my posts here are expository, to explain how I approached these same matters in E, to see if that helps resolve any remaining controversy. Regarding what we're actually proposing, I'll let Mike speak for that.

# Mark S. Miller (14 years ago)

On Tue, Feb 7, 2012 at 1:52 PM, Waldemar Horwat <waldemar at google.com> wrote: [...]

That's going back to the previous approach of treating the whole quasi as a single token. This doesn't work because it's not possible to specify the BalancedCurlySequence production as a lexical grammar. You're confusing the lexical with the syntactic grammars here.

Hi Waldemar, I am first of all trying to make clear what we're actually proposing, and to resolve any genuine ambiguity. As for how we phrase this proposal so that it fits with the rest of our spec language, what do you suggest?

Examples of why BalancedCurlySequence doesn't work:

{/[{]/} (interior parses as five single-character tokens but no matching closing bracket)

Yes, and therefore a program consisting of

`{/[{]/}`

fails to lex and fails to parse. That seems like the correct outcome.

{ainb} (interior parses as three tokens: a in b)

Why doesn't it parse as one token: ainb ?

{3.toString()} (interior parses as 3 . toString ( ))

Why? That's not what the JS lexer does anywhere else?

I don't at all see how you arrived at your conclusions. Is it actually unclear what I am trying to say, or are you simply taking issue with how I'm saying it? If you find Erik's way of specifying ok, let's just use that. As I just said in reply to him, it does capture my actual intent more directly.

# Waldemar Horwat (14 years ago)

On 02/07/2012 02:51 PM, Mark S. Miller wrote:

On Tue, Feb 7, 2012 at 1:52 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote: [...]

That's going back to the previous approach of treating the whole quasi as a single token.  This doesn't work because it's not possible to specify the BalancedCurlySequence production as a lexical grammar.  You're confusing the lexical with the syntactic grammars here.

Hi Waldemar, I am first of all trying to make clear what we're actually proposing, and to resolve any genuine ambiguity. As for how we phrase this proposal so that it fits with the rest of our spec language, what do you suggest?

Examples of why BalancedCurlySequence doesn't work:

{/[{]/}
(interior parses as five single-character tokens but no matching closing bracket)

Yes, and therefore a program consisting of

 `{/[{]/}`

fails to lex and fails to parse. That seems like the correct outcome.

Why? It's just a regexp.

{ainb}
(interior parses as three tokens: a in b)

Why doesn't it parse as one token: ainb ?

The point is that a in b is one valid parse. I don't need to show that there are no other valid parses. In fact, there are lots of other valid parses because the grammar is very ambiguous.

{3.toString()}
(interior parses as 3 . toString ( ))

Why? That's not what the JS lexer does anywhere else?

That's the problem with the rule you gave.

I don't at all see how you arrived at your conclusions. Is it actually unclear what I am trying to say, or are you simply taking issue with how I'm saying it? If you find Erik's way of specifying ok, let's just use that. As I just said in reply to him, it does capture my actual intent more directly.

The bug is in what you're trying to say, not in how you're saying it. You're confusing the lexical and syntactic grammars. Due to this confusion you're trying lexical productions such as

BalancedCurlySequence :: Token but not one of { or } { Spacing* (BalancedCurlySequence Spacing*)* }

To illustrate the problem, consider a simpler lexer rule:

TokenSequence :: Token*

This will lex ainb as many things, including for example a in b. The existing lexer resolves it by always chomping the largest sequence of characters to bite off as the next lexical token. Once it accepts a token, it doesn't backtrack if it later finds an alternative parse for that token that would have made future tokens work better. On the other hand, if you allow productions such as a TokenSequence inside a lexical token, then you get full backtracking and ambiguity across the Tokens that make up the TokenSequence because they are all part of one lexical token.

I was favorable to splitting up a quasi into multiple tokens, where this problem for the most part doesn't arise. If you want to make the whole quasi into one token, then you'll need to solve this problem.

 Waldemar
# Mark S. Miller (14 years ago)

On Tue, Feb 7, 2012 at 3:47 PM, Waldemar Horwat <waldemar at google.com> wrote: [...]

To illustrate the problem, consider a simpler lexer rule:

TokenSequence :: Token*

This will lex ainb as many things, including for example a in b.

I now understand your objection. Rather than trying to repair my way of saying this, do you find Erik's approach clear? If so, let's just start there. It does correspond exactly to what I've been trying to explain.

# Brendan Eich (14 years ago)

I like Erik's way, but it makes a strange loop from lexical to syntactic grammar. It all works, I believe.

The loop is here:

 QuasiHole ::
     Identifier
     { Spacing* Expression Spacing* }

Expression is a syntactic grammar non-terminal, yet here we are in a lexical production.

Waldemar, is this sound?

# Waldemar Horwat (14 years ago)

On 02/07/2012 04:40 PM, Brendan Eich wrote:

I like Erik's way, but it makes a strange loop from lexical to syntactic grammar. It all works, I believe.

The loop is here:

QuasiHole :: Identifier { Spacing* Expression Spacing* }

Expression is a syntactic grammar non-terminal, yet here we are in a lexical production.

Waldemar, is this sound?

QuasiHole is a syntactic production, not a lexical one. See Mark's grammar in his 02/02/2012 11:03 AM message.

I believe that it works, except for the treatment of comments and whitespace along the boundary of a QuasiHole. I recently gave some examples of the mischief those can create unless we can figure out what to do about them.

 Waldemar
# Mark S. Miller (14 years ago)

I believe we have all figured out what to do about them and agree on the same answer. We're only struggling to find a way to state the answer.

If you understand what we're trying to say, please suggest a way to say it you would find acceptable. If you don't understand, can we proceed by example until you understand our intent, so that we can then proceed to discuss how to say it?