How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

# 程劭非 (13 years ago)

everyone

I noticed both “IdentifierName” and “Identifier” appeared in syntactical grammar of ES5. It seems a lexer will not be able to decide a token to be an "IdentifierName" or "Identifier" during lexing phase.

A similar problem is “get”. "get" is not a keyword but is used like a keyword in object literal. It can also be used as an "IdentifierName" or "Identifier".

To solve the above issues, I guess we can treat "IdentifierName" as a syntactical symbol instead of a token type, using the following production:

IdentifierName :: Identifier Keywords FutureReservedWord

/Shaofei Cheng

# Brendan Eich (13 years ago)

SpiderMonkey, at least, uses feed-forward from parser to lexer, a one-bit mode flag:

mxr.mozilla.org/mozilla-central/search?string=TSF_KEYWORD_IS_NAME

# Allen Wirfs-Brock (13 years ago)

If I was going to move something to the syntactic grammar it would probably be the current definition of Identifier

Identifier : IdentifierName but not ReservedWord

You might then lex all IdentifierNames (including ReservedWord) as IdentifierName tokens and treat all occurrences of keyword terminals in the syntactic grammar as short-hands for saying: IdentifierName matching this specific keyword. For example:

PropertyAssignment : get PropertyName ( ) { FunctionBody }

could be interpreted as:

PropertyAssignment : IdentifierName PropertyName ( ) { FunctionBody }

with the static semantic restriction that the text of IdentifierName must be "get"

# 程劭非 (12 years ago)

Though it's a little too long since this discussion, I've tried Allen's idea in my parser and still find conflicting.

Consider the following rules:

PropertyAssignment : IdentifierName PropertyName ( ) { FunctionBody }

PropertyAssignment : PropertyName : AssignmentExpression

PropertyName : IdentifierName

when a parser get “IdentifierName” it need to decide reduce the IdentifierName into "get" or “PropertyName”. For LR parsers there is no way to do these things.

I would suggest another way:

IdentifierName :: FutureReservedWord Keywords Identifier SpecialWord

SpecialWord :: get set

2012/5/3 Allen Wirfs-Brock <allen at wirfs-brock.com>

# Michael Dyck (12 years ago)

程劭非 wrote:

Though it's a little too long since this discussion, I've tried Allen's idea in my parser and still find conflicting.

Consider the following rules:

PropertyAssignment : IdentifierName PropertyName ( ) { FunctionBody }

PropertyAssignment : PropertyName : AssignmentExpression

PropertyName : IdentifierName

when a parser get “IdentifierName” it need to decide reduce the IdentifierName into "get" or “PropertyName”.

Strictly speaking, it wouldn't reduce IdentifierName to "get", because there isn't a production "get" : IdentifierName Instead, it's a shift-reduce conflict.

For LR parsers there is no way to do these things.

There's no way only if you're talking about an LR(0) parser. But an LR(1) parser would have 1 token of lookahead to resolve the conflict: -- if the next token is ":", reduce IdentifierName to PropertyName; -- if the next token is an IdentifierName, shift that. -- if the next token is anything else, syntax error.

Similarly, an LL(1) parser wouldn't be able to decide between the two alternatives for PropertyAssignment, but an LL(2) could do it.

(All of this is ignoring the effects of other productions.)

# Brendan Eich (12 years ago)

Michael Dyck wrote:

There's no way only if you're talking about an LR(0) parser. But an LR(1) parser would have 1 token of lookahead to resolve the conflict: -- if the next token is ":", reduce IdentifierName to PropertyName; -- if the next token is an IdentifierName, shift that. -- if the next token is anything else, syntax error.

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

# Michael Dyck (12 years ago)

Brendan Eich wrote:

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

Yup. Like I said, I was ignoring the effect of other productions. (That's what the OP appeared to be doing too.)

# gaz Heyes (12 years ago)

On 4 February 2013 21:56, Michael Dyck <jmdyck at ibiblio.org> wrote:

Brendan Eich wrote:

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

I find the syntax of set/get confusing: ({'get'x(){return 123;}}).x

# Brendan Eich (12 years ago)

gaz Heyes wrote:

I find the syntax of set/get confusing:

What's confusing?

({'get'x(){return 123;}}).x

That's not legal ES5.

# gaz Heyes (12 years ago)

On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:

What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword. Then a property descriptor uses a completely new syntax to define the same thing. Why? Object.defineProperty(window,'x',{set:alert}); x=1;

To me this seems hacked together.

({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point.

# Rick Waldron (12 years ago)

On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <gazheyes at gmail.com> wrote:

On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:

What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword.

ES6 concise methods will make this the norm:

let o = { meaning() { return 42; } };

o.meaning(); // 42

Then a property descriptor uses a completely new syntax to define the same thing. Why? Object.defineProperty(window,'x',{set:alert}); x=1;

What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?

To me this seems hacked together.

({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point.

I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:

({ get x() { return 123; } }).x

# Brandon Benvie (12 years ago)

Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Object.define(x, { get a(){}, set a(v){}, get b(){}, c(){} });

Instead of most current descriptor stuff (since enumerability and configurability are rarely desired to be false).

# Rick Waldron (12 years ago)

On Tue, Feb 5, 2013 at 11:55 AM, Brandon Benvie <brandon at brandonbenvie.com>wrote:

Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Nothing there yet, though I suspect Object.mixin() will have more traction.

esdiscuss/2012-December/027037

# Allen Wirfs-Brock (12 years ago)

Right, I think "mixin" is winning over "define" as the name. Same semantics in either case.