How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

# 程劭非 (13 years ago)

everyone

I noticed both “IdentifierName” and “Identifier” appeared in syntactical grammar of ES5. It seems a lexer will not be able to decide a token to be an "IdentifierName" or "Identifier" during lexing phase.

A similar problem is “get”. "get" is not a keyword but is used like a keyword in object literal. It can also be used as an "IdentifierName" or "Identifier".

To solve the above issues, I guess we can treat "IdentifierName" as a syntactical symbol instead of a token type, using the following production:

IdentifierName :: Identifier Keywords FutureReservedWord

/Shaofei Cheng

Hi, everyone

I noticed both “IdentifierName” and “Identifier” appeared in
syntactical grammar of ES5. It seems a lexer will not be able to
decide a token to be an "IdentifierName" or "Identifier" during lexing
phase.

A similar problem is “get”. "get" is not a keyword but is used like a
keyword in object literal. It can also be used as an "IdentifierName"
or "Identifier".

To solve the above issues, I guess we can treat "IdentifierName" as a
syntactical symbol instead of a token type, using the following
production:

IdentifierName ::
    Identifier
    Keywords
    FutureReservedWord


/Shaofei Cheng

# Brendan Eich (13 years ago)

SpiderMonkey, at least, uses feed-forward from parser to lexer, a one-bit mode flag:

mxr.mozilla.org/mozilla-central/search?string=TSF_KEYWORD_IS_NAME

SpiderMonkey, at least, uses feed-forward from parser to lexer, a 
one-bit mode flag:

http://mxr.mozilla.org/mozilla-central/search?string=TSF_KEYWORD_IS_NAME

/be

程劭非 wrote:
> Hi, everyone
>
> I noticed both “IdentifierName” and “Identifier” appeared in
> syntactical grammar of ES5. It seems a lexer will not be able to
> decide a token to be an "IdentifierName" or "Identifier" during lexing
> phase.
>
> A similar problem is “get”. "get" is not a keyword but is used like a
> keyword in object literal. It can also be used as an "IdentifierName"
> or "Identifier".
>
> To solve the above issues, I guess we can treat "IdentifierName" as a
> syntactical symbol instead of a token type, using the following
> production:
>
> IdentifierName ::
>      Identifier
>      Keywords
>      FutureReservedWord
>
>
> /Shaofei Cheng
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Allen Wirfs-Brock (13 years ago)

If I was going to move something to the syntactic grammar it would probably be the current definition of Identifier

Identifier : IdentifierName but not ReservedWord

You might then lex all IdentifierNames (including ReservedWord) as IdentifierName tokens and treat all occurrences of keyword terminals in the syntactic grammar as short-hands for saying: IdentifierName matching this specific keyword. For example:

PropertyAssignment : get PropertyName ( ) { FunctionBody }

could be interpreted as:

PropertyAssignment : IdentifierName PropertyName ( ) { FunctionBody }

with the static semantic restriction that the text of IdentifierName must be "get"

If I was going to move something to the syntactic grammar it would probably be the current definition of Identifier

Identifier :
    IdentifierName but not ReservedWord

You might  then lex all IdentifierNames (including ReservedWord) as IdentifierName tokens and treat all occurrences of keyword terminals in the syntactic grammar as short-hands for saying: IdentifierName matching this specific keyword.  For example:

PropertyAssignment :
   get PropertyName ( ) { FunctionBody }

could be interpreted as:

PropertyAssignment :
   IdentifierName PropertyName ( ) { FunctionBody }

with the static semantic restriction that the text of IdentifierName must be "get"


Allen

# 程劭非 (12 years ago)

Though it's a little too long since this discussion, I've tried Allen's idea in my parser and still find conflicting.

Consider the following rules:

PropertyAssignment : IdentifierName PropertyName ( ) { FunctionBody }

PropertyAssignment : PropertyName : AssignmentExpression

PropertyName : IdentifierName

when a parser get “IdentifierName” it need to decide reduce the IdentifierName into "get" or “PropertyName”. For LR parsers there is no way to do these things.

I would suggest another way:

IdentifierName :: FutureReservedWord Keywords Identifier SpecialWord

SpecialWord :: get set

2012/5/3 Allen Wirfs-Brock <allen at wirfs-brock.com>

Though it's a little too long since this discussion, I've tried Allen's
idea in my parser and still find conflicting.

Consider the following rules:

PropertyAssignment :
    IdentifierName PropertyName ( ) { FunctionBody }

PropertyAssignment :
    PropertyName : AssignmentExpression

PropertyName :
    IdentifierName

when a parser get “IdentifierName” it need to decide reduce
the IdentifierName into "get" or “PropertyName”. For LR parsers there is no
way to do these things.

I would suggest another way:

IdentifierName ::
    FutureReservedWord
    Keywords
    Identifier
    SpecialWord

SpecialWord ::
    get
    set

2012/5/3 Allen Wirfs-Brock <allen at wirfs-brock.com>

> If I was going to move something to the syntactic grammar it would
> probably be the current definition of Identifier
>
> Identifier :
>     IdentifierName but not ReservedWord
>
> You might  then lex all IdentifierNames (including ReservedWord) as
> IdentifierName tokens and treat all occurrences of keyword terminals in the
> syntactic grammar as short-hands for saying: IdentifierName matching this
> specific keyword.  For example:
>
> PropertyAssignment :
>    get PropertyName ( ) { FunctionBody }
>
> could be interpreted as:
>
> PropertyAssignment :
>    IdentifierName PropertyName ( ) { FunctionBody }
>
> with the static semantic restriction that the text of IdentifierName must
> be "get"
>
>
> Allen
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130204/5e0a7124/attachment.html>

# Michael Dyck (12 years ago)

程劭非 wrote:

Though it's a little too long since this discussion, I've tried Allen's idea in my parser and still find conflicting.

Consider the following rules:

PropertyAssignment : IdentifierName PropertyName ( ) { FunctionBody }

PropertyAssignment : PropertyName : AssignmentExpression

PropertyName : IdentifierName

when a parser get “IdentifierName” it need to decide reduce the IdentifierName into "get" or “PropertyName”.

Strictly speaking, it wouldn't reduce IdentifierName to "get", because there isn't a production "get" : IdentifierName Instead, it's a shift-reduce conflict.

For LR parsers there is no way to do these things.

There's no way only if you're talking about an LR(0) parser. But an LR(1) parser would have 1 token of lookahead to resolve the conflict: -- if the next token is ":", reduce IdentifierName to PropertyName; -- if the next token is an IdentifierName, shift that. -- if the next token is anything else, syntax error.

Similarly, an LL(1) parser wouldn't be able to decide between the two alternatives for PropertyAssignment, but an LL(2) could do it.

(All of this is ignoring the effects of other productions.)

程劭非 wrote:
> Though it's a little too long since this discussion, I've tried Allen's 
> idea in my parser and still find conflicting.
> 
> Consider the following rules:
> 
> PropertyAssignment :
>     IdentifierName PropertyName ( ) { FunctionBody }
> 
> PropertyAssignment :
>     PropertyName : AssignmentExpression
> 
> PropertyName :
>     IdentifierName
> 
> when a parser get “IdentifierName” it need to decide reduce 
> the IdentifierName into "get" or “PropertyName”.

Strictly speaking, it wouldn't reduce IdentifierName to "get", because 
there isn't a production
     "get" : IdentifierName
Instead, it's a shift-reduce conflict.

> For LR parsers there is no way to do these things.

There's no way only if you're talking about an LR(0) parser. But an LR(1) 
parser would have 1 token of lookahead to resolve the conflict:
  -- if the next token is ":", reduce IdentifierName to PropertyName;
  -- if the next token is an IdentifierName, shift that.
  -- if the next token is anything else, syntax error.

Similarly, an LL(1) parser wouldn't be able to decide between the two 
alternatives for PropertyAssignment, but an LL(2) could do it.

(All of this is ignoring the effects of other productions.)

-Michael

# Brendan Eich (12 years ago)

Michael Dyck wrote:

There's no way only if you're talking about an LR(0) parser. But an LR(1) parser would have 1 token of lookahead to resolve the conflict: -- if the next token is ":", reduce IdentifierName to PropertyName; -- if the next token is an IdentifierName, shift that. -- if the next token is anything else, syntax error.

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

Michael Dyck wrote:
> There's no way only if you're talking about an LR(0) parser. But an 
> LR(1) parser would have 1 token of lookahead to resolve the conflict:
>  -- if the next token is ":", reduce IdentifierName to PropertyName;
>  -- if the next token is an IdentifierName, shift that.
>  -- if the next token is anything else, syntax error.

Don't forget method definition shorthand syntax, if the next token is 
"(". Then the method is named "get", of course.

/be

# Michael Dyck (12 years ago)

Brendan Eich wrote:

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

Yup. Like I said, I was ignoring the effect of other productions. (That's what the OP appeared to be doing too.)

Brendan Eich wrote:
> 
> Don't forget method definition shorthand syntax, if the next token is 
> "(". Then the method is named "get", of course.

Yup. Like I said, I was ignoring the effect of other productions.
(That's what the OP appeared to be doing too.)

-Michael

# gaz Heyes (12 years ago)

On 4 February 2013 21:56, Michael Dyck <jmdyck at ibiblio.org> wrote:

Brendan Eich wrote:

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

I find the syntax of set/get confusing: ({'get'x(){return 123;}}).x

On 4 February 2013 21:56, Michael Dyck <jmdyck at ibiblio.org> wrote:

> Brendan Eich wrote:
>
>>
>> Don't forget method definition shorthand syntax, if the next token is
>> "(". Then the method is named "get", of course.
>>
>
I find the syntax of set/get confusing:
({'get'x(){return 123;}}).x
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130204/fe3f5f69/attachment.html>

# Brendan Eich (12 years ago)

gaz Heyes wrote:

I find the syntax of set/get confusing:

What's confusing?

({'get'x(){return 123;}}).x

That's not legal ES5.

gaz Heyes wrote:
> I find the syntax of set/get confusing:

What's confusing?

> ({'get'x(){return 123;}}).x 

That's not legal ES5.

/be

# gaz Heyes (12 years ago)

On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:

What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword. Then a property descriptor uses a completely new syntax to define the same thing. Why? Object.defineProperty(window,'x',{set:alert}); x=1;

To me this seems hacked together.

({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point.

On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:

> What's confusing?
>

The fact that you can have an object property without a colon and a
function without a function keyword. Then a property descriptor uses a
completely new syntax to define the same thing. Why?
Object.defineProperty(window,'x',{set:alert});
x=1;

To me this seems hacked together.

> ({'get'x(){return 123;}}).x
>>
>
> That's not legal ES5.
>

Some engines support it though and I'm pretty sure Firefox did at some
point.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130205/c8bd1e9d/attachment.html>

# Rick Waldron (12 years ago)

On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <gazheyes at gmail.com> wrote:

On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:

What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword.

ES6 concise methods will make this the norm:

let o = { meaning() { return 42; } };

o.meaning(); // 42

Then a property descriptor uses a completely new syntax to define the same thing. Why? Object.defineProperty(window,'x',{set:alert}); x=1;

What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?

To me this seems hacked together.

({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point.

I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:

({ get x() { return 123; } }).x

On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <gazheyes at gmail.com> wrote:

> On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:
>
>> What's confusing?
>>
>
> The fact that you can have an object property without a colon and a
> function without a function keyword.
>

ES6 concise methods will make this the norm:

let o = {
  meaning() {
    return 42;
  }
};

o.meaning(); // 42



> Then a property descriptor uses a completely new syntax to define the same
> thing. Why?
> Object.defineProperty(window,'x',{set:alert});
> x=1;
>


What part is "new syntax"? Property descriptors are just object literal
syntax—did you mean "different syntax"?



>
> To me this seems hacked together.
>
>
>> ({'get'x(){return 123;}}).x
>>>
>>
>> That's not legal ES5.
>>
>
> Some engines support it though and I'm pretty sure Firefox did at some
> point.
>

I think Brendan was referring to the quotes, ie. 'get'. Remove those for
legal syntax:

({ get x() { return 123; } }).x


Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130205/06cddb5f/attachment.html>

# Brandon Benvie (12 years ago)

Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Object.define(x, { get a(){}, set a(v){}, get b(){}, c(){} });

Instead of most current descriptor stuff (since enumerability and configurability are rarely desired to be false).

Indeed, and given use of ES6, I expect things like this wouldn't be very
uncommon (I think is supposed to be Object.define right?):

Object.define(x, {
  get a(){},
  set a(v){},
  get b(){},
  c(){}
});

Instead of most current descriptor stuff (since enumerability and
configurability are rarely desired to be false).

On Tuesday, February 5, 2013, Rick Waldron wrote:

>
>
>
> On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <gazheyes at gmail.com> wrote:
>
>> On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:
>>
>>> What's confusing?
>>>
>>
>> The fact that you can have an object property without a colon and a
>> function without a function keyword.
>>
>
> ES6 concise methods will make this the norm:
>
> let o = {
>   meaning() {
>     return 42;
>   }
> };
>
> o.meaning(); // 42
>
>
>
>> Then a property descriptor uses a completely new syntax to define the
>> same thing. Why?
>> Object.defineProperty(window,'x',{set:alert});
>> x=1;
>>
>
>
> What part is "new syntax"? Property descriptors are just object literal
> syntax—did you mean "different syntax"?
>
>
>
>>
>> To me this seems hacked together.
>>
>>
>>> ({'get'x(){return 123;}}).x
>>>>
>>>
>>> That's not legal ES5.
>>>
>>
>> Some engines support it though and I'm pretty sure Firefox did at some
>> point.
>>
>
> I think Brendan was referring to the quotes, ie. 'get'. Remove those for
> legal syntax:
>
> ({ get x() { return 123; } }).x
>
>
> Rick
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130205/2b558cad/attachment.html>

# Rick Waldron (12 years ago)

On Tue, Feb 5, 2013 at 11:55 AM, Brandon Benvie <brandon at brandonbenvie.com>wrote:

Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Nothing there yet, though I suspect Object.mixin() will have more traction.

esdiscuss/2012-December/027037

On Tue, Feb 5, 2013 at 11:55 AM, Brandon Benvie
<brandon at brandonbenvie.com>wrote:

> Indeed, and given use of ES6, I expect things like this wouldn't be very
> uncommon (I think is supposed to be Object.define right?):
>

Nothing there yet, though I suspect Object.mixin() will have more traction.

https://mail.mozilla.org/pipermail/es-discuss/2012-December/027037.html

Rick


>
> Object.define(x, {
>   get a(){},
>   set a(v){},
>   get b(){},
>   c(){}
> });
>
> Instead of most current descriptor stuff (since enumerability and
> configurability are rarely desired to be false).
>
>
> On Tuesday, February 5, 2013, Rick Waldron wrote:
>
>>
>>
>>
>> On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <gazheyes at gmail.com> wrote:
>>
>>> On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:
>>>
>>>> What's confusing?
>>>>
>>>
>>> The fact that you can have an object property without a colon and a
>>> function without a function keyword.
>>>
>>
>> ES6 concise methods will make this the norm:
>>
>> let o = {
>>   meaning() {
>>     return 42;
>>   }
>> };
>>
>> o.meaning(); // 42
>>
>>
>>
>>> Then a property descriptor uses a completely new syntax to define the
>>> same thing. Why?
>>> Object.defineProperty(window,'x',{set:alert});
>>> x=1;
>>>
>>
>>
>> What part is "new syntax"? Property descriptors are just object literal
>> syntax—did you mean "different syntax"?
>>
>>
>>
>>>
>>> To me this seems hacked together.
>>>
>>>
>>>> ({'get'x(){return 123;}}).x
>>>>>
>>>>
>>>> That's not legal ES5.
>>>>
>>>
>>> Some engines support it though and I'm pretty sure Firefox did at some
>>> point.
>>>
>>
>> I think Brendan was referring to the quotes, ie. 'get'. Remove those for
>> legal syntax:
>>
>> ({ get x() { return 123; } }).x
>>
>>
>> Rick
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130205/7da462a8/attachment.html>

# Allen Wirfs-Brock (12 years ago)

Right, I think "mixin" is winning over "define" as the name. Same semantics in either case.

Right, I think "mixin" is winning over "define" as the name.   Same semantics in either case.

Allen

On Feb 5, 2013, at 9:03 AM, Rick Waldron wrote:

> 
> 
> 
> On Tue, Feb 5, 2013 at 11:55 AM, Brandon Benvie <brandon at brandonbenvie.com> wrote:
> Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):
> 
> Nothing there yet, though I suspect Object.mixin() will have more traction.
> 
> https://mail.mozilla.org/pipermail/es-discuss/2012-December/027037.html
> 
> Rick
>  
> 
> Object.define(x, {
>   get a(){},
>   set a(v){},
>   get b(){},
>   c(){}
> });
> 
> Instead of most current descriptor stuff (since enumerability and configurability are rarely desired to be false).
> 
> 
> On Tuesday, February 5, 2013, Rick Waldron wrote:
> 
> 
> 
> On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <gazheyes at gmail.com> wrote:
> On 4 February 2013 23:44, Brendan Eich <brendan at mozilla.com> wrote:
> What's confusing?
> 
> The fact that you can have an object property without a colon and a function without a function keyword.
> 
> ES6 concise methods will make this the norm:
> 
> let o = {
>   meaning() {
>     return 42;
>   }
> };
> 
> o.meaning(); // 42
> 
>  
> Then a property descriptor uses a completely new syntax to define the same thing. Why?
> Object.defineProperty(window,'x',{set:alert});
> x=1;
> 
> 
> What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?
> 
>  
> 
> To me this seems hacked together.
>  
> ({'get'x(){return 123;}}).x 
> 
> That's not legal ES5.
> 
> Some engines support it though and I'm pretty sure Firefox did at some point.  
> 
> I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:
> 
> ({ get x() { return 123; } }).x 
> 
> 
> Rick
> 
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130205/255510b2/attachment-0001.html>