Is \u006eew a valid Identifier?

# Eric Suen (9 years ago)

"The ReservedWord definitions are specified as literal sequences of specific SourceCharacter elements. A code point in a ReservedWord cannot be expressed by a \ UnicodeEscapeSequence." - what does it mean?

The following code is valid in Chrome, but invalid in firefox and IE.

var \u006eew = 1; // \u006e = 'n'

and valid in Babel/Traceur, invalid in typescript/esprima...

# Caitlin Potter (9 years ago)

That it works in Chrome is a bug, which will hopefully be fixed by Monday or Tuesday!

Per tc39.github.io/ecma262/#sec-keywords, “new” is a Keyword, which makes it a ReservedWord.

Per tc39.github.io/ecma262/#sec-identifiers-static-semantics-early-errors, under the “Identifier: IdentifierName but not ReservedWord” section, the second early error applies here. This applies to new, which is always a reserved word. So whenever an Identifier is expected, if it contains UnicodeEscapeSequences which result in the same StringValue as a ReservedWord, it’s an error.

The spec is similarly explicit in saying that escaped ReservedWords are not valid as ReservedWords. Browsers behave differently here (for instance, jsfiddle.net/jd51pqae <<< at the time of this writing Webkit Nightly prints the text, while other browsers SyntaxError. in Chromes case, this is because it’s tokenized as an Identifier, so the second Identifier “f” is unexpected when parsing a MemberExpression. SpiderMonkey is doing a nice job of reporting clean errors for this kind of thing, that are easier to understand.

There are some odd points though:

  1. ReservedWord restrictions never apply to get or set, even in ObjectLiterals (though currently Chrome fails to treat g\u{65}t or s\u{65}t as an accessor prefix, this is a bug).

  2. In the case of “new.target”, it’s technically legal to write new.t\u{61}rget, but this mostly just seems like an oversight in the spec.

# Allen Wirfs-Brock (9 years ago)

On Nov 7, 2015, at 7:32 AM, Caitlin Potter <caitpotter88 at gmail.com> wrote:

That it works in Chrome is a bug, which will hopefully be fixed by Monday or Tuesday!

Per tc39.github.io/ecma262/#sec-keywords, “new” is a Keyword, which makes it a ReservedWord.

Per tc39.github.io/ecma262/#sec-identifiers-static-semantics-early-errors, under the “Identifier: IdentifierName but not ReservedWord” section, the second early error applies here. This applies to new, which is always a reserved word. So whenever an Identifier is expected, if it contains UnicodeEscapeSequences which result in the same StringValue as a ReservedWord, it’s an error.

yup

The spec is similarly explicit in saying that escaped ReservedWords are not valid as ReservedWords. Browsers behave differently here (for instance, jsfiddle.net/jd51pqae <<< at the time of this writing Webkit Nightly prints the text, while other browsers SyntaxError. in Chromes case, this is because it’s tokenized as an Identifier, so the second Identifier “f” is unexpected when parsing a MemberExpression. SpiderMonkey is doing a nice job of reporting clean errors for this kind of thing, that are easier to understand.

There are some odd points though:

  1. ReservedWord restrictions never apply to get or set, even in ObjectLiterals (though currently Chrome fails to treat g\u{65}t or s\u{65}t as an accessor prefix, this is a bug).

Chrome is doing the right thing here and should not change. The intent is that in the syntactic grammar a sequence of characters in bold font such as get (for example, see ecma-international.org/ecma-262/6.0/#sec-method-definitions, ecma-international.org/ecma-262/6.0/#sec-method-definitions ) matches exactly that sequence and does not match theescaped equivalences. That is why the grammar says get rather than saying Identifier with a static semantic restriction that the StringValue of the Identifier must be “get”.

I think this is implicit in a careful reading of the grammars, but perhaps would benifit from a explicit note stating this.

  1. In the case of “new.target”, it’s technically legal to write new.t\u{61}rget, but this mostly just seems like an oversight in the spec.

no, for the same reason. ecma-international.org/ecma-262/6.0/#sec-left-hand-side-expressions, ecma-international.org/ecma-262/6.0/#sec-left-hand-side-expressions says:

NewTarget : new . target

new is a keyword, target is an explicit character sequence that doesn’t allow for unicode escapes.

# Eric Suen (9 years ago)

So escaped ReservedWords are not valid as ReservedWords nor Identifier, what is it then?

And in export { IdentifierName }, why it's IdentifierName not Identifier. since IdentifierName only valid as Property in ObjectLiteral or Method in Class, is there any way to define ReservedWords as local name?

'get'/'set' is ContextuallyReservedIdentifier, maybe that's the reason?

# Caitlin Potter (9 years ago)

It should be reserved, logically — but the spec does not explicitly forbid this. Unless you take Allen’s “it’s a bold sequence of characters, therefore…” argument, which is all well and good, but is only really explained in the spec in tc39.github.io/ecma262/#sec-grammar-notation, tc39.github.io/ecma262/#sec-grammar-notation in the first paragraph — and then further, unicode escapes tend to be eagerly converted, so “appear in a script exactly as written” can be misleading. Explicit static semantics work a lot better for this sort of thing, as they can be easily referenced.

So, in that case, it’s SpiderMonkey with the bug in the case of accessor methods, and Chrome without it, while Firefox behaves correctly with escaped new . target, and Chrome doesn’t.

# Caitlin Potter (9 years ago)

Actually, I take that back, FF Nightly is behaving differently this morning than it was earlier in the week. Go figure

# Eric Suen (9 years ago)

get/set/as/from/target are valid Identifier, it can't be reserved words

# Caitlin Potter (9 years ago)

You seem to be misunderstanding me

# Eric Suen (9 years ago)

I see, I thought you were refer 'get'/'set'. Indeed escaped ReservedWords should be ReservedWords.

Class a = \u006eew Class()

is valid in Java and C#.

# Allen Wirfs-Brock (9 years ago)

On Nov 7, 2015, at 9:58 AM, Eric Suen <eric.suen.tech at gmail.com> wrote:

I see, I thought you were refer 'get'/'set'. Indeed escaped ReservedWords should be ReservedWords.

Class a = \u006eew Class()

is valid in Java and C#.

But not in ECMAScript 2015. JavaScript is neither Java or C#

# Eric Suen (9 years ago)

Like Caitlin said, logically

Escaped ReservedWords is IdentifierName Escaped ReservedWords is not ReservedWord Identifier is IdentifierName but not ReservedWord Escaped ReservedWords is not Identifier?

I'm writing javascript parser myself, those inconsistency really confuse me...

# Andreas Rossberg (9 years ago)

Allen, what was the motivation for allowing random escapes in identifiers but not in keywords? AFAICS, it would be simpler and more consistent to allow them anywhere and render "escape normalisation" a uniform prepass before tokenisation. IIUC, that's what other languages do. The current ES rules are far from ideal, and require jumping through extra hoops, in particular, to handle context-dependent keywords like yield.

# Allen Wirfs-Brock (9 years ago)

On Nov 9, 2015, at 6:55 AM, Andreas Rossberg <rossberg at google.com> wrote:

Allen, what was the motivation for allowing random escapes in identifiers but not in keywords? AFAICS, it would be simpler and more consistent to allow them anywhere and render "escape normalisation" a uniform prepass before tokenisation. IIUC, that's what other languages do. The current ES rules are far from ideal, and require jumping through extra hoops, in particular, to handle context-dependent keywords like yield.

/Andreas

see:

Here are some references: tc39/tc39-notes/blob/master/es6/2013-11/nov-20.md#42-clarification-of-the-interaction-of-unicode-escapes-and-identification-syntax, tc39/tc39-notes/blob/master/es6/2013-11/nov-20.md#42-clarification-of-the-interaction-of-unicode-escapes-and-identification-syntax

ecmascript#277, ecmascript#277

esdiscuss.org/topic/fw-unicode-escape-sequences-for-keywords-what-s-the-correct-behaviour, esdiscuss.org/topic/fw-unicode-escape-sequences-for-keywords-what-s-the-correct-behaviour

esdiscuss.org/topic/this-vs-thi-u0073, esdiscuss.org/topic/this-vs-thi-u0073

there are many others, and also there were earlier TC39 meeting discussions that I didn’t find in my quick search.

It’s a usability vs. implementor convience trade-off. the TC39 was to go with usability (and in particular readability).

(Also, my recollection is that in some TC39 discussions (that I didn’t find in my search) there were security concerns raised WRT allowing unicode escapes in keywords. Probably concerns about code injection filters not recognizing escaped keywords)

In ES6 (and I believe that Waldemar would claim in previous editions) unicode escapes cannot be handled with such a prepass. Essentially, escaped and non-escaped IdentifierName characters are only equated when doing identifier binding or property name lookups. It’s probably a misperception of the lexical grammar and static semantics that leads some implementors down the path of thinking that such a preps is reasonable.

Regarding yield, if it is written containing unicode escapes it is never a contextual keyword.

BTW, personally I I would be just fine with never allowing unicode escapes within IdentiferName. But that would be a web breaking change.

# Isiah Meadows (9 years ago)

Is there a reason why escapes like that in the title shouldn't evaluate to keywords? To be honest, it's bad style, but I don't get how it would be breaking the Web.

# Coroutines (9 years ago)

On Mon, Nov 9, 2015 at 10:10 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:

Is there a reason why escapes like that in the title shouldn't evaluate to keywords? To be honest, it's bad style, but I don't get how it would be breaking the Web.

This. I do not understand this either.

If you were to write:

var x = "\u006eew";

it'd be obvious that you're referring to the codepoint in the source (written in ASCII or something ASCII-compatible) so that it doesn't get garbled in the editors of your fellow programmers who might not be using the same locale.

It's strange that you would do that in an identifier because in no one's editor would the editor replace it with the unicode character it refers to.

I've always thought JS identifiers were a little too liberal with what is allowed, and in that spirit I think it should continue - but Grumpy Cat[1] disapproves.

1: img2-2.timeinc.net/people/i/2014/pets/news/141124/grumpy-cat-800.jpg

# Caitlin Potter (9 years ago)

var x = “\u006eew”; has a different meaning in the grammar, as compared with var \u006eew = x;.

Since you can’t declare var new, there is no good reason why you should be able to declare var \u006eew;.

On a similar note, allowing escaped keywords to have the same semantic meaning as the keywords (which would break allowing them as Identifiers anyways), is a problem. A naive user-input sanitizer could cause a page to be defaced, or worse, arbitrary code evaluated on a web-server, in the case of an input like (f\u{75}nction() { /* somethingSneaky */ })();.

Given that, these seem like perfectly good reasons not to allow it in the language.

# Mark S. Miller (9 years ago)

On Mon, Nov 9, 2015 at 12:05 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>

wrote:

On Nov 9, 2015, at 6:55 AM, Andreas Rossberg <rossberg at google.com> wrote:

Allen, what was the motivation for allowing random escapes in identifiers but not in keywords? AFAICS, it would be simpler and more consistent to allow them anywhere and render "escape normalisation" a uniform prepass before tokenisation. IIUC, that's what other languages do. The current ES rules are far from ideal, and require jumping through extra hoops, in particular, to handle context-dependent keywords like yield.

/Andreas

see:

Here are some references:

tc39/tc39-notes/blob/master/es6/2013-11/nov-20.md#42-clarification-of-the-interaction-of-unicode-escapes-and-identification-syntax

ecmascript#277

esdiscuss.org/topic/fw-unicode-escape-sequences-for-keywords-what-s-the-correct-behaviour

esdiscuss.org/topic/this-vs-thi-u0073

there are many others, and also there were earlier TC39 meeting discussions that I didn’t find in my quick search.

It’s a usability vs. implementor convience trade-off. the TC39 was to go with usability (and in particular readability).

Yes.

(Also, my recollection is that in some TC39 discussions (that I didn’t find in my search) there were security concerns raised WRT allowing unicode escapes in keywords. Probably concerns about code injection filters not recognizing escaped keywords)

Yes.

This was extensively discussed. Consensus was reached and declared. Other tools were then built that rely on these decisions. Let us not change these.

TC39 decided this the right way. Even if they did not, the arguments against do not outweigh the arguments for stability and against spec thrashing this late in the game. It is way too late to revisit this if the counter-arguments are only moderate implementor inconvenience.

In ES6 (and I believe that Waldemar would claim in previous editions) unicode escapes cannot be handled with such a prepass. Essentially, escaped and non-escaped IdentifierName characters are only equated when doing identifier binding or property name lookups. It’s probably a misperception of the lexical grammar and static semantics that leads some implementors down the path of thinking that such a preps is reasonable.

Regarding yield, if it is written containing unicode escapes it is never a contextual keyword.

BTW, personally I I would be just fine with never allowing unicode escapes within IdentiferName. But that would be a web breaking change.

Yes and yes.

# Waldemar Horwat (9 years ago)

On 11/09/2015 10:10, Isiah Meadows wrote:

Is there a reason why escapes like that in the title shouldn't evaluate to keywords? To be honest, it's bad style, but I don't get how it would be breaking the Web.

Yes. We did it for the same reason that 0xC1 0xA2 is not a valid UTF-8 encoding of 'b'. Having multiple ways to encode keywords causes trouble.

 Waldemar
# Eric Suen (9 years ago)

In Spec it's clear that escaped reservedWords is not Identifier nor ReservedWord

In esdiscuss.org/topic/fw-unicode-escape-sequences-for-keywords-what-s-the-correct-behaviour you said it's keywords... I said in Jave/C# escaped keywords is keywords, you said JavaScript is not Java nor C#...

In code.google.com/p/google-caja/wiki/SecurityAdvisory20131121 "JavaScript parsers differ on whether they interpret escaped sequences of letters spelling a reserved word, such as "de\u006Cete", as an identifier or a reserved word." that may cause issue.

Till today still none Engine/Tool parse it correctly, Chrome/babel treat it as Identifier, IE 11 and Firefox 42.0 and esprima treat it as keywords.

It's confirm that escaped reservedWords is not Identifier. Can I have a final conclusion is it keywords or not?

# Caitlin Potter (9 years ago)

Per spec, it’s not a keyword