Backwards compatibility and U+2E2F in `Identifier`s
On 19 Aug 2013, at 11:25, Mathias Bynens <mathias at qiwi.be> wrote:
After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in
IdentifierStart
andIdentifierPart
, but ECMAScript 6 / Unicode TR31 doesn’t.Was this potentially breaking change intentional? I’m fine with disallowing U+2E2F, but only if we’re sure it doesn’t break any existing code.
Follow-up: since this thread is being ignored, I filed ecmascript#1802.
Thanks for filing. I don't recall any reason for this and it seems bad to break compatibility.
It may be that Norbert and Allen just missed your post on Monday; cc'ing them.
I had no intentions specific to U+2E2F when I proposed relying on UTR 31 - the change is simply the effect of the character properties that the Unicode Technical Committee assigned to this character.
I don't think there's a real problem. U+2E2F was added in Unicode version 5.1. ECMAScript 5.1 requires only support for Unicode 3.0, and warns "If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0" (section 7.6). IE 10 throws a SyntaxError if the character is used in an identifier.
BTW, if that's the only difference between the regular expressions for ES 5.1 and ES 6, then at least one of them is wrong - ES 6 allows supplementary characters in identifiers, while ES 5.1 doesn't.
Norbert
I had no intentions specific to U+2E2F when I proposed relying on UTR 31 - the change is simply the effect of the character properties that the Unicode Technical Committee assigned to this character.
I don't think there's a real problem. U+2E2F was added in Unicode version 5.1. ECMAScript 5.1 requires only support for Unicode 3.0, and warns "If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0" (section 7.6). IE 10 throws a SyntaxError if the character is used in an identifier.
BTW, if that's the only difference between the regular expressions for ES 5.1 and ES 6, then at least one of them is wrong - ES 6 allows supplementary characters in identifiers, while ES 5.1 doesn't.
It’s the only difference in the BMP range. (Like you said, differences in the astral range are to be expected, since astral symbols weren’t allowed in ES5. No back-compat issues there.)
On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens <mathias at qiwi.be> wrote:
After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in
IdentifierStart
andIdentifierPart
, but ECMAScript 6 / Unicode TR31 doesn’t.
Per ES6 identifiers start with code points whose category is ID_Start which per www.unicode.org/reports/tr31 includes Lm which per www.unicode.org/Public/UNIDATA/UnicodeData.txt is true for U+2E2F. So why exactly is it disallowed?
Unless I'm missing something, the discussion we had in TC39 yesterday was moot, and these bugs are invalid:
ecmascript#1802, bugzilla.mozilla.org/show_bug.cgi?id=917436, bugs.webkit.org/show_bug.cgi?id=121541, code.google.com/p/v8/issues/detail?id=2892
And IE10 and maybe IE11 have a bug in not allowing it.
On 18 Sep 2013, at 21:05, Anne van Kesteren <annevk at annevk.nl> wrote:
On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens <mathias at qiwi.be> wrote:
After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in
IdentifierStart
andIdentifierPart
, but ECMAScript 6 / Unicode TR31 doesn’t.Per ES6 identifiers start with code points whose category is ID_Start which per www.unicode.org/reports/tr31 includes Lm which per www.unicode.org/Public/UNIDATA/UnicodeData.txt is true for U+2E2F. So why exactly is it disallowed?
ID_Start
includes code points in the Lm
category indeed, but then later explicitly disallows Pattern_Syntax
and Pattern_White_Space
code points. As it says on the page you linked to:
In set notation, this is [[:L:][:Nl:]--[:Pattern_Syntax:]--[:Pattern_White_Space:]] plus stability extensions.
U+2E2F has the Pattern_Syntax
property and is thus not a valid ID_Start
code point.
I wrote a (new) script that generates a regular expression that matches valid JavaScript identifiers as per ECMAScript 5.1 / Unicode v6.2.0. mathiasbynens.be/demo/javascript-identifier-regex
Then, I made it do the same thing according to the latest ECMAScript 6 draft, which refers to Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax (www.unicode.org/reports/tr31).
After comparing the output, I noticed that both regular expressions are identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL TILDE in
IdentifierStart
andIdentifierPart
, but ECMAScript 6 / Unicode TR31 doesn’t.Was this potentially breaking change intentional? I’m fine with disallowing U+2E2F, but only if we’re sure it doesn’t break any existing code.