Using Unicode code point escape sequences in regular expressions without the `u` flag
On Thu, Nov 21, 2013 at 2:41 PM, Mathias Bynens <mathias at qiwi.be> wrote:
I’d suggest allowing
\u{xxxxxx}
-style escape sequences everywhere, and simply changing the behavior of the resulting regular expression depending on theu
flag. There’s no good reason to disallow e.g./\u{20}/
or even/\u{1F4A9}/
.
That would unfortunately not be backwards compatible since /\u{123}/ is a valid RegExp in ES5.1.
Ah, doh! I was thinking in terms of strings: modern engines throw errors for things like '\u{123}'
. Thanks for the explanation.
One more related question: are these three regular expression literals equivalent?
/[💩-💫]/u
: raw astral symbols/[\u{1F4A9}-\u{1F4AB}]/u
: astral symbols represented using Unicode code point escape sequences/[\uD83D\uDCA9-\uD83D\uDCAB]/u
: astral symbols represented as a surrogate pair
Did you check the ES6 draft grammar? The answer to that should be fairly obvious there and if it isn't it would be good to know so we can try to make it clearer in the spec.
It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to (1) and (2) because of the following:
RegExpUnicodeEscapeSequence[U] ::
[+U] LeadSurrogate \u TrailSurrogate
…but I was looking for confirmation.
yes that's what the above production says. Also see the semantics for that production in people.mozilla.org/~jorendorff/es6-draft.html#sec-characterescape
If I’m reading the latest draft correctly,
RegExpUnicodeEscapeSequence
s aren’t allowed in regular expressions without theu
flag. Why is that?AFAICT, the only situations that require looking at code points rather than UCS-2/UTF-16 code units in order to support full Unicode are:
.
;I’d suggest allowing
\u{xxxxxx}
-style escape sequences everywhere, and simply changing the behavior of the resulting regular expression depending