Using Unicode code point escape sequences in regular expressions without the `u` flag

# Mathias Bynens (12 years ago)

If I’m reading the latest draft correctly, RegExpUnicodeEscapeSequences aren’t allowed in regular expressions without the u flag. Why is that?

AFAICT, the only situations that require looking at code points rather than UCS-2/UTF-16 code units in order to support full Unicode are:

  • the regex is case-insensitive;
  • the regex contains a character class;
  • the regex uses .;
  • the regex uses a quantifier.

I’d suggest allowing \u{xxxxxx}-style escape sequences everywhere, and simply changing the behavior of the resulting regular expression depending

# Erik Arvidsson (12 years ago)

On Thu, Nov 21, 2013 at 2:41 PM, Mathias Bynens <mathias at qiwi.be> wrote:

I’d suggest allowing \u{xxxxxx}-style escape sequences everywhere, and simply changing the behavior of the resulting regular expression depending on the u flag. There’s no good reason to disallow e.g. /\u{20}/ or even /\u{1F4A9}/.

That would unfortunately not be backwards compatible since /\u{123}/ is a valid RegExp in ES5.1.

# Mathias Bynens (12 years ago)

Ah, doh! I was thinking in terms of strings: modern engines throw errors for things like '\u{123}'. Thanks for the explanation.

# Mathias Bynens (12 years ago)

One more related question: are these three regular expression literals equivalent?

  1. /[💩-💫]/u: raw astral symbols
  2. /[\u{1F4A9}-\u{1F4AB}]/u: astral symbols represented using Unicode code point escape sequences
  3. /[\uD83D\uDCA9-\uD83D\uDCAB]/u: astral symbols represented as a surrogate pair
# Allen Wirfs-Brock (12 years ago)

Did you check the ES6 draft grammar? The answer to that should be fairly obvious there and if it isn't it would be good to know so we can try to make it clearer in the spec.

# Mathias Bynens (12 years ago)

It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to (1) and (2) because of the following:

RegExpUnicodeEscapeSequence[U] ::
    [+U] LeadSurrogate \u TrailSurrogate

…but I was looking for confirmation.

# Allen Wirfs-Brock (12 years ago)

yes that's what the above production says. Also see the semantics for that production in people.mozilla.org/~jorendorff/es6-draft.html#sec-characterescape