RegExp `x` flag

# Jacob Pratt (6 years ago)

Has there been any previous discussion of adding the x flag to JS? It exists in other languages, and can make relatively complicated regex much easier to read. It also allows for comments, which are incredibly helpful when trying to understand some regexes.

For prior art, XRegExp has this flag (though I've no idea to figure out how frequently it's used), as do a few other languages.

Quick overview: www.regular-expressions.info/freespacing.html

Language references: Python: docs.python.org/3/library/re.html#re.X Rust: docs.rs/regex/1.1.6/regex XRegExp: xregexp.com/xregexp/flags/#extended .NET: docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#regular-expression-options

Jacob Pratt

# Isiah Meadows (6 years ago)

I would personally love this (as well as interpolations in regexp literals). I do have a concern about whether removing the newline restriction creates ambiguities with division, but I suspect this is not the case.


Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com

# Isiah Meadows (6 years ago)

Let me clarify that previous message: I mean "newline restriction" in the sense that newlines are not permitted in regexp literals. A /x flag would make removing it practically required for it to have any utility.


Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com

# Jacob Pratt (6 years ago)

Even if this flag were restricted to constructors instead of both constructors and literals, it could be worthwhile.

# kai zhu (6 years ago)
  1. is this minifier-friendly?
  2. is parsing-impact minimal enough to not affect load-times? regexp-detection/bounding is among the most expensive/complex part of javascript-parsing.

those 2 nits aside, i'm on the fence. regexp-spaghetti is a valid painpoint, and jslint's author has expressed desire for multiline regexp [1]. otoh, there is a good-enough solution by falling-back to constructor-form to improve readability:

// real-world spaghetti-regexp code in jslint.js
const rx_token =
/^((\s+)|([a-zA-Z_$][a-zA-Z0-9_$]*)|[(){}\[\],:;'"~`]|\?\.?|=(?:==?|>)?|\.+|[*\/][*\/=]?|\+[=+]?|-[=\-]?|[\^%]=?|&[&=]?|\|[|=]?|>{1,3}=?|<<?=?|!(?:!|==?)?|(0|[1-9][0-9]*))(.*)$/;

// vs

/*
 * break JSON.stringify(rx_token.source)
 * into multiline constructor-form for readability
 */
const rx_token = new RegExp(
    "^("
    + "(\\s+)"
    + "|([a-zA-Z_$][a-zA-Z0-9_$]*)"
    + "|[(){}\\[\\],:;'\"~`]"
    + "|\\?\\.?"
    + "|=(?:==?|>)?"
    + "|\\.+"
    + "|[*\\/][*\\/=]?"
    + "|\\+[=+]?"
    + "|-[=\\-]?"
    + "|[\\^%]=?"
    + "|&[&=]?"
    + "|\\|[|=]?"
    + "|>{1,3}=?"
    + "|<<?=?"
    + "|!(?:!|==?)?"
    + "|(0|[1-9][0-9]*)"
    + ")(.*)$"
);

[1] github jslint-issue #231 - ignore long regexp's (and comments) douglascrockford/JSLint#231

# Jacob Pratt (6 years ago)

With to a minifier, I see no reason it couldn't compact it to the regex we have today. After all, the only changes are the addition of whitespace and comments.

I can't speak as to parse time, though in production that would large be removed by the aforementioned minification.

String concatenation certainly works, but then any escapes have to be doubly so, else you use String.raw on template literals in every situation, quickly cluttering things back up.

# Isiah Meadows (6 years ago)
  1. Very. Just strip the extra whitespace and replace it with the non-/x version.
  2. Whitespace is negligible in parsing performance, and regexps have a fairly simple grammar to begin with. (It can be done with a single character of lookahead easily and the only thing that can nest more than a single level is parentheses.) 90% of the actual time spent on them is on compilation and /x would have zero effect on that.

The issue of detection is actually pretty trivial: a / is assumed to be division any time you can continue an expression, and regexps are only consumed when no binary operator could potentially be expected. It's a rather obscure edge case often left out of ASI posts, one I've yet to even hear about being used, although I could contemplate it being used in code bases which use cond && foo() instead of if (cond) foo() and cond || foo() instead of if (!cond) foo().

new RegExp(multilineString) is a valid fallback, something I already use today quite a bit, but I'd prefer to use one or the other consistently for static regexps.


Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com