Should RegExp(regexp, flags) always return a functional RegExp for reasonable values of flags?

# Claude Pache (9 years ago)

Given a RegExp object rx and a string f that contains legal RegExp flag characters, should ideally the following expressions

RegExp(rx, f)
eval("/" + rx.source + "/" + f)

always return a functional regexp?

Practical example: rx = /\-/, and f === "u" (recall that \- is invalid in u-regexps but valid in non-u-regexps).

One may wish that RegExp(/\-/, "u") and eval("/" + /\-/.source + "/u") be both equivalent to /-/u.

Context: That question came up while I was thinking about a possible precise specification for RegExp.prototype.source.

Hi,

Given a RegExp object `rx` and a string `f` that contains legal RegExp flag characters, should ideally the following expressions

```js
RegExp(rx, f)
eval("/" + rx.source + "/" + f)
```

always return a functional regexp?

Practical example: `rx = /\-/`, and `f === "u"` (recall that `\-` is invalid in u-regexps but valid in non-u-regexps).

One may wish that `RegExp(/\-/, "u")` and `eval("/" + /\-/.source + "/u")` be both equivalent to `/-/u`.

Context: That question came up while I was thinking about a possible precise specification for `RegExp.prototype.source`.

—Claude

# Jordan Harband (9 years ago)

I'm not as sure about eval, but absolutely new RegExp(rx.source, rx.flags) should always imo reproduce a functionally equivalent regex.

I'm not as sure about `eval`, but absolutely `new RegExp(rx.source,
rx.flags)` should always imo reproduce a functionally equivalent regex.

On Thu, May 19, 2016 at 8:21 AM, Claude Pache <claude.pache at gmail.com>
wrote:

> Hi,
>
> Given a RegExp object `rx` and a string `f` that contains legal RegExp
> flag characters, should ideally the following expressions
>
> ```js
> RegExp(rx, f)
> eval("/" + rx.source + "/" + f)
> ```
>
> always return a functional regexp?
>
> Practical example: `rx = /\-/`, and `f === "u"` (recall that `\-` is
> invalid in u-regexps but valid in non-u-regexps).
>
> One may wish that `RegExp(/\-/, "u")` and `eval("/" + /\-/.source + "/u")`
> be both equivalent to `/-/u`.
>
> Context: That question came up while I was thinking about a possible
> precise specification for `RegExp.prototype.source`.
>
> —Claude
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160519/84827fd1/attachment.html>

# Claude Pache (9 years ago)

Le 19 mai 2016 à 17:54, Jordan Harband <ljharb at gmail.com> a écrit :

I'm not as sure about eval, but absolutely new RegExp(rx.source, rx.flags) should always imo reproduce a functionally equivalent regex.

Sure, but it doesn’t answer the question. I am concerned with, e.g., new RegExp(rx.source, "u") where rx.unicode is false, because u-regexp syntax is stricter.

> Le 19 mai 2016 à 17:54, Jordan Harband <ljharb at gmail.com> a écrit :
> 
> I'm not as sure about `eval`, but absolutely `new RegExp(rx.source, rx.flags)` should always imo reproduce a functionally equivalent regex.

Sure, but it doesn’t answer the question. I am concerned with, e.g., `new RegExp(rx.source, "u")` where `rx.unicode` is `false`, because u-regexp syntax is stricter.

—Claude

# Jordan Harband (9 years ago)

Ah - in that case, no, I would not necessarily expect that the source of a u-mode regex would produce a valid regex in another context without the "u" flag.

Ah - in that case, no, I would not necessarily expect that the source of a
u-mode regex would produce a valid regex in another context without the "u"
flag.

On Thu, May 19, 2016 at 10:18 AM, Claude Pache <claude.pache at gmail.com>
wrote:

>
> > Le 19 mai 2016 à 17:54, Jordan Harband <ljharb at gmail.com> a écrit :
> >
> > I'm not as sure about `eval`, but absolutely `new RegExp(rx.source,
> rx.flags)` should always imo reproduce a functionally equivalent regex.
>
> Sure, but it doesn’t answer the question. I am concerned with, e.g., `new
> RegExp(rx.source, "u")` where `rx.unicode` is `false`, because u-regexp
> syntax is stricter.
>
> —Claude
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160519/e67233ec/attachment.html>

# Claude Pache (9 years ago)

Thinking more about it: there is a fatal incompatibility:

/^\u{12345}$/u.test("\u{12345}") // true
/^\u{12345}$/.test("u".repeat(12345)) // true (annex b)

and

/^\u{1F4A9}$/u.test("\u{1F4A9}") // true
/^\u{1F4A9}$/.test("u{1F4A9}") // true (annex b)

Thinking more about it: there is a fatal incompatibility:

```js
/^\u{12345}$/u.test("\u{12345}") // true
/^\u{12345}$/.test("u".repeat(12345)) // true (annex b)
```
and

```js
/^\u{1F4A9}$/u.test("\u{1F4A9}") // true
/^\u{1F4A9}$/.test("u{1F4A9}") // true (annex b)
```

—Claude

> Le 19 mai 2016 à 19:38, Jordan Harband <ljharb at gmail.com> a écrit :
> 
> Ah - in that case, no, I would not necessarily expect that the source of a u-mode regex would produce a valid regex in another context without the "u" flag.
> 
> On Thu, May 19, 2016 at 10:18 AM, Claude Pache <claude.pache at gmail.com <mailto:claude.pache at gmail.com>> wrote:
> 
> > Le 19 mai 2016 à 17:54, Jordan Harband <ljharb at gmail.com <mailto:ljharb at gmail.com>> a écrit :
> >
> > I'm not as sure about `eval`, but absolutely `new RegExp(rx.source, rx.flags)` should always imo reproduce a functionally equivalent regex.
> 
> Sure, but it doesn’t answer the question. I am concerned with, e.g., `new RegExp(rx.source, "u")` where `rx.unicode` is `false`, because u-regexp syntax is stricter.
> 
> —Claude
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160519/416905f7/attachment.html>

# Allen Wirfs-Brock (9 years ago)

On May 19, 2016, at 11:21 AM, Claude Pache <claude.pache at gmail.com> wrote:

Hi,

Given a RegExp object rx and a string f that contains legal RegExp flag characters, should ideally the following expressions
RegExp(rx, f)

This will throw a syntax error if rx is a non-u RegExp object that uses syntax that is invalid in in a u-regexp and f; includes “u”.  Note that this form of the constructor rx.[[OriginalSource]]  as the source that is parsed (using the f flags). See https://tc39.github.io/ecma262/#sec-regexp-pattern-flags <https://tc39.github.io/ecma262/#sec-regexp-pattern-flags> 

```js
> eval("/" + rx.source + "/" + f)
> ```


As current spec’d this may or may not throw a syntax error depending upon the implementation.  See https://tc39.github.io/ecma262/#sec-escaperegexppattern <https://tc39.github.io/ecma262/#sec-escaperegexppattern> . The text that is returned is derived from rx.[[OriginalSource]] but in addition to added certain required escapes, the spec. text seems to allow deleting redundant escapes.

> always return a functional regex?

If either returns (rather than throwing the returned regexp must be functional.
> 
> Practical example: `rx = /\-/`, and `f === "u"` (recall that `\-` is invalid in u-regexps but valid in non-u-regexes).

throws using the constructor, may or may not throw using rx.source and eval
> 
> One may wish that `RegExp(/\-/, "u")` and `eval("/" + /\-/.source + "/u")` be both equivalent to `/-/u`.
 
The construct or definitely is not.

> Context: That question came up while I was thinking about a possible precise specification for `RegEx.prototype.source`.

This may be an unintentional spec. over-site.  Note that once upon a time, input was that implementation did not want to change their implementation depended escaping. Perhaps things are different now.

> On May 19, 2016, at 11:21 AM, Claude Pache <claude.pache at gmail.com> wrote:
> 
> Hi,
> 
> Given a RegExp object `rx` and a string `f` that contains legal RegExp flag characters, should ideally the following expressions
> 
> ```js
> RegExp(rx, f)
```
This will throw a syntax error if rx is a non-u RegExp object that uses syntax that is invalid in in a u-regexp and f; includes “u”.  Note that this form of the constructor rx.[[OriginalSource]]  as the source that is parsed (using the f flags). See https://tc39.github.io/ecma262/#sec-regexp-pattern-flags <https://tc39.github.io/ecma262/#sec-regexp-pattern-flags> 

```js
> eval("/" + rx.source + "/" + f)
> ```

As current spec’d this may or may not throw a syntax error depending upon the implementation.  See https://tc39.github.io/ecma262/#sec-escaperegexppattern <https://tc39.github.io/ecma262/#sec-escaperegexppattern> . The text that is returned is derived from rx.[[OriginalSource]] but in addition to added certain required escapes, the spec. text seems to allow deleting redundant escapes.

> always return a functional regex?

If either returns (rather than throwing the returned regexp must be functional.
> 
> Practical example: `rx = /\-/`, and `f === "u"` (recall that `\-` is invalid in u-regexps but valid in non-u-regexes).

throws using the constructor, may or may not throw using rx.source and eval
> 
> One may wish that `RegExp(/\-/, "u")` and `eval("/" + /\-/.source + "/u")` be both equivalent to `/-/u`.

The construct or definitely is not.

> Context: That question came up while I was thinking about a possible precise specification for `RegEx.prototype.source`.

This may be an unintentional spec. over-site.  Note that once upon a time, input was that implementation did not want to change their implementation depended escaping. Perhaps things are different now.

Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20160519/4816cd80/attachment.html>