Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

# Claude Pache (9 years ago)

About the bad old nonstandard RegExp functionalities:

  • RegExp.prototype.compile() — currently in Annex B;
  • RegExp.$1, RegExp.leftContext, etc. — currently unspecced.

Although we could probably not get rid of them for plain regexps, I think it is a good idea to disable them for proper subclasses of RegExp (e.g., class MyRegExp extends RegExp {}).

Basically, the reason is that these methods break encapsulation (they operate on the raw regexp), leading to potential bugs when using them. Moreover RegExp.$1 and friends have the additional troublesome property of relying on a global state that could be unexpectedly modified. For concrete examples of what could go wrong, see the subclass-restriction-motivation.md subpage in the proposal linked below.

Here is a link to a possible specification for the regexp statics (RegExp.$1, ...), that includes the disabling of bad legacy features for proper subclasses of RegExp (and for some edge-cases around cross-realm interactions):

 https://github.com/claudepache/es-regexp-legacy-static-properties <https://github.com/claudepache/es-regexp-legacy-static-properties>

What do you think? Are implementations willing to try?

# Andrea Giammarchi (9 years ago)

My 2 cents.

I always had the feeling people complaining about RegExp.$1 and friends never really used them.

For instance, your example:

/(a)/.exec('a')
Object.keys(bar)
RegExp.$1

might have side effects but it's also made up and, I believe, not a real-world concern. If you use re.exec you address it, otherwise you go re.test while if you use re.test you (or at least me) are aware of possible side effects and will use the RegExp.$1 or other property instantly after.

Following just a simple example:

if (/^item-(\d+)$/.test(id)) {
  let num = parseInt(RegExp.$1, 10);
  // the rest of the code
}

I'm not saying these properties are cool, always safe, or anything, I'm just saying there are few useful cases for them.

However, while I agree the problem is that these modify the globally shared constructor, I also think that having these magic properties available in a subclass only, would be probably the key to solve pretty much all problems described in that page.

reg.exp = class extends RegExp {};
function reg(source, ...flags) {
  return typeof source === 'string' ?
    new reg.exp(...[source, ...flags]) :
    new reg.exp(source.source, source.flags);
}

if (reg(/^item-(\d+)$/).test(id)) {
  let num = parseInt(reg.exp.$1, 10);
  // the rest of the code

  RegExp.$1 === reg.exp.$1; // false
  // the extended RegExp didn't modify
  // the global RegExp
}

Of course this would still suffer same, or very similar, problems in case re.class is exported as module and consumed by many different libraries, but I would be surprised if subclassing RegExp will create a limited subset or, at least, I wouldn't call that an extend.

After all, if "safe" is the main concern, writing class SRegExp extends RegExp {} on top of a module that uses RegExp in various ways doesn't look like a big piece of extra code to write, it's like a function declaration and it will make your code immune from global RegExp gotchas.

Or ... doesn't it?

Best

# Claude Pache (9 years ago)

Le 27 avr. 2016 à 06:50, Andrea Giammarchi <andrea.giammarchi at gmail.com> a écrit :

My 2 cents.

I always had the feeling people complaining about RegExp.$1 and friends never really used them.

For instance, your example:

/(a)/.exec('a')
Object.keys(bar)
RegExp.$1

might have side effects but it's also made up and, I believe, not a real-world concern. If you use re.exec you address it, otherwise you go re.test while if you use re.test you (or at least me) are aware of possible side effects and will use the RegExp.$1 or other property instantly after.

Following just a simple example:

if (/^item-(\d+)$/.test(id)) {
  let num = parseInt(RegExp.$1, 10);
  // the rest of the code
}

I'm not saying these properties are cool, always safe, or anything, I'm just saying there are few useful cases for them.

An important point in my made-up example with Object.keys(), is that such an apparently innocuous function call might be hidden deep in the subclass implementation. That was an illustration of a way (among others) how a user-defined subclass could make the built-in RegExp.$1 feature brittle or buggy. (And no, I haven't taken time to find a realistic example, and I won't.)

However, while I agree the problem is that these modify the globally shared constructor, I also think that having these magic properties available in a subclass only, would be probably the key to solve pretty much all problems described in that page.

reg.exp = class extends RegExp {};
function reg(source, ...flags) {
  return typeof source === 'string' ?
    new reg.exp(...[source, ...flags]) :
    new reg.exp(source.source, source.flags);
}

if (reg(/^item-(\d+)$/).test(id)) {
  let num = parseInt(reg.exp.$1, 10);
  // the rest of the code

  RegExp.$1 === reg.exp.$1; // false
  // the extended RegExp didn't modify
  // the global RegExp
}

Of course this would still suffer same, or very similar, problems in case re.class is exported as module and consumed by many different libraries, but I would be surprised if subclassing RegExp will create a limited subset or, at least, I wouldn't call that an extend.

Of course, subclasses of RegExp are free to define their own reg.exp.$1 static properties, insomuch that they are free to implement features that other people judge horrific.

In any case, it might be good that reg.exp.$1 does not inherit from RegExp.$1 by default (as it is currently the case), in case the semantics and implementation of reg.exp renders the value of the inherited property misleading.

After all, if "safe" is the main concern, writing class SRegExp extends RegExp {} on top of a module that uses RegExp in various ways doesn't look like a big piece of extra code to write, it's like a function declaration and it will make your code immune from global RegExp gotchas.

Or ... doesn't it?

I think that safety is not the main concern here. I mean, safety is better served by completely removing RegExp.$1 and friends, including for plain regexps, which, although it couldn't be achieved by default, can be made possible by leaving to the user the ability to remove completely the API (e.g., by making RegExp.$1 configurable and deletable) and by limiting cross-realm leaks. I.e., by rendering the entire environment safe instead of leaving that task to individual modules.

# Andrea Giammarchi (9 years ago)

What I am saying is that class MyRE extends RegExp {} should not inherit any magic property from global RegExp but also should not pollute the global RegExp when used.

Right now, the code I've tested on Chrome, suffers pollution either ways: MyRE.$1 is RegExp.$1 and vice versa, even if I'm using new MyRE() when testing or matching my own regular expressions.

Having same features improved on the global pollution side looks like a better win than removing features one might or not like.

Best

# Mark S. Miller (9 years ago)

Sorry, none of these examples are persuasive. Such non-local causality is a nightmare on many grounds. If we did want to provide equivalent functionality somehow, we should find a non-stupid way to package it.

I like Claude's proposal as stated: Annex B normative optional, configurable and actually deletable, no contamination of cross-realm or subclasses. The only reason to admit even that is legacy. Claude's proposal is a clean proposal that accommodates legacy while minimizing further impact.

# Claude Pache (9 years ago)

Le 27 avr. 2016 à 15:46, Andrea Giammarchi <andrea.giammarchi at gmail.com> a écrit :

What I am saying is that class MyRE extends RegExp {} should not inherit any magic property from global RegExp but also should not pollute the global RegExp when used.

Right now, the code I've tested on Chrome, suffers pollution either ways: MyRE.$1 is RegExp.$1 and vice versa, even if I'm using new MyRE() when testing or matching my own regular expressions.

Having same features improved on the global pollution side looks like a better win than removing features one might or not like.

Best

Right. I have just amended my proposal so that MyRE.$1 is undefined by default.

In the same time, I have simplified the semantics: more precisely, I have removed attempts to be smart that resulted in being more risky w.r.t backward compatibility. It is more important to have something that is agreed to be implemented.