Capturing groups with a quantifier in look-behind assertions should capture the leftmost substring matched by that group or the rightmost one?

# ziyunfei (9 years ago)

$ d8 --harmony-regexp-lookbehind -e '"123".match(/(?<=(.){3})/);print(RegExp.$1)' 1

$ perl -e '"123" =~ /(?<=(.){3})/;print $1' 3

Currently, V8's implementation is storing the leftmost substring in $1(and \1) which surprised me a bit.

Also note that in .Net, you can get all captured substrings by that capturing group using the .Captures property msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx , in this case it would be [3, 2, 1] (the order is from right to left) .

# Claude Pache (9 years ago)

Le 9 janv. 2016 à 09:28, ziyunfei <446240525 at qq.com> a écrit :

$ d8 --harmony-regexp-lookbehind -e '"123".match(/(?<=(.){3})/);print(RegExp.$1)' 1

$ perl -e '"123" =~ /(?<=(.){3})/;print $1' 3

Currently, V8's implementation is storing the leftmost substring in $1(and \1) which surprised me a bit.

This is a consequence of lookbehind being implemented in V8 as traversing the string in reverse order (contrarily to Perl), and of the general rule of returning the last matched substring. I don't think it is worth to complicate the algorithm in order to "correct" that behaviour, because, for me, it is intrinsically ambiguous what match $1 should refer to.

# Yang Guo (9 years ago)

Please note that RegExp.$1 is not part of the spec. The implementation in V8 is done in a way to mirror .Net as much as possible. Ignoring .Captures property that has no equivalent in Javascript, capturing the left-most sub-match inside a lookbehind is what .Net does.

Yang

# Andrea Giammarchi (9 years ago)

FWIF RegExp.$1 and others are de-facto standard and removing them would break the Web and much more.

I'm not sure how these would affect a lookbehind proposal but I these cannot be exclude from the list of possible gotchas.

Best

# Yang Guo (9 years ago)

I'm not even sure why RegExp.$1 is mentioned here. The submatches can be observed just fine as part of the match result. And I don't think it's a "gotcha" if it's reflected in the spec. And it is in the current draft afaict.

Yang

# Andrea Giammarchi (9 years ago)

Yeah, sorry for the noise but this part confused me too

Please note that RegExp.$1 is not part of the spec

All good then, .

# Simon Pieters (9 years ago)

On Fri, 15 Jan 2016 16:49:13 +0100, Andrea Giammarchi
<andrea.giammarchi at gmail.com> wrote:

FWIF RegExp.$1 and others are de-facto standard and removing them would break the Web and much more.

Indeed. These are currently "specified" at
javascript.spec.whatwg.org/#regexp.$n

I think it would be good to have these defined in the ES spec proper.

# Claude Pache (9 years ago)

Le 19 janv. 2016 à 08:55, Simon Pieters <simonp at opera.com> a écrit :

On Fri, 15 Jan 2016 16:49:13 +0100, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

FWIF RegExp.$1 and others are de-facto standard and removing them would break the Web and much more.

Indeed. These are currently "specified" at javascript.spec.whatwg.org/#regexp.$n

I think it would be good to have these defined in the ES spec proper.

See: tc39/ecma262#137, tc39/ecma262#137

But that wasn't the object of this thread.

# ziyunfei (9 years ago)

$ d8 --harmony-regexp-lookbehind -e '"123".match(/(?<=(.){3})/);print(RegExp.$1)' 1

$ perl -e '"123" =~ /(?<=(.){3})/;print $1' 3

Currently, V8's implementation is storing the leftmost substring in $1(and \1) which surprised me a bit.

Also note that in .Net, you can get all captured substrings by that capturing group using the .Captures property msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx , in this case it would be [3, 2, 1] (the order is from right to left) .