/\1/ could be a valid RegExp through Chapter 16 Extension clause?
On Jul 6, 2011, at 4:35 PM, Dave Fugate wrote:
var x = /\1/;
According to 15.10.2.11, the RegExp snippet above should throw something as there aren’t any capturing parenthesis within the RegExp, yet one is referenced. Just now noticed that step 4 of 15.10.2.9 is more precise and shows a SyntaxError gets thrown. Isn’t the snippet then potentially valid ES5 code through Chapter 16’s SyntaxError extension clause?
It isn't valid ES5 code but it valid as an implementation defined extension to ES5.
2011/7/6 Dave Fugate <dfugate at microsoft.com>:
var x = /\1/;
According to 15.10.2.11, the RegExp snippet above should throw something as there aren’t any capturing parenthesis within the RegExp, yet one is referenced. Just now noticed that step 4 of 15.10.2.9 is more precise and shows a SyntaxError gets thrown. Isn’t the snippet then potentially valid ES5 code through Chapter 16’s SyntaxError extension clause?
Yes, by the extension, and whether a <octal> is a backreference or an
octal escape sequence is determined by whether there are parseInt(<octal>, 10) capturing groups to the left of it in the
regular expression. So /\1(foo)\1/ matches the same language as /\u0001(foo)\1/ but does not match the same language as /\u0001(foo)\u0001/
On Thu, Jul 7, 2011 at 3:52 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
Yes, by the extension, and whether a <octal> is a backreference or an octal escape sequence is determined by whether there are parseInt(<octal>, 10) capturing groups to the left of it in the regular expression. So /\1(foo)\1/ matches the same language as /\u0001(foo)\1/
I don't think thats correct. The \1 is a valid DecimalEscape, its value is 1, which is not greater than NCapturingParens in 15.10.2.9 step 7 (NCapturingParens is defined globally for the pattern, not just to the left of the current escape). I.e., it is not a Syntax Error, so the \1 must be treated as a back-reference. It will always be to a non-participating capture, so the regexp is equivalent to /(foo)\1/ or just /(foo)foo/ but never to /\u0001(foo)foo/
2011/7/7 Lasse Reichstein <reichsteinatwork at gmail.com>:
On Thu, Jul 7, 2011 at 3:52 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
Yes, by the extension, and whether a <octal> is a backreference or an octal escape sequence is determined by whether there are parseInt(<octal>, 10) capturing groups to the left of it in the regular expression. So /\1(foo)\1/ matches the same language as /\u0001(foo)\1/
I don't think thats correct. The \1 is a valid DecimalEscape, its value is 1, which is not greater than NCapturingParens in 15.10.2.9 step 7 (NCapturingParens is defined globally for the pattern, not just to the left of the current escape). I.e., it is not a Syntax Error, so the \1 must be treated as a back-reference. It will always be to a non-participating capture, so the regexp is equivalent to /(foo)\1/ or just /(foo)foo/ but never to /\u0001(foo)foo/
/Lasse
I was wrong. You're right about the spec language of course and empirically, /\1(foo)\1/.test("\u0001foofoo") is true, but what I should have been testing is /^\1(foo)\1$/.test("\u0001foofoo") which is false on all the interpreters I have installed. I think the first test spuriously matches because group 1 is initialized empty at the point that it matches the first \1.
One way to tell whether the group initialized to empty works on an interpreter is to test /^(?:\1x(y)x){2}$/.test("xyxyxyx") which is true in most interpreters, but false in Rhino1.7 and Chrome12.
Interestingly other perl 5 interpreters
perl -e '$s = "xyxyxyx"; $m = scalar($s =~ /^(?:\1x(y)x){2}$/) ?
"true" : "false"; print "$m\n"'
yields false, as does the java
public class Foo {
public static void main(String... argv) {
System.out.println(java.util.regex.Pattern.compile(
"^(?:\\1x(y)x){2}\\z").matcher("xyxyxyx").matches());
}
}
The python import re re.match(r"^(?:\1x(y)x){2}$", "xyxyxyx") fails with sre_constants.error: bogus escape: '\1' but not if the \1 is after the capturing group.
CC'ing Gavin as he's been looking at RegExp compatibility in the real world vs. the spec recently.
On Thu, 07 Jul 2011 21:17:17 +0200, Mike Samuel <mikesamuel at gmail.com>
wrote:
One way to tell whether the group initialized to empty works on an interpreter is to test /^(?:\1x(y)x){2}$/.test("xyxyxyx") which is true in most interpreters, but false in Rhino1.7 and Chrome12.
I do believe it should be false. The captures are cleared for each
iteration
of a quantified atom (RepeatMatcher in section 15.10.2.5, step 4), so the
\1 will
always be non-participating (and match the empty string).
Interestingly other perl 5 interpreters
I don't think ES RegExps should count as a PCRE :)
On Jul 7, 2011, at 2:40 PM, Lasse Reichstein wrote:
On Thu, 07 Jul 2011 21:17:17 +0200, Mike Samuel <mikesamuel at gmail.com> wrote:
One way to tell whether the group initialized to empty works on an interpreter is to test /^(?:\1x(y)x){2}$/.test("xyxyxyx") which is true in most interpreters, but false in Rhino1.7 and Chrome12.
I do believe it should be false. The captures are cleared for each iteration of a quantified atom (RepeatMatcher in section 15.10.2.5, step 4), so the \1 will always be non-participating (and match the empty string).
Agreed.
Interestingly other perl 5 interpreters
I don't think ES RegExps should count as a PCRE :)
+1 (and I'm the guy who copied perl4 in the pre-ES3 cowpath treading exercise that led to the ES3 paving of RegExp's cowpath; I even apprised lwall of the plan, saying it would lead to an ISO standard -- he turned three shades of green ;-)
On Jul 7, 2011, at 2:40 PM, Lasse Reichstein wrote:
On Thu, 07 Jul 2011 21:17:17 +0200, Mike Samuel <
One way to tell whether the group initialized to empty works on an interpreter is to test /^(?:\1x(y)x){2}$/.test("xyxyxyx") which is true in most interpreters, but false in Rhino1.7 and Chrome12.
I do believe it should be false
Yep, this is our understanding too (and we agree with the same results).
G.
2011/7/7 Brendan Eich <brendan at mozilla.com>:
On Jul 7, 2011, at 2:40 PM, Lasse Reichstein wrote:
On Thu, 07 Jul 2011 21:17:17 +0200, Mike Samuel <mikesamuel at gmail.com> wrote:
One way to tell whether the group initialized to empty works on an interpreter is to test /^(?:\1x(y)x){2}$/.test("xyxyxyx") which is true in most interpreters, but false in Rhino1.7 and Chrome12.
I do believe it should be false. The captures are cleared for each iteration of a quantified atom (RepeatMatcher in section 15.10.2.5, step 4), so the \1 will always be non-participating (and match the empty string).
Agreed.
Would that mean that
/^(?:\1x(y)x){2}$/.test("xyxxyx") && !/^(?:\1x(y)x){2}$/.test("xyxyxyx")
If so, V8 agrees with that, the species of monkey in FF 5 does not, the JsCore in Safari 533.21 does not, and Rhino does.
On Jul 7, 2011, at 2:59 PM, Mike Samuel wrote:
Agreed.
Would that mean that
/^(?:\1x(y)x){2}$/.test("xyxxyx") && !/^(?:\1x(y)x){2}$/.test("xyxyxyx")
If so, V8 agrees with that, the species of monkey in FF 5 does not, the JsCore in Safari 533.21 does not, and Rhino does.
Yes. This is fixed for Safari in the WebKit nightly builds.
2011/7/7 Gavin Barraclough <barraclough at apple.com>:
On Jul 7, 2011, at 2:59 PM, Mike Samuel wrote:
Agreed.
Would that mean that
/^(?:\1x(y)x){2}$/.test("xyxxyx") && !/^(?:\1x(y)x){2}$/.test("xyxyxyx")
If so, V8 agrees with that, the species of monkey in FF 5 does not, the JsCore in Safari 533.21 does not, and Rhino does.
Yes. This is fixed for Safari in the WebKit nightly builds.
Great. Filed a bug against FF : bugzilla.mozilla.org/show_bug.cgi?id=670031
On Jul 7, 2011, at 3:27 PM, Gavin Barraclough wrote:
On Jul 7, 2011, at 2:59 PM, Mike Samuel wrote:
Agreed.
Would that mean that
/^(?:\1x(y)x){2}$/.test("xyxxyx") && !/^(?:\1x(y)x){2}$/.test("xyxyxyx")
If so, V8 agrees with that, the species of monkey in FF 5 does not, the JsCore in Safari 533.21 does not, and Rhino does.
Yes. This is fixed for Safari in the WebKit nightly builds.
Great -- to answer Mike's question, we use JSC's yarr in SpiderMonkey, so we'll pick this up soon too.
On Jul 7, 2011, at 3:34 PM, Mike Samuel wrote:
2011/7/7 Gavin Barraclough <barraclough at apple.com>:
On Jul 7, 2011, at 2:59 PM, Mike Samuel wrote:
Agreed.
Would that mean that
/^(?:\1x(y)x){2}$/.test("xyxxyx") && !/^(?:\1x(y)x){2}$/.test("xyxyxyx")
If so, V8 agrees with that, the species of monkey in FF 5 does not, the JsCore in Safari 533.21 does not, and Rhino does.
Yes. This is fixed for Safari in the WebKit nightly builds.
Great. Filed a bug against FF : bugzilla.mozilla.org/show_bug.cgi?id=670031
Already fixed for FF7. Fast releases FTW!
var x = /\1/;
According to 15.10.2.11, the RegExp snippet above should throw something as there aren't any capturing parenthesis within the RegExp, yet one is referenced. Just now noticed that step 4 of 15.10.2.9 is more precise and shows a SyntaxError gets thrown. Isn't the snippet then potentially valid ES5 code through Chapter 16's SyntaxError extension clause?