should calling RegExp constructor as function without arguments throw?
On Wed, 14 Jan 2009 14:13:13 +0100, Hallvord R. M. Steen <hallvord at opera.com> wrote:
Apologies if this has already been covered, I tried googling but found only tangentially related stuff about "/regexp/()" syntax.
There are a few parts of the regexp syntax that wouldn't mind a look-over.
My two primary pee-ve's are that look-aheads are Atoms, not Assertions, and that back-references to captures occuring later in the source, are valid.
The only difference between an Atom and an Assertion is that the former can have a quantifier attached. There is absolutely no reason to put a quantifier on a look-ahead, and look-aheads are zero-width matches just like all assertions, so they would fit much better as assertions. Changing the grammar to make look-aheads actual assertions wouldn't even require implementations to change. It would just change quantified look-aheads from being standard to being an extension, like so many other things in regexps already are. (The feature was only added to JSC recently - I'm guessing nobody had needed it).
The problem with back-references is that the requirement prevents a one-pass parser, because you need to scan the entire regexp to know whether a decimal escape is valid. Well, actually it wouldn't be a problem if you didn't want to be compatible with all the current implementations that treat invalid decimal escapes as octal escapes - so you need to know whether a given decimal sequence is a valid back-reference in order to parse it as octal if it isn't valid. At least IE6 actually limits the valid back-references to the captures that were started previous to the back-reference in the source. That's a reasonable approach from a parsing perspective (I'd be happy if that was what was required), but really you only need to be able to reference captures that can be completed at the point where they occour, i.e., where both the start and end parentheses of the capture being referenced occur prior to the back-reference in the source.
ES3 does not specify an error for this case as far as I can see, and
it does specify returning /(?:)/ (the empty-match regexp):
15.10.3 The RegExp Constructor Called as a Function
15.10.3.1 RegExp(pattern, flags)
If pattern is an object R whose [[Class]] property is "RegExp" and
flags is undefined, then return R unchanged. Otherwise call the RegExp
constructor (section 15.10.4.1), passing it the pattern and flags
arguments and return the object constructed by that constructor.
15.10.4 The RegExp Constructor
When RegExp is called as part of a new expression, it is a
constructor: it initialises the newly created object.
15.10.4.1 new RegExp(pattern, flags)
If pattern is an object R whose [[Class]] property is "RegExp" and
flags is undefined, then let P be the pattern used to construct R and
let F be the flags used to construct R. If pattern is an object R
whose [[Class]] property is "RegExp" and flags is not undefined, then
throw a TypeError exception. Otherwise, let P be the empty string if
pattern is undefined and ToString(pattern) otherwise, and let F be the
empty string if flags is undefined and ToString(flags) otherwise.
Opera throws an error if you do
var foo = RegExp();
("new RegExp()" is OK though). I've noticed other engines do not throw, so for compatibility reasons we should perhaps fix it. However, I'm inclined to think it might simply be a mistake if a script ever does this - the empty regexp object other engines return is after all pretty useless for anything, no? Hence, perhaps throwing makes sense? If so please consider standardising it. Apologies if this has already been covered, I tried
googling but found only tangentially related stuff about "/regexp/()"
syntax.