should calling RegExp constructor as function without arguments throw?

# Hallvord R. M. Steen (17 years ago)

Opera throws an error if you do

var foo = RegExp();

("new RegExp()" is OK though). I've noticed other engines do not throw, so for compatibility reasons we should perhaps fix it. However, I'm inclined to think it might simply be a mistake if a script ever does this - the empty regexp object other engines return is after all pretty useless for anything, no? Hence, perhaps throwing makes sense? If so please consider standardising it. Apologies if this has already been covered, I tried
googling but found only tangentially related stuff about "/regexp/()"
syntax.

Hi,
Opera throws an error if you do

var foo = RegExp();

("new RegExp()" is OK though). I've noticed other engines do not throw, so
for compatibility reasons we should perhaps fix it. However, I'm inclined
to think it might simply be a mistake if a script ever does this - the
empty regexp object other engines return is after all pretty useless for
anything, no? Hence, perhaps throwing makes sense? If so please consider
standardising it. Apologies if this has already been covered, I tried  
googling but found only tangentially related stuff about "/regexp/()"  
syntax.

-- 
Hallvord R. M. Steen
Core JavaScript tester, Opera Software
http://www.opera.com/
Opera - simply the best Internet experience

# Lasse R.H. Nielsen (17 years ago)

On Wed, 14 Jan 2009 14:13:13 +0100, Hallvord R. M. Steen <hallvord at opera.com> wrote:

Apologies if this has already been covered, I tried googling but found only tangentially related stuff about "/regexp/()" syntax.

There are a few parts of the regexp syntax that wouldn't mind a look-over.

My two primary pee-ve's are that look-aheads are Atoms, not Assertions, and that back-references to captures occuring later in the source, are valid.

The only difference between an Atom and an Assertion is that the former can have a quantifier attached. There is absolutely no reason to put a quantifier on a look-ahead, and look-aheads are zero-width matches just like all assertions, so they would fit much better as assertions. Changing the grammar to make look-aheads actual assertions wouldn't even require implementations to change. It would just change quantified look-aheads from being standard to being an extension, like so many other things in regexps already are. (The feature was only added to JSC recently - I'm guessing nobody had needed it).

The problem with back-references is that the requirement prevents a one-pass parser, because you need to scan the entire regexp to know whether a decimal escape is valid. Well, actually it wouldn't be a problem if you didn't want to be compatible with all the current implementations that treat invalid decimal escapes as octal escapes - so you need to know whether a given decimal sequence is a valid back-reference in order to parse it as octal if it isn't valid. At least IE6 actually limits the valid back-references to the captures that were started previous to the back-reference in the source. That's a reasonable approach from a parsing perspective (I'd be happy if that was what was required), but really you only need to be able to reference captures that can be completed at the point where they occour, i.e., where both the start and end parentheses of the capture being referenced occur prior to the back-reference in the source.

On Wed, 14 Jan 2009 14:13:13 +0100, Hallvord R. M. Steen <hallvord at opera.com> wrote:

> Apologies if this has already been covered, I tried
> googling but found only tangentially related stuff about "/regexp/()"
> syntax.

There are a few parts of the regexp syntax that wouldn't mind a look-over.

My two primary pee-ve's are that look-aheads are Atoms, not Assertions,
and that back-references to captures occuring later in the source, are 
valid. 

The only difference between an Atom and an Assertion is that the former
can have a quantifier attached. There is absolutely no reason to put a
quantifier on a look-ahead, and look-aheads are zero-width matches just
like all assertions, so they would fit much better as assertions.
Changing the grammar to make look-aheads actual assertions wouldn't even
require implementations to change. It would just change quantified
look-aheads from being standard to being an extension, like so many
other things in regexps already are. (The feature was only added to 
JSC recently - I'm guessing nobody had needed it).

The problem with back-references is that the requirement prevents
a one-pass parser, because you need to scan the entire regexp to
know whether a decimal escape is valid. Well, actually it wouldn't 
be a problem if you didn't want to be compatible with all the 
current implementations that treat invalid decimal escapes as 
octal escapes - so you need to know whether a given decimal sequence
is a valid back-reference in order to parse it as octal if it isn't
valid.
At least IE6 actually limits the valid back-references to the
captures that were started previous to the back-reference in the
source. That's a reasonable approach from a parsing perspective
(I'd be happy if that was what was required), but really you only 
need to be able to reference captures that can be completed at the 
point where they occour, i.e., where both the start and end parentheses 
of the capture being referenced occur prior to the back-reference in 
the source.

/L 
-- 
Lasse R.H. Nielsen
Speaking only for myself ... if even that.
'Faith without judgement merely degrades the spirit divine'

# Brendan Eich (17 years ago)

ES3 does not specify an error for this case as far as I can see, and
it does specify returning /(?:)/ (the empty-match regexp):

15.10.3 The RegExp Constructor Called as a Function 15.10.3.1 RegExp(pattern, flags) If pattern is an object R whose [[Class]] property is "RegExp" and
flags is undefined, then return R unchanged. Otherwise call the RegExp
constructor (section 15.10.4.1), passing it the pattern and flags
arguments and return the object constructed by that constructor.

15.10.4 The RegExp Constructor When RegExp is called as part of a new expression, it is a
constructor: it initialises the newly created object.

15.10.4.1 new RegExp(pattern, flags) If pattern is an object R whose [[Class]] property is "RegExp" and
flags is undefined, then let P be the pattern used to construct R and
let F be the flags used to construct R. If pattern is an object R
whose [[Class]] property is "RegExp" and flags is not undefined, then
throw a TypeError exception. Otherwise, let P be the empty string if
pattern is undefined and ToString(pattern) otherwise, and let F be the
empty string if flags is undefined and ToString(flags) otherwise.

ES3 does not specify an error for this case as far as I can see, and  
it does specify returning /(?:)/ (the empty-match regexp):

15.10.3 The RegExp Constructor Called as a Function
15.10.3.1 RegExp(pattern, flags)
If pattern is an object R whose [[Class]] property is "RegExp" and  
flags is undefined, then return R unchanged. Otherwise call the RegExp  
constructor (section 15.10.4.1), passing it the pattern and flags  
arguments and return the object constructed by that constructor.

15.10.4 The RegExp Constructor
When RegExp is called as part of a new expression, it is a  
constructor: it initialises the newly created object.

15.10.4.1 new RegExp(pattern, flags)
If pattern is an object R whose [[Class]] property is "RegExp" and  
flags is undefined, then let P be the pattern used to construct R and  
let F be the flags used to construct R. If pattern is an object R  
whose [[Class]] property is "RegExp" and flags is not undefined, then  
throw a TypeError exception. Otherwise, let P be the empty string if  
pattern is undefined and ToString(pattern) otherwise, and let F be the  
empty string if flags is undefined and ToString(flags) otherwise.

/be

On Jan 14, 2009, at 5:13 AM, Hallvord R. M. Steen wrote:

> Hi,
> Opera throws an error if you do
>
> var foo = RegExp();
>
> ("new RegExp()" is OK though). I've noticed other engines do not  
> throw, so
> for compatibility reasons we should perhaps fix it. However, I'm  
> inclined
> to think it might simply be a mistake if a script ever does this - the
> empty regexp object other engines return is after all pretty useless  
> for
> anything, no? Hence, perhaps throwing makes sense? If so please  
> consider
> standardising it. Apologies if this has already been covered, I  
> tried googling but found only tangentially related stuff about "/ 
> regexp/()" syntax.
>
> -- 
> Hallvord R. M. Steen
> Core JavaScript tester, Opera Software
> http://www.opera.com/
> Opera - simply the best Internet experience
> _______________________________________________
> Es-discuss mailing list
> Es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20090114/87284dc0/attachment.html>