Inline regexps and caching

# Laurens Holst (17 years ago)

I and a colleague were puzzled by some strange behaviour in Firefox, we found that in some browsers literal regular expressions are cached and reused. Testcase:

function test(str) { var regexp = /^[^d]\bd{1,4}\b[^d]$/g; alert('Expexted: 0/true, result: ' + regexp.lastIndex + '/' + regexp.test(str) ); }

var xxx = "MM/dd/yyyy"; test(xxx); test(xxx);

It turns out that Firefox and Opera return ‘false’ for the second test result, whereas Internet Explorer and Safari return ‘true’ in both cases.

The latter behaviour seems most sensible and expected to me; browsers can of course cache the regular expression object to avoid parsing it over and over again, but they should IMO clone that cached object every time it is used.

~Laurens

Hi,

I and a colleague were puzzled by some strange behaviour in Firefox, we 
found that in some browsers literal regular expressions are cached and 
reused. Testcase:

function test(str) {
    var regexp = /^[^d]*\bd{1,4}\b[^d]*$/g;
    alert('Expexted: 0/true, result: ' +
            regexp.lastIndex + '/' +
            regexp.test(str)
        );
}

var xxx = "MM/dd/yyyy";
test(xxx);
test(xxx);

It turns out that Firefox and Opera return ‘false’ for the second test 
result, whereas Internet Explorer and Safari return ‘true’ in both cases.

The latter behaviour seems most sensible and expected to me; browsers 
can of course cache the regular expression object to avoid parsing it 
over and over again, but they should IMO clone that cached object every 
time it is used.

~Laurens

-- 
Note: New email address! Please update your address book.

~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, student, university of Utrecht, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lholst.vcf
Type: text/x-vcard
Size: 134 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20090123/3c1e696d/attachment.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3265 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20090123/3c1e696d/attachment.bin>

# Mike Shaver (17 years ago)

2009/1/23 Laurens Holst <lholst at students.cs.uu.nl>:

Hi,

I and a colleague were puzzled by some strange behaviour in Firefox, we found that in some browsers literal regular expressions are cached and reused. Testcase:

function test(str) { var regexp = /^[^d]\bd{1,4}\b[^d]$/g; alert('Expexted: 0/true, result: ' + regexp.lastIndex + '/' + regexp.test(str) ); }

var xxx = "MM/dd/yyyy"; test(xxx); test(xxx);

It turns out that Firefox and Opera return 'false' for the second test result, whereas Internet Explorer and Safari return 'true' in both cases.

Firefox and Opera are doing what ES3 requires (s 7.8.5: bclary.com/2004/11/07/#a-7.8.5 ), but I believe that it's being changed in 3.1 to produce a new one each time the literal expression is executed.

Mike

2009/1/23 Laurens Holst <lholst at students.cs.uu.nl>:
> Hi,
>
> I and a colleague were puzzled by some strange behaviour in Firefox, we
> found that in some browsers literal regular expressions are cached and
> reused. Testcase:
>
> function test(str) {
>   var regexp = /^[^d]*\bd{1,4}\b[^d]*$/g;
>   alert('Expexted: 0/true, result: ' +
>           regexp.lastIndex + '/' +
>           regexp.test(str)
>       );
> }
>
> var xxx = "MM/dd/yyyy";
> test(xxx);
> test(xxx);
>
> It turns out that Firefox and Opera return 'false' for the second test
> result, whereas Internet Explorer and Safari return 'true' in both cases.

Firefox and Opera are doing what ES3 requires (s 7.8.5:
http://bclary.com/2004/11/07/#a-7.8.5 ), but I believe that it's being
changed in 3.1 to produce a new one each time the literal expression
is executed.

Mike

# Mark S. Miller (17 years ago)

On Fri, Jan 23, 2009 at 7:33 AM, Mike Shaver <mike.shaver at gmail.com> wrote:

Firefox and Opera are doing what ES3 requires (s 7.8.5: bclary.com/2004/11/07/#a-7.8.5 ),

Correct.

but I believe that it's being changed in 3.1 to produce a new one each time the literal expression is executed.

Correct. In the meantime, you can change expressions like

var regexp = /^[^d]*\bd{1,4}\b[^d]*$/g;

into

var regexp = new RegExp("^[^d]*\\bd{1,4}\\b[^d]*$","g");

Yes, this is ugly. But an ugly program that works is better than a pretty one that doesn't.

On Fri, Jan 23, 2009 at 7:33 AM, Mike Shaver <mike.shaver at gmail.com> wrote:

> Firefox and Opera are doing what ES3 requires (s 7.8.5:
> http://bclary.com/2004/11/07/#a-7.8.5 ),

Correct.

> but I believe that it's being
> changed in 3.1 to produce a new one each time the literal expression
> is executed.
>

Correct. In the meantime, you can change expressions like

    var regexp = /^[^d]*\bd{1,4}\b[^d]*$/g;

into

    var regexp = new RegExp("^[^d]*\\bd{1,4}\\b[^d]*$","g");

Yes, this is ugly. But an ugly program that works is better than a pretty
one that doesn't.

-- 
   Cheers,
   --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20090123/35d55691/attachment.html>

# David-Sarah Hopwood (17 years ago)

Laurens Holst wrote:

Hi,

I and a colleague were puzzled by some strange behaviour in Firefox, we found that in some browsers literal regular expressions are cached and reused. Testcase:

function test(str) { var regexp = /^[^d]\bd{1,4}\b[^d]$/g; alert('Expexted: 0/true, result: ' + regexp.lastIndex + '/' + regexp.test(str) ); }

var xxx = "MM/dd/yyyy"; test(xxx); test(xxx);

It turns out that Firefox and Opera return ‘false’ for the second test result, whereas Internet Explorer and Safari return ‘true’ in both cases.

The latter behaviour seems most sensible and expected to me; browsers can of course cache the regular expression object to avoid parsing it over and over again, but they should IMO clone that cached object every time it is used.

This is a known problem that has been fixed in ES3.1.

ES3 section 7.8.5:

A regular expression literal is an input element that is converted to

a RegExp object (section 15.10) when it is scanned.

ES3.1 section 7.8.5:

A regular expression literal is an input element that is converted to

a RegExp object (section 15.10) each time the literal is evaluated.

Laurens Holst wrote:
> Hi,
> 
> I and a colleague were puzzled by some strange behaviour in Firefox, we
> found that in some browsers literal regular expressions are cached and
> reused. Testcase:
> 
> function test(str) {
>    var regexp = /^[^d]*\bd{1,4}\b[^d]*$/g;
>    alert('Expexted: 0/true, result: ' +
>            regexp.lastIndex + '/' +
>            regexp.test(str)
>        );
> }
> 
> var xxx = "MM/dd/yyyy";
> test(xxx);
> test(xxx);
> 
> It turns out that Firefox and Opera return ‘false’ for the second test
> result, whereas Internet Explorer and Safari return ‘true’ in both cases.
> 
> The latter behaviour seems most sensible and expected to me; browsers
> can of course cache the regular expression object to avoid parsing it
> over and over again, but they should IMO clone that cached object every
> time it is used.

This is a known problem that has been fixed in ES3.1.

ES3 section 7.8.5:
# A regular expression literal is an input element that is converted to
# a RegExp object (section 15.10) when it is scanned.

ES3.1 section 7.8.5:
# A regular expression literal is an input element that is converted to
# a RegExp object (section 15.10) each time the literal is evaluated.

-- 
David-Sarah Hopwood ⚥

# Richard Cornford (17 years ago)

Mark S. Miller wrote:

Mike Shaver wrote:

Firefox and Opera are doing what ES3 requires (s 7.8.5: bclary.com/2004/11/07/#a-7.8.5 ),

Correct.

but I believe that it's being changed in 3.1 to produce a new one each time the literal expression is executed.

Correct. In the meantime, you can change expressions like

var regexp = /^[^d]\bd{1,4}\b[^d]$/g;

into

var regexp = new RegExp("^[^d]\bd{1,4}\b[^d]$","g");

Yes, this is ugly. But an ugly program that works is better than a pretty one that doesn't.

Ugly, and an example of using a hammer to crack a nut. The issue is provoked by the fact that for a regular expression with the global flag set the - exec - method employs the regular expression object's - lastIndex - property, leaving it set to the end index of the last match made. Knowing that suggests that a simple 'solution' would be to explicitly set the regular expression object's - lastIndex - property to zero before using it. That must be cheaper than creating a new regular expression object just for the side effect of then having one with a zero - lastIndex - property.

In addition, knowing the mechanism also directs attention towards the global flag; does the regular expression being used need to have the global flag set in the first place? If the flag is not set then subsequent - exec - uses will always start at the zero index. The example regular expression used above only appears to be interested in making a single match so probably there was never a need to have the flag set.

Richard Cornford.

Mark S. Miller wrote:
> Mike Shaver wrote:
>> Firefox and Opera are doing what ES3 requires (s 7.8.5:
>> http://bclary.com/2004/11/07/#a-7.8.5 ),
>
> Correct.
>
>> but I believe that it's being changed in 3.1 to produce a new
>> one each time the literal expression is executed.
>>
>
> Correct. In the meantime, you can change expressions like
>
>    var regexp = /^[^d]*\bd{1,4}\b[^d]*$/g;
>
> into
>
>    var regexp = new RegExp("^[^d]*\\bd{1,4}\\b[^d]*$","g");
>
> Yes, this is ugly. But an ugly program that works is better than
> a pretty one that doesn't.

Ugly, and an example of using a hammer to crack a nut. The issue is 
provoked by the fact that for a regular expression with the global flag 
set the - exec - method employs the regular expression object's - 
lastIndex - property, leaving it set to the end index of the last match 
made. Knowing that suggests that a simple 'solution' would be to 
explicitly set the regular expression object's - lastIndex - property to 
zero before using it. That must be cheaper than creating a new regular 
expression object just for the side effect of then having one with a 
zero - lastIndex - property.

In addition, knowing the mechanism also directs attention towards the 
global flag; does the regular expression being used need to have the 
global flag set in the first place? If the flag is not set then 
subsequent - exec - uses will always start at the zero index. The example 
regular expression used above only appears to be interested in making a 
single match so probably there was never a need to have the flag set.

Richard Cornford.

# Brendan Eich (17 years ago)

On Jan 24, 2009, at 5:42 PM, Richard Cornford wrote:

Ugly, and an example of using a hammer to crack a nut.

I do this all the time, works great ;-).

Seriously, there's more afoot than can be patched by resetting
lastIndex.

The issue is provoked by the fact that for a regular expression with
the global flag set the - exec - method employs the regular
expression object's - lastIndex - property, leaving it set to the
end index of the last match made. Knowing that suggests that a
simple 'solution' would be to explicitly set the regular expression
object's - lastIndex - property to zero before using it. That must
be cheaper than creating a new regular expression object just for
the side effect of then having one with a zero - lastIndex - property.

The more general problem is shared mutable literal-expressed
singletons. In no other case (object or array initialiser, function
expressions, primitive literals) does evaluation return the singleton
created as if at parse time. Mutation hurts, sharing should be
explicit. To match the other kinds of literals and avoid bugs such as

bugzilla.mozilla.org/show_bug.cgi?id=474412

Efficiency concerns are secondary but can be addressed by lightweight
cloning of a shared-immutable compiler-created regexp.

In addition, knowing the mechanism also directs attention towards
the global flag; does the regular expression being used need to have
the global flag set in the first place? If the flag is not set then
subsequent - exec - uses will always start at the zero index. The
example regular expression used above only appears to be interested
in making a single match so probably there was never a need to have
the flag set.

This is an optimization challenge for implementors, not a reason to
specify a shared singleton with mutable state (lastIndex is mutable
and set to 0 even without the 'g' flag).

On Jan 24, 2009, at 5:42 PM, Richard Cornford wrote:

> Ugly, and an example of using a hammer to crack a nut.

I do this all the time, works great ;-).

Seriously, there's more afoot than can be patched by resetting  
lastIndex.

> The issue is provoked by the fact that for a regular expression with  
> the global flag set the - exec - method employs the regular  
> expression object's - lastIndex - property, leaving it set to the  
> end index of the last match made. Knowing that suggests that a  
> simple 'solution' would be to explicitly set the regular expression  
> object's - lastIndex - property to zero before using it. That must  
> be cheaper than creating a new regular expression object just for  
> the side effect of then having one with a zero - lastIndex - property.

The more general problem is shared mutable literal-expressed  
singletons. In no other case (object or array initialiser, function  
expressions, primitive literals) does evaluation return the singleton  
created as if at parse time. Mutation hurts, sharing should be  
explicit. To match the other kinds of literals and avoid bugs such as

https://bugzilla.mozilla.org/show_bug.cgi?id=474412

Efficiency concerns are secondary but can be addressed by lightweight  
cloning of a shared-immutable compiler-created regexp.

> In addition, knowing the mechanism also directs attention towards  
> the global flag; does the regular expression being used need to have  
> the global flag set in the first place? If the flag is not set then  
> subsequent - exec - uses will always start at the zero index. The  
> example regular expression used above only appears to be interested  
> in making a single match so probably there was never a need to have  
> the flag set.

This is an optimization challenge for implementors, not a reason to  
specify a shared singleton with mutable state (lastIndex is mutable  
and set to 0 even without the 'g' flag).

/be
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20090124/8cf664b7/attachment.html>

# Richard Cornford (17 years ago)

Brendan Eich wrote:

On Jan 24, 2009, at 5:42 PM, Richard Cornford wrote:

Ugly, and an example of using a hammer to crack a nut.

I do this all the time, works great ;-).

And I never do it, and that works great too.

Seriously, there's more afoot than can be patched by resetting lastIndex.

My intention was to suggest that the 'prettiest' solution was to delete the superfluous 'g' from the end of the regular expression literal. But resetting the - lastIndex - prior to using the regular expression object would also eliminate the undesirable behaviour in Laurens Holst's code, and has the merit of directly addressing the characteristic of regular expressions that results in the issue.

I admit, though, that the - new RegExp - thing is a bit of a bugbear for me. There are two reasons for that. The first is that I have encountered orders of magnitude more issues arising from people failing to cope with the 'double escaping' needed in string literal arguments for the RegExp constructor than issues following from the handling of - lastIndex -. So, for example, if you want to match a dot or a backslash in a regular expression they will need to be escaped by preceding them with a backslash, but in the string literal that backslash needs to be escaped with a second backslash if the RegExp constructor is going to see it (and in the case of matching the backslash that also needs to be escaped for in string literal). People just seem to make a lot of mistakes when being required to do that, and those mistakes don't seem to be easy to spot as the resulting regular expressions still 'work', even to the extent of sometimes making some 'correct'/expected matches.

The second reason is that the construct is often proposed without explanation and so can be received as a mystical incantation to be chanted in the face of every regular expression regardless of whether it is achieving anything useful in context. And so you encounter things like:-

... format: function(s) { return $.tablesorter.formatFloat(s.replace(new RegExp(/%/g),"")); }, ... (from a JQuery table sorting plug-in)

and end up wondering what on earth the author thought that - new RegExp - was supposed to achieve.

The issue is provoked by the fact that for a regular expression with the global flag set the - exec - method employs the regular expression object's - lastIndex - property, leaving it set to the end index of the last match made. Knowing that suggests that a simple 'solution' would be to explicitly set the regular expression object's - lastIndex - property to zero before using it. That must be cheaper than creating a new regular expression object just for the side effect of then having one with a zero - lastIndex - property.

The more general problem is shared mutable literal-expressed singletons. In no other case (object or array initialiser, function expressions, primitive literals) does evaluation return the singleton created as if at parse time. Mutation hurts, sharing should be explicit.

All of that is true, and making sure the next language version eliminates that is a good idea. But that does not help people who have to address current ES 3 implementations.

To match the other kinds of literals and avoid bugs such as

bugzilla.mozilla.org/show_bug.cgi?id=474412

Now that is an issue that relates to the identify of regular expression objects, and so can only be addressed by creating distinct objects with - new RegExp -.

Efficiency concerns are secondary but can be addressed by lightweight cloning of a shared-immutable compiler-created regexp.

"Can be addressed by ...", 'will be addressed by ...' and 'MUST be addressed by ...' are all very different things. It is not in the remit of the new specification to be requiring specific optimisations in future implementations.

In addition, knowing the mechanism also directs attention towards the global flag; does the regular expression being used need to have the global flag set in the first place? If the flag is not set then subsequent - exec - uses will always start at the zero index. The example regular expression used above only appears to be interested in making a single match so probably there was never a need to have the flag set.

This is an optimization challenge for implementors, not a reason to specify a shared singleton with mutable state (lastIndex is mutable and set to 0 even without the 'g' flag).

I am not saying that there should be a shared singleton. In the situation as we have it now there are implementations that create regular expression literals while parsing, and others that create them when the expression is evaluated. So it is not possible to rely on the former or expect the latter. The result is a minefield that needs to be cleaned up. But in the meanwhile bulldozing all regular expression uses with - new RegExp - seems an extreme alternative to recognising the few that can blow up in your face and diffusing them individually.

Richard Cornford.

Brendan Eich wrote:
> On Jan 24, 2009, at 5:42 PM, Richard Cornford wrote:
>> Ugly, and an example of using a hammer to crack a nut.
>
> I do this all the time, works great ;-).

And I never do it, and that works great too.

> Seriously, there's more afoot than can be patched by
> resetting lastIndex.

My intention was to suggest that the 'prettiest' solution was to delete 
the superfluous 'g' from the end of the regular expression literal.  But 
resetting the - lastIndex - prior to using the regular expression object 
would also eliminate the undesirable behaviour in Laurens Holst's code, 
and has the merit of directly addressing the characteristic of regular 
expressions that results in the issue.

I admit, though, that the - new RegExp - thing is a bit of a bugbear for 
me. There are two reasons for that. The first is that I have encountered 
orders of magnitude more issues arising from people failing to cope with 
the 'double escaping' needed in string literal arguments for the RegExp 
constructor than issues following from the handling of - lastIndex -. So, 
for example, if you want to match a dot or a backslash in a regular 
expression they will need to be escaped by preceding them with a 
backslash, but in the string literal that backslash needs to be escaped 
with a second backslash if the RegExp constructor is going to see it (and 
in the case of matching the backslash that also needs to be escaped for in 
string literal). People just seem to make a lot of mistakes when being 
required to do that, and those mistakes don't seem to be easy to spot as 
the resulting regular expressions still 'work', even to the extent of 
sometimes making some 'correct'/expected matches.

The second reason is that the construct is often proposed without 
explanation and so can be received as a mystical incantation to be chanted 
in the face of every regular expression regardless of whether it is 
achieving anything useful in context. And so you encounter things like:-

...
 format: function(s) {
  return $.tablesorter.formatFloat(s.replace(new RegExp(/%/g),""));
 },
...
(from a JQuery table sorting plug-in)

- and end up wondering what on earth the author thought that - new 
RegExp - was supposed to achieve.

>> The issue is provoked by the fact that for a regular
>> expression with the global flag set the - exec - method
>> employs the regular expression object's - lastIndex - property,
>> leaving it set to the end index of the last match made. Knowing
>> that suggests that a simple 'solution' would be to explicitly
>> set the regular expression object's - lastIndex - property to
>> zero before using it. That must be cheaper than creating a new
>> regular expression object just for the side effect of then
>> having one with a zero - lastIndex - property.
>
> The more general problem is shared mutable literal-expressed
> singletons. In no other case (object or array initialiser,
> function expressions, primitive literals) does evaluation
> return the singleton created as if at parse time. Mutation
> hurts, sharing should be explicit.

All of that is true, and making sure the next language version eliminates 
that is a good idea. But that does not help people who have to address 
current ES 3 implementations.

> To match the other kinds of literals and avoid bugs such as
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=474412

Now that is an issue that relates to the identify of regular expression 
objects, and so can only be addressed by creating distinct objects with - 
new RegExp -.

> Efficiency concerns are secondary but can be addressed by
> lightweight cloning of a shared-immutable compiler-created
> regexp.

"Can be addressed by ...", 'will be addressed by ...' and 'MUST be 
addressed by ...' are all very different things. It is not in the remit of 
the new specification to be requiring specific optimisations in future 
implementations.

>> In addition, knowing the mechanism also directs attention
>> towards the global flag; does the regular expression being
>> used need to have the global flag set in the first place?
>> If the flag is not set then subsequent - exec - uses will
>> always start at the zero index. The example regular
>> expression used above only appears to be interested in
>> making a single match so probably there was never a need
>> to have the flag set.
>
> This is an optimization challenge for implementors, not a
> reason to specify a shared singleton with mutable state
> (lastIndex is mutable and set to 0 even without the 'g' flag).

I am not saying that there should be a shared singleton. In the situation 
as we have it now there are implementations that create regular expression 
literals while parsing, and others that create them when the expression is 
evaluated. So it is not possible to rely on the former or expect the 
latter. The result is a minefield that needs to be cleaned up. But in the 
meanwhile bulldozing all regular expression uses with - new RegExp - seems 
an extreme alternative to recognising the few that can blow up in your 
face and diffusing them individually.

Richard Cornford.

# Brendan Eich (17 years ago)

On Jan 24, 2009, at 10:56 PM, Richard Cornford wrote:

All of that is true, and making sure the next language version
eliminates that is a good idea. But that does not help people who
have to address current ES 3 implementations.

The bug I cited,

bugzilla.mozilla.org/show_bug.cgi?id=474412

claims IE and Safari do what ES3.1 specifies already. Pratap's JScript
Deviations doc didn't mention this, at least not in the early form I
just checked, and I can't test IE here. It's true for Safari 3.2.1.

To match the other kinds of literals and avoid bugs such as

bugzilla.mozilla.org/show_bug.cgi?id=474412

Now that is an issue that relates to the identify of regular
expression objects, and so can only be addressed by creating
distinct objects with - new RegExp -.

The question of identity is potentially involved, unless you can show
that resetting lastIndex but preserving the lexical singleton ES3
conformance satisfies most of the complaints behind the highly-dup'ed
bug

bugzilla.mozilla.org/show_bug.cgi?id=98409

(see weblogs.mozillazine.org/roadmap/archives/008325.html where
the top three most-frequently dup'ed bugzilla.mozilla.org JS bugs were
tabulated).

I'm not sure how much lastIndex resetting would help, but I know it
would hurt those reporters who complain (also or separately) about the
odd lexical singleton evaluation model. I'd rather fix regexp literals
to evaluate like other mutable-object literals. This fix subsumes any
ad-hoc fix for the lastIndex bug, for principled reasons: by
eliminating implicit sharing of literally expressed mutable objects.

"Can be addressed by ...", 'will be addressed by ...' and 'MUST be
addressed by ...' are all very different things. It is not in the
remit of the new specification to be requiring specific
optimisations in future implementations.

Of course not -- nor is it the spec's job to prematurely optimize at
the expense of semantic cleanliness and principle-of-least-astonishment.

The market will sort out the implementation quality issue, and it's
already forcing major performance work from all the top vendors. It
won't happen overnight, but we've already got a cross-browser
difference to deal with. Better to make the right long-term fix to the
spec.

I am not saying that there should be a shared singleton. In the
situation as we have it now there are implementations that create
regular expression literals while parsing, and others that create
them when the expression is evaluated.

The latter are not conforming to ES3, FWIW. The spec is clear.

So it is not possible to rely on the former or expect the latter.
The result is a minefield that needs to be cleaned up. But in the
meanwhile bulldozing all regular expression uses with - new RegExp -
seems an extreme alternative to recognising the few that can blow up
in your face and diffusing them individually.

Oh, I didn't mean to argue against your argument with Mark's advice to
use new RegExp(...) exclusively and never use literals! Sorry if I
misread you, I thought you were arguing for lastIndex resetting as an
alternative to the 3.1 evaluation change.

On Jan 24, 2009, at 10:56 PM, Richard Cornford wrote:

> All of that is true, and making sure the next language version  
> eliminates that is a good idea. But that does not help people who  
> have to address current ES 3 implementations.

The bug I cited,

https://bugzilla.mozilla.org/show_bug.cgi?id=474412

claims IE and Safari do what ES3.1 specifies already. Pratap's JScript  
Deviations doc didn't mention this, at least not in the early form I  
just checked, and I can't test IE here. It's true for Safari 3.2.1.

>> To match the other kinds of literals and avoid bugs such as
>>
>> https://bugzilla.mozilla.org/show_bug.cgi?id=474412
>
> Now that is an issue that relates to the identify of regular  
> expression objects, and so can only be addressed by creating  
> distinct objects with - new RegExp -.

The question of identity is potentially involved, unless you can show  
that resetting lastIndex but preserving the lexical singleton ES3  
conformance satisfies most of the complaints behind the highly-dup'ed  
bug

http://bugzilla.mozilla.org/show_bug.cgi?id=98409

(see http://weblogs.mozillazine.org/roadmap/archives/008325.html where  
the top three most-frequently dup'ed bugzilla.mozilla.org JS bugs were  
tabulated).

I'm not sure how much lastIndex resetting would help, but I know it  
would hurt those reporters who complain (also or separately) about the  
odd lexical singleton evaluation model. I'd rather fix regexp literals  
to evaluate like other mutable-object literals. This fix subsumes any  
ad-hoc fix for the lastIndex bug, for principled reasons: by  
eliminating implicit sharing of literally expressed mutable objects.

> "Can be addressed by ...", 'will be addressed by ...' and 'MUST be  
> addressed by ...' are all very different things. It is not in the  
> remit of the new specification to be requiring specific  
> optimisations in future implementations.

Of course not -- nor is it the spec's job to prematurely optimize at  
the expense of semantic cleanliness and principle-of-least-astonishment.

The market will sort out the implementation quality issue, and it's  
already forcing major performance work from all the top vendors. It  
won't happen overnight, but we've already got a cross-browser  
difference to deal with. Better to make the right long-term fix to the  
spec.

> I am not saying that there should be a shared singleton. In the  
> situation as we have it now there are implementations that create  
> regular expression literals while parsing, and others that create  
> them when the expression is evaluated.

The latter are not conforming to ES3, FWIW. The spec is clear.

> So it is not possible to rely on the former or expect the latter.  
> The result is a minefield that needs to be cleaned up. But in the  
> meanwhile bulldozing all regular expression uses with - new RegExp -  
> seems an extreme alternative to recognising the few that can blow up  
> in your face and diffusing them individually.

Oh, I didn't mean to argue against your argument with Mark's advice to  
use new RegExp(...) exclusively and never use literals! Sorry if I  
misread you, I thought you were arguing for lastIndex resetting as an  
alternative to the 3.1 evaluation change.

/be

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20090124/cc29b352/attachment.html>

# Richard Cornford (17 years ago)

Brendan Eich wrote:

On Jan 24, 2009, at 10:56 PM, Richard Cornford wrote: <snip>

I am not saying that there should be a shared singleton. In the situation as we have it now there are implementations that create regular expression literals while parsing, and others that create them when the expression is evaluated.

The latter are not conforming to ES3, FWIW. The spec is clear.

They are not conforming, but their (now long-term) existence has prevented people form expecting conformity in this area. Which becomes the thing that allows the new spec to change the way regular expression literals are handled.

So it is not possible to rely on the former or expect the latter. The result is a minefield that needs to be cleaned up. But in the meanwhile bulldozing all regular expression uses with - new RegExp - seems an extreme alternative to recognising the few that can blow up in your face and diffusing them individually.

Oh, I didn't mean to argue against your argument with Mark's advice to use new RegExp(...) exclusively and never use literals! Sorry if I misread you, I thought you were arguing for lastIndex resetting as an alternative to the 3.1 evaluation change.

It would never have been realistic/practical to change the way - lastIndex - is handled in the - exec - method as that would break code such as:-

var m, rx = / ... /g; while (( m = rx.exec( st ) )){ ... // handle each successive match in turn. }

which, even if not often seen, is something people do.

Fortunately the existing non-conforming implementations will have prevented the variation:-

var m; while (( m = / ... /g.exec( st ) )){ ... // handle each successive match in turn. }

which would otherwise have been broken by getting rid of the shared singleton.

Richard Cornford.

Brendan Eich wrote:
> On Jan 24, 2009, at 10:56 PM, Richard Cornford wrote:
<snip>
>> I am not saying that there should be a shared singleton. In
>> the situation as we have it now there are implementations
>> that create regular expression literals while parsing, and
>> others that create them when the expression is evaluated.
>
> The latter are not conforming to ES3, FWIW. The spec is clear.

They are not conforming, but their (now long-term) existence has prevented
people form expecting conformity in this area. Which becomes the thing
that allows the new spec to change the way regular expression literals are
handled.

>> So it is not possible to rely on the former or expect the
>> latter. The result is a minefield that needs to be cleaned
>> up. But in the meanwhile bulldozing all regular expression
>> uses with - new RegExp - seems an extreme alternative to
>> recognising the few that can blow up in your face and
>> diffusing them individually.
>
> Oh, I didn't mean to argue against your argument with Mark's
> advice to use new RegExp(...) exclusively and never use
> literals! Sorry if I misread you, I thought you were arguing
> for lastIndex resetting as an alternative to the 3.1 evaluation
> change.

It would never have been realistic/practical to change the way -
lastIndex - is handled in the - exec - method as that would break code
such as:-

var m, rx = / ... /g;
while (( m =  rx.exec( st ) )){
   ... // handle each successive match in turn.
}

- which, even if not often seen, is something people do.

Fortunately the existing non-conforming implementations will have
prevented the variation:-

var m;
while (( m =  / ... /g.exec( st ) )){
    ... // handle each successive match in turn.
}

-  which would otherwise have been broken by getting rid of the shared
singleton.

Richard Cornford.