ES Discuss - Message History

Benjamin Gruenbaum (2015-06-13T18:39:39.000Z)

Go to Source

On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> wrote:

> On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote:
>
>>  All of these should be building on top of RegExp.escape :P
>>
>
> It's funny how, by considering it as leading to a proposal, I quickly saw
> deep flaws that I was previously missing.
>
>
That was a big part of making a proposal out of it - to find these things :)


> the overall result does not do this. For example:
>
>     const data = ':x';
>     const rebad = RegExp.tag`(?${data})`;
>     console.log(rebad.test('x')); // true
>
> is nonsense. Since the RegExp grammar can be extended per platform, the
> same argument that says we should have the platform provide RegExp.escape
> says we should have the platform provide RegExp.tag -- so that they can
> conisistently reflect these platform extensions.
>
>
This is a good point, I considered whether or not `-` should be included
for a similar reason. I think it is reasonable to only include syntax
identifiers and expect users to deal with parts of patterns of more than
one characters themselves (by wrapping the string with `()` in the
constructor). This is what every other language does practically.

That said - I'm very open to allowing implementations to escape _more_ than
`SyntaxCharacter` in their implementations and to even recommend  that they
do so in such a way that is consistent with their regular expressions. What
do you think about doing that?

I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm
not sure if we have a way in JavaScript to not make a capturing group out
of it.


> * Now that we have modules, I would like to see us stop having each
> proposal for new functionality come at the price of further global
> namespace pollution. I would like to see us transition towards having most
> new std library entry points be provided by std modules. I understand why
> we haven't yet, but something needs to go first.
>
>
I think that doing this should be an eventual target but I don't think
adding a single much-asked-for static function to the RegExp function would
be a good place to start. I think the committee first needs to agree about
how this form of modularisation should be done - there are much bigger
targets first and I would not like to see this proposal tied and held back
by that (useful) goal.


> * ES6 made RegExp subclassable with most methods delegating to a common
> @exec method, so that a subclass only needs to consistently override a
> small number of things to stay consistent. Neither RegExpSubclass.escape
> nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because
> of the first bullet, RegExpSubclass.tag also cannot be derived from
> RegExpSubclass.escape. But having RegExpSubclass.escape delegating to
> RegExpSubclass.tag seem weird.
>
>
Right but it makes sense that `escape` does not play in this game since it
is a static method that takes a string argument - I'm not sure how it could
use @exec.


> * The instanceof below prevents this polyfill from working cross-frame.
> Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where
> RegExpSubclass2.source produces a regexp grammar string that
> RegExpSubclass1 does not understand, I have no idea what the composition
> should do other than reject with an error. But what if the strings happen
> to be mutually valid but with conflicting meaning between these subclasses?
>
> This is hacky, but in my code I just did `argument.exec ? treatAsRegExp :
treatAsString`.


>
>
>
>>
>>
>> *From:* es-discuss [mailto:es-discuss-bounces at mozilla.org] *On Behalf Of
>> *Mark S. Miller
>> *Sent:* Saturday, June 13, 2015 02:39
>> *To:* C. Scott Ananian
>> *Cc:* Benjamin Gruenbaum; es-discuss
>> *Subject:* Re: RegExp.escape()
>>
>>
>>
>> The point of this last variant is that data gets escaped but RegExp
>> objects do not -- allowing you to compose RegExps:
>> re`${re1}|${re2}*|${data}`
>> But this requires one more adjustment:
>>
>>
>> >
>> >   function re(first, ...args) {
>> >     let flags = first;
>> >     function tag(template, ...subs) {
>> >       const parts = [];
>> >       const numSubs = subs.length;
>> >       for (let i = 0; i < numSubs; i++) {
>> >         parts.push(template.raw[i]);
>> >         const subst = subs[i] instanceof RegExp ?
>>
>>
>>                `(?:${subs[i].source})` :
>>
>> >             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
>> >         parts.push(subst);
>> >       }
>> >       parts.push(template.raw[numSubs]);
>> >       return RegExp(parts.join(''), flags);
>> >     }
>> >     if (typeof first === 'string') {
>> >       return tag;
>> >     } else {
>> >       flags = void 0;  // Should this be '' ?
>> >       return tag(first, ...args);
>> >     }
>> >   }
>>
>
>
>
> --
>     Cheers,
>     --MarkM
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/9e6fdabf/attachment.html>

d at domenic.me (2015-06-16T16:55:30.381Z)

On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> wrote:

> On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote:
>
>>  All of these should be building on top of RegExp.escape :P
>>
>
> It's funny how, by considering it as leading to a proposal, I quickly saw
> deep flaws that I was previously missing.
>
>

That was a big part of making a proposal out of it - to find these things :)

> the overall result does not do this. For example:
>
>     const data = ':x';
>     const rebad = RegExp.tag`(?${data})`;
>     console.log(rebad.test('x')); // true
>
> is nonsense. Since the RegExp grammar can be extended per platform, the
> same argument that says we should have the platform provide RegExp.escape
> says we should have the platform provide RegExp.tag -- so that they can
> conisistently reflect these platform extensions.
>
>

This is a good point, I considered whether or not `-` should be included
for a similar reason. I think it is reasonable to only include syntax
identifiers and expect users to deal with parts of patterns of more than
one characters themselves (by wrapping the string with `()` in the
constructor). This is what every other language does practically.

That said - I'm very open to allowing implementations to escape _more_ than
`SyntaxCharacter` in their implementations and to even recommend  that they
do so in such a way that is consistent with their regular expressions. What
do you think about doing that?

I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm
not sure if we have a way in JavaScript to not make a capturing group out
of it.

> * Now that we have modules, I would like to see us stop having each
> proposal for new functionality come at the price of further global
> namespace pollution. I would like to see us transition towards having most
> new std library entry points be provided by std modules. I understand why
> we haven't yet, but something needs to go first.
>
>

I think that doing this should be an eventual target but I don't think
adding a single much-asked-for static function to the RegExp function would
be a good place to start. I think the committee first needs to agree about
how this form of modularisation should be done - there are much bigger
targets first and I would not like to see this proposal tied and held back
by that (useful) goal.

> * ES6 made RegExp subclassable with most methods delegating to a common
> @exec method, so that a subclass only needs to consistently override a
> small number of things to stay consistent. Neither RegExpSubclass.escape
> nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because
> of the first bullet, RegExpSubclass.tag also cannot be derived from
> RegExpSubclass.escape. But having RegExpSubclass.escape delegating to
> RegExpSubclass.tag seem weird.
>
>

Right but it makes sense that `escape` does not play in this game since it
is a static method that takes a string argument - I'm not sure how it could
use @exec.

> * The instanceof below prevents this polyfill from working cross-frame.
> Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where
> RegExpSubclass2.source produces a regexp grammar string that
> RegExpSubclass1 does not understand, I have no idea what the composition
> should do other than reject with an error. But what if the strings happen
> to be mutually valid but with conflicting meaning between these subclasses?

This is hacky, but in my code I just did `argument.exec ? treatAsRegExp : treatAsString`.

Edit