Suggest adopting .NET/Perl regexp named capture syntax
On 10/24/07, StevenLevithan <steves_list at hotmail.com> wrote:
ECMAScript 4 regular expression extension proposals indicate that the Python syntax will be used for named capture. Python uses (?P<name>...) for named capture, (?P=name) for a backreference within the regex, and \g<name> for a backreference within a replacement string. Personally, I feel this a mistake.
Although Python was the first to implement named capture, other libraries seem to be standardizing around .NET's alternative syntax, which uses (?<name>...) or (?'name'...) for capture, \k<name> or \k'name' for a backreference within the regex, and ${name} for a backreference within a replacement string. Perl 5.10 has adopted .NET's syntax (although backrefereces within a replacement string use $+{name}) since "most people consider it to be nicer". Recent versions of PCRE have followed Perl's lead by supporting .NET's syntax for named capture as the preferred style.
I guess that counts as momentum...
Here are the problems I see with the Python syntax:
(?P<name>...)
- What does the "P" stand for? "Python"? The character is unnecessary and unhelpful.
Or alternatively, it makes it possible to use other characters later for other purposes.
(?P=name)
- Backreferences should not use parentheses since they are a single token and not a grouping.
\g<name>
- Is this a single token in ES4, or a string which in a string literal will have to be written as "\g<name>"?
ES4 does not have this functionality. That may be an oversight and is now logged as bugs.ecmascript.org/ticket/255.
IMO the most natural syntax for ES4 is something like $<name>, where
name is restricted to one of the names actually captured by the RegExp: it's not an arbitrary property name or variable name.
If it is the former, how will you be able to generate a replacement string using e.g. a textarea with user input, and if it's the latter, it seems less elegant than ${name} , which follows the '$ denotes a backreference' convention.
One other question... can the results from named capture be used in a replacement closure function? I.e., will you be able to do something like str.replace(/(?P<name>)/,function(match){return match.name;}); ?
The captured substrings with names are available as properties on the match result object, so that should work, yes.
Thanks for the feedback and for opening the ticket.
Lars T Hansen-2 wrote:
One other question... can the results from named capture be used in a replacement closure function? I.e., will you be able to do something like str.replace(/(?P<name>)/,function(match){return match.name;}); ?
The captured substrings with names are available as properties on the match result object, so that should work, yes.
--lars
But the match result object is not available within a replacement closure function, hence the question. At least in ES3, arguments[0] is a string primitive containing the entire match (i.e. backreference zero). It could be changed to a String object and have the named backreferences attached to it as properties (which is how I handle the issue in my stevenlevithan.com/regex/xregexp XRegExp library), but this is fundamentally a different thing, and might have some obscure potential for compatibility issues.
-----Original Message----- From: es4-discuss-bounces at mozilla.org [mailto:es4-discuss-bounces at mozilla.org] On Behalf Of StevenLevithan
One other question... can the results from named capture be used in
a
replacement closure function? I.e., will you be able to do
something
like str.replace(/(?P<name>)/,function(match){return match.name;}); ?
The captured substrings with names are available as properties on
the
match result object, so that should work, yes.
But the match result object is not available within a replacement closure function, hence the question.
Sorry, brain fart on my part. Interesting point. Will look into it; expect another ticket tomorrow.
ECMAScript 4 regular expression extension proposals indicate that the Python syntax will be used for named capture. Python uses (?P<name>...) for named
capture, (?P=name) for a backreference within the regex, and \g<name> for a
backreference within a replacement string. Personally, I feel this a mistake.
Although Python was the first to implement named capture, other libraries seem to be standardizing around .NET's alternative syntax, which uses (?<name>...) or (?'name'...) for capture, \k<name> or \k'name' for a
backreference within the regex, and ${name} for a backreference within a replacement string. Perl 5.10 has adopted .NET's syntax (although backrefereces within a replacement string use $+{name}) since "most people consider it to be nicer". Recent versions of PCRE have followed Perl's lead by supporting .NET's syntax for named capture as the preferred style.
Here are the problems I see with the Python syntax:
(?P<name>...)
(?P=name)
\g<name>
to generate a replacement string using e.g. a textarea with user input, and if it's the latter, it seems less elegant than ${name} , which follows the '$ denotes a backreference' convention.
One other question... can the results from named capture be used in a replacement closure function? I.e., will you be able to do something like str.replace(/(?P<name>)/,function(match){return match.name;}); ?