Look-behind proposal

# Nozomu Katō (10 years ago)

I submit a proposal for adding the look-behind assertions to RegExp.

Abstract

Basically, the evaluation for the look-behind assertion is performed by "rewind the target sequence and do the same thing as the look-ahead assertion":

When compiling a regular expression pattern, count the total number of PatternCharacter and . and \ AtomEscape and CharacterClass in Disjunction per look-behind assertion. Each Atom may be followed by {n}, but may not be followed by the other quantifiers.
When a look-behind assertion evaluated, rewind the target sequence by that number and evaluate as if the look-ahead assertion from that point.

-- Static Semantics: Early Errors (21.2.1.1 in ECMAScript Specifition 6)

Assertion :: ( ? < = Disjunction ) Assertion :: ( ? < ! Disjunction )

It is a Syntax Error if Disjunction contains Quantifier :: QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

When Disjunction is separated by the | regular expression, if the number of the sequence of code points which the left Alternative matches and the number of the sequence of code points which the right Disjunction matches are different, then it is a Syntax Error.

Assertion (21.2.2.6 in ECMAScript Specifition 6)

The production Assertion :: ( ? < = Disjunction ) evaluates as follows:

Evaluate Disjunction to obtain a Matcher m.
Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
Let n be the exact number of the sequence of code points which Disjunction matches [NOTE].
Let xe be x's endIndex.
If xe < n, return failure.
Let xcap be x's captures List.
Let y be the State (xe-n, xcap).
Let d be a Continuation that always returns its State argument as a successful MatchResult.
Call m(y, d) and let r be its result.
If r is failure, return failure.
Let z be r's State.
Let zcap be z's captures List.
Let a be the State (xe, zcap).
Call c(a) and return its result.

The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:

Evaluate Disjunction to obtain a Matcher m.
Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
Let n be the exact number of the sequence of code points which Disjunction matches [NOTE].
Let xe be x's endIndex.
If xe < n, then call c(x) and return its result.
Let xcap be x's captures List.
Let y be the State (xe-n, xcap).
Let d be a Continuation that always returns its State argument as a successful MatchResult.
Call m(y, d) and let r be its result.
If r is not failure, return failure.
Call c(x) and return its result.

NOTE: n is computed by counting and summing up the following Atoms in the Disjunction: a. Atom :: PatternCharacter b. Atom :: . c. Atom :: \ AtomEscape d. Atom :: CharacterClass

If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }", then that Atom is counted as the number of DecimalDigits instead of one.

If the Disjunction contains an "Atom :: ( Disjunction )" or "Atom :: ( ? : Disjunction )" that is followed by a "QuantifierPrefix :: { DecimalDigits }", then the number of code points which that inner-Disjunction matches is multiplied by the number of DecimalDigits.

If it is not necessarily important to strictly follow the syntax of Perl 5, then adopting the following syntax may make implemetation simple:

(?<{n}= Disjunction) (?<{n}! Disjunction)

where n in {} is the number of code points by which the target sequence is rewound when a look-behind assertion is evaluated. In the case of this syntax, it is not required to count Atoms in Disjunction and the number to rewind the sequence of code points is taken from n directly.

, Nozomu

I submit a proposal for adding the look-behind assertions to RegExp.

----
Abstract

Basically, the evaluation for the look-behind assertion is performed by
"rewind the target sequence and do the same thing as the look-ahead
assertion":

  1. When compiling a regular expression pattern, count the total number
     of PatternCharacter and . and \ AtomEscape and CharacterClass in
     Disjunction per look-behind assertion. Each Atom may be followed by
     {n}, but may not be followed by the other quantifiers.
  2. When a look-behind assertion evaluated, rewind the target sequence
     by that number and evaluate as if the look-ahead assertion from
     that point.

--
Static Semantics: Early Errors (21.2.1.1 in ECMAScript Specifition 6)

  Assertion :: ( ? < = Disjunction )
  Assertion :: ( ? < ! Disjunction )

  It is a Syntax Error if Disjunction contains Quantifier ::
  QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

  When Disjunction is separated by the | regular expression, if the
  number of the sequence of code points which the left Alternative
  matches and the number of the sequence of code points which the right
  Disjunction matches are different, then it is a Syntax Error.


Assertion (21.2.2.6 in ECMAScript Specifition 6)

The production Assertion :: ( ? < = Disjunction ) evaluates as follows:

1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
   x and a Continuation c, and performs the following steps:
  1. Let n be the exact number of the sequence of code points which
     Disjunction matches [NOTE].
  2. Let xe be x's endIndex.
  3. If xe < n, return failure.
  4. Let xcap be x's captures List.
  5. Let y be the State (xe-n, xcap).
  6. Let d be a Continuation that always returns its State argument as a
     successful MatchResult.
  7. Call m(y, d) and let r be its result.
  8. If r is failure, return failure.
  9. Let z be r's State.
  10. Let zcap be z's captures List.
  11. Let a be the State (xe, zcap).
  12. Call c(a) and return its result.

The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:

1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
   x and a Continuation c, and performs the following steps:
  1. Let n be the exact number of the sequence of code points which
     Disjunction matches [NOTE].
  2. Let xe be x's endIndex.
  3. If xe < n, then call c(x) and return its result.
  4. Let xcap be x's captures List.
  5. Let y be the State (xe-n, xcap).
  6. Let d be a Continuation that always returns its State argument as a
     successful MatchResult.
  7. Call m(y, d) and let r be its result.
  8. If r is not failure, return failure.
  9. Call c(x) and return its result.

NOTE: n is computed by counting and summing up the following Atoms in
the Disjunction:
  a. Atom :: PatternCharacter
  b. Atom :: .
  c. Atom :: \ AtomEscape
  d. Atom :: CharacterClass

If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }",
then that Atom is counted as the number of DecimalDigits instead of one.

If the Disjunction contains an "Atom :: ( Disjunction )" or "Atom :: ( ?
: Disjunction )" that is followed by a "QuantifierPrefix :: {
DecimalDigits }", then the number of code points which that
inner-Disjunction matches is multiplied by the number of DecimalDigits.

----

If it is not necessarily important to strictly follow the syntax of Perl
5, then adopting the following syntax may make implemetation simple:

  (?<{n}= Disjunction)
  (?<{n}! Disjunction)

where n in {} is the number of code points by which the target sequence
is rewound when a look-behind assertion is evaluated. In the case of
this syntax, it is not required to count Atoms in Disjunction and the
number to rewind the sequence of code points is taken from n directly.


Regards,
  Nozomu

# Claude Pache (10 years ago)

Old thread of interest:

esdiscuss.org/topic/regexp-lookbehind, esdiscuss.org/topic/regexp-lookbehind

I've also personally thought quite thoroughly on how to add look-behind assertions in ES RegExps.

Your approach imposes that the exact length of a string matched by the sub-pattern ( ? < = Disjunction ) is statically determinable This is, indeed, the original semantics of Perl 5.

Another more powerful approach (which I prefer), is the one followed by .NET: the sub-pattern ( ? < = Disjunction ) is tentatively matched backwards. The advantage is that it does not impose any restriction on the subpattern, while remaining as efficient as possible (in theory, about as efficient as an equivalent look-ahead assertion). The main drawback is that it may be somewhat confusing for some types of complicated subpatterns, especially for those that contain backreferences. (Also it is more mind-bending to spec.)

That said, I think there are easier missing features to be added, that are supported by most other RegExp flavours. I'm thinking of:

the s, or .dotall flag: the dot . matches every character, including newlines;
true support of Unicode, namely: escape sequences such as \Lu for uppercase letter, or \X for grapheme cluster.

Old thread of interest:

https://esdiscuss.org/topic/regexp-lookbehind <https://esdiscuss.org/topic/regexp-lookbehind>

I've also personally thought quite thoroughly on how to add look-behind assertions in ES RegExps.

Your approach imposes that the exact length of a string matched by the sub-pattern `( ? < = Disjunction )` is statically determinable This is, indeed, the original semantics of Perl 5. 

Another more powerful approach (which I prefer), is the one followed by .NET: the sub-pattern `( ? < = Disjunction )` is tentatively matched _backwards_. The advantage is that it does not impose any restriction on the subpattern, while remaining as efficient as possible (in theory, about as efficient as an equivalent look-ahead assertion). The main drawback is that it may be somewhat confusing for some types of complicated subpatterns, especially for those that contain backreferences. (Also it is more mind-bending to spec.)

That said, I think there are easier missing features to be added, that are supported by most other RegExp flavours. I'm thinking of:

* the `s`, or `.dotall` flag: the dot `.` matches every character, including newlines;
* true support of Unicode, namely: escape sequences such as `\Lu` for uppercase letter, or `\X` for grapheme cluster.

—Claude


> Le 18 mai 2015 à 15:50, Nozomu Katō <noz.ka at akenotsuki.com> a écrit :
> 
> I submit a proposal for adding the look-behind assertions to RegExp.
> 
> ----
> Abstract
> 
> Basically, the evaluation for the look-behind assertion is performed by
> "rewind the target sequence and do the same thing as the look-ahead
> assertion":
> 
>  1. When compiling a regular expression pattern, count the total number
>     of PatternCharacter and . and \ AtomEscape and CharacterClass in
>     Disjunction per look-behind assertion. Each Atom may be followed by
>     {n}, but may not be followed by the other quantifiers.
>  2. When a look-behind assertion evaluated, rewind the target sequence
>     by that number and evaluate as if the look-ahead assertion from
>     that point.
> 
> --
> Static Semantics: Early Errors (21.2.1.1 in ECMAScript Specifition 6)
> 
>  Assertion :: ( ? < = Disjunction )
>  Assertion :: ( ? < ! Disjunction )
> 
>  It is a Syntax Error if Disjunction contains Quantifier ::
>  QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.
> 
>  When Disjunction is separated by the | regular expression, if the
>  number of the sequence of code points which the left Alternative
>  matches and the number of the sequence of code points which the right
>  Disjunction matches are different, then it is a Syntax Error.
> 
> 
> Assertion (21.2.2.6 in ECMAScript Specifition 6)
> 
> The production Assertion :: ( ? < = Disjunction ) evaluates as follows:
> 
> 1. Evaluate Disjunction to obtain a Matcher m.
> 2. Return an internal Matcher closure that takes two arguments, a State
>   x and a Continuation c, and performs the following steps:
>  1. Let n be the exact number of the sequence of code points which
>     Disjunction matches [NOTE].
>  2. Let xe be x's endIndex.
>  3. If xe < n, return failure.
>  4. Let xcap be x's captures List.
>  5. Let y be the State (xe-n, xcap).
>  6. Let d be a Continuation that always returns its State argument as a
>     successful MatchResult.
>  7. Call m(y, d) and let r be its result.
>  8. If r is failure, return failure.
>  9. Let z be r's State.
>  10. Let zcap be z's captures List.
>  11. Let a be the State (xe, zcap).
>  12. Call c(a) and return its result.
> 
> The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:
> 
> 1. Evaluate Disjunction to obtain a Matcher m.
> 2. Return an internal Matcher closure that takes two arguments, a State
>   x and a Continuation c, and performs the following steps:
>  1. Let n be the exact number of the sequence of code points which
>     Disjunction matches [NOTE].
>  2. Let xe be x's endIndex.
>  3. If xe < n, then call c(x) and return its result.
>  4. Let xcap be x's captures List.
>  5. Let y be the State (xe-n, xcap).
>  6. Let d be a Continuation that always returns its State argument as a
>     successful MatchResult.
>  7. Call m(y, d) and let r be its result.
>  8. If r is not failure, return failure.
>  9. Call c(x) and return its result.
> 
> NOTE: n is computed by counting and summing up the following Atoms in
> the Disjunction:
>  a. Atom :: PatternCharacter
>  b. Atom :: .
>  c. Atom :: \ AtomEscape
>  d. Atom :: CharacterClass
> 
> If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }",
> then that Atom is counted as the number of DecimalDigits instead of one.
> 
> If the Disjunction contains an "Atom :: ( Disjunction )" or "Atom :: ( ?
> : Disjunction )" that is followed by a "QuantifierPrefix :: {
> DecimalDigits }", then the number of code points which that
> inner-Disjunction matches is multiplied by the number of DecimalDigits.
> 
> ----
> 
> If it is not necessarily important to strictly follow the syntax of Perl
> 5, then adopting the following syntax may make implemetation simple:
> 
>  (?<{n}= Disjunction)
>  (?<{n}! Disjunction)
> 
> where n in {} is the number of code points by which the target sequence
> is rewound when a look-behind assertion is evaluated. In the case of
> this syntax, it is not required to count Atoms in Disjunction and the
> number to rewind the sequence of code points is taken from n directly.
> 
> 
> Regards,
>  Nozomu
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150518/a792b5e6/attachment.html>

# Jason Orendorff (10 years ago)

On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō <noz.ka at akenotsuki.com> wrote:

I submit a proposal for adding the look-behind assertions to RegExp.

Fantastic! I'm not a TC39 member but I've been hoping this would happen for some time.

I have often thought that the major obstacle was that no one has done this work. I hope I was not wrong!

It is a Syntax Error if Disjunction contains Quantifier :: QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

Backreferences must be ruled out, too.

Let n be the exact number of the sequence of code points which Disjunction matches [NOTE].

I don't think a NOTE is strong enough for spec purposes, so you can improve the proposal by adding a "Static Semantics:" section that formally specifies this.

The spec uses attribute grammars for dozens of things like this, where some piece of information has to be determined from the parse tree. See "Static Semantics: ElisionWidth" for a simple example.

On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō <noz.ka at akenotsuki.com> wrote:
> I submit a proposal for adding the look-behind assertions to RegExp.

Fantastic! I'm not a TC39 member but I've been hoping this would
happen for some time.

I have often thought that the major obstacle was that no one has done
this work. I hope I was not wrong!

>  It is a Syntax Error if Disjunction contains Quantifier ::
>  QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

Backreferences must be ruled out, too.

>   1. Let n be the exact number of the sequence of code points which
>      Disjunction matches [NOTE].

I don't think a NOTE is strong enough for spec purposes, so you can
improve the proposal by adding a "Static Semantics:" section that
formally specifies this.

The spec uses attribute grammars for dozens of things like this, where
some piece of information has to be determined from the parse tree.
See "Static Semantics: ElisionWidth" for a simple example.

-j

# Jason Orendorff (10 years ago)

On Mon, May 18, 2015 at 1:51 PM, Claude Pache <claude.pache at gmail.com> wrote:

Another more powerful approach (which I prefer), [...]

As long as we prohibit matching groups within lookbehind assertions, I think the approach you prefer is a strict superset of what's proposed. So there's no opportunity cost to taking the proposal as-is: the .NET semantics could be compatibly added later.

I imagine TC39 gives some preference to proposals that exist in a concrete form. So if you prefer the .NET approach enough to do anything about it, now's a great time to write some spec text.

That said, I think there are easier missing features to be added, that are supported by most other RegExp flavours. I'm thinking of:

the s, or .dotall flag: the dot . matches every character, including newlines;

true support of Unicode, namely: escape sequences such as \Lu for uppercase letter, or \X for grapheme cluster.

Why not submit proposals for these?

On Mon, May 18, 2015 at 1:51 PM, Claude Pache <claude.pache at gmail.com> wrote:
> Another more powerful approach (which I prefer), [...]

As long as we prohibit matching groups within lookbehind assertions, I
think the approach you prefer is a strict superset of what's proposed.
So there's no opportunity cost to taking the proposal as-is: the .NET
semantics could be compatibly added later.

I imagine TC39 gives some preference to proposals that exist in a
concrete form. So if you prefer the .NET approach enough to do
anything about it, now's a great time to write some spec text.

> That said, I think there are easier missing features to be added, that are
> supported by most other RegExp flavours. I'm thinking of:
>
> * the `s`, or `.dotall` flag: the dot `.` matches every character, including
> newlines;
> * true support of Unicode, namely: escape sequences such as `\Lu` for
> uppercase letter, or `\X` for grapheme cluster.

Why not submit proposals for these?

-j

# Allen Wirfs-Brock (10 years ago)

On May 18, 2015, at 2:20 PM, Jason Orendorff wrote:

On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō <noz.ka at akenotsuki.com> wrote:

I submit a proposal for adding the look-behind assertions to RegExp.

Fantastic! I'm not a TC39 member but I've been hoping this would happen for some time.

I have often thought that the major obstacle was that no one has done this work. I hope I was not wrong!

You're exactly right, we've been waiting for somebody (anybody) to create a technically complete proposal

It is a Syntax Error if Disjunction contains Quantifier :: QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

Backreferences must be ruled out, too.

Let n be the exact number of the sequence of code points which Disjunction matches [NOTE].

I don't think a NOTE is strong enough for spec purposes, so you can improve the proposal by adding a "Static Semantics:" section that formally specifies this.

The spec uses attribute grammars for dozens of things like this, where some piece of information has to be determined from the parse tree. See "Static Semantics: ElisionWidth" for a simple example.

The proposal also needs to follow the conventions and terminology of the ES6 RegExp spec. In particular, the RegExp pattern matching algorithms use the term "character" with a specific meaning rather than using "code unit" or "code point" (see people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics ). It does this so that the same algorithms can be use to describe the matching semantic of both "normal" and "unicode" patterns. Any extensions also will need t work with both kinds of patterns.

On May 18, 2015, at 2:20 PM, Jason Orendorff wrote:

> On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō <noz.ka at akenotsuki.com> wrote:
>> I submit a proposal for adding the look-behind assertions to RegExp.
> 
> Fantastic! I'm not a TC39 member but I've been hoping this would
> happen for some time.
> 
> I have often thought that the major obstacle was that no one has done
> this work. I hope I was not wrong!

You're exactly right, we've been waiting for somebody (anybody) to create a technically complete proposal 

> 
>> It is a Syntax Error if Disjunction contains Quantifier ::
>> QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.
> 
> Backreferences must be ruled out, too.
> 
>>  1. Let n be the exact number of the sequence of code points which
>>     Disjunction matches [NOTE].
> 
> I don't think a NOTE is strong enough for spec purposes, so you can
> improve the proposal by adding a "Static Semantics:" section that
> formally specifies this.
> 
> The spec uses attribute grammars for dozens of things like this, where
> some piece of information has to be determined from the parse tree.
> See "Static Semantics: ElisionWidth" for a simple example.
> 

The proposal also needs to follow the conventions and terminology of the ES6 RegExp spec.  In particular, the RegExp pattern matching algorithms use the term "character" with a specific meaning rather than using "code unit" or "code point"  (see http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics ). It does this so that the same algorithms can be use to describe the matching semantic of both "normal" and "unicode" patterns. Any extensions also will need t work with both kinds of patterns.

Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150518/ff8e4edc/attachment.html>

# Claude Pache (10 years ago)

Le 18 mai 2015 à 23:53, Jason Orendorff <jason.orendorff at gmail.com> a écrit :

On Mon, May 18, 2015 at 1:51 PM, Claude Pache <claude.pache at gmail.com> wrote:

Another more powerful approach (which I prefer), [...]

As long as we prohibit matching groups within lookbehind assertions, I think the approach you prefer is a strict superset of what's proposed. So there's no opportunity cost to taking the proposal as-is: the .NET semantics could be compatibly added later.

I imagine TC39 gives some preference to proposals that exist in a concrete form. So if you prefer the .NET approach enough to do anything about it, now's a great time to write some spec text.

That said, I think there are easier missing features to be added, that are supported by most other RegExp flavours. I'm thinking of:

the s, or .dotall flag: the dot . matches every character, including newlines;

true support of Unicode, namely: escape sequences such as \Lu for uppercase letter, or \X for grapheme cluster.

Why not submit proposals for these?

You are right, but procrastination and other activities... But I'll try to write some more concrete proposals for those ideas.

> Le 18 mai 2015 à 23:53, Jason Orendorff <jason.orendorff at gmail.com> a écrit :
> 
> On Mon, May 18, 2015 at 1:51 PM, Claude Pache <claude.pache at gmail.com> wrote:
>> Another more powerful approach (which I prefer), [...]
> 
> As long as we prohibit matching groups within lookbehind assertions, I
> think the approach you prefer is a strict superset of what's proposed.
> So there's no opportunity cost to taking the proposal as-is: the .NET
> semantics could be compatibly added later.
> 
> I imagine TC39 gives some preference to proposals that exist in a
> concrete form. So if you prefer the .NET approach enough to do
> anything about it, now's a great time to write some spec text.
> 
>> That said, I think there are easier missing features to be added, that are
>> supported by most other RegExp flavours. I'm thinking of:
>> 
>> * the `s`, or `.dotall` flag: the dot `.` matches every character, including
>> newlines;
>> * true support of Unicode, namely: escape sequences such as `\Lu` for
>> uppercase letter, or `\X` for grapheme cluster.
> 
> Why not submit proposals for these?
> 

You are right, but procrastination and other activities... 
But I'll try to write some more concrete proposals for those ideas.

—Claude

# Nozomu Katō (10 years ago)

Allen Wirfs-Brock wrote on Mon, 18 May 2015, at 15:08:13 -0700:

On May 18, 2015, at 2:20 PM, Jason Orendorff wrote:

On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō wrote:

It is a Syntax Error if Disjunction contains Quantifier :: QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

Backreferences must be ruled out, too.

Yes, I forgot to mention backreferences in my proposal. I have modified my proposal to reflect this.

Let n be the exact number of the sequence of code points which Disjunction matches [NOTE].

I don't think a NOTE is strong enough for spec purposes, so you can improve the proposal by adding a "Static Semantics:" section that formally specifies this.

The spec uses attribute grammars for dozens of things like this, where some piece of information has to be determined from the parse tree. See "Static Semantics: ElisionWidth" for a simple example.

Thank you. This sample is helpful. I was unsure how to describe this information.

The proposal also needs to follow the conventions and terminology of the ES6 RegExp spec. In particular, the RegExp pattern matching algorithms use the term "character" with a specific meaning rather than using "code unit" or "code point" (see people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics ). It does this so that the same algorithms can be use to describe the matching semantic of both "normal" and "unicode" patterns. Any extensions also will need t work with both kinds of patterns.

Based on these comments, I re-submit a modified proposal:

Abstract

Basically, the evaluation for the look-behind assertion is performed by "rewind the target sequence and do the same thing as the look-ahead assertion":

When compiling a regular expression pattern, count the total number of PatternCharacter and . and \ AtomEscape and CharacterClass in Disjunction per look-behind assertion. Each Atom may be followed by {n}, but may not be followed by the other quantifiers.
When a look-behind assertion evaluated, rewind the target sequence by that number and evaluate as if the look-ahead assertion from that point.

-- Static Semantics: Early Errors

Assertion :: ( ? < = Disjunction ) Assertion :: ( ? < ! Disjunction )

It is a Syntax Error if Disjunction contains Quantifier :: QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

It is a Syntax Error if Disjunction contains AtomEscape :: DecimalEscape.

When Disjunction is separated by the | regular expression, if the number of the sequence of characters which the left Alternative matches and the number of the sequence of characters which the right Disjunction matches are different, then it is a Syntax Error.

Static Semantics: DisjunctionSize

return the total sum of the following Atoms in the Disjunction: a. Atom :: PatternCharacter b. Atom :: . c. Atom :: \ AtomEscape except AtomEscape :: DecimalEscape d. Atom :: CharacterClass

If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }", then that Atom is counted as the number of DecimalDigits instead of one. If the Disjunction contains an "Atom :: ( Disjunction )" or "Atom :: ( ? : Disjunction )" that is followed by a "QuantifierPrefix :: { DecimalDigits }", then the total number of the sequence of Atoms in that inner-Disjunction is multiplied by the number of DecimalDigits.

Assertion

The production Assertion :: ( ? < = Disjunction ) evaluates as follows:

Evaluate Disjunction to obtain a Matcher m.
Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
Let n be the result of DisjunctionSize for Disjunction.
Let xe be x's endIndex.
If xe < n, return failure.
Let xcap be x's captures List.
Let y be the State (xe-n, xcap).
Let d be a Continuation that always returns its State argument as a successful MatchResult.
Call m(y, d) and let r be its result.
If r is failure, return failure.
Let z be r's State.
Let zcap be z's captures List.
Let a be the State (xe, zcap).
Call c(a) and return its result.

The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:

Evaluate Disjunction to obtain a Matcher m.
Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
Let n be the result of DisjunctionSize for Disjunction.
Let xe be x's endIndex.
If xe < n, then call c(x) and return its result.
Let xcap be x's captures List.
Let y be the State (xe-n, xcap).
Let d be a Continuation that always returns its State argument as a successful MatchResult.
Call m(y, d) and let r be its result.
If r is not failure, return failure.
Call c(x) and return its result.

Nozomu

Allen Wirfs-Brock wrote on Mon, 18 May 2015, at 15:08:13 -0700:
> 
> On May 18, 2015, at 2:20 PM, Jason Orendorff wrote:
> 
>> On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō wrote:
>>> It is a Syntax Error if Disjunction contains Quantifier ::
>>> QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.
>>
>> Backreferences must be ruled out, too.

Yes, I forgot to mention backreferences in my proposal. I have modified
my proposal to reflect this.

>>>  1. Let n be the exact number of the sequence of code points which
>>>     Disjunction matches [NOTE].
>>
>> I don't think a NOTE is strong enough for spec purposes, so you can
>> improve the proposal by adding a "Static Semantics:" section that
>> formally specifies this.
>>
>> The spec uses attribute grammars for dozens of things like this, where
>> some piece of information has to be determined from the parse tree.
>> See "Static Semantics: ElisionWidth" for a simple example.

Thank you. This sample is helpful. I was unsure how to describe this
information.

> The proposal also needs to follow the conventions and terminology of
> the ES6 RegExp spec.  In particular, the RegExp pattern matching
> algorithms use the term "character" with a specific meaning rather
> than using "code unit" or "code point"
> (see http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics ).
> It does this so that the same algorithms can be use to describe the
> matching semantic of both "normal" and "unicode" patterns. Any
> extensions also will need t work with both kinds of patterns.

Based on these comments, I re-submit a modified proposal:

----
Abstract

Basically, the evaluation for the look-behind assertion is performed by
"rewind the target sequence and do the same thing as the look-ahead
assertion":

  1. When compiling a regular expression pattern, count the total number
     of PatternCharacter and . and \ AtomEscape and CharacterClass in
     Disjunction per look-behind assertion. Each Atom may be followed by
     {n}, but may not be followed by the other quantifiers.
  2. When a look-behind assertion evaluated, rewind the target sequence
     by that number and evaluate as if the look-ahead assertion from
     that point.

--
Static Semantics: Early Errors

  Assertion :: ( ? < = Disjunction )
  Assertion :: ( ? < ! Disjunction )

  It is a Syntax Error if Disjunction contains Quantifier ::
  QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

  It is a Syntax Error if Disjunction contains AtomEscape ::
  DecimalEscape.

  When Disjunction is separated by the | regular expression, if the
  number of the sequence of characters which the left Alternative
  matches and the number of the sequence of characters which the right
  Disjunction matches are different, then it is a Syntax Error.


Static Semantics: DisjunctionSize
1. return the total sum of the following Atoms in the Disjunction:
     a. Atom :: PatternCharacter
     b. Atom :: .
     c. Atom :: \ AtomEscape except AtomEscape :: DecimalEscape
     d. Atom :: CharacterClass

   If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }",
   then that Atom is counted as the number of DecimalDigits instead of
   one. If the Disjunction contains an "Atom :: ( Disjunction )" or
   "Atom :: ( ? : Disjunction )" that is followed by a "QuantifierPrefix
   :: { DecimalDigits }", then the total number of the sequence of Atoms
   in that inner-Disjunction is multiplied by the number of
   DecimalDigits.


Assertion

The production Assertion :: ( ? < = Disjunction ) evaluates as follows:

1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
   x and a Continuation c, and performs the following steps:
  1. Let n be the result of DisjunctionSize for Disjunction.
  2. Let xe be x's endIndex.
  3. If xe < n, return failure.
  4. Let xcap be x's captures List.
  5. Let y be the State (xe-n, xcap).
  6. Let d be a Continuation that always returns its State argument as a
     successful MatchResult.
  7. Call m(y, d) and let r be its result.
  8. If r is failure, return failure.
  9. Let z be r's State.
  10. Let zcap be z's captures List.
  11. Let a be the State (xe, zcap).
  12. Call c(a) and return its result.

The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:

1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
   x and a Continuation c, and performs the following steps:
  1. Let n be the result of DisjunctionSize for Disjunction.
  2. Let xe be x's endIndex.
  3. If xe < n, then call c(x) and return its result.
  4. Let xcap be x's captures List.
  5. Let y be the State (xe-n, xcap).
  6. Let d be a Continuation that always returns its State argument as a
     successful MatchResult.
  7. Call m(y, d) and let r be its result.
  8. If r is not failure, return failure.
  9. Call c(x) and return its result.

----

Nozomu

# Nozomu Katō (10 years ago)

the s, or .dotall flag: the dot . matches every character, including newlines;

true support of Unicode, namely: escape sequences such as \Lu for uppercase letter, or \X for grapheme cluster.

Why not submit proposals for these?

You are right, but procrastination and other activities... But I'll try to write some more concrete proposals for those ideas.

Unlike Perl, there is no special treatment for ] that appears just after [ or [^ in RegExp of ECMAScript. Thanks to this, [^] can be used for what matches every character including newlines.

>>> * the `s`, or `.dotall` flag: the dot `.` matches every character,
>>> including newlines;
>>> * true support of Unicode, namely: escape sequences such as `\Lu`
>>> for uppercase letter, or `\X` for grapheme cluster.
>> 
>> Why not submit proposals for these?
>> 
> 
> You are right, but procrastination and other activities... 
> But I'll try to write some more concrete proposals for those ideas.

Unlike Perl, there is no special treatment for ] that appears just after
[ or [^ in RegExp of ECMAScript. Thanks to this, [^] can be used for
what matches every character including newlines.

# Nozomu Katō (10 years ago)

Sorry, I accidentally posted an unfinished e-mail. I was about to add the link of a html version of my proposal to my previous post: www.akenotsuki.com/misc/srell/lookbehind_proposal.html

Nozomu

Sorry, I accidentally posted an unfinished e-mail. I was about to add
the link of a html version of my proposal to my previous post:
http://www.akenotsuki.com/misc/srell/lookbehind_proposal.html

Nozomu

# Nozomu Katō (10 years ago)

I revised my proposal for lookbehinds: www.akenotsuki.com/misc/srell/lookbehind_proposal.html

I rewrote the paragraphs about how to count Atoms in Disjunction, to clarify that the recursive call is used for nested Disjunctions.

, Nozomu

I revised my proposal for lookbehinds:
http://www.akenotsuki.com/misc/srell/lookbehind_proposal.html

I rewrote the paragraphs about how to count Atoms in Disjunction, to
clarify that the recursive call is used for nested Disjunctions.

Regards,
  Nozomu

# Sebastian Zartner (10 years ago)

Thank you for picking this up again! I asked for adding look-behinds back in 2013[1], though I didn't find the time to come up with an algorithm and read into writing this down for the specification. So let's hope this discussion will result in something this time.

Sebastian

[1] esdiscuss/2013-September/033837

Thank you for picking this up again!
I asked for adding look-behinds back in 2013[1], though I didn't find the
time to come up with an algorithm and read into writing this down for the
specification.
So let's hope this discussion will result in something this time.

Sebastian

[1] https://mail.mozilla.org/pipermail/es-discuss/2013-September/033837.html

On 21 May 2015 at 14:01, Nozomu Katō <noz.ka at akenotsuki.com> wrote:

> I revised my proposal for lookbehinds:
> http://www.akenotsuki.com/misc/srell/lookbehind_proposal.html
>
> I rewrote the paragraphs about how to count Atoms in Disjunction, to
> clarify that the recursive call is used for nested Disjunctions.
>
> Regards,
>   Nozomu
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150603/e42ae5e4/attachment.html>

# Nozomu Katō (10 years ago)

Sebastian Zartner wrote on Wed, 3 Jun 2015, at 13:44:06 +0200:

Thank you for picking this up again! I asked for adding look-behinds back in 2013[1], though I didn't find the time to come up with an algorithm and read into writing this down for the specification. So let's hope this discussion will result in something this time.

Yes, I also really hope so. Although I read old discussions about look-behinds before posting my contribution, I am still uncertain whether additional concrete actions are required for look-behind support to go forward. Anyhow, I am looking forward to seeing new Meeting Notes.

Nozomu

Sebastian Zartner wrote on Wed, 3 Jun 2015, at 13:44:06 +0200:
> Thank you for picking this up again!
> I asked for adding look-behinds back in 2013[1], though I didn't find the
> time to come up with an algorithm and read into writing this down for the
> specification.
> So let's hope this discussion will result in something this time.

Yes, I also really hope so. Although I read old discussions about
look-behinds before posting my contribution, I am still uncertain
whether additional concrete actions are required for look-behind support
to go forward. Anyhow, I am looking forward to seeing new Meeting Notes.


Nozomu

# Allen Wirfs-Brock (10 years ago)

In order to keep this moving forward, you need to recruit a TC39 member who will work with you and act as the champion for the proposal within TC39.

In order to keep this moving forward, you need to recruit a TC39 member who will work with you and act as the champion for the proposal within TC39.

Allen

On Jun 4, 2015, at 6:16 AM, Nozomu Katō wrote:

> Sebastian Zartner wrote on Wed, 3 Jun 2015, at 13:44:06 +0200:
>> Thank you for picking this up again!
>> I asked for adding look-behinds back in 2013[1], though I didn't find the
>> time to come up with an algorithm and read into writing this down for the
>> specification.
>> So let's hope this discussion will result in something this time.
> 
> Yes, I also really hope so. Although I read old discussions about
> look-behinds before posting my contribution, I am still uncertain
> whether additional concrete actions are required for look-behind support
> to go forward. Anyhow, I am looking forward to seeing new Meeting Notes.
> 
> 
> Nozomu
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Nozomu Katō (10 years ago)

Thank you for the clarification.

Unfortunately, I do not know at all who is a TC39 member. I wonder if there is a member who will work for this proposal.

Even in the case that no one is found this time, I will leave the html version of the proposal as it is at that url. If anyone later come to have an interest in that proposal please take it and modify (if needed) freely.

Nozomu

Thank you for the clarification.

Unfortunately, I do not know at all who is a TC39 member. I wonder if
there is a member who will work for this proposal.

Even in the case that no one is found this time, I will leave the html
version of the proposal as it is at that url. If anyone later come to
have an interest in that proposal please take it and modify (if needed)
freely.

Nozomu


Allen Wirfs-Brock wrote on Thu, 4 Jun 2015, at 13:23:08 -0700:
> In order to keep this moving forward, you need to recruit a TC39
> member who will work with you and act as the champion for the
> proposal within TC39.
> 
> Allen

# Sebastian Zartner (10 years ago)

So the question is, how to recruit a TC39 member? Through this list? Is it possible to become a member as a private person?

Sebastian

So the question is, how to recruit a TC39 member? Through this list? Is it
possible to become a member as a private person?

Sebastian

On 5 June 2015 at 14:58, Nozomu Katō <noz.ka at akenotsuki.com> wrote:

> Thank you for the clarification.
>
> Unfortunately, I do not know at all who is a TC39 member. I wonder if
> there is a member who will work for this proposal.
>
> Even in the case that no one is found this time, I will leave the html
> version of the proposal as it is at that url. If anyone later come to
> have an interest in that proposal please take it and modify (if needed)
> freely.
>
> Nozomu
>
>
> Allen Wirfs-Brock wrote on Thu, 4 Jun 2015, at 13:23:08 -0700:
> > In order to keep this moving forward, you need to recruit a TC39
> > member who will work with you and act as the champion for the
> > proposal within TC39.
> >
> > Allen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150608/f3aca116/attachment.html>

# Brendan Eich (10 years ago)

I will do it if no one else steps up by 10pm tonight PDT.

Thank you to Nozomu Katō for writing it up!

I will do it if no one else steps up by 10pm tonight PDT.

Thank you to Nozomu Katō for writing it up!

/be

Sebastian Zartner wrote:
> So the question is, how to recruit a TC39 member? Through this list? 
> Is it possible to become a member as a private person?

# Sebastian Zartner (10 years ago)

Thank you Brendan for picking this up! Nozomu, Brendan, please let me know if I can be of any help.

Sebastian

Thank you Brendan for picking this up!
Nozomu, Brendan, please let me know if I can be of any help.

Sebastian

On 8 June 2015 at 18:47, Brendan Eich <brendan at mozilla.org> wrote:
> I will do it if no one else steps up by 10pm tonight PDT.
>
> Thank you to Nozomu Katō for writing it up!
>
> /be
>
>
> Sebastian Zartner wrote:
>>
>> So the question is, how to recruit a TC39 member? Through this list? Is it
>> possible to become a member as a private person?

# Matthew Robb (10 years ago)

This thread is a cool example of why this list is still do important. +1 to all of it!

This thread is a cool example of why this list is still do important. +1 to
all of it!
On Jun 9, 2015 4:20 AM, "Sebastian Zartner" <sebastianzartner at gmail.com>
wrote:

> Thank you Brendan for picking this up!
> Nozomu, Brendan, please let me know if I can be of any help.
>
> Sebastian
>
> On 8 June 2015 at 18:47, Brendan Eich <brendan at mozilla.org> wrote:
> > I will do it if no one else steps up by 10pm tonight PDT.
> >
> > Thank you to Nozomu Katō for writing it up!
> >
> > /be
> >
> >
> > Sebastian Zartner wrote:
> >>
> >> So the question is, how to recruit a TC39 member? Through this list? Is
> it
> >> possible to become a member as a private person?
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150609/e24f7323/attachment.html>

# Nozomu Katō (10 years ago)

On 8 June 2015 at 18:47, Brendan Eich wrote:

I will do it if no one else steps up by 10pm tonight PDT.

I thank you very much indeed for taking this proposal on. I am very relieved!

Sebastian Zartner wrote on Tue, 9 Jun 2015, at 10:20:00 +0200:

Nozomu, Brendan, please let me know if I can be of any help.

Thank you for your kindness. Any action that could make the proposal move forward is a great help to me.

This is the first time for me to see how a proposal becomes part of the ECMAScript standard. If/whenever I do not a required action please urge me to do it.

, Nozomu

> On 8 June 2015 at 18:47, Brendan Eich wrote:
>> I will do it if no one else steps up by 10pm tonight PDT.

I thank you very much indeed for taking this proposal on. I am very
relieved!

Sebastian Zartner wrote on Tue, 9 Jun 2015, at 10:20:00 +0200:
> Nozomu, Brendan, please let me know if I can be of any help.

Thank you for your kindness. Any action that could make the proposal
move forward is a great help to me.

This is the first time for me to see how a proposal becomes part of the
ECMAScript standard. If/whenever I do not a required action please urge
me to do it.

Regards,
  Nozomu

# Allen Wirfs-Brock (10 years ago)

On Jun 9, 2015, at 6:18 AM, Nozomu Katō wrote:

On 8 June 2015 at 18:47, Brendan Eich wrote:

I will do it if no one else steps up by 10pm tonight PDT.

I thank you very much indeed for taking this proposal on. I am very relieved!

Nozomu,

Also, please register as an Ecma TC39 non-member contributor by following the steps described at tc39/ecma262#contributing-new-proposals

Thanks,

On Jun 9, 2015, at 6:18 AM, Nozomu Katō wrote:

>> On 8 June 2015 at 18:47, Brendan Eich wrote:
>>> I will do it if no one else steps up by 10pm tonight PDT.
> 
> I thank you very much indeed for taking this proposal on. I am very
> relieved!

Nozomu,

Also, please register as an Ecma TC39 non-member contributor by following the steps described at https://github.com/tc39/ecma262#contributing-new-proposals 

Thanks,
Allen









> 
> Sebastian Zartner wrote on Tue, 9 Jun 2015, at 10:20:00 +0200:
>> Nozomu, Brendan, please let me know if I can be of any help.
> 
> Thank you for your kindness. Any action that could make the proposal
> move forward is a great help to me.
> 
> This is the first time for me to see how a proposal becomes part of the
> ECMAScript standard. If/whenever I do not a required action please urge
> me to do it.
> 
> Regards,
>  Nozomu
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150609/ab438a2a/attachment.html>

# Nozomu Katō (10 years ago)

Allen Wirfs-Brock wrote on Tue, 9 Jun 2015, at 07:27:36 -0700:

Nozomu,

Also, please register as an Ecma TC39 non-member contributor by following the steps described at tc39/ecma262#contributing-new-proposals

I just did. Thank you for telling me that.

Nozomu

Allen Wirfs-Brock wrote on Tue, 9 Jun 2015, at 07:27:36 -0700:
> Nozomu,
> 
> Also, please register as an Ecma TC39 non-member contributor by
> following the steps described at
> https://github.com/tc39/ecma262#contributing-new-proposals

I just did. Thank you for telling me that.

Nozomu