ES Discuss - Message History

Claude Pache (2015-10-07T10:08:47.000Z)

Go to Source

> Le 7 oct. 2015 à 11:16, Erik Corry <erik.corry at gmail.com> a écrit :
> 
> The proposal needs to be clarified to explain that you are stepping back a number of code points, not units.  This implies that you are inspecting the input string as you step backwards.  Also it should be explained what to do if there are unpaired surrogates in the input string and inside the lookbehind expression source.

Looking at the proposal [1], there is a Note section (recently added) clarifying that point if needed. 

The way of counting, and the meaning of the words "character", "code point" and "code unit", are the same as in ES2015; there is really nothing new here. See [2] for details. If anything needs to be clarified e.g. regarding unpaired surrogates, it is not specific to lookbehind, but applies to the whole regexp semantics.

—Claude

[1] http://www.akenotsuki.com/misc/srell/lookbehind_proposal.html <http://www.akenotsuki.com/misc/srell/lookbehind_proposal.html>
[2] http://www.ecma-international.org/ecma-262/6.0/#sec-pattern-semantics <http://www.ecma-international.org/ecma-262/6.0/#sec-pattern-semantics>


> 
> I think the proposal would benefit from a pointer to an implementation or two.  Of course the implementations should also fully support /u.
> 
> On Wed, Oct 7, 2015 at 11:10 AM, Claude Pache <claude.pache at gmail.com <mailto:claude.pache at gmail.com>> wrote:
> This should not be a problem: With the /u flag, you work with code points, not code units. In particular, the `.` matches always a sequence (of code points with /u, or code units otherwise) of length 1.
> 
> —Claude
> 
> 
> 
>> Le 7 oct. 2015 à 10:08, Erik Corry <erik.corry at gmail.com <mailto:erik.corry at gmail.com>> a écrit :
>> 
>> Oops forgot the /u on the regexp in the example.
>> 
>> On Wed, Oct 7, 2015 at 10:06 AM, Erik Corry <erik.corry at gmail.com <mailto:erik.corry at gmail.com>> wrote:
>> Your proposal for look-behind relies on being able to count the match length of the look-behind in order to step back that far.  This presupposes that atoms like . and character classes have a fixed length.
>> 
>> However, with the /u flag, the . and some character classes can be either 1 or two code units.  This means you don't know how far to step back.  This needs to be fixed in a way that is not incompatible with the "correct" .NET way of doing things.
>> 
>> Eg matching /a.(?<!x..)/ against "xa😹"  (x, a, cat-face-with-tears-of-joy, which is a surrogate pair).  The back reference has an apparent width of 3, so we step back 3 code units, but that hits the 'a', not the 'x' and so the back reference fails to spot the 'x'. 
>> 
>> 
>> On Sun, Oct 4, 2015 at 1:52 PM, Nozomu Katō <noz.ka at akenotsuki.com <mailto:noz.ka at akenotsuki.com>> wrote:
>> Apparently my proposal for adding the look-behind assertions to RegExp
>> has been in trouble. I would like to ask anyone for help.
>> 
>> The following story is what I know about the proposal after my previous
>> post:
>> 
>> I created a pull request for the proposal in July and sent an email to
>> Brendan Eich asking if I can put his name as a champion:
>> https://github.com/tc39/ecma262/pull/48 <https://github.com/tc39/ecma262/pull/48>
>> 
>> I have not received a reply to my email, but I received a notification
>> email in September that replying to the pull request, the proposal was
>> moved to stage 0. Today, however, I just noticed that the proposal had
>> been dropped from stage 0, stating "RegExp lookbehind has no champion".
>> https://github.com/tc39/ecma262/commits/master/stage0.md <https://github.com/tc39/ecma262/commits/master/stage0.md> (Oct 4, 2015)
>> 
>> I am uncertain about what happened. Does this mean that Brendan Eich is
>> no longer a champion or did not take a champion on from the beginning or
>> ...?
>> 
>> 
>> Regards,
>>   Nozomu
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>> https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
>> 
>> 
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
>> https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20151007/dbbe997b/attachment.html>

d at domenic.me (2015-10-12T20:37:23.477Z)

> Le 7 oct. 2015 à 11:16, Erik Corry <erik.corry at gmail.com> a écrit :
> 
> The proposal needs to be clarified to explain that you are stepping back a number of code points, not units.  This implies that you are inspecting the input string as you step backwards.  Also it should be explained what to do if there are unpaired surrogates in the input string and inside the lookbehind expression source.

Looking at [the proposal][1], there is a Note section (recently added) clarifying that point if needed. 

The way of counting, and the meaning of the words "character", "code point" and "code unit", are the same as in ES2015; there is really nothing new here. See [2][] for details. If anything needs to be clarified e.g. regarding unpaired surrogates, it is not specific to lookbehind, but applies to the whole regexp semantics.

[1]: http://www.akenotsuki.com/misc/srell/lookbehind_proposal.html
[2]: http://www.ecma-international.org/ecma-262/6.0/#sec-pattern-semantics

Edit