Erik Corry (2015-10-07T08:06:17.000Z)
d at domenic.me (2015-10-12T20:36:30.350Z)
Your proposal for look-behind relies on being able to count the match length of the look-behind in order to step back that far. This presupposes that atoms like . and character classes have a fixed length. However, with the /u flag, the . and some character classes can be either 1 or two code units. This means you don't know how far to step back. This needs to be fixed in a way that is not incompatible with the "correct" .NET way of doing things. Eg matching `/a.(?<!x..)/u` against `"xa😹"` (x, a, cat-face-with-tears-of-joy, which is a surrogate pair). The back reference has an apparent width of 3, so we step back 3 code units, but that hits the 'a', not the 'x' and so the back reference fails to spot the 'x'.
d at domenic.me (2015-10-12T20:35:57.734Z)
Your proposal for look-behind relies on being able to count the match length of the look-behind in order to step back that far. This presupposes that atoms like . and character classes have a fixed length. However, with the /u flag, the . and some character classes can be either 1 or two code units. This means you don't know how far to step back. This needs to be fixed in a way that is not incompatible with the "correct" .NET way of doing things. Eg matching `/a.(?<!x..)/` against `"xa😹"` (x, a, cat-face-with-tears-of-joy, which is a surrogate pair). The back reference has an apparent width of 3, so we step back 3 code units, but that hits the 'a', not the 'x' and so the back reference fails to spot the 'x'.