ES Discuss - Message History

Norbert Lindenberg (2013-10-27T03:12:51.000Z)

Go to Source

On Oct 26, 2013, at 6:58 , Jason Orendorff <jason.orendorff at gmail.com> wrote:

> On Fri, Oct 25, 2013 at 11:42 PM, Norbert Lindenberg
> <ecmascript at lindenbergsoftware.com> wrote:
>> 
>> On Oct 25, 2013, at 18:35 , Jason Orendorff <jason.orendorff at gmail.com> wrote:
>> 
>>> UTF-16 is designed so that you can search based on code units
>>> alone, without computing boundaries. RegExp searches fall in this
>>> category.
>> 
>> Not if the RegExp is case insensitive, or uses a character class, or ".", or a quantifier - these all require looking at code points rather than UTF-16 code units in order to support the full Unicode character set.

> I'd like to know what you have in mind regarding quantifiers though.

When I write /💩{2}/, I mean /💩💩/, but the current code unit based RegExp will interpret it as /💩\uDCA9/, which can't match any well-formed UTF-16 string.

Norbert

domenic at domenicdenicola.com (2013-10-28T14:53:24.365Z)

On Oct 26, 2013, at 6:58 , Jason Orendorff <jason.orendorff at gmail.com> wrote:

> I'd like to know what you have in mind regarding quantifiers though.

When I write /💩{2}/, I mean /💩💩/, but the current code unit based RegExp will interpret it as /💩\uDCA9/, which can't match any well-formed UTF-16 string.

Edit