domenic at domenicdenicola.com (2013-10-28T14:53:24.365Z)
On Oct 26, 2013, at 6:58 , Jason Orendorff <jason.orendorff at gmail.com> wrote:
> I'd like to know what you have in mind regarding quantifiers though.
When I write /💩{2}/, I mean /💩💩/, but the current code unit based RegExp will interpret it as /💩\uDCA9/, which can't match any well-formed UTF-16 string.
On Oct 26, 2013, at 6:58 , Jason Orendorff <jason.orendorff at gmail.com> wrote: > On Fri, Oct 25, 2013 at 11:42 PM, Norbert Lindenberg > <ecmascript at lindenbergsoftware.com> wrote: >> >> On Oct 25, 2013, at 18:35 , Jason Orendorff <jason.orendorff at gmail.com> wrote: >> >>> UTF-16 is designed so that you can search based on code units >>> alone, without computing boundaries. RegExp searches fall in this >>> category. >> >> Not if the RegExp is case insensitive, or uses a character class, or ".", or a quantifier - these all require looking at code points rather than UTF-16 code units in order to support the full Unicode character set. > I'd like to know what you have in mind regarding quantifiers though. When I write /💩{2}/, I mean /💩💩/, but the current code unit based RegExp will interpret it as /💩\uDCA9/, which can't match any well-formed UTF-16 string. Norbert