Mathias Bynens (2013-10-27T09:44:09.000Z)
domenic at domenicdenicola.com (2013-10-28T14:54:03.124Z)
On 26 Oct 2013, at 14:39, Bjoern Hoehrmann <derhoermi at gmx.net> wrote: > If you have a regular expression over an alphabet like "Unicode scalar > values" it is easy to turn it into an equivalent regular expression over > an alphabet like "UTF-16 code units". FWIW, [Regenerate](http://mths.be/regenerate) is a JavaScript library that can be used for this. A few examples from http://mathiasbynens.be/notes/javascript-unicode#regex: > Here’s a regular expression is created that matches any Unicode scalar value: > > >> regenerate() > .addRange(0x0, 0x10FFFF) // all Unicode code points > .removeRange(0xD800, 0xDBFF) // minus high surrogates > .removeRange(0xDC00, 0xDFFF) // minus low surrogates > .toRegExp() > /[\0-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]/ Similarly, to polyfill `.` in a Unicode-enabled ES6 regex: