Regex

# Mark Davis ☕ (14 years ago)

Regex has not been part of scope of the Globalization API work. I wanted to find out whether any improvements from an internationalization point of view are being planned, separately.

Some of the problems include:

  • Regex's fail on supplementary characters (above U+FFFF). Most of these are rather low frequency, but there are a large number of Chinese characters, some used in people's names or place names.
  • The Unicode support is otherwise extremely limited, especially for properties. See 98.245.80.27/tcpc/OSCON2011/gbu.html for a comparison to other programming languages. The downside of this is that it promotes hard-coded lists because people "think" they know what characters occur in words, etc., but get it wrong.