Are regex character class set operations (subtraction, intersection) worth the parsing complexity?

# Steve (18 years ago)

C-style syntax is hard to parse in general, but regex literals can be particularly tricky. However, many kinds of tools (syntax highlighters, minifiers, etc.) need to parse them accurately, but unfortunately most such ECMAScript-based tools don't (I could start naming high-profile tools with edge-case regex-syntax parsing bugs, but it should be obvious that it is not entirely trivial). ES4 regex proposals make this even harder in several ways, but worst of all (from a regex syntax parser complexity perspective) is the java.util.regex-inspired infinitely-nesting character class subtraction and intersection syntax.

Now, I understand that the feature is powerful (and I assume also quite useful in the case of regexes which make heavy use of ES4's Unicode property tokens), but it effectively makes it impossible to parse ES4 regex syntax using ES4 regexes (which lack PCRE/.NET/Perl's recursion support). And considering that java.util.regex is the only (major) regex library to include full character class set operations (.NET only does class subtraction), I don't think people would miss the feature that greatly.

Of course, mixing recursion support into existing regex syntax parsers is probably not really all that difficult in most cases, but nevertheless, I'm interested in what others think about the character class subtraction and intersection features. Personally, I think only allowing one level of character class nesting might be a reasonable compromise, especially since people could emulate more levels of nesting using lookahead anyway.

# Steve (18 years ago)

BTW, since I mentioned .NET 2.0's character class subtraction syntax, I'll note that it actually comes from the W3C's XML Schema regex flavor, and is also used in the XPath flavor.

BTW2, I don't actually think it would make much sense to arbitrarily remove or limit a feature for the sake of arguably reduced parsing complexity. I was more just interested in if others had similar concerns.


From: "Steve" <steves_list at hotmail.com>

Sent: Saturday, December 22, 2007 10:40 PM