Specify exactly how RegExp.source should be escaped
On Mar 19, 2013, at 8:05 AM, Simon Pieters wrote:
Hi
The spec says about RegExp.source:
[[ The characters / or backslash \ occurring in the pattern shall be escaped in S as necessary to ensure that the String value formed by concatenating the Strings "/", S, "/", and F can be parsed (in an appropriate lexical context) as a RegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is "/", then S could be "/" or "\u002F", among other possibilities, but not "/", because /// followed by F would be parsed as a SingleLineComment rather than a RegularExpressionLiteral. If P is the empty String, this specification can be met by letting S be "(?:)".
...
The source property of the newly constructed object is set to S. ]] es5.github.com/#x15.10.4.1
Why is the requirement so vague? I would like the spec to state exactly how source is to be escaped, maybe with an algorithm like:
Prior to ES5, the escaping wasn't even specified and the spec. simply said that the pattern was implementation defined. We could probably specify it, but somebody would need to develop a proposal that completely specifies the required escaping.
- If S is the empty string, let S be "(?:)".
- Replace all instances of "/" in S with "/".
- Replace all instances of literal new lines in S with ???
- ???
Currently, I have no idea what to check for when writing test cases for the .source property when testing e.g. empty string or a slash as P.
The key requirement in the specification is that a RegExp created from the .source property string behaves identically to the RegExp created from the original pattern. I would test this by developing a set of interesting patterns that require escaping and then testings that a RegExp created from their .source properties produces identical result to the original patterns.
The spec says about RegExp.source:
[[ The characters / or backslash \ occurring in the pattern shall be escaped in S as necessary to ensure that the String value formed by concatenating the Strings "/", S, "/", and F can be parsed (in an appropriate lexical context) as a RegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is "/", then S could be "/" or "\u002F", among other possibilities, but not "/", because /// followed by F would be parsed as a SingleLineComment rather than a RegularExpressionLiteral. If P is the empty String, this specification can be met by letting S be "(?:)".
...
The source property of the newly constructed object is set to S. ]] es5.github.com/#x15.10.4.1
Why is the requirement so vague? I would like the spec to state exactly how source is to be escaped, maybe with an algorithm like:
Currently, I have no idea what to check for when writing test cases for the .source property when testing e.g. empty string or a slash as P.