invalid escape sequences

# Mike Samuel (14 years ago)

During the last meeting, the semantics of "\z" came up. Specifically, what does \ followed by a character not in the set with a specified escape expand to?

From 7.8.4 StringLiteral

"
EscapeSequence :: CharacterEscapeSequence
"

leads to

"
CharacterEscapeSequence :: ...
    NonEscapeCharacter

NonEscapeCharacter :: SourceCharacter but not one of

EscapeCharacter or LineTerminator "

and the semantics of NonEscapeCharacter is given thus

"
The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV

of the NonEscapeCharacter. "

so are the following assertions true?

(1)

The only SourceCharacter sequences that do not match ( DoubleStringCharacter | SingleStringCharacter ) applied one or more times are a LineTerminator not preceded by an odd number of backslashes, "u" not followed by 4 valid hex digits and not preceded by an even number of backslashes, "x" not followed by 2 valid hex digits and not preceded by an even number of backslashes, or a decimal digit not preceded by an even number of backslashes. I.e. /(?:^|[^\])(?:\\)*([\r\n\u2028\u2029]|\u(?![0-9A-Fa-f]{4})|\x(?![0-9A-Fa-f]{2})|\[0-9]/ tests whether a sequence of SourceCharacters matches zero or more ( DoubleStringCharacter | SingleStringCharacter ).

(2)

The B.1.2 additional octal syntax, quoted below, does change the validity of the test above. " OctalEscapeSequence :: OctalDigit [lookahead not in DecimalDigit] ZeroToThree OctalDigit [lookahead not in DecimalDigit] FourToSeven OctalDigit ZeroToThree OctalDigit OctalDigit "

NonEscapeCharacter excludes DecimalDigit through SingleEscapeCharacter but OctalEscape allows [0-7]. So under B.1.2, /(?:^|[^\])(?:\\)*([\r\n\u2028\u2029]|\u(?![0-9A-Fa-f]{4})|\x(?![0-9A-Fa-f]{2}|\[89]|\[0-3][0-7]?(?![89])|\4-7)/ tests whether a sequence of SourceCharacters matches zero or more ( DoubleStringCharacter | SingleStringCharacter ).

I did some empirical testing to see what is actually allowed by running the below in a variety of browsers in the squarefree shell.

var notStringLiterals = [ "\r", "\u", "\x", "\8", "\28", "\228", "\3778", "\478", "\778" ]; for (var i = 0; i < notStringLiterals.length; ++i) { var result; try { result = eval('"' + notStringLiterals[i] + '"'); } catch (ex) { result = "ERROR"; } print(JSON.stringify(notStringLiterals[i]) + " : " + JSON.stringify(result)); }

All are invalid absent B.1.2 if the assertions above are true. With B.1.2, "\3778", "\478", and "\778" are valid.

I'm having trouble running IE today, but on other browsers, in alphabetical order:

Chrome "\r" : "ERROR" "\u" : "u" "\x" : "x" "\8" : "8" "\28" : "\u00028" "\228" : "\u00128" "\3778" : "ÿ8" "\478" : "'8" "\778" : "?8"

FF3 "\u000d" : "ERROR" "\u" : "u" "\x" : "x" "\8" : "8" "\28" : "\u00028" "\228" : "\u00128" "\3778" : "ÿ8" "\478" : "'8" "\778" : "?8"

Safari "\r" : "ERROR" "\u" : "u" "\x" : "x" "\8" : "8" "\28" : "\u00028" "\228" : "\u00128" "\3778" : "ÿ8" "\478" : "'8" "\778" : "?8"

So at least 3 different interpreter strains treat "\u" === "u", "\x" === "x", "\8" === "8", and don't care whether there is a decimal digit after an octal escape sequence. All reject unescaped newlines in string literals.

I would like to be able to specify quasiliteral literal part decoding in terms of the SV defined in 7.8.4. If user code is going to have decoded literal parts available when they validly decode, but at least have access to the raw literal parts otherwise, then it would be good for them to be consistently available across interpreters. Would it be worthwhile having the SV and CV in 7.8.4 specify the decoding of some sourcecharacter sequences that can't actually reach the SV or CV from via the StringLiteral production?

# Mike Samuel (14 years ago)

2011/5/31 Mike Samuel <mikesamuel at gmail.com>:

I'm having trouble running IE today, but on other browsers, in alphabetical order:

IE 7 loves me but apparently hates \u.

"\r" : "ERROR" "\u" : "ERROR" "\x" : "ERROR" "\8" : "8" "\28" : "\u00028" "\228" : "\u00128" "\3778" : "ÿ8" "\478" : "'8" "\778" : "?8"

# Dave Fugate (14 years ago)

Results for IE9 ("IE9 standards" mode) given the snippet below: "\r" : "ERROR" "\u" : "ERROR" "\x" : "ERROR" "\8" : "8" "\28" : "\u00028" "\228" : "\u00128" "\3778" : "ÿ8" "\478" : "'8" "\778" : "?8"

My best,