Why should e.g. '\u2xao' throw an error?

# Mathias Bynens (14 years ago)

Why should e.g. '\u2xao' throw an error? I can’t find this in the spec, but Test262 actually has a test for this behavior so I must be missing something obvious.

I know UnicodeEscapeSequence is defined as follows:

UnicodeEscapeSequence :: u HexDigit HexDigit HexDigit HexDigit

But since x is not a HexDigit, I’d expect '\u2xao' to equal 'u2xao', i.e. \u is an escape for u and the rest of the string is nothing special.

Thanks in advance, Mathias

Why should e.g. '\u2xao' throw an error? I can’t find this in the spec, but
Test262 actually has a test for this behavior so I must be missing
something obvious.

I know `UnicodeEscapeSequence` is defined as follows:

    UnicodeEscapeSequence :: u HexDigit HexDigit HexDigit HexDigit

But since `x` is not a `HexDigit`, I’d expect '\u2xao' to equal 'u2xao',
i.e. `\u` is an escape for `u` and the rest of the string is nothing
special.

Thanks in advance,
Mathias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120317/35e97caa/attachment.html>

# Lasse Reichstein (14 years ago)

On Sat, Mar 17, 2012 at 11:16 AM, Mathias Bynens <mathias at qiwi.be> wrote:

Why should e.g. '\u2xao' throw an error? I can’t find this in the spec, but Test262 actually has a test for this behavior so I must be missing something obvious.

I know UnicodeEscapeSequence is defined as follows:

UnicodeEscapeSequence :: u HexDigit HexDigit HexDigit HexDigit

But since x is not a HexDigit, I’d expect '\u2xao' to equal 'u2xao', i.e. \u is an escape for u and the rest of the string is nothing special.

The spec doesn't say that "\u" is an escape for 'u'. That's just implementations trying to be lenient rather that tell the user that his program can't be parsed. I.e., it's a language extension.

In strings, not RegExps, the 'g' in "\g" is matched by the production. NonEscapeCharacter :: SourceCharacter but not EscapeCharacter or LineTerminator Since 'u' is an EscapeCharacter, "\u" is not a valid production using NonEscapeCharacter. The only part of the lexical grammar for strings that allow "\u" requires it to be followed by four hex-digits.

I.e., the string "\u2xa0" can't be parsed by the lexical grammar at all, so you have your syntax error.

On the other hand, I agree that I can't point to a place in the spec where it says that you must throw a SyntaxError if you are unable to parse global code. If it's eval code, it throws in 15.1.2.1 step 2, and if it's an argument to the Function constructor, it throws in 15.3.2.1 step 8 (i.e., where a String is parsed at runtime). In all other places, the spec assumes that both lexical and syntactic parsing succeeded, and describes what to do with the result. If it doesn't parse, the input isn't even ECMAScript, so presumably it's the surrounding system that must report an error.

On Sat, Mar 17, 2012 at 11:16 AM, Mathias Bynens <mathias at qiwi.be> wrote:
> Why should e.g. '\u2xao' throw an error? I can’t find this in the spec, but
> Test262 actually has a test for this behavior so I must be missing something
> obvious.
>
> I know `UnicodeEscapeSequence` is defined as follows:
>
>     UnicodeEscapeSequence :: u HexDigit HexDigit HexDigit HexDigit
>
> But since `x` is not a `HexDigit`, I’d expect '\u2xao' to equal 'u2xao',
> i.e. `\u` is an escape for `u` and the rest of the string is nothing
> special.

The spec doesn't say that "\u" is an escape for 'u'. That's just
implementations trying to be lenient rather that tell the user that
his program can't be parsed. I.e., it's a language extension.

In strings, not RegExps, the 'g' in "\g" is matched by the production.
  NonEscapeCharacter ::
    SourceCharacter but not EscapeCharacter or LineTerminator
Since 'u' is an EscapeCharacter, "\u" is not a valid production using
NonEscapeCharacter.
The only part of the lexical grammar for strings that allow "\u"
requires it to be followed by four hex-digits.

I.e., the string "\u2xa0" can't be parsed by the lexical grammar at
all, so you have your syntax error.

On the other hand, I agree that I can't point to a place in the spec
where it says that you must throw a SyntaxError if you are unable to
parse global code.
If it's eval code, it throws in 15.1.2.1 step 2, and if it's an
argument to the Function constructor, it throws in 15.3.2.1 step 8
(i.e., where a String is parsed at runtime).
In all other places, the spec assumes that both lexical and syntactic
parsing succeeded, and describes what to do with the result. If it
doesn't parse, the input isn't even ECMAScript, so presumably it's the
surrounding system that must report an error.

/L

# Allen Wirfs-Brock (14 years ago)

On Mar 17, 2012, at 4:50 AM, Lasse Reichstein wrote:

... On the other hand, I agree that I can't point to a place in the spec where it says that you must throw a SyntaxError if you are unable to parse global code. If it's eval code, it throws in 15.1.2.1 step 2, and if it's an argument to the Function constructor, it throws in 15.3.2.1 step 8 (i.e., where a String is parsed at runtime). In all other places, the spec assumes that both lexical and syntactic parsing succeeded, and describes what to do with the result. If it doesn't parse, the input isn't even ECMAScript, so presumably it's the surrounding system that must report an error.

see section 16. Any syntax error is an "early error". The process used to report early errors for a ECMAScript Program production is implementation defined (see note in section 14) except for the case you mention for eval

On Mar 17, 2012, at 4:50 AM, Lasse Reichstein wrote:
> ...
> On the other hand, I agree that I can't point to a place in the spec
> where it says that you must throw a SyntaxError if you are unable to
> parse global code.
> If it's eval code, it throws in 15.1.2.1 step 2, and if it's an
> argument to the Function constructor, it throws in 15.3.2.1 step 8
> (i.e., where a String is parsed at runtime).
> In all other places, the spec assumes that both lexical and syntactic
> parsing succeeded, and describes what to do with the result. If it
> doesn't parse, the input isn't even ECMAScript, so presumably it's the
> surrounding system that must report an error.

see section 16.  Any syntax error is an "early error".  The process used to report early errors for a ECMAScript Program production is implementation defined (see note in section 14) except for the case you mention for eval 


Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120319/9d8901ee/attachment.html>