Unicode non-character being treat as space on Firefox/Chrome

# Gareth Heyes (8 years ago)

Not sure if this is a bug or not. Non-character is being treated as a space even though it's not defined as one. Edge and Safari treat it as an invalid character.

�alert�(1)�

In case the characters get mangled:

eval("alert"+String.fromCharCode(65534)+"(1)");

Hi all

Not sure if this is a bug or not. Non-character is being treated as a space
even though it's not defined as one. Edge and Safari treat it as an invalid
character.

```javascript
�alert�(1)�
```

In case the characters get mangled:
```javascript
eval("alert"+String.fromCharCode(65534)+"(1)");
```

Cheers
Gareth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/ec453945/attachment.html>

# Michał Wadas (8 years ago)

I believe that Unicode specification make it undefined behaviour.

In effect, noncharacters can be thought of as application-internal private-use code points. Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which are assigned characters and which are intended for use in open interchange, subject to interpretation by private agreement, noncharacters are permanently reserved (unassigned) and have no interpretation whatsoever outside of their possible application-internal private uses

www.unicode.org/versions/Unicode6.0.0/ch16.pdf

I believe that Unicode specification make it undefined behaviour.

In effect, noncharacters can be thought of as application-internal
private-use code points. Unlike the private-use characters discussed in
Section 16.5, Private-Use Characters, which are assigned characters and
which are intended for use in open interchange, subject to interpretation
by private agreement, noncharacters are permanently reserved (unassigned)
and have no interpretation whatsoever outside of their possible
application-internal private uses

http://www.unicode.org/versions/Unicode6.0.0/ch16.pdf

On Thu, May 25, 2017 at 12:33 PM, Gareth Heyes <gareth.heyes at portswigger.net
> wrote:

> Hi all
>
> Not sure if this is a bug or not. Non-character is being treated as a
> space even though it's not defined as one. Edge and Safari treat it as an
> invalid character.
>
> ```javascript
> �alert�(1)�
> ```
>
> In case the characters get mangled:
> ```javascript
> eval("alert"+String.fromCharCode(65534)+"(1)");
> ```
>
> Cheers
> Gareth
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/6bdc3415/attachment.html>

# Mark S. Miller (8 years ago)

What is the relevant EcmaScript standards text that would delegate to this? Even if Unicode implies an undefined case, EcmaScript should not. If EcmaScript behavior for such cases is undefined, we should define it.

What is the relevant EcmaScript standards text that would delegate to this?
Even if Unicode implies an undefined case, EcmaScript should not. If
EcmaScript behavior for such cases is undefined, we should define it.


On Thu, May 25, 2017 at 9:01 AM, Michał Wadas <michalwadas at gmail.com> wrote:

> I believe that Unicode specification make it undefined behaviour.
>
> In effect, noncharacters can be thought of as application-internal
> private-use code points. Unlike the private-use characters discussed in
> Section 16.5, Private-Use Characters, which are assigned characters and
> which are intended for use in open interchange, subject to interpretation
> by private agreement, noncharacters are permanently reserved (unassigned)
> and have no interpretation whatsoever outside of their possible
> application-internal private uses
>
> http://www.unicode.org/versions/Unicode6.0.0/ch16.pdf
>
>
>
> On Thu, May 25, 2017 at 12:33 PM, Gareth Heyes <
> gareth.heyes at portswigger.net> wrote:
>
>> Hi all
>>
>> Not sure if this is a bug or not. Non-character is being treated as a
>> space even though it's not defined as one. Edge and Safari treat it as an
>> invalid character.
>>
>> ```javascript
>> �alert�(1)�
>> ```
>>
>> In case the characters get mangled:
>> ```javascript
>> eval("alert"+String.fromCharCode(65534)+"(1)");
>> ```
>>
>> Cheers
>> Gareth
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>


-- 
    Cheers,
    --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/811666bd/attachment.html>

# Gareth Heyes (8 years ago)

On 25 May 2017 at 14:04, Mark S. Miller <erights at google.com> wrote:

What is the relevant EcmaScript standards text that would delegate to this? Even if Unicode implies an undefined case, EcmaScript should not. If EcmaScript behavior for such cases is undefined, we should define it.

Looking at the spec. it seems undefined. 0xfffe isn't defined as a whitespace character. This is probably why we have different behaviour in different browsers.

On 25 May 2017 at 14:04, Mark S. Miller <erights at google.com> wrote:

> What is the relevant EcmaScript standards text that would delegate to
> this? Even if Unicode implies an undefined case, EcmaScript should not. If
> EcmaScript behavior for such cases is undefined, we should define it.
>

Looking at the spec. it seems undefined. 0xfffe isn't defined as a
whitespace character. This is probably why we have different behaviour in
different browsers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/562d9cc5/attachment.html>

# Domenic Denicola (8 years ago)

We should probably move this to a GitHub issue then, so ES can have clarity on it.

If it helps, I am pretty sure (although I should double-check) that HTML treats such noncharacters as conformance errors (i.e. external tools like validators will warn you about them), but does not let them impact the processing model; they are passed through as-is.

We should probably move this to a GitHub issue then, so ES can have clarity on it.

If it helps, I am pretty sure (although I should double-check) that HTML treats such noncharacters as conformance errors (i.e. external tools like validators will warn you about them), but does not let them impact the processing model; they are passed through as-is.

________________________________
From: es-discuss <es-discuss-bounces at mozilla.org> on behalf of Gareth Heyes <gareth.heyes at portswigger.net>
Sent: Thursday, May 25, 2017 10:52:52 AM
To: Mark S. Miller
Cc: es-discuss at mozilla.org
Subject: Re: Unicode non-character being treat as space on Firefox/Chrome

On 25 May 2017 at 14:04, Mark S. Miller <erights at google.com<mailto:erights at google.com>> wrote:
What is the relevant EcmaScript standards text that would delegate to this? Even if Unicode implies an undefined case, EcmaScript should not. If EcmaScript behavior for such cases is undefined, we should define it.

Looking at the spec. it seems undefined. 0xfffe isn't defined as a whitespace character. This is probably why we have different behaviour in different browsers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/c161097c/attachment-0001.html>

# Allen Wirfs-Brock (8 years ago)

clause 10.1:

ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars.

tc39.github.io/ecma262/#sec-white-space, tc39.github.io/ecma262/#sec-white-space exactly defines which specific code units are treated as Whitespae by the ECMAScript grammar. It does not include unassigned code points in the set of valid Whitespace

clause 10.1: 

ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars.


https://tc39.github.io/ecma262/#sec-white-space <https://tc39.github.io/ecma262/#sec-white-space> exactly defines which specific code units are treated as Whitespae by the ECMAScript grammar. It does not include unassigned code points in the set of valid Whitespace

Allen


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/a7e073b8/attachment.html>

# Mark S. Miller (8 years ago)

Allen, I'm very glad to hear that it is unambiguous after all.

Gareth, could you file bugs against the non-conforming browsers? Thanks for finding this!

Allen, I'm very glad to hear that it is unambiguous after all.

Gareth, could you file bugs against the non-conforming browsers? Thanks for
finding this!



On Thu, May 25, 2017 at 8:58 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>
wrote:

> clause 10.1:
>
> ECMAScript code is expressed using Unicode. ECMAScript source text is a
> sequence of code points. All Unicode code point values from U+0000 to
> U+10FFFF, including surrogate code points, may occur in source text where
> permitted by the ECMAScript grammars.
>
>
> https://tc39.github.io/ecma262/#sec-white-space exactly defines which
> specific code units are treated as Whitespae by the ECMAScript grammar. It
> does not include unassigned code points in the set of valid Whitespace
>
> Allen
>
>
>


-- 
    Cheers,
    --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170525/1b0322c4/attachment.html>

# Gareth Heyes (8 years ago)

On 25 May 2017 at 17:02, Mark S. Miller <erights at google.com> wrote:

Allen, I'm very glad to hear that it is unambiguous after all.

Gareth, could you file bugs against the non-conforming browsers? Thanks for

finding this!

Yeah sure I'll file the bugs now.

On 25 May 2017 at 17:02, Mark S. Miller <erights at google.com> wrote:

> Allen, I'm very glad to hear that it is unambiguous after all.
>
Gareth, could you file bugs against the non-conforming browsers? Thanks for
> finding this!
>

Yeah sure I'll file the bugs now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170526/6957e3bf/attachment.html>