BOM in script sources
# Lars T Hansen (19 years ago)
Section 7.1 of E262-3 requires all format control (class Cf)
characters to be stripped from the source before the program is
compiled. Opera has never done this, and is actually at fault here.
Mea culpa.
The ECMAScript 4 committee has since concluded that the requirement
to strip class Cf characters is a bug in the spec (people want to
have regexes and strings containing those characters literally) and
ECMAScript 4 will not contain that requirement. See http://
developer.mozilla.org/es4/proposals/update_unicode.html.
I've come across an incompatibility between Opera and some other browsers:
if there is a Unicode Zero Width No-Break Space character in the script
source the script will not compile in Opera. This character is usually
known as the Unicode Byte Order Mark (BOM). If it is at the start of a
script file sent as UTF-8 it will be removed before compilation, but if it
is inside the script and not within a string it will break the script.
According to ECMA-262 "Any other Unicode space separator <USP>" should be
treated as whitespace. But apparently that only covers the Zs class in
Unicode, which currently consists of the following code points:
0020;SPACE;Zs;0;WS;;;;;N;;;;; 00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING SPACE;;;; 1680;OGHAM SPACE MARK;Zs;0;WS;;;;;N;;;;; 180E;MONGOLIAN VOWEL SEPARATOR;Zs;0;WS;;;;;N;;;;; 2000;EN QUAD;Zs;0;WS;2002;;;;N;;;;; 2001;EM QUAD;Zs;0;WS;2003;;;;N;;;;; 2002;EN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 2003;EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 2004;THREE-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 2005;FOUR-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 2006;SIX-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;; 2008;PUNCTUATION SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 2009;THIN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 200A;HAIR SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;; 205F;MEDIUM MATHEMATICAL SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;; 3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;;
FEFF has the class "Cf" which means "Other, format".
Hence, Opera is complicant with the ECMA-262 spec in not considering the
U+FEFF character a "white space" character in script source. Is this
something Firefox would consider a bug and fix, or would it be better to
spec ES4 to allow the U+FEFF character inside script source?