Allen Wirfs-Brock (2012-02-27T21:58:47.000Z)
domenic at domenicdenicola.com (2013-08-29T19:38:35.286Z)
Yes, this interpretation is consistent with my understanding of the requirements as expressed in the ES5 spec. ES5 logically only works with UCS-2 characters corresponding to the BMP. Some (probably most) implementations pass UTF-16 encodings of supplemental characters to the JavaScript compiler. According to the spec, these are processed as two UCS-2 characters neither of which would be a member of any of the above character categories. Their use in an identifier context should result in a syntax error. Within a string literal, the two UCS-2 characters would generate two string elements. This is something that I think can be clarified for the ES6 specification, independent of the on-going discussion of the possibility of 21-bit string elements. My preference for the future is to simply define the input alphabet of ECMAScript as all Unicode characters independent of actual encoding. var \ud87e\udc00 would probably still be illegal because each \uXXXX define a separate character but: var \u{2f800} =42; schould be find as should the direct none escaped occurrence of that characters.