Brendan Eich (2013-09-04T16:34:09.000Z)
domenic at domenicdenicola.com (2013-09-09T02:00:42.413Z)
Anne van Kesteren <mailto:annevk at annevk.nl> September 4, 2013 9:06 AM > Oops. Any reason this is not just String.from() btw? Give the better > method a nice short name? Because of String.fromCharCode precedent. Balanced names with noun phrases that distinguish the "from" domains are better than longAndPortly vs. tiny. > Yes, but it cannot encode lone surrogates. It can only deal in Unicode > scalar values. Sure, but you wanted to reduce "three concepts" and I don't see how to do that. Most developers can ignore UTF-8, for sure. Probably I just misunderstood what you meant, and you were simply pointing out that lone surrogates arise only from legacy APIs? > Unicode scalar values are code points sans surrogates, i.e. completely > compatible with what a utf-8 encoder/decoder pair can handle. > > Why do you want to expose surrogates? I'm not sure I do! Sounds scandalous. :-P Here, from the latest ES6 draft, is 15.5.2.3 String.fromCodePoint (...codePoints): ``` The String.fromCodePoint function may be called with a variable number of arguments which form the rest parameter codePoints. The following steps are taken: 1. Assert: codePoints is a well-formed rest parameter object. 2. Let length be the result of Get(codePoints, "length"). 3. Let elements be a new List. 4. Let nextIndex be 0. 5. Repeat while nextIndex < length a. Let next be the result of Get(codePoints, ToString(nextIndex)). b. Let nextCP be ToNumber(next). c. ReturnIfAbrupt(nextCP). d. If SameValue(nextCP, ToInteger(nextCP)) is false,then throw a RangeError exception. e. If nextCP < 0 or nextCP > 0x10FFFF, then throw a RangeError exception. f. Append the elements of the UTF-16 Encoding (clause 6) of nextCP to the end of elements. g. Let nextIndex be nextIndex + 1. 6. Return the String value whose elements are, in order, the elements in the List elements. If length is 0, the empty string is returned. ``` No exposed surrogates here! Here's the spec for String.prototype.codePointAt: ``` When the codePointAt method is called with one argument pos, the following steps are taken: 1. Let O be CheckObjectCoercible(this value). 2. Let S be ToString(O). 3. ReturnIfAbrupt(S). 4. Let position be ToInteger(pos). 5. ReturnIfAbrupt(position). 6. Let size be the number of elements in S. 7. If position < 0 or position ≥ size, return undefined. 8. Let first be the code unit value of the element at index position in the String S. 9. If first < 0xD800 or first > 0xDBFF or position+1 = size, then return first. 10. Let second be the code unit value of the element at index position+1 in the String S. 11. If second < 0xDC00 or second > 0xDFFF, then return first. 12. Return ((first – 0xD800) × 1024) + (second – 0xDC00) + 0x10000. NOTE The codePointAt function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method. ``` I take it you are objecting to step 11? > "\udfff".codePointAt(0) == "\udfff" > > It seems better if that returns "\ufffd", as you'd get with utf-8 > (assuming it accepts code points as input rather than just Unicode > scalar values, in which case it'd throw). Maybe. Allen and Norbert should weigh in. > The indexing of codePointAt() is also kind of sad as it just passes > through to charCodeAt(), I don't see that in the spec cited above.