mathias at qiwi.be (2018-03-25T13:26:15.507Z)
On Thu, Sep 12, 2013 at 6:42 PM, Brendan Eich <brendan at mozilla.com> wrote:
> Iterators forward and (if needed backward) over Unicode characters (scalar
> values; I'm allowed to call those "characters", no?) would be good. Github
> beats TC39 as usual, prollyfill FTW.
No, there a non-characters that are Unicode scalar values and can
(therefore) be expressed using utf-8, such as U+FFFF.
This should do what you asked for, although it's late and it's not an
iterator as those don't really work in browsers yet, but should be
easy enough to convert:
```js
function toUnicode(str) {
var output = ""
for(var i = 0, l = str.length; i < l; i++) {
var c = str.charCodeAt(i)
if (0xD800 <= c && c <= 0xDBFF) {
nextC = str.charCodeAt(i+1);
if (0xDC00 > nextC || nextC > 0xDFFF) {
output += "\uFFFD"
} else {
output += str[i] += str[++i]
continue
}
}
else if (0xDC00 <= c && c <= 0xDFFF) {
output += "\uFFFD"
} else {
output += str[i]
}
}
return output
}
toUnicode("\ud800a")
toUnicode("\ud800\udc01")
toUnicode("\udc00a")
```
On Thu, Sep 12, 2013 at 6:42 PM, Brendan Eich <brendan at mozilla.com> wrote: > Iterators forward and (if needed backward) over Unicode characters (scalar > values; I'm allowed to call those "characters", no?) would be good. Github > beats TC39 as usual, prollyfill FTW. No, there a non-characters that are Unicode scalar values and can (therefore) be expressed using utf-8, such as U+FFFF. This should do what you asked for, although it's late and it's not an iterator as those don't really work in browsers yet, but should be easy enough to convert: function toUnicode(str) { var output = "" for(var i = 0, l = str.length; i < l; i++) { var c = str.charCodeAt(i) if (0xD800 <= c && c <= 0xDBFF) { nextC = str.charCodeAt(i+1); if (0xDC00 > nextC || nextC > 0xDFFF) { output += "\uFFFD" } else { output += str[i] += str[++i] continue } } else if (0xDC00 <= c && c <= 0xDFFF) { output += "\uFFFD" } else { output += str[i] } } return output } toUnicode("\ud800a") toUnicode("\ud800\udc01") toUnicode("\udc00a") -- http://annevankesteren.nl/