ES Discuss - Message History

Anne van Kesteren (2013-09-12T18:25:12.000Z)

Go to Source

On Thu, Sep 12, 2013 at 6:42 PM, Brendan Eich <brendan at mozilla.com> wrote:
> Iterators forward and (if needed backward) over Unicode characters (scalar
> values; I'm allowed to call those "characters", no?) would be good. Github
> beats TC39 as usual, prollyfill FTW.

No, there a non-characters that are Unicode scalar values and can
(therefore) be expressed using utf-8, such as U+FFFF.

This should do what you asked for, although it's late and it's not an
iterator as those don't really work in browsers yet, but should be
easy enough to convert:

function toUnicode(str) {
  var output = ""
  for(var i = 0, l = str.length; i < l; i++) {
    var c = str.charCodeAt(i)
    if (0xD800 <= c && c <= 0xDBFF) {
      nextC = str.charCodeAt(i+1);
      if (0xDC00 > nextC || nextC > 0xDFFF) {
        output += "\uFFFD"
      } else {
        output += str[i] += str[++i]
        continue
      }
    }
    else if (0xDC00 <= c && c <= 0xDFFF) {
      output += "\uFFFD"
    } else {
      output += str[i]
    }
  }
  return output
}
toUnicode("\ud800a")
toUnicode("\ud800\udc01")
toUnicode("\udc00a")


-- 
http://annevankesteren.nl/

mathias at qiwi.be (2018-03-25T13:26:15.507Z)

On Thu, Sep 12, 2013 at 6:42 PM, Brendan Eich <brendan at mozilla.com> wrote:
> Iterators forward and (if needed backward) over Unicode characters (scalar
> values; I'm allowed to call those "characters", no?) would be good. Github
> beats TC39 as usual, prollyfill FTW.

No, there a non-characters that are Unicode scalar values and can
(therefore) be expressed using utf-8, such as U+FFFF.

This should do what you asked for, although it's late and it's not an
iterator as those don't really work in browsers yet, but should be
easy enough to convert:

```js
function toUnicode(str) {
  var output = ""
  for(var i = 0, l = str.length; i < l; i++) {
    var c = str.charCodeAt(i)
    if (0xD800 <= c && c <= 0xDBFF) {
      nextC = str.charCodeAt(i+1);
      if (0xDC00 > nextC || nextC > 0xDFFF) {
        output += "\uFFFD"
      } else {
        output += str[i] += str[++i]
        continue
      }
    }
    else if (0xDC00 <= c && c <= 0xDFFF) {
      output += "\uFFFD"
    } else {
      output += str[i]
    }
  }
  return output
}
toUnicode("\ud800a")
toUnicode("\ud800\udc01")
toUnicode("\udc00a")
```

Edit