`String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

# Mathias Bynens (12 years ago)

ES6 fixes String.fromCharCode by introducing String.fromCodePoint.

Similarly, String.prototype.charCodeAt is fixed by String.prototype.codePointAt.

Should there be a method that is like String.prototype.charAt except it deals with astral Unicode symbols wherever possible?

>> '𝌆'.charAt(0) // U+1D306
'\uD834' // the first surrogate half for U+1D306

>> '𝌆'.symbolAt(0) // U+1D306
'𝌆' // U+1D306

Has this been discussed before? If there’s any interest I’d be happy to create a strawman.

ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`.

Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible?

    >> '𝌆'.charAt(0) // U+1D306
    '\uD834' // the first surrogate half for U+1D306

    >> '𝌆'.symbolAt(0) // U+1D306
    '𝌆' // U+1D306

Has this been discussed before? If there’s any interest I’d be happy to create a strawman.

Mathias  
http://mathiasbynens.be/

# Rick Waldron (12 years ago)

I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?)

On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens <mathias at qiwi.be> wrote:

> ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.
>
> Similarly, `String.prototype.charCodeAt` is fixed by
> `String.prototype.codePointAt`.
>
> Should there be a method that is like `String.prototype.charAt` except it
> deals with astral Unicode symbols wherever possible?
>
>     >> '𝌆'.charAt(0) // U+1D306
>     '\uD834' // the first surrogate half for U+1D306
>
>     >> '𝌆'.symbolAt(0) // U+1D306
>     '𝌆' // U+1D306
>

I think the idea is good, but the name may be confusing with regard to
Symbols (maybe not?)

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/195eccae/attachment.html>

# Mathias Bynens (12 years ago)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or “Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out a proposal. We can then use this thread to bikeshed about the name.

On 18 Oct 2013, at 09:21, Rick Waldron <waldron.rick at gmail.com> wrote:

> I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or “Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out a proposal. We can then use this thread to bikeshed about the name.

# Benjamin (Inglor) Gruenbaum (12 years ago)

I also noticed the naming similarity to ES6 Symbols.

I've seen people fill String.prototype.getFullChar before and similarly things like String.prototype.fromFullCharCode for dealing with surrogates before. I like String.prototype.signAt but I haven't seen it used before.

I'm eager to hear what Allen has to say about this given his work on unicode in ecmascript. Especially how it settles with this strawman:support_full_unicode_in_strings

I also think that this is important enough to be there.

I also noticed the naming similarity to ES6 `Symbol`s.

 I've seen people fill  `String.prototype.getFullChar` before and similarly
things like `String.prototype.fromFullCharCode` for dealing with surrogates
before. I like `String.prototype.signAt` but I haven't seen it used before.

I'm eager to hear what Allen has to say about this given his work on
unicode in ecmascript. Especially how it settles with this
http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings&rev=1304034700


I also think that this is important enough to be there.

---------- Forwarded message ----------
From: Mathias Bynens <mathias at qiwi.be>
To: Rick Waldron <waldron.rick at gmail.com>
Cc: "es-discuss at mozilla.org list" <es-discuss at mozilla.org>
Date: Fri, 18 Oct 2013 09:47:21 -0500
Subject: Re: `String.prototype.symbolAt()` (improved
`String.prototype.charAt()`)
On 18 Oct 2013, at 09:21, Rick Waldron <waldron.rick at gmail.com> wrote:

> I think the idea is good, but the name may be confusing with regard to
Symbols (maybe not?)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph”
or “Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing
out a proposal. We can then use this thread to bikeshed about the name.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/00875df2/attachment.html>

# Rick Waldron (12 years ago)

On Fri, Oct 18, 2013 at 10:47 AM, Mathias Bynens <mathias at qiwi.be> wrote:

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out a proposal. We can then use this thread to bikeshed about the name.

I think it's worthwhile to write up a proposal.

And the shed should always be pink ;)

On Fri, Oct 18, 2013 at 10:47 AM, Mathias Bynens <mathias at qiwi.be> wrote:

> On 18 Oct 2013, at 09:21, Rick Waldron <waldron.rick at gmail.com> wrote:
>
> > I think the idea is good, but the name may be confusing with regard to
> Symbols (maybe not?)
>
> Yeah, I thought about that, but couldn’t figure out a better name. “Glyph”
> or “Grapheme” wouldn’t be accurate. Any suggestions?
>
> Anyway, if everyone agrees this is a good idea I’ll get started on
> fleshing out a proposal. We can then use this thread to bikeshed about the
> name.
>

I think it's worthwhile to write up a proposal.

And the shed should always be pink ;)

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/28fb51ed/attachment.html>

# Mathias Bynens (12 years ago)

Here’s my proposal. Feedback welcome, as well as suggestions for a better name (if any).

String.prototype.symbolAt(pos)

NOTE: Returns a single-element String containing the code point at element position pos in the String value resulting from converting the this object to a String. If there is no element at that position, the result is the empty String. The result is a String value, not a String object.

When the symbolAt method is called with one argument pos, the following steps are taken:

Let O be CheckObjectCoercible(this value).
Let S be ToString(O).
ReturnIfAbrupt(S).
Let position be ToInteger(pos).
ReturnIfAbrupt(position).
Let size be the number of elements in S.
If position < 0 or position ≥ size, return the empty String.
Let first be the code unit at index position in the String S.
Let cuFirst be the code unit value of the element at index 0 in the String first.
If cuFirst < 0xD800 or cuFirst > 0xDBFF or position + 1 = size, then return first.
Let cuSecond be the code unit value of the element at index position + 1 in the String S.
If cuSecond < 0xDC00 or cuSecond > 0xDFFF, then return first.
Let second be the code unit at index position + 1 in the string S.
Let cp be (first – 0xD800) × 0x400 + (second – 0xDC00) + 0x10000.
Return the elements of the UTF-16 Encoding (clause 6) of cp.

NOTE: The symbolAt function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

Here’s my proposal. Feedback welcome, as well as suggestions for a better name (if any).

## String.prototype.symbolAt(pos)

NOTE: Returns a single-element String containing the code point at element position `pos` in the String `value` resulting from converting the `this` object to a String. If there is no element at that position, the result is the empty String. The result is a String value, not a String object.

When the `symbolAt` method is called with one argument `pos`, the following steps are taken:

01. Let `O` be `CheckObjectCoercible(this value)`.
02. Let `S` be `ToString(O)`.
03. `ReturnIfAbrupt(S)`.
04. Let `position` be `ToInteger(pos)`.
05. `ReturnIfAbrupt(position)`.
06. Let `size` be the number of elements in `S`.
07. If `position < 0` or `position ≥ size`, return the empty String.
08. Let `first` be the code unit at index `position` in the String `S`.
09. Let `cuFirst` be the code unit value of the element at index `0` in the String `first`.
10. If `cuFirst < 0xD800` or `cuFirst > 0xDBFF` or `position + 1 = size`, then return `first`.
11. Let `cuSecond` be the code unit value of the element at index `position + 1` in the String `S`.
12. If `cuSecond < 0xDC00` or `cuSecond > 0xDFFF`, then return `first`.
13. Let `second` be the code unit at index `position + 1` in the string `S`.
14. Let `cp` be `(first – 0xD800) × 0x400 + (second – 0xDC00) + 0x10000`.
15. Return the elements of the UTF-16 Encoding (clause 6) of `cp`.

NOTE: The `symbolAt` function is intentionally generic; it does not require that its `this` value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

# Rick Waldron (12 years ago)

On Fri, Oct 18, 2013 at 11:15 AM, Mathias Bynens <mathias at qiwi.be> wrote:

Here’s my proposal. Feedback welcome, as well as suggestions for a better name (if any).

String.prototype.symbolAt(pos)

Here goes...

String.prototype.elementAt?

On Fri, Oct 18, 2013 at 11:15 AM, Mathias Bynens <mathias at qiwi.be> wrote:

> Here’s my proposal. Feedback welcome, as well as suggestions for a better
> name (if any).
>
> ## String.prototype.symbolAt(pos)
>

Here goes...

String.prototype.elementAt?

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/cbe161a7/attachment.html>

# Domenic Denicola (12 years ago)

Doesn't Unicode have some name for "visual representation of a code point"? Maybe it's "symbol"?

Doesn't Unicode have some name for "visual representation of a code point"? Maybe it's "symbol"?

# Anne van Kesteren (12 years ago)

On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens <mathias at qiwi.be> wrote:

Similarly, String.prototype.charCodeAt is fixed by String.prototype.codePointAt.

When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair.

The same goes for this new method. I still think that only offering a better way to iterate strings (as planned) seems like a much safer start into this brave new code point-based world.

On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`.

When you phrase it like that, I see another problem with
codePointAt(). You can't just replace existing usage of charCodeAt()
with codePointAt() as that would fail for input with paired
surrogates. E.g. a simple loop over a string that prints code points
would print both the code point and the trail surrogate code point for
a surrogate pair.

The same goes for this new method. I still think that only offering a
better way to iterate strings (as planned) seems like a much safer
start into this brave new code point-based world.


-- 
http://annevankesteren.nl/

# Mathias Bynens (12 years ago)

On 18 Oct 2013, at 10:39, Domenic Denicola <domenic at domenicdenicola.com> wrote:

Doesn't Unicode have some name for "visual representation of a code point"? Maybe it's "symbol"?

Not that I know of. I guess “Character” (www.unicode.org/glossary/#character) comes close, but we can’t really use that because String.prototype.charAt already exists. FWIW, I always use the term “symbol” to refer to a string that represents a single code point.

IMHO it’s not really confusing to name this new method symbolAt because it’s defined on String.prototype, which indicates that it acts on strings and has nothing to do with ES6 Symbols. That said, I welcome better suggestions :)

On 18 Oct 2013, at 10:39, Domenic Denicola <domenic at domenicdenicola.com> wrote:

> Doesn't Unicode have some name for "visual representation of a code point"? Maybe it's "symbol"?

Not that I know of. I guess “Character” (http://www.unicode.org/glossary/#character) comes close, but we can’t really use that because `String.prototype.charAt` already exists. FWIW, I always use the term “symbol” to refer to a string that represents a single code point.

IMHO it’s not _really_ confusing to name this new method `symbolAt` because it’s defined on `String.prototype`, which indicates that it acts on strings and has nothing to do with ES6 Symbols. That said, I welcome better suggestions :)

# Mathias Bynens (12 years ago)

On 18 Oct 2013, at 10:25, Rick Waldron <waldron.rick at gmail.com> wrote:

String.prototype.elementAt?

This may be confusing too, since the spec refers to elements as code units, not code points.

On 18 Oct 2013, at 10:25, Rick Waldron <waldron.rick at gmail.com> wrote:

> String.prototype.elementAt?

This may be confusing too, since the spec refers to `elements` as code units, not code points.

# Mathias Bynens (12 years ago)

On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:

On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens <mathias at qiwi.be> wrote:

Similarly, String.prototype.charCodeAt is fixed by String.prototype.codePointAt.

When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair.

I disagree. In those situations you should just iterate over the string using for…of.

.symbolAt() can be a useful replacement for .charAt() in case you only need to get the first symbol in the string. The same goes for .codePointAt() vs. .charCodeAt().

On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:

> On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens <mathias at qiwi.be> wrote:
>> Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`.
> 
> When you phrase it like that, I see another problem with
> codePointAt(). You can't just replace existing usage of charCodeAt()
> with codePointAt() as that would fail for input with paired
> surrogates. E.g. a simple loop over a string that prints code points
> would print both the code point and the trail surrogate code point for
> a surrogate pair.

I disagree. In those situations you should just iterate over the string using `for…of`.

`.symbolAt()` can be a useful replacement for `.charAt()` in case you only need to get the first symbol in the string. The same goes for `.codePointAt()` vs. `.charCodeAt()`.

# Rick Waldron (12 years ago)

On Fri, Oct 18, 2013 at 11:53 AM, Mathias Bynens <mathias at qiwi.be> wrote:

On 18 Oct 2013, at 10:25, Rick Waldron <waldron.rick at gmail.com> wrote:

String.prototype.elementAt?

This may be confusing too, since the spec refers to elements as code units, not code points.

Yes, slight mis-reading of your proposal—thanks for clarifying

On Fri, Oct 18, 2013 at 11:53 AM, Mathias Bynens <mathias at qiwi.be> wrote:

> On 18 Oct 2013, at 10:25, Rick Waldron <waldron.rick at gmail.com> wrote:
>
> > String.prototype.elementAt?
>
> This may be confusing too, since the spec refers to `elements` as code
> units, not code points.
>

Yes, slight mis-reading of your proposal—thanks for clarifying

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/af584b3d/attachment.html>

# Anne van Kesteren (12 years ago)

On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be> wrote:

On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:

When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair.

I disagree. In those situations you should just iterate over the string using for…of.

That seems to iterate over code units as far as I can tell.

for (var x of "💩")
  print(x.charCodeAt(0))

invokes print() twice in Gecko.

On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:
>> When you phrase it like that, I see another problem with
>> codePointAt(). You can't just replace existing usage of charCodeAt()
>> with codePointAt() as that would fail for input with paired
>> surrogates. E.g. a simple loop over a string that prints code points
>> would print both the code point and the trail surrogate code point for
>> a surrogate pair.
>
> I disagree. In those situations you should just iterate over the string using `for…of`.

That seems to iterate over code units as far as I can tell.

  for (var x of "💩")
    print(x.charCodeAt(0))

invokes print() twice in Gecko.


-- 
http://annevankesteren.nl/

# André Bargull (12 years ago)

SpiderMonkey does not implement the (yet to be) spec'ed String.prototype.@@iterator function, instead it simply aliases String.prototype["@@iterator"] to Array.prototype["@@iterator"]:

js> String.prototype["@@iterator"] === Array.prototype["@@iterator"]

true

> >/  I disagree. In those situations you should just iterate over the string using `for...of`.
> /
> That seems to iterate over code units as far as I can tell.
>
>    for (var x of "?")
>      print(x.charCodeAt(0))
>
> invokes print() twice in Gecko.

SpiderMonkey does not implement the (yet to be) spec'ed 
String.prototype.@@iterator function, instead it simply aliases 
String.prototype["@@iterator"] to Array.prototype["@@iterator"]:

js> String.prototype["@@iterator"] === Array.prototype["@@iterator"]
true


- André
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/ba49c35b/attachment-0001.html>

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:

I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?)

Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at':

 '𝌆'.at(0)

The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition:

'𝌆'.at(1)

do you still get '𝌆' or do you get the equivalent of String.fromCharCode('𝌆'[1])?

On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:

> 
> 
> 
> On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens <mathias at qiwi.be> wrote:
> ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.
> 
> Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`.
> 
> Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible?
> 
>     >> '𝌆'.charAt(0) // U+1D306
>     '\uD834' // the first surrogate half for U+1D306
> 
>     >> '𝌆'.symbolAt(0) // U+1D306
>     '𝌆' // U+1D306
> 
> I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?)
> 

Given that we have charAt, charCodeAt and codePointAt,  I think the most appropiate name for such a method would be 'at':
     '𝌆'.at(0)

The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition:

    '𝌆'.at(1)

do you still get '𝌆' or do you get the equivalent of String.fromCharCode('𝌆'[1]) ?

Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/59719172/attachment.html>

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:

That seems to iterate over code units as far as I can tell.
for (var x of "💩")
  print(x.charCodeAt(0))
invokes print() twice in Gecko.

No that's not correct, the @@iterator method of String.prototype is supposed to returns an iterator the iterates code points and returns single codepoint strings.

The spec. for this will be in the next draft that I release.

On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:

> On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be> wrote:
>> On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:
>>> When you phrase it like that, I see another problem with
>>> codePointAt(). You can't just replace existing usage of charCodeAt()
>>> with codePointAt() as that would fail for input with paired
>>> surrogates. E.g. a simple loop over a string that prints code points
>>> would print both the code point and the trail surrogate code point for
>>> a surrogate pair.
>> 
>> I disagree. In those situations you should just iterate over the string using `for…of`.
> 
> That seems to iterate over code units as far as I can tell.
> 
>  for (var x of "💩")
>    print(x.charCodeAt(0))
> 
> invokes print() twice in Gecko.
> 

No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings.

The spec. for this will be in the next draft that I release.

Allen

# Andrea Giammarchi (12 years ago)

+1 for the simplified at(symbolIndex)

I would expect '𝌆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) would.

I would expect '𝌆'.at(symbolIndex) to behave as length does based on unique symbol (unicode extra) so that everyone, except RAM and CPU, will have life easier with strings.

Long story short: there's no symbol at 1, the symbol is at 0 because the size of that unicode string is 1

That said, I am sure the discussion went through this already ^_^

+1 for the simplified `at(symbolIndex)`

I would expect '𝌆'.at(1) to fail same way 'a'.charAt(1) or
'a'.charCodeAt(1) would.

I would expect '𝌆'.at(symbolIndex) to behave as `length` does based on
unique symbol (unicode extra) so that everyone, except RAM and CPU, will
have life easier with strings.

Long story short: there's no symbol at 1, the symbol is at 0 because the
size of that unicode string is 1

That said, I am sure the discussion went through this already ^_^





On Fri, Oct 18, 2013 at 9:57 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:

>
> On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:
>
>
>
>
> On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens <mathias at qiwi.be> wrote:
>
>> ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.
>>
>> Similarly, `String.prototype.charCodeAt` is fixed by
>> `String.prototype.codePointAt`.
>>
>> Should there be a method that is like `String.prototype.charAt` except it
>> deals with astral Unicode symbols wherever possible?
>>
>>     >> '𝌆'.charAt(0) // U+1D306
>>     '\uD834' // the first surrogate half for U+1D306
>>
>>     >> '𝌆'.symbolAt(0) // U+1D306
>>     '𝌆' // U+1D306
>>
>
> I think the idea is good, but the name may be confusing with regard to
> Symbols (maybe not?)
>
>
> Given that we have charAt, charCodeAt and codePointAt,  I think the most
> appropiate name for such a method would be 'at':
>      '𝌆'.at(0)
>
> The issue when this sort of method has been discussed in the past has been
> what to do when you index at a trailing surrogate possition:
>
>     '𝌆'.at(1)
>
> do you still get '𝌆' or do you get the equivalent of
> String.fromCharCode('𝌆'[1]) ?
>
> Allen
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/28341207/attachment-0001.html>

# Andrea Giammarchi (12 years ago)

"the size of that unicode string is 1" ... meaning the virtual size for human eyes

"the size of that unicode string is 1" ... meaning the **virtual** size for
human eyes


On Fri, Oct 18, 2013 at 10:06 AM, Andrea Giammarchi <
andrea.giammarchi at gmail.com> wrote:

> +1 for the simplified `at(symbolIndex)`
>
> I would expect '𝌆'.at(1) to fail same way 'a'.charAt(1) or
> 'a'.charCodeAt(1) would.
>
> I would expect '𝌆'.at(symbolIndex) to behave as `length` does based on
> unique symbol (unicode extra) so that everyone, except RAM and CPU, will
> have life easier with strings.
>
> Long story short: there's no symbol at 1, the symbol is at 0 because the
> size of that unicode string is 1
>
> That said, I am sure the discussion went through this already ^_^
>
>
>
>
>
> On Fri, Oct 18, 2013 at 9:57 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
>
>>
>> On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:
>>
>>
>>
>>
>> On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens <mathias at qiwi.be> wrote:
>>
>>> ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.
>>>
>>> Similarly, `String.prototype.charCodeAt` is fixed by
>>> `String.prototype.codePointAt`.
>>>
>>> Should there be a method that is like `String.prototype.charAt` except
>>> it deals with astral Unicode symbols wherever possible?
>>>
>>>     >> '𝌆'.charAt(0) // U+1D306
>>>     '\uD834' // the first surrogate half for U+1D306
>>>
>>>     >> '𝌆'.symbolAt(0) // U+1D306
>>>     '𝌆' // U+1D306
>>>
>>
>> I think the idea is good, but the name may be confusing with regard to
>> Symbols (maybe not?)
>>
>>
>> Given that we have charAt, charCodeAt and codePointAt,  I think the most
>> appropiate name for such a method would be 'at':
>>      '𝌆'.at(0)
>>
>> The issue when this sort of method has been discussed in the past has
>> been what to do when you index at a trailing surrogate possition:
>>
>>     '𝌆'.at(1)
>>
>> do you still get '𝌆' or do you get the equivalent of
>> String.fromCharCode('𝌆'[1]) ?
>>
>> Allen
>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/14c709f3/attachment.html>

# Andrea Giammarchi (12 years ago)

if this is true then .at(symbolIndex) should be a no-brain ?

var virtualLength = 0;
for (var x of "💩") {
  virtualLength++;
}

// equivalent of
for(var i = 0; i < virtualLength; i++) {
  "💩".at(i);
}

Am I missing something ?

if this is true then .at(symbolIndex) should be a no-brain ?

```
var virtualLength = 0;
for (var x of "💩") {
  virtualLength++;
}

// equivalent of
for(var i = 0; i < virtualLength; i++) {
  "💩".at(i);
}

```

Am I missing something ?


On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock
<allen at wirfs-brock.com>wrote:

>
> On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:
>
> > On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> >> On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:
> >>> When you phrase it like that, I see another problem with
> >>> codePointAt(). You can't just replace existing usage of charCodeAt()
> >>> with codePointAt() as that would fail for input with paired
> >>> surrogates. E.g. a simple loop over a string that prints code points
> >>> would print both the code point and the trail surrogate code point for
> >>> a surrogate pair.
> >>
> >> I disagree. In those situations you should just iterate over the string
> using `for…of`.
> >
> > That seems to iterate over code units as far as I can tell.
> >
> >  for (var x of "💩")
> >    print(x.charCodeAt(0))
> >
> > invokes print() twice in Gecko.
> >
>
> No that's not correct, the @@iterator method of String.prototype is
> supposed to returns an interator the iterates code points and returns
> single codepoint strings.
>
> The spec. for this will be in the next draft that I release.
>
> Allen
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/edf7332a/attachment.html>

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 10:06 AM, Andrea Giammarchi wrote:

+1 for the simplified at(symbolIndex)

I would expect '𝌆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) would.

They are comparable, as the 'a' example are "index out of bounds" errors. We only use code unit indices with strings so '𝌆'[1] is valid (and so presumably should be '𝌆'.at(1) with 1 having the same meaning in each case.

The most consistent way to define String.prototype.at be be:

String.prototype.at = function(pos} {
    let cp = this.codePointAt(pos);
    return cp===undefined ? undefined : String.fromCodePoint(cp)
}

On Oct 18, 2013, at 10:06 AM, Andrea Giammarchi wrote:

> +1 for the simplified `at(symbolIndex)`
> 
> I would expect '𝌆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) would.

They are comparable, as the 'a' example are "index out of bounds" errors. We only use code unit indices with strings so '𝌆'[1] is valid (and so presumably should be '𝌆'.at(1) with 1 having the same meaning in each case.

The most consistent way to define String.prototype.at be be:

String.prototype.at = function(pos} {
    let cp = this.codePointAt(pos);
    return cp===undefined ? undefined : String.fromCodePoint(cp)
}

Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/eb6dce4e/attachment.html>

# Mathias Bynens (12 years ago)

On 18 Oct 2013, at 11:05, Anne van Kesteren <annevk at annevk.nl> wrote:

That seems to iterate over code units as far as I can tell.
for (var x of "💩")
  print(x.charCodeAt(0))
invokes print() twice in Gecko.

Woah, that doesn’t seem very useful. Is that a bug, or the way it’s supposed to work? I thought it was supposed to only iterate over whole code points (i.e. only print once for each code point, not once for each surrogate half).

On 18 Oct 2013, at 11:05, Anne van Kesteren <annevk at annevk.nl> wrote:

> On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be> wrote:
>> I disagree. In those situations you should just iterate over the string using `for…of`.
> 
> That seems to iterate over code units as far as I can tell.
> 
> for (var x of "💩")
>  print(x.charCodeAt(0))
> 
> invokes print() twice in Gecko.

Woah, that doesn’t seem very useful. Is that a bug, or the way it’s supposed to work? I thought it was supposed to only iterate over whole code points (i.e. only print once for each code point, not once for each surrogate half).

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 10:18 AM, Andrea Giammarchi wrote:

Am I missing something ?

Yes, we don't want to introduce code point based direct indexing, which alway requires scanning from the front of the string. We already made that decision in the context of charPointAt which only use code unit indices.

On Oct 18, 2013, at 10:18 AM, Andrea Giammarchi wrote:

> if this is true then .at(symbolIndex) should be a no-brain ?
> 
> ```
> var virtualLength = 0;
> for (var x of "💩") {
>   virtualLength++;
> }
> 
> // equivalent of
> for(var i = 0; i < virtualLength; i++) {
>   "💩".at(i);
> }
> 
> ```
> 
> Am I missing something ?

Yes, we don't want to introduce code point based direct indexing, which alway requires scanning from the front of the string.  We already made that decision in the context of charPointAt which only use code unit indices. 

Allen






> 
> 
> On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
> 
> On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:
> 
> > On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> >> On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:
> >>> When you phrase it like that, I see another problem with
> >>> codePointAt(). You can't just replace existing usage of charCodeAt()
> >>> with codePointAt() as that would fail for input with paired
> >>> surrogates. E.g. a simple loop over a string that prints code points
> >>> would print both the code point and the trail surrogate code point for
> >>> a surrogate pair.
> >>
> >> I disagree. In those situations you should just iterate over the string using `for…of`.
> >
> > That seems to iterate over code units as far as I can tell.
> >
> >  for (var x of "💩")
> >    print(x.charCodeAt(0))
> >
> > invokes print() twice in Gecko.
> >
> 
> No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings.
> 
> The spec. for this will be in the next draft that I release.
> 
> Allen
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/7c006ea1/attachment.html>

# Jason Orendorff (12 years ago)

On Fri, Oct 18, 2013 at 12:03 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

for (var x of "💩")
  print(x.charCodeAt(0))
invokes print() twice in Gecko.
No that's not correct, the @@iterator method of String.prototype is supposed to returns an iterator the iterates code points and returns single codepoint strings.

Filed: bugzilla.mozilla.org/show_bug.cgi?id=928508

On Fri, Oct 18, 2013 at 12:03 PM, Allen Wirfs-Brock
<allen at wirfs-brock.com> wrote:
>>  for (var x of "💩")
>>    print(x.charCodeAt(0))
>>
>> invokes print() twice in Gecko.
>
> No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings.

Filed: https://bugzilla.mozilla.org/show_bug.cgi?id=928508

-j

# Andrea Giammarchi (12 years ago)

fair enough, that was my point about

except for RAM and CPU, life is going to be easier for devs

so my counter-question would be: is there any way to do that in core so that we can “💩💩💩”.split() it so that we can have an ArrayLike that with [1] gives back the single “💩” and not the whole thing ?

Or does Mathyas have already a RegExp able to split like that with reasonable perfomance ?

P.S. I am in Chrome and Safari and I had no idea until I've seen that on twitter what kind of “💩” we were talking about :D

fair enough, that was my point about

> except for RAM and CPU, life is going to be easier for devs

so my counter-question would be: is there any way to do that in core so
that we can “💩💩💩”.split() it so that we can have an ArrayLike that with
[1] gives back the single “💩” and not the whole thing ?

Or does Mathyas have already a RegExp able to split like that with
reasonable perfomance ?

P.S. I am in Chrome and Safari and I had no idea until I've seen that on
twitter what kind of “💩” we were talking about :D

On Fri, Oct 18, 2013 at 10:34 AM, Allen Wirfs-Brock
<allen at wirfs-brock.com>wrote:

>
> On Oct 18, 2013, at 10:18 AM, Andrea Giammarchi wrote:
>
> if this is true then .at(symbolIndex) should be a no-brain ?
>
> ```
> var virtualLength = 0;
> for (var x of "💩") {
>   virtualLength++;
> }
>
> // equivalent of
> for(var i = 0; i < virtualLength; i++) {
>   "💩".at(i);
> }
>
> ```
>
> Am I missing something ?
>
>
> Yes, we don't want to introduce code point based direct indexing, which
> alway requires scanning from the front of the string.  We already made that
> decision in the context of charPointAt which only use code unit indices.
>
> Allen
>
>
>
>
>
>
>
>
> On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock <allen at wirfs-brock.com
> > wrote:
>
>>
>> On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:
>>
>> > On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens <mathias at qiwi.be>
>> wrote:
>> >> On 18 Oct 2013, at 10:48, Anne van Kesteren <annevk at annevk.nl> wrote:
>> >>> When you phrase it like that, I see another problem with
>> >>> codePointAt(). You can't just replace existing usage of charCodeAt()
>> >>> with codePointAt() as that would fail for input with paired
>> >>> surrogates. E.g. a simple loop over a string that prints code points
>> >>> would print both the code point and the trail surrogate code point for
>> >>> a surrogate pair.
>> >>
>> >> I disagree. In those situations you should just iterate over the
>> string using `for…of`.
>> >
>> > That seems to iterate over code units as far as I can tell.
>> >
>> >  for (var x of "💩")
>> >    print(x.charCodeAt(0))
>> >
>> > invokes print() twice in Gecko.
>> >
>>
>> No that's not correct, the @@iterator method of String.prototype is
>> supposed to returns an interator the iterates code points and returns
>> single codepoint strings.
>>
>> The spec. for this will be in the next draft that I release.
>>
>> Allen
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/22d2746e/attachment-0001.html>

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 1:12 PM, Andrea Giammarchi wrote:

fair enough, that was my point about

except for RAM and CPU, life is going to be easier for devs

so my counter-question would be: is there any way to do that in core so that we can “💩💩💩”.split() it so that we can have an ArrayLike that with [1] gives back the single “💩” and not the whole thing ?

Array.from( '𝌆𝌆𝌆'))[1]

On Oct 18, 2013, at 1:12 PM, Andrea Giammarchi wrote:

> fair enough, that was my point about 
> 
> > except for RAM and CPU, life is going to be easier for devs
> 
> so my counter-question would be: is there any way to do that in core so that we can “💩💩💩”.split() it so that we can have an ArrayLike that with [1] gives back the single “💩” and not the whole thing ?

Array.from( '𝌆𝌆𝌆'))[1]

Allen

# Mathias Bynens (12 years ago)

Please ignore my previous email; it has been answered already. (It was a draft I wrote up this morning before I lost my internet connection.)

On 18 Oct 2013, at 11:57, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at': '𝌆'.at(0)

Love it!

The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition:
'𝌆'.at(1)
do you still get '𝌆' or do you get the equivalent of String.fromCharCode('𝌆'[1]) ?

In my proposal it would return the equivalent of String.fromCharCode('𝌆'[1]). I think that’s the most sane behavior in that case. This also mimics the way String.codePointAt works in such a case.

Here’s a prollyfill for String.prototype.at based on my earlier proposal: mathiasbynens/String.prototype.at Tests: mathiasbynens/String.prototype.at/blob/master/tests/tests.js

Please ignore my previous email; it has been answered already. (It was a draft I wrote up this morning before I lost my internet connection.)

On 18 Oct 2013, at 11:57, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

> Given that we have charAt, charCodeAt and codePointAt,  I think the most appropiate name for such a method would be 'at':
>      '𝌆'.at(0)

Love it!

> The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition:
> 
>     '𝌆'.at(1)
> 
> do you still get '𝌆' or do you get the equivalent of String.fromCharCode('𝌆'[1]) ?

In my proposal it would return the equivalent of `String.fromCharCode('𝌆'[1])`. I think that’s the most sane behavior in that case. This also mimics the way `String.codePointAt` works in such a case.

Here’s a prollyfill for `String.prototype.at` based on my earlier proposal: https://github.com/mathiasbynens/String.prototype.at Tests: https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js

# Mathias Bynens (12 years ago)

On 18 Oct 2013, at 15:12, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

so my counter-question would be: is there any way to do that in core so that we can “💩💩💩”.split() it so that we can have an ArrayLike that with [1] gives back the single “💩” and not the whole thing ?

This brings us back to the earlier discussion of whether something like String.prototype.codePoints should be added: esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string I think it would be useful

On 18 Oct 2013, at 15:12, Andrea Giammarchi <andrea.giammarchi at gmail.com> wrote:

> so my counter-question would be: is there any way to do that in core so that we can “💩💩💩”.split() it so that we can have an ArrayLike that with [1] gives back the single “💩” and not the whole thing ?

This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string I think it would be useful

# Andrea Giammarchi (12 years ago)

If I understand Allen answer looks like Array.from(“💩&💩”).length would do, being 3, and making the operation straight forward?

If I understand Allen answer looks like `Array.from(“💩&💩”).length` would
do, being 3, and making the operation straight forward?

Cheers


On Fri, Oct 18, 2013 at 1:33 PM, Mathias Bynens <mathias at qiwi.be> wrote:

> On 18 Oct 2013, at 15:12, Andrea Giammarchi <andrea.giammarchi at gmail.com>
> wrote:
>
> > so my counter-question would be: is there any way to do that in core so
> that we can “💩💩💩”.split() it so that we can have an ArrayLike that with
> [1] gives back the single “💩” and not the whole thing ?
>
> This brings us back to the earlier discussion of whether something like
> `String.prototype.codePoints` should be added:
> http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringI think it would be useful
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/2d64ec1e/attachment.html>

# Joshua Bell (12 years ago)

Given that you can only use the proposed String.prototype.at() properly for indexes > 0 if you know the index of a non-BMP character or lead surrogate by some other means, or if you will test the return value for a trailing surrogate, is it really an advantage over using codePointAt / fromCodePoint?

The name "at" is so tempting I'm imagining naive scripts of the form for (i = 0; i < s.length; ++i) { r += s.at(i); } which will work fine until they get a non-BMP input at which point they're suddenly duplicating the trailing surrogates.

Pushing people towards for-of iteration and even Allen's Array.from('𝌆𝌆𝌆'))[1] seems safer; users who need more subtle things have have codePointAt / fromCodePoint available and hopefully the knowledge to use them.

Given that you can only use the proposed String.prototype.at() properly for
indexes > 0 if you know the index of a non-BMP character or lead surrogate
by some other means, or if you will test the return value for a trailing
surrogate, is it really an advantage over using codePointAt / fromCodePoint?

The name "at" is so tempting I'm imagining naive scripts of the form for (i
= 0; i < s.length; ++i) { r += s.at(i); } which will work fine until they
get a non-BMP input at which point they're suddenly duplicating the
trailing surrogates.

Pushing people towards for-of iteration and even Allen's Array.from(
'𝌆𝌆𝌆'))[1] seems safer; users who need more subtle things have have
codePointAt
/ fromCodePoint available and hopefully the knowledge to use them.

On Fri, Oct 18, 2013 at 1:30 PM, Mathias Bynens <mathias at qiwi.be> wrote:

> Please ignore my previous email; it has been answered already. (It was a
> draft I wrote up this morning before I lost my internet connection.)
>
> On 18 Oct 2013, at 11:57, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
>
> > Given that we have charAt, charCodeAt and codePointAt,  I think the most
> appropiate name for such a method would be 'at':
> >      '𝌆'.at(0)
>
> Love it!
>
> > The issue when this sort of method has been discussed in the past has
> been what to do when you index at a trailing surrogate possition:
> >
> >     '𝌆'.at(1)
> >
> > do you still get '𝌆' or do you get the equivalent of
> String.fromCharCode('𝌆'[1]) ?
>
> In my proposal it would return the equivalent of
> `String.fromCharCode('𝌆'[1])`. I think that’s the most sane behavior in
> that case. This also mimics the way `String.codePointAt` works in such a
> case.
>
> Here’s a prollyfill for `String.prototype.at` based on my earlier
> proposal: https://github.com/mathiasbynens/String.prototype.at Tests:
> https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131018/7415e23e/attachment.html>

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:

Array.from( '𝌆𝌆𝌆'))[1]

maybe even better:

Uint32Array.from( '𝌆𝌆𝌆'))[1]

On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
> 
> Array.from( '𝌆𝌆𝌆'))[1]

maybe even better:

Uint32Array.from( '𝌆𝌆𝌆'))[1]

Allen

# Allen Wirfs-Brock (12 years ago)

err...maybe not if you want a string value:

String.fromCodePoint(Uint32Array.from( '𝌆𝌆𝌆')[1])

On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:

> 
> On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
>> 
>> Array.from( '𝌆𝌆𝌆'))[1]
> 
> maybe even better:
> 
> Uint32Array.from( '𝌆𝌆𝌆'))[1]

err...maybe not if you want a string value:

String.fromCodePoint(Uint32Array.from( '𝌆𝌆𝌆')[1])

# André Bargull (12 years ago)

On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:

String.fromCodePoint(Uint32Array.from( '???')[1])

That does not seem to be too useful:

js> String.fromCodePoint(Uint32Array.from("\u{1d306}\u{1d306}\u{1d306}")[1])
"\u0000"

According to norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, String.prototype[@@iterator] does not return plain code points, but the String value for the code point.

> On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:
>
> >/  
> />/  On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
> />>/  
> />>/  Array.from( '???'))[1]
> />/  
> />/  maybe even better:
> />/  
> />/  Uint32Array.from( '???'))[1]
> /
> err...maybe not if you want a string value:
>
> String.fromCodePoint(Uint32Array.from( '???')[1])

That does not seem to be too useful:

js> String.fromCodePoint(Uint32Array.from("\u{1d306}\u{1d306}\u{1d306}")[1])
"\u0000"


According to 
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, 
String.prototype[@@iterator] does not return plain code points, but the 
String value for the code point.


- André
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131019/20634663/attachment.html>

# Mathias Bynens (12 years ago)

On 18 Oct 2013, at 17:51, Joshua Bell <jsbell at google.com> wrote:

Given that you can only use the proposed String.prototype.at() properly for indexes > 0 if you know the index of a non-BMP character or lead surrogate by some other means, or if you will test the return value for a trailing surrogate, is it really an advantage over using codePointAt / fromCodePoint?

The name "at" is so tempting I'm imagining naive scripts of the form for (i = 0; i < s.length; ++i) { r += s.at(i); } which will work fine until they get a non-BMP input at which point they're suddenly duplicating the trailing surrogates.

Pushing people towards for-of iteration and even Allen's Array.from( '𝌆𝌆𝌆'))[1] seems safer; users who need more subtle things have have codePointAt / fromCodePoint available and hopefully the knowledge to use them.

Just because new features can be used incorrectly doesn’t mean the feature isn’t useful. for…of on strings and String.prototype.at are two very different things for two very different use cases. It’s a matter of using the right tool for the job, IMHO.

In your example (iterating over all code points in a string), for…of should be used.

String.prototype.codePointAt or String.prototype.at come in handy in case you only need to get the first code point or symbol in a string, for example.

On 18 Oct 2013, at 17:51, Joshua Bell <jsbell at google.com> wrote:

> Given that you can only use the proposed String.prototype.at() properly for indexes > 0 if you know the index of a non-BMP character or lead surrogate by some other means, or if you will test the return value for a trailing surrogate, is it really an advantage over using codePointAt / fromCodePoint?
> 
> The name "at" is so tempting I'm imagining naive scripts of the form for (i = 0; i < s.length; ++i) { r += s.at(i); } which will work fine until they get a non-BMP input at which point they're suddenly duplicating the trailing surrogates.
> 
> Pushing people towards for-of iteration and even Allen's Array.from( '𝌆𝌆𝌆'))[1] seems safer; users who need more subtle things have have codePointAt / fromCodePoint available and hopefully the knowledge to use them.

Just because new features can be used incorrectly doesn’t mean the feature isn’t useful. `for…of` on strings and `String.prototype.at` are two very different things for two very different use cases. It’s a matter of using the right tool for the job, IMHO.

In your example (iterating over all code points in a string), `for…of` should be used.

`String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example.

# Domenic Denicola (12 years ago)

On 19 Oct 2013, at 01:12, "Mathias Bynens" <mathias at qiwi.be> wrote:

String.prototype.codePointAt or String.prototype.at come in handy in case you only need to get the first code point or symbol in a string, for example.

Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that?

On 19 Oct 2013, at 01:12, "Mathias Bynens" <mathias at qiwi.be> wrote:
> `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example.

Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that?

# Andrea Giammarchi (12 years ago)

so it's a for/of with a break when it finds a code point? if that's the only use case I'd like to have an example of how convenient it is. I am just wondering, not saying is not useful (trying to understand when/where/why I'd like to use .at())

so it's a for/of with a break when it finds a code point? if that's the
only use case I'd like to have an example of how convenient it is. I am
just wondering, not saying is not useful (trying to understand
when/where/why I'd like to use .at())

Thanks


On Fri, Oct 18, 2013 at 10:12 PM, Mathias Bynens <mathias at qiwi.be> wrote:

> On 18 Oct 2013, at 17:51, Joshua Bell <jsbell at google.com> wrote:
>
> > Given that you can only use the proposed String.prototype.at() properly
> for indexes > 0 if you know the index of a non-BMP character or lead
> surrogate by some other means, or if you will test the return value for a
> trailing surrogate, is it really an advantage over using codePointAt /
> fromCodePoint?
> >
> > The name "at" is so tempting I'm imagining naive scripts of the form for
> (i = 0; i < s.length; ++i) { r += s.at(i); } which will work fine until
> they get a non-BMP input at which point they're suddenly duplicating the
> trailing surrogates.
> >
> > Pushing people towards for-of iteration and even Allen's Array.from(
> '𝌆𝌆𝌆'))[1] seems safer; users who need more subtle things have have
> codePointAt / fromCodePoint available and hopefully the knowledge to use
> them.
>
> Just because new features can be used incorrectly doesn’t mean the feature
> isn’t useful. `for…of` on strings and `String.prototype.at` are two very
> different things for two very different use cases. It’s a matter of using
> the right tool for the job, IMHO.
>
> In your example (iterating over all code points in a string), `for…of`
> should be used.
>
> `String.prototype.codePointAt` or `String.prototype.at` come in handy in
> case you only need to get the first code point or symbol in a string, for
> example.
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131019/7c7c5d10/attachment.html>

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 10:53 PM, Domenic Denicola wrote:

On 19 Oct 2013, at 01:12, "Mathias Bynens" <mathias at qiwi.be> wrote:

String.prototype.codePointAt or String.prototype.at come in handy in case you only need to get the first code point or symbol in a string, for example.

Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that?

We discussed the utility of 'codePointAt' in the context of Norbert's full Unicode support proposal. At that time we concluded that it was something we needed. I don't see any new evidence that suggests that we need to reopen that decision at this point in the process.

The utility of a hypothetical 'at' method is presumably exactly that of 'codePointAt'.

str.at(p)

would just be a convenience for expressing

String.fromCodePoint(str.codePointAt(p))

So the real question is probably, how common is that use case.

It's relatively easy using 'at' do a for loop over the characters of a string using 'at'. Something like:

let c = '';
for (let p=0; p<str.length; p+=c.length) {
   c = str.at(p);
   ...
}

although, a for-of would be better in most cases:

for (let c of str)

The use case that we don't support well is any sort of back wards iteration of the characters of a string. We don't current have an iterator specified to do it, nor do we have a one stop way to test whether we at looking at the trailing surrogate of a surrogate pair.

On Oct 18, 2013, at 10:53 PM, Domenic Denicola wrote:

> On 19 Oct 2013, at 01:12, "Mathias Bynens" <mathias at qiwi.be> wrote:
>> `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example.
> 
> Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that?

We discussed the utility of 'codePointAt' in the context of Norbert's full Unicode support proposal.  At that time we concluded that it was something we needed.  I don't see any new evidence  that suggests that we need to reopen that decision at this point in the process.

The utility of a hypothetical 'at' method is presumably exactly that of 'codePointAt'. 

   str.at(p)
would just be a convenience  for expressing
   String.fromCodePoint(str.codePointAt(p))

So the real question is probably, how common is that  use case.

It's relatively easy using 'at'  do a for loop over the characters of a string using 'at'. Something like:

let c = '';
for (let p=0; p<str.length; p+=c.length) {
   c = str.at(p);
   ...
}

although, a for-of would be better in most cases:
   for (let c of str)

The use case that we don't support well is any sort of back wards iteration of the characters of a string. We don't current have an iterator specified to do it, nor do we have a one stop way to test whether we at looking at the trailing surrogate of a surrogate pair.

Allen

# Allen Wirfs-Brock (12 years ago)

On Oct 18, 2013, at 4:22 PM, André Bargull wrote:

That does not seem to be too useful:

js> String.fromCodePoint(Uint32Array.from("\u{1d306}\u{1d306}\u{1d306}")[1])
"\u0000"

right, it would need to be

String.fromCodePoint(Uint32Array.from( '𝌆𝌆𝌆', s=>s.codePointAt(0))[1])

According to norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, String.prototype[@@iterator] does not return plain code points, but the String value for the code point.

yes, that's correct and how I have it spec'ed in rev20

On Oct 18, 2013, at 4:22 PM, André Bargull wrote:

>> On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:
>> 
>> > 
>> > On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
>> >> 
>> >> Array.from( '𝌆𝌆𝌆'))[1]
>> > 
>> > maybe even better:
>> > 
>> > Uint32Array.from( '𝌆𝌆𝌆'))[1]
>> 
>> err...maybe not if you want a string value:
>> 
>> String.fromCodePoint(Uint32Array.from( '𝌆𝌆𝌆')[1])
> 
> That does not seem to be too useful:
> 
> js> String.fromCodePoint(Uint32Array.from("\u{1d306}\u{1d306}\u{1d306}")[1])
> "\u0000"

right, it would need to be

String.fromCodePoint(Uint32Array.from( '𝌆𝌆𝌆', s=>s.codePointAt(0))[1])
> 
> 
> According to http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, String.prototype[@@iterator] does not return plain code points, but the String value for the code point.

yes, that's correct and how I have it spec'ed in rev20

Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131019/0791ad62/attachment.html>

# Bjoern Hoehrmann (12 years ago)

Allen Wirfs-Brock wrote:

The utility of a hypothetical 'at' method is presumably exactly that of 'codePointAt'.
str.at(p)
would just be a convenience for expressing
String.fromCodePoint(str.codePointAt(p))
So the real question is probably, how common is that use case.

Certainly not common enough to warrant a two-character method on the native string type. Odds are people will use it incorrectly in an attempt to make their code look concise, not understanding that it'll retrieve a substring of .length 1 or 2, possibly consisting of a lone surrogate, based on a 16 bit index that might fall in the middle of a character; the problematic cases are fairly rare, so it's hard to notice improper use of .at in automated testing or in code review.

* Allen Wirfs-Brock wrote:
>The utility of a hypothetical 'at' method is presumably exactly that of 'codePointAt'. 
>
>   str.at(p)
>would just be a convenience  for expressing
>   String.fromCodePoint(str.codePointAt(p))
>
>So the real question is probably, how common is that  use case.

Certainly not common enough to warrant a two-character method on the
native string type. Odds are people will use it incorrectly in an
attempt to make their code look concise, not understanding that it'll
retrieve a substring of .length 1 or 2, possibly consisting of a lone
surrogate, based on a 16 bit index that might fall in the middle of a
character; the problematic cases are fairly rare, so it's hard to
notice improper use of `.at` in automated testing or in code review.
-- 
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

# Mathias Bynens (12 years ago)

On 19 Oct 2013, at 12:15, Bjoern Hoehrmann <derhoermi at gmx.net> wrote:

Certainly not common enough to warrant a two-character method on the native string type. Odds are people will use it incorrectly in an attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than at would solve this problem?

[…] not understanding that it'll retrieve a substring of .length 1 or 2, possibly consisting of a lone surrogate, based on a 16 bit index that might fall in the middle of a character; the problematic cases are fairly rare, so it's hard to notice improper use of .at in automated testing or in code review.

People are using String.prototype.charAt() incorrectly too, expecting it to return whole symbols instead of surrogate halves wherever possible. How would not introducing a method that avoids this problem help?

On 19 Oct 2013, at 12:15, Bjoern Hoehrmann <derhoermi at gmx.net> wrote:

> Certainly not common enough to warrant a two-character method on the
> native string type. Odds are people will use it incorrectly in an
> attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than `at` would solve this problem?

> […] not understanding that it'll retrieve a substring of .length 1 or 2,
> possibly consisting of a lone surrogate, based on a 16 bit index that
> might fall in the middle of a character; the problematic cases are
> fairly rare, so it's hard to notice improper use of `.at` in automated
> testing or in code review.

People are using `String.prototype.charAt()` incorrectly too, expecting it to return whole symbols instead of surrogate halves wherever possible. How would _not_ introducing a method that avoids this problem help?

# Mathias Bynens (12 years ago)

On 19 Oct 2013, at 00:53, Domenic Denicola <domenic at domenicdenicola.com> wrote:

On 19 Oct 2013, at 01:12, "Mathias Bynens" <mathias at qiwi.be> wrote:

String.prototype.codePointAt or String.prototype.at come in handy in case you only need to get the first code point or symbol in a string, for example.

Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that?

Yeah, that’s the problem with these methods. Additional user code is required to handle non-zero position arguments, unless you’re sure the position is actually the start of a code point (and not in the middle of a surrogate pair). I guess there are situations where that’s a certainty, for example when you’re dealing with a string in which the user selected some text.

On 19 Oct 2013, at 00:53, Domenic Denicola <domenic at domenicdenicola.com> wrote:

> On 19 Oct 2013, at 01:12, "Mathias Bynens" <mathias at qiwi.be> wrote:
>> `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example.
> 
> Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that?

Yeah, that’s the problem with these methods. Additional user code is required to handle non-zero `position` arguments, unless you’re sure the `position` is actually the start of a code point (and not in the middle of a surrogate pair). I guess there are situations where that’s a certainty, for example when you’re dealing with a string in which the user selected some text.

This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It could be a getter or a generator… Or does `for…of` iteration handle this use case adequately?

# Domenic Denicola (12 years ago)

From: Mathias Bynens [mailto:mathias at qiwi.be]

This brings us back to the earlier discussion of whether something like String.prototype.codePoints should be added: esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It could be a getter or a generator… Or does for…of iteration handle this use case adequately?

It sounds like you are proposing a second name for String.prototype[Symbol.iterator], which does not sound very useful.

A property for the string's "real length" does seem somewhat useful, as does a method that does random-access on "real characters." Certainly more useful than the proposed symbolAt/at. But I suppose we can pave whatever cowpaths arise.

My proposed cowpaths:

Object.mixin(String.prototype, {
  realCharacterAt(i) {
    let index = 0;
    for (var c of this) {
      if (index++ === i) {
        return c;
      }
    }
  }
  get realLength() {
    let counter = 0;
    for (var c of this) {
      ++counter;
    }
    return counter;
  }
});

This would allow you to e.g. find the character in the "real" middle of a string with code like

var middleIndex = Math.floor(theString.realLength / 2);
var middleRealCharacter = theString.realCharacterAt(middleIndex);

From: Mathias Bynens [mailto:mathias at qiwi.be]


> This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It could be a getter or a generator… Or does `for…of` iteration handle this use case adequately?

It sounds like you are proposing a second name for `String.prototype[Symbol.iterator]`, which does not sound very useful.

A property for the string's "real length" does seem somewhat useful, as does a method that does random-access on "real characters." Certainly more useful than the proposed symbolAt/at. But I suppose we can pave whatever cowpaths arise.

My proposed cowpaths:

```js
Object.mixin(String.prototype, {
  realCharacterAt(i) {
    let index = 0;
    for (var c of this) {
      if (index++ === i) {
        return c;
      }
    }
  }
  get realLength() {
    let counter = 0;
    for (var c of this) {
      ++counter;
    }
    return counter;
  }
});
```

This would allow you to e.g. find the character in the "real" middle of a string with code like

```js
var middleIndex = Math.floor(theString.realLength / 2);
var middleRealCharacter = theString.realCharacterAt(middleIndex);
```

# Andrea Giammarchi (12 years ago)

AFAIK that's also what Allen said didn't want to implement in core. An expensive operation per each invocation due stateless loop over arbitrary indexes.

Although, strings are immutable in JS so I'd implement that logic creating a snapshot once and use that as if it was an Array ... something like the following:


!function(dict){

  function getOrCreate(str) {
    if (!(str in dict)) {
      dict[str] = {
        i: 0,
        l: 0,
        v: (Array.from || function(){
          // miserable callback
          return str.split('')
        })(str)
        // or the for/of loop
      };
    }
    // times it's used
    dict[str].i++;
    return dict[str].v;
  }

  setInterval(function () {
    var key, value;
    for(key in dict) {
      value = dict[key];
      value.l = value.i - value.l;
      // used only once or never used again
      if (value.l < 2) {
        // free all the RAM
        delete dict[key];
      }
    }
  }, 5000); // 5 seconds should be enough ?
            // incremental works better with
            // slower timeout though
            // 500 might be good too

  Object.defineProperties(
    String.prototype,
    {
      at: {
        configurable: true,
        writable: true,
        value: function at(i) {
          return getOrCreate(this)[i];
        }
      },
      // or any meaningful name
      size: {
        configurable: true,
        get: function () {
          return getOrCreate(this).length;
        }
      }
    }
  );

}(Object.create(null));


// @example
var str = 'abc';
alert([
  str.size, // 3
  str.at(1) // b
]);

AFAIK that's also what Allen said didn't want to implement in core. An
expensive operation per each invocation due stateless loop over arbitrary
indexes.

Although, strings are immutable in JS so I'd implement that logic creating
a snapshot once and use that as if it was an Array ... something like the
following:

```javascript

!function(dict){

  function getOrCreate(str) {
    if (!(str in dict)) {
      dict[str] = {
        i: 0,
        l: 0,
        v: (Array.from || function(){
          // miserable callback
          return str.split('')
        })(str)
        // or the for/of loop
      };
    }
    // times it's used
    dict[str].i++;
    return dict[str].v;
  }

  setInterval(function () {
    var key, value;
    for(key in dict) {
      value = dict[key];
      value.l = value.i - value.l;
      // used only once or never used again
      if (value.l < 2) {
        // free all the RAM
        delete dict[key];
      }
    }
  }, 5000); // 5 seconds should be enough ?
            // incremental works better with
            // slower timeout though
            // 500 might be good too

  Object.defineProperties(
    String.prototype,
    {
      at: {
        configurable: true,
        writable: true,
        value: function at(i) {
          return getOrCreate(this)[i];
        }
      },
      // or any meaningful name
      size: {
        configurable: true,
        get: function () {
          return getOrCreate(this).length;
        }
      }
    }
  );

}(Object.create(null));


// @example
var str = 'abc';
alert([
  str.size, // 3
  str.at(1) // b
]);


```

Regards




On Sat, Oct 19, 2013 at 10:54 AM, Domenic Denicola <
domenic at domenicdenicola.com> wrote:

> From: Mathias Bynens [mailto:mathias at qiwi.be]
>
>
> > This brings us back to the earlier discussion of whether something like
> `String.prototype.codePoints` should be added:
> http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringIt could be a getter or a generator… Or does `for…of` iteration handle this
> use case adequately?
>
> It sounds like you are proposing a second name for
> `String.prototype[Symbol.iterator]`, which does not sound very useful.
>
> A property for the string's "real length" does seem somewhat useful, as
> does a method that does random-access on "real characters." Certainly more
> useful than the proposed symbolAt/at. But I suppose we can pave whatever
> cowpaths arise.
>
> My proposed cowpaths:
>
> ```js
> Object.mixin(String.prototype, {
>   realCharacterAt(i) {
>     let index = 0;
>     for (var c of this) {
>       if (index++ === i) {
>         return c;
>       }
>     }
>   }
>   get realLength() {
>     let counter = 0;
>     for (var c of this) {
>       ++counter;
>     }
>     return counter;
>   }
> });
> ```
>
> This would allow you to e.g. find the character in the "real" middle of a
> string with code like
>
> ```js
> var middleIndex = Math.floor(theString.realLength / 2);
> var middleRealCharacter = theString.realCharacterAt(middleIndex);
> ```
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131019/790ca9bf/attachment.html>

# Andrea Giammarchi (12 years ago)

example mroe readable and with some typo fixed in github: gist.github.com/WebReflection/7059536

license wtfpl v2 www.wtfpl.net/txt/copying

example mroe readable and with some typo fixed in github:
https://gist.github.com/WebReflection/7059536

license wtfpl v2 http://www.wtfpl.net/txt/copying/

Cheers


On Sat, Oct 19, 2013 at 11:18 AM, Andrea Giammarchi <
andrea.giammarchi at gmail.com> wrote:

> AFAIK that's also what Allen said didn't want to implement in core. An
> expensive operation per each invocation due stateless loop over arbitrary
> indexes.
>
> Although, strings are immutable in JS so I'd implement that logic creating
> a snapshot once and use that as if it was an Array ... something like the
> following:
>
> ```javascript
>
> !function(dict){
>
>   function getOrCreate(str) {
>     if (!(str in dict)) {
>       dict[str] = {
>         i: 0,
>         l: 0,
>         v: (Array.from || function(){
>           // miserable callback
>           return str.split('')
>         })(str)
>         // or the for/of loop
>       };
>     }
>     // times it's used
>     dict[str].i++;
>     return dict[str].v;
>   }
>
>   setInterval(function () {
>     var key, value;
>     for(key in dict) {
>       value = dict[key];
>       value.l = value.i - value.l;
>       // used only once or never used again
>       if (value.l < 2) {
>         // free all the RAM
>         delete dict[key];
>       }
>     }
>   }, 5000); // 5 seconds should be enough ?
>             // incremental works better with
>             // slower timeout though
>             // 500 might be good too
>
>   Object.defineProperties(
>     String.prototype,
>     {
>       at: {
>         configurable: true,
>         writable: true,
>         value: function at(i) {
>           return getOrCreate(this)[i];
>         }
>       },
>       // or any meaningful name
>       size: {
>         configurable: true,
>         get: function () {
>           return getOrCreate(this).length;
>         }
>       }
>     }
>   );
>
> }(Object.create(null));
>
>
> // @example
> var str = 'abc';
> alert([
>   str.size, // 3
>   str.at(1) // b
> ]);
>
>
> ```
>
> Regards
>
>
>
>
> On Sat, Oct 19, 2013 at 10:54 AM, Domenic Denicola <
> domenic at domenicdenicola.com> wrote:
>
>> From: Mathias Bynens [mailto:mathias at qiwi.be]
>>
>>
>> > This brings us back to the earlier discussion of whether something like
>> `String.prototype.codePoints` should be added:
>> http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringIt could be a getter or a generator… Or does `for…of` iteration handle this
>> use case adequately?
>>
>> It sounds like you are proposing a second name for
>> `String.prototype[Symbol.iterator]`, which does not sound very useful.
>>
>> A property for the string's "real length" does seem somewhat useful, as
>> does a method that does random-access on "real characters." Certainly more
>> useful than the proposed symbolAt/at. But I suppose we can pave whatever
>> cowpaths arise.
>>
>> My proposed cowpaths:
>>
>> ```js
>> Object.mixin(String.prototype, {
>>   realCharacterAt(i) {
>>     let index = 0;
>>     for (var c of this) {
>>       if (index++ === i) {
>>         return c;
>>       }
>>     }
>>   }
>>   get realLength() {
>>     let counter = 0;
>>     for (var c of this) {
>>       ++counter;
>>     }
>>     return counter;
>>   }
>> });
>> ```
>>
>> This would allow you to e.g. find the character in the "real" middle of a
>> string with code like
>>
>> ```js
>> var middleIndex = Math.floor(theString.realLength / 2);
>> var middleRealCharacter = theString.realCharacterAt(middleIndex);
>> ```
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131019/f73ff613/attachment-0001.html>

# Bjoern Hoehrmann (12 years ago)

Mathias Bynens wrote:

Are you saying that changing the name to something that is longer than at would solve this problem?

If it was .getOneOrTwoCodepointLongSubstringAtUcs2CodeUnitIndex(...) I am sure people would be reluctant using it because it's unreasonably long compared to String.fromCodePoint(str.codePointAt(p)) and harder to understand than the combination of those two primitives.

People are using String.prototype.charAt() incorrectly too, expecting it to return whole symbols instead of surrogate halves wherever possible. How would not introducing a method that avoids this problem help?

Right now people do not have much of a choice other than writing code that does not do the right thing when faced with malformed strings or non-BMP characters, it's unreasonable to call a method like substr and then manually smooth it up around the edges and perhaps scan the interior for lone surrogates to ensure that at least your code doesn't do the wrong thing. That gives you "well-known bad" code, which is a good thing to have, better than more complicated code that might have unknown bugs. Allen's loop for (let p=0; p<str.length; p+=c.length) for instance is just waiting for someone to improve or replace it with code that increments by 1 instead of .length because that's simpler.

The methods fromCodePoint and codePointAt can be used to get ugly constants out of code that tries to do the right thing, and they will offer some insight into how developers might go from UCS-only code to something more proper, but for the moment duplicating all the UCS-based methods strikes me as premature, especially when giving them seductive names. How would a somewhat-surrogate-aware substring method work and what would it be called, for instance? If it is omitted, we would be back to square one, someone in need of substring functionality has to jump through overly complicated hoops to make it work "correctly" and ends up mixing surrogate-pair-aware with -unaware code.

* Mathias Bynens wrote:
>On 19 Oct 2013, at 12:15, Bjoern Hoehrmann <derhoermi at gmx.net> wrote:
>
>> Certainly not common enough to warrant a two-character method on the
>> native string type. Odds are people will use it incorrectly in an
>> attempt to make their code look concise […]
>
>Are you saying that changing the name to something that is longer than 
>`at` would solve this problem?

If it was `.getOneOrTwoCodepointLongSubstringAtUcs2CodeUnitIndex(...)`
I am sure people would be reluctant using it because it's unreasonably
long compared to `String.fromCodePoint(str.codePointAt(p))` and harder
to understand than the combination of those two primitives.

>> […] not understanding that it'll retrieve a substring of .length 1 or 2,
>> possibly consisting of a lone surrogate, based on a 16 bit index that
>> might fall in the middle of a character; the problematic cases are
>> fairly rare, so it's hard to notice improper use of `.at` in automated
>> testing or in code review.
>
>People are using `String.prototype.charAt()` incorrectly too, expecting
>it to return whole symbols instead of surrogate halves wherever possible.
>How would _not_ introducing a method that avoids this problem help?

Right now people do not have much of a choice other than writing code
that does not do the right thing when faced with malformed strings or
non-BMP characters, it's unreasonable to call a method like `substr`
and then manually smooth it up around the edges and perhaps scan the
interior for lone surrogates to ensure that at least your code doesn't
do the wrong thing. That gives you "well-known bad" code, which is a
good thing to have, better than more complicated code that might have
unknown bugs. Allen's loop `for (let p=0; p<str.length; p+=c.length)`
for instance is just waiting for someone to improve or replace it with
code that increments by `1` instead of `.length` because that's simpler.

The methods `fromCodePoint` and `codePointAt` can be used to get ugly
constants out of code that tries to do the right thing, and they will
offer some insight into how developers might go from UCS-only code to
something more proper, but for the moment duplicating all the UCS-based
methods strikes me as premature, especially when giving them seductive
names. How would a somewhat-surrogate-aware `substring` method work and
what would it be called, for instance? If it is omitted, we would be
back to square one, someone in need of substring functionality has to
jump through overly complicated hoops to make it work "correctly" and
ends up mixing surrogate-pair-aware with -unaware code.
-- 
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

# Brendan Eich (12 years ago)

Allen Wirfs-Brock wrote:

The use case that we don't support well is any sort of back wards iteration of the characters of a string. We don't current have an iterator specified to do it, nor do we have a one stop way to test whether we at looking at the trailing surrogate of a surrogate pair.

What do you mean by "one stop"? O(1)? We aren't going to mandate implementations make such tests (or backward iteration) that cheap.

Is there yet a real world (from the field, not a testcase) use-case for backward iteration?

Allen Wirfs-Brock wrote:
> The use case that we don't support well is any sort of back wards iteration of the characters of a string. We don't current have an iterator specified to do it, nor do we have a one stop way to test whether we at looking at the trailing surrogate of a surrogate pair.

What do you mean by "one stop"? O(1)? We aren't going to mandate 
implementations make such tests (or backward iteration) that cheap.

Is there yet a real world (from the field, not a testcase) use-case for 
backward iteration?

/be

# Andrea Giammarchi (12 years ago)

a nested loop might be a concrete case where O(n) happens ... not so common with strings but quite possibly used in many parsers implemented in JS itself.

a nested loop might be a concrete case where `O(n)` happens ... not so
common with strings but quite possibly used in many parsers implemented in
JS itself.


On Sat, Oct 19, 2013 at 12:11 PM, Brendan Eich <brendan at mozilla.com> wrote:

> Allen Wirfs-Brock wrote:
>
>> The use case that we don't support well is any sort of back wards
>> iteration of the characters of a string. We don't current have an iterator
>> specified to do it, nor do we have a one stop way to test whether we at
>> looking at the trailing surrogate of a surrogate pair.
>>
>
> What do you mean by "one stop"? O(1)? We aren't going to mandate
> implementations make such tests (or backward iteration) that cheap.
>
> Is there yet a real world (from the field, not a testcase) use-case for
> backward iteration?
>
> /be
>
> ______________________________**_________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/**listinfo/es-discuss<https://mail.mozilla.org/listinfo/es-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131019/353c3c9b/attachment.html>

# Mathias Bynens (12 years ago)

On 19 Oct 2013, at 12:54, Domenic Denicola <domenic at domenicdenicola.com> wrote:

My proposed cowpaths:

Object.mixin(String.prototype, {
 realCharacterAt(i) {
   let index = 0;
   for (var c of this) {
     if (index++ === i) {
       return c;
     }
   }
 }
 get realLength() {
   let counter = 0;
   for (var c of this) {
     ++counter;
   }
   return counter;
 }
});

Good stuff!

To account for [lookalike symbols due to combining marks] 1, just add a call to String.prototype.normalize:

Object.mixin(String.prototype, {
  get realLength() {
    let counter = 0;
    for (var c of this.normalize('NFC')) {
      ++counter;
    }
    return counter;
  }
});

assert('ma\xF1ana'.realLength == 'man\u0303ana'.realLength);

On 19 Oct 2013, at 12:54, Domenic Denicola <domenic at domenicdenicola.com> wrote:

> My proposed cowpaths:
> 
> ```js
> Object.mixin(String.prototype, {
>  realCharacterAt(i) {
>    let index = 0;
>    for (var c of this) {
>      if (index++ === i) {
>        return c;
>      }
>    }
>  }
>  get realLength() {
>    let counter = 0;
>    for (var c of this) {
>      ++counter;
>    }
>    return counter;
>  }
> });
> ```

Good stuff!

To account for [lookalike symbols due to combining marks] [1], just add a call to `String.prototype.normalize`:

    Object.mixin(String.prototype, {
      get realLength() {
        let counter = 0;
        for (var c of this.normalize('NFC')) {
          ++counter;
        }
        return counter;
      }
    });
    
    assert('ma\xF1ana'.realLength == 'man\u0303ana'.realLength);

[1]: http://mathiasbynens.be/notes/javascript-unicode#accounting-for-lookalikes

# Mathias Bynens (11 years ago)

Allen mentioned that String#at might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)

Looking over the ‘TC39 progress’ document at docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See mths.be/at.

Is there anything else I can do to help get this included as a non-TC39-member?

Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)

Looking over the ‘TC39 progress’ document at <https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU>, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See <http://mths.be/at>.

Is there anything else I can do to help get this included as a non-TC39-member?

# Domenic Denicola (11 years ago)

This was the method that was only useful if you pass 0 to it?

This was the method that was only useful if you pass `0` to it?

-----Original Message-----
From: es-discuss [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Mathias Bynens
Sent: Friday, February 14, 2014 10:34
To: Rick Waldron; Allen Wirfs-Brock
Cc: es-discuss at mozilla.org list
Subject: Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I've now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)

Looking over the 'TC39 progress' document at <https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU>, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there's an example implementation/polyfill with unit tests. See <http://mths.be/at>.

Is there anything else I can do to help get this included as a non-TC39-member?
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

# C. Scott Ananian (11 years ago)

Note that Array.from(str) and str[Symbol.iterator] overlap significantly. In particular, it's somewhat awkward to iterate over code points using String#symbolAt; it's much easier to use substr() and then use the StringIterator.

ps. I see that Domenic has said something similar.

Note that `Array.from(str)` and `str[Symbol.iterator]` overlap
significantly.  In particular, it's somewhat awkward to iterate over
code points using `String#symbolAt`; it's much easier to use
`substr()` and then use the StringIterator.
  --scott
ps. I see that Domenic has said something similar.

On Thu, Feb 13, 2014 at 11:34 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)
>
> Looking over the ‘TC39 progress’ document at <https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU>, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See <http://mths.be/at>.
>
> Is there anything else I can do to help get this included as a non-TC39-member?
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# Mathias Bynens (11 years ago)

On 14 Feb 2014, at 11:11, Domenic Denicola <domenic at domenicdenicola.com> wrote:

This was the method that was only useful if you pass 0 to it?

I’ll just avoid the infinite loop here by pointing to earlier posts in this thread where this was discussed before: esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-34 and esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-40.

This method is just as useful as String.prototype.codePointAt. If that method is included, so should String.prototype.at. If String.prototype.at is found not to be useful, String.prototype.codePointAt should be removed too.

On 14 Feb 2014, at 11:11, Domenic Denicola <domenic at domenicdenicola.com> wrote:

> This was the method that was only useful if you pass `0` to it?

I’ll just avoid the infinite loop here by pointing to earlier posts in this thread where this was discussed before: <http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-34> and <http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-40>.

This method is just as useful as `String.prototype.codePointAt`. If that method is included, so should `String.prototype.at`. If `String.prototype.at` is found not to be useful, `String.prototype.codePointAt` should be removed too.

# Mathias Bynens (11 years ago)

On 14 Feb 2014, at 11:14, C. Scott Ananian <ecmascript at cscott.net> wrote:

Note that Array.from(str) and str[Symbol.iterator] overlap significantly. In particular, it's somewhat awkward to iterate over code points using String#symbolAt; it's much easier to use substr() and then use the StringIterator.

String#at is not meant for iterating over code points – that’s what the StringIterator is for.

String#at is exactly like String#codePointAt except it returns strings (containing the symbol) instead of numbers (representing the code point value). It can be used to get the symbol at a given code unit position in a string (similar to how String#codePointAt can be used to get the code point at a given code unit position in a string).

On 14 Feb 2014, at 11:14, C. Scott Ananian <ecmascript at cscott.net> wrote:

> Note that `Array.from(str)` and `str[Symbol.iterator]` overlap
> significantly.  In particular, it's somewhat awkward to iterate over
> code points using `String#symbolAt`; it's much easier to use
> `substr()` and then use the StringIterator.

`String#at` is not meant for iterating over code points – that’s what the `StringIterator` is for.

`String#at` is exactly like `String#codePointAt` except it returns strings (containing the symbol) instead of numbers (representing the code point value). It can be used to get the symbol at a given code unit position in a string (similar to how `String#codePointAt` can be used to get the code point at a given code unit position in a string).

# C. Scott Ananian (11 years ago)

Yes, I know what String#at is supposed to do.

I was pointing out that String#at makes it easy to do the wrong thing. If you do Array.from(str) then you suddenly have a complete random-access data structure where you can find out the number of code points in the String, iterate it in reverse from the end to the start, slice it, find the midpoint, etc. Array.from looks like an O(n) operation, and it is -- so it encourages developers to cache the value and reuse it.

That said, I can see where a lexer might want to use String#at, being careful to do the correct index bump based on result.length. However, the fastest JS lexers don't create String objects, they operate directly on the code point (see marijnhaverbeke.nl/acorn/#section-58). So I'm -0, mostly because the name isn't great. But I have exactly zero say in the matter anyway. So I'll shut up now.

Yes, I know what `String#at` is supposed to do.

I was pointing out that `String#at` makes it easy to do the wrong
thing.  If you do `Array.from(str)` then you suddenly have a complete
random-access data structure where you can find out the number of code
points in the String, iterate it in reverse from the end to the start,
slice it, find the midpoint, etc.  `Array.from` looks like an O(n)
operation, and it is -- so it encourages developers to cache the value
and reuse it.

That said, I can see where a lexer might want to use `String#at`,
being careful to do the correct index bump based on `result.length`.
However, the fastest JS lexers don't create String objects, they
operate directly on the code point (see
http://marijnhaverbeke.nl/acorn/#section-58).  So I'm -0, mostly
because the name isn't great.  But I have exactly zero say in the
matter anyway.  So I'll shut up now.
 --scott

# Domenic Denicola (11 years ago)

I think Mathias's point, that it is exactly as useful or useless as codePointAt, is a reasonable one. However,

This method is just as useful as String.prototype.codePointAt. If that method is included, so should String.prototype.at. If String.prototype.at is found not to be useful, String.prototype.codePointAt should be removed too.

This does not follow. The choice is not between adding two useless methods and adding zero. There is no reason to exclude the possibility of adding only one useless method.

But anyway, as some people seem to think that both methods are in fact useful---including Rick, who has agreed to champion---I agree with Scott that after having said our piece it's time to exit the thread.

I think Mathias's point, that it is exactly as useful or useless as `codePointAt`, is a reasonable one. However,

> This method is just as useful as `String.prototype.codePointAt`. If that method is included, so should `String.prototype.at`. If `String.prototype.at` is found not to be useful, `String.prototype.codePointAt` should be removed too.

This does not follow. The choice is not between adding two useless methods and adding zero. There is no reason to exclude the possibility of adding only one useless method.

But anyway, as some people seem to think that both methods are in fact useful---including Rick, who has agreed to champion---I agree with Scott that after having said our piece it's time to exit the thread.

-----Original Message-----
From: es-discuss [mailto:es-discuss-bounces at mozilla.org] On Behalf Of C. Scott Ananian
Sent: Friday, February 14, 2014 12:12
To: Mathias Bynens
Cc: es-discuss at mozilla.org list
Subject: Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

Yes, I know what `String#at` is supposed to do.

I was pointing out that `String#at` makes it easy to do the wrong thing.  If you do `Array.from(str)` then you suddenly have a complete random-access data structure where you can find out the number of code points in the String, iterate it in reverse from the end to the start, slice it, find the midpoint, etc.  `Array.from` looks like an O(n) operation, and it is -- so it encourages developers to cache the value and reuse it.

That said, I can see where a lexer might want to use `String#at`, being careful to do the correct index bump based on `result.length`.
However, the fastest JS lexers don't create String objects, they operate directly on the code point (see http://marijnhaverbeke.nl/acorn/#section-58).  So I'm -0, mostly because the name isn't great.  But I have exactly zero say in the matter anyway.  So I'll shut up now.
 --scott
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

# Rick Waldron (11 years ago)

On Fri, Feb 14, 2014 at 1:34 AM, Mathias Bynens <mathias at qiwi.be> wrote:

Allen mentioned that String#at might not make it to ES6 because nobody in TC39 is championing it. I've now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)

Published to wiki here: strawman:string_at

On Fri, Feb 14, 2014 at 1:34 AM, Mathias Bynens <mathias at qiwi.be> wrote:

> Allen mentioned that `String#at` might not make it to ES6 because nobody
> in TC39 is championing it. I've now asked Rick if he would be the champion
> for this, and he agreed. (Thanks again!)
>

Published to wiki here:
http://wiki.ecmascript.org/doku.php?id=strawman:string_at

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140214/e074b230/attachment.html>

# Allen Wirfs-Brock (11 years ago)

On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote:

Allen mentioned that String#at might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)

Looking over the ‘TC39 progress’ document at docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See mths.be/at.

Is there anything else I can do to help get this included as a non-TC39-member?

But just to be even clear, the new feature gate for ES6 is officially closed.

It's a really high bar to get over that closed gate. Unless the exclusion of a feature was a mistake, fixes a bug, or is somehow essentially to supporting something that is already in ES6 I don't think we should be talking about adding it to ES6.

I don't think String.prototype.at fits any of those criteria. We've talked about it several times, including in the context of Norbert's original ES6 full unicode support proposal, and never achieved consensus on including it. Personally, I think it should be there but it's time to start talking about it for ES7 not ES6.

On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote:

> Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!)
> 
> Looking over the ‘TC39 progress’ document at <https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU>, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See <http://mths.be/at>.
> 
> Is there anything else I can do to help get this included as a non-TC39-member?
> 

But just to be even clear,  the new feature gate for ES6 is officially closed.

It's a really high bar to get over that closed gate.  Unless the exclusion of a feature was a mistake, fixes a bug, or is somehow essentially to supporting something that is already in ES6 I don't think we should be talking about adding it to ES6.

I don't think String.prototype.at fits any of those criteria.  We've talked about it several times, including in the context of Norbert's original ES6 full unicode support proposal, and never achieved consensus on including it.  Personally, I think it should be there but it's time to start talking about it for ES7 not ES6.

Allen

# Rick Waldron (11 years ago)

Yes, I absolutely agree, apologies as I realize that was not addressed in my previous message.

On Fri, Feb 14, 2014 at 10:59 AM, Allen Wirfs-Brock
<allen at wirfs-brock.com>wrote:

> On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote:
>
> > Allen mentioned that `String#at` might not make it to ES6 because nobody
> in TC39 is championing it. I've now asked Rick if he would be the champion
> for this, and he agreed. (Thanks again!)
> >
> > Looking over the 'TC39 progress' document at <
> https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU>,
> it seems most of the work is already taken care of: the use case was
> discussed in this thread, the proposal has a complete spec text, and
> there's an example implementation/polyfill with unit tests. See <
> http://mths.be/at>.
> >
> > Is there anything else I can do to help get this included as a
> non-TC39-member?
> >
>
> But just to be even clear,  the new feature gate for ES6 is officially
> closed.


> It's a really high bar to get over that closed gate.  Unless the exclusion
> of a feature was a mistake, fixes a bug, or is somehow essentially to
> supporting something that is already in ES6 I don't think we should be
> talking about adding it to ES6.
>
> I don't think String.prototype.at fits any of those criteria.  We've
> talked about it several times, including in the context of Norbert's
> original ES6 full unicode support proposal, and never achieved consensus on
> including it.  Personally, I think it should be there but it's time to
> start talking about it for ES7 not ES6.
>

Yes, I absolutely agree, apologies as I realize that was not addressed in
my previous message.

Rick


> Allen
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140214/e2cc6084/attachment.html>

# C. Scott Ananian (11 years ago)

I'm excited to start working on es7-shim once we get to that point! (String.prototype.at has a particularly simple shim, thankfully...)

I'm excited to start working on es7-shim once we get to that point!
(String.prototype.at has a particularly simple shim, thankfully...)
  --scott

# Rick Waldron (11 years ago)

On Fri, Feb 14, 2014 at 12:23 PM, C. Scott Ananian <ecmascript at cscott.net>wrote:

I'm excited to start working on es7-shim once we get to that point! (String.prototype.at has a particularly simple shim, thankfully...)

Have you seen: mathiasbynens/String.prototype.at ?

On Fri, Feb 14, 2014 at 12:23 PM, C. Scott Ananian <ecmascript at cscott.net>wrote:

> I'm excited to start working on es7-shim once we get to that point!
> (String.prototype.at has a particularly simple shim, thankfully...)
>

Have you seen: https://github.com/mathiasbynens/String.prototype.at ?

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140214/c9baa583/attachment.html>

# C. Scott Ananian (11 years ago)

yes, of course. es6-shim is a large-ish collection of such.

However, it would be much better to use an implementation of String#at which used substr and thus avoided creating and appending a new string object.

yes, of course.  es6-shim is a large-ish collection of such.

However, it would be much better to use an implementation of
`String#at` which used substr and thus avoided creating and appending
a new string object.
 --scott

# Brendan Eich (11 years ago)

Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious. But here's a pointer:

C. Scott Ananian wrote:

new string object.

"new string primitive", because "string object" (especially with "new" in front) suggests new String('hi').

Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's JS 
not ES :-P), nor euphonious. But here's a pointer:


C. Scott Ananian wrote:
> new string object.

"new string primitive", because "string object" (especially with "new" 
in front) suggests new String('hi').

/be

# Mathias Bynens (11 years ago)

On 14 Feb 2014, at 19:59, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

It's a really high bar to get over that closed gate. Unless the exclusion of a feature was a mistake […] I don't think we should be talking about adding it to ES6.

It does feel like a mistake to me to introduce String.prototype.codePointAt, but no similar function that returns the symbol instead.

On 14 Feb 2014, at 19:59, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

> It's a really high bar to get over that closed gate.  Unless the exclusion of a feature was a mistake […] I don't think we should be talking about adding it to ES6.

It does feel like a mistake to me to introduce `String.prototype.codePointAt`, but no similar function that returns the symbol instead.

# C. Scott Ananian (11 years ago)

On Feb 15, 2014 9:13 AM, "Brendan Eich" <brendan at mozilla.com> wrote:

Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious.

I'm learning all sorts of things! I guess there are two names here; what's your preferred phrase for "the language used to write algorithms in the ES6 spec" (JS6?), and, if it differs, "the language used by members of the TC39 committee among themselves when describing language primitives in a very precise way"?

"new string primitive", because "string object" (especially with "new" in front) suggests new String('hi').

I wrestled with the phrasing there. I think what I really mean is "avoid allocating new backing storage", since there are "new string primitives" returned regardless. If there's a better phrase for "string backing storage" I'd be glad to add that to my dictionary.

On Feb 15, 2014 9:13 AM, "Brendan Eich" <brendan at mozilla.com> wrote:
> Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's JS
not ES :-P), nor euphonious.

I'm learning all sorts of things! I guess there are two names here; what's
your preferred phrase for "the language used to write algorithms in the ES6
spec" (JS6?), and, if it differs, "the language used by members of the TC39
committee among themselves when describing language primitives in a very
precise way"?

>> new string object.
>
> "new string primitive", because "string object" (especially with "new" in
front) suggests new String('hi').

I wrestled with the phrasing there. I think what I really mean is "avoid
allocating new backing storage", since there are "new string primitives"
returned regardless.  If there's a better phrase for "string backing
storage" I'd be glad to add that to my dictionary.
  --scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140215/58f4524e/attachment.html>

# Brendan Eich (11 years ago)

C. Scott Ananian wrote:

I'm learning all sorts of things! I guess there are two names here; what's your preferred phrase for "the language used to write algorithms in the ES6 spec" (JS6?), and, if it differs, "the language used by members of the TC39 committee among themselves when describing language primitives in a very precise way"?

When I'm in a bad mood, I call it VisualCobol. It's painfully low-level and verbose, yet hard to verify. Let's hope that the JSCert work will help, and Allen has been common'ing subroutines. Whatever we call it, the spec language ain't great.

Using "-Speak" as a stem conjures Orwell. Not good.

The definition of array-like -- an informal bit of jargon, useful (e.g., "array-like vs. iterable" in context in larger discussions about Array.from) until it's time to get precise -- is a spec matter. I agree we need a common definition that we use consistently.

I wrestled with the phrasing there. I think what I really mean is "avoid allocating new backing storage", since there are "new string primitives" returned regardless. If there's a better phrase for "string backing storage" I'd be glad to add that to my dictionary.

What does "backing storage" mean? There are no new String objects in any event. There may be ropes or dependent strings under the hood, but that's all unobservable (apart from performance) implementation-land.

C. Scott Ananian wrote:
>
> On Feb 15, 2014 9:13 AM, "Brendan Eich" <brendan at mozilla.com 
> <mailto:brendan at mozilla.com>> wrote:
> > Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's 
> JS not ES :-P), nor euphonious.
>
> I'm learning all sorts of things! I guess there are two names here; 
> what's your preferred phrase for "the language used to write 
> algorithms in the ES6 spec" (JS6?), and, if it differs, "the language 
> used by members of the TC39 committee among themselves when describing 
> language primitives in a very precise way"?
>

When I'm in a bad mood, I call it VisualCobol. It's painfully low-level 
and verbose, yet hard to verify. Let's hope that the JSCert work will 
help, and Allen has been common'ing subroutines. Whatever we call it, 
the spec language ain't great.

Using "-Speak" as a stem conjures Orwell. Not good.

The definition of array-like -- an informal bit of jargon, useful (e.g., 
"array-like vs. iterable" in context in larger discussions about 
Array.from) until it's time to get precise -- is a spec matter. I agree 
we need a common definition that we use consistently.

> >> new string object.
> >
> > "new string primitive", because "string object" (especially with 
> "new" in front) suggests new String('hi').
>
> I wrestled with the phrasing there. I think what I really mean is 
> "avoid allocating new backing storage", since there are "new string 
> primitives" returned regardless.  If there's a better phrase for 
> "string backing storage" I'd be glad to add that to my dictionary.
>

What does "backing storage" mean? There are no new String objects in any 
event. There may be ropes or dependent strings under the hood, but 
that's all unobservable (apart from performance) implementation-land.

/be

# Allen Wirfs-Brock (11 years ago)

On Feb 15, 2014, at 11:47 AM, Brendan Eich wrote:

When I'm in a bad mood, I call it VisualCobol. It's painfully low-level and verbose, yet hard to verify. Let's hope that the JSCert work will help, and Allen has been common'ing subroutines. Whatever we call it, the spec language ain't great.

But remember, prior to ES5, it was closer to Cobolish machine language. No structured control, goto's targeting numeric step numbers, intermediate results referenced by step number (sorta SSA with numeric ids), etc.

There has never been a complete redo, just incremental improvements and refactorings. But we've definitely advanced from the early 1950s to the late 1970s.

On Feb 15, 2014, at 11:47 AM, Brendan Eich wrote:

> C. Scott Ananian wrote:
>> 
>> On Feb 15, 2014 9:13 AM, "Brendan Eich" <brendan at mozilla.com <mailto:brendan at mozilla.com>> wrote:
>> > Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious.
>> 
>> I'm learning all sorts of things! I guess there are two names here; what's your preferred phrase for "the language used to write algorithms in the ES6 spec" (JS6?), and, if it differs, "the language used by members of the TC39 committee among themselves when describing language primitives in a very precise way"?
>> 
> 
> When I'm in a bad mood, I call it VisualCobol. It's painfully low-level and verbose, yet hard to verify. Let's hope that the JSCert work will help, and Allen has been common'ing subroutines. Whatever we call it, the spec language ain't great.

But remember, prior to ES5, it was closer to Cobolish machine language.  No structured control, goto's targeting numeric step numbers, intermediate results referenced by step number (sorta  SSA with numeric ids), etc.

There has never been a complete redo, just incremental improvements and refactorings. But we've definitely advanced from the early 1950s to the late 1970s.  

Allen

# Andreas Rossberg (11 years ago)

Well, Algol-60 already was more structured a language than our spec-speak. Let alone how far the Algol-68 spec was ahead of us. :)

On 15 February 2014 21:06, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
> On Feb 15, 2014, at 11:47 AM, Brendan Eich wrote:
>> C. Scott Ananian wrote:
>>>
>>> On Feb 15, 2014 9:13 AM, "Brendan Eich" <brendan at mozilla.com <mailto:brendan at mozilla.com>> wrote:
>>> > Aside: "ECMASpeak" is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious.
>>>
>>> I'm learning all sorts of things! I guess there are two names here; what's your preferred phrase for "the language used to write algorithms in the ES6 spec" (JS6?), and, if it differs, "the language used by members of the TC39 committee among themselves when describing language primitives in a very precise way"?
>>
>> When I'm in a bad mood, I call it VisualCobol. It's painfully low-level and verbose, yet hard to verify. Let's hope that the JSCert work will help, and Allen has been common'ing subroutines. Whatever we call it, the spec language ain't great.
>
> But remember, prior to ES5, it was closer to Cobolish machine language.  No structured control, goto's targeting numeric step numbers, intermediate results referenced by step number (sorta  SSA with numeric ids), etc.
>
> There has never been a complete redo, just incremental improvements and refactorings. But we've definitely advanced from the early 1950s to the late 1970s.

Well, Algol-60 already was more structured a language than our
spec-speak. Let alone how far the Algol-68 spec was ahead of us. :)

/Andreas

# Andreas Rossberg (11 years ago)

On 15 February 2014 20:47, Brendan Eich <brendan at mozilla.com> wrote:

Using "-Speak" as a stem conjures Orwell. Not good.

Ah, relax. Gilad Bracha even named his own language Newspeak. Self-mockery is good.

On 15 February 2014 20:47, Brendan Eich <brendan at mozilla.com> wrote:
> Using "-Speak" as a stem conjures Orwell. Not good.

Ah, relax. Gilad Bracha even named his own language Newspeak.
Self-mockery is good.

/Andreas

# Allen Wirfs-Brock (11 years ago)

On Feb 17, 2014, at 4:38 AM, Andreas Rossberg wrote:

Well, Algol-60 already was more structured a language than our spec-speak. Let alone how far the Algol-68 spec was ahead of us. :)

We were discussing the nature of the ES spec. pseudo code, not comparing pseudo code to a complete programming language.

Structured programming styles weren't widely adopted until the mid to late 1970's.

The Algol 60 Report used English prose to describe its semantics. the ES specs are closer in style to the Algol 68 Report although less formal and arguably more approachable.

On Feb 17, 2014, at 4:38 AM, Andreas Rossberg wrote:

> On 15 February 2014 21:06, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
>> ...
>> But remember, prior to ES5, it was closer to Cobolish machine language.  No structured control, goto's targeting numeric step numbers, intermediate results referenced by step number (sorta  SSA with numeric ids), etc.
>> 
>> There has never been a complete redo, just incremental improvements and refactorings. But we've definitely advanced from the early 1950s to the late 1970s.
> 
> Well, Algol-60 already was more structured a language than our
> spec-speak. Let alone how far the Algol-68 spec was ahead of us. :)
> 

We were discussing the nature of the ES spec. pseudo code, not comparing pseudo code to a complete programming language.

Structured programming styles weren't widely adopted until the mid to late 1970's.

The Algol 60 Report used English prose to describe its semantics.  the ES specs are closer in style to the Algol 68 Report although less formal and arguably more approachable.

Allen

# Brendan Eich (11 years ago)

Andreas Rossberg wrote:

Ah, relax. Gilad Bracha even named his own language Newspeak.

Yeah, but no "ECMA" -- the double-whammy.

Self-mockery is good.

I pay my dues (see "wat" played with commentary at Fluent 2012 and narrated with tech details at Strange Loop 2012).

Andreas Rossberg wrote:
> On 15 February 2014 20:47, Brendan Eich<brendan at mozilla.com>  wrote:
>> >  Using "-Speak" as a stem conjures Orwell. Not good.
>
> Ah, relax. Gilad Bracha even named his own language Newspeak.

Yeah, but no "ECMA" -- the double-whammy.

> Self-mockery is good.

I pay my dues (see "wat" played with commentary at Fluent 2012 and 
narrated with tech details at Strange Loop 2012).

/be

# C. Scott Ananian (11 years ago)

Are recordings available?

Are recordings available?
  --scott
On Feb 17, 2014 10:26 AM, "Brendan Eich" <brendan at mozilla.com> wrote:

> Andreas Rossberg wrote:
>
>> On 15 February 2014 20:47, Brendan Eich<brendan at mozilla.com>  wrote:
>>
>>> >  Using "-Speak" as a stem conjures Orwell. Not good.
>>>
>>
>> Ah, relax. Gilad Bracha even named his own language Newspeak.
>>
>
> Yeah, but no "ECMA" -- the double-whammy.
>
>  Self-mockery is good.
>>
>
> I pay my dues (see "wat" played with commentary at Fluent 2012 and
> narrated with tech details at Strange Loop 2012).
>
> /be
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140217/86a1880a/attachment.html>

# Brendan Eich (11 years ago)

C. Scott Ananian wrote:

Are recordings available?

www.infoq.com/presentations/State-JavaScript starting at 1:50

Youtube has more.

C. Scott Ananian wrote:
> Are recordings available?

http://www.infoq.com/presentations/State-JavaScript starting at 1:50

Youtube has more.

/be