Need a champion? StringView strawman

# Allen Wirfs-Brock (12 years ago)

ecmascript#1557 is a request that StringView over ArrayBuffers be added to ES.

The current StringView proposal is a WebIDL based design and not particularly integrated into the ES6 Typed Array support, the ES6 Unicode support, or the post ES6 "Binary Data" work. It isn't clear to me exactly how much, if any, momentum this proposal has in any standards process outside of TC39. However, if something like it is going to emerge TC39 should be involved early to ensure that it is well integrated into ES.

It sounds to me like we need a TC39 champion (or perhaps an anti-champioin) to shepherd this work in the context of the new TC39 development process. Any volunteers? I can help but have limited time available for this right now.

https://bugs.ecmascript.org/show_bug.cgi?id=1557 is a request that StringView [1] over ArrayBuffers  be added to ES.

The current StringView proposal is a WebIDL based design and not particularly integrated into the ES6 Typed Array support, the ES6 Unicode support, or the post ES6 "Binary Data" work.  It isn't clear to me exactly how much, if any,  momentum this proposal has in any standards process outside of TC39.  However, if something like it is going to emerge TC39 should be involved early to ensure that it is well integrated into ES.

It sounds to me like we need a TC39 champion (or perhaps an anti-champioin) to shepherd this work in the context of the new TC39 development process.  Any volunteers? I can help but have limited time available for this right now.

Allen

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/0cb1f401/attachment.html>

# Anne van Kesteren (12 years ago)

Where is this from?

Google and Mozilla have implemented the API from encoding.spec.whatwg.org as a means to get strings out of bytes (and bytes out of strings). It's not clear we need anything else.

On Fri, Jan 10, 2014 at 6:22 PM, Allen Wirfs-Brock
<allen at wirfs-brock.com> wrote:
> https://bugs.ecmascript.org/show_bug.cgi?id=1557 is a request that
> StringView [1] over ArrayBuffers  be added to ES.
>
> [1]
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView

Where is this from?

Google and Mozilla have implemented the API from
http://encoding.spec.whatwg.org/ as a means to get strings out of
bytes (and bytes out of strings). It's not clear we need anything
else.

-- 
http://annevankesteren.nl/

# Kenneth Russell (12 years ago)

There was some discussion about implementing StringView on the blink-dev mailing list in August 2013. My opinion was and is that the Encoding spec satisfies these use cases.

Adding a StringView to Typed Arrays would bring along all of the complexities of character set encoding and decoding to the Typed Array definitions. Typed Arrays were designed to be small, simple, and comprehensible enough that they would be easily implementable and optimizable. I believe that adding a StringView would contradict these goals.

On Fri, Jan 10, 2014 at 10:26 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
>
> On Fri, Jan 10, 2014 at 6:22 PM, Allen Wirfs-Brock
> <allen at wirfs-brock.com> wrote:
> > https://bugs.ecmascript.org/show_bug.cgi?id=1557 is a request that
> > StringView [1] over ArrayBuffers  be added to ES.
> >
> > [1]
> > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView
>
> Where is this from?
>
> Google and Mozilla have implemented the API from
> http://encoding.spec.whatwg.org/ as a means to get strings out of
> bytes (and bytes out of strings). It's not clear we need anything
> else.

There was some discussion about implementing StringView on the
blink-dev mailing list in August 2013. My opinion was and is that the
Encoding spec satisfies these use cases.

Adding a StringView to Typed Arrays would bring along all of the
complexities of character set encoding and decoding to the Typed Array
definitions. Typed Arrays were designed to be small, simple, and
comprehensible enough that they would be easily implementable and
optimizable. I believe that adding a StringView would contradict these
goals.

-Ken

# Allen Wirfs-Brock (12 years ago)

On Jan 10, 2014, at 10:26 AM, Anne van Kesteren wrote:

Where is this from?

Don't know, I'm just creating visibility of the ES bug/feature request.

Google and Mozilla have implemented the API from encoding.spec.whatwg.org as a means to get strings out of bytes (and bytes out of strings). It's not clear we need anything else.

Same base point applies to that proposal. If anybody wants this capability to be considered as a standard ES capability, it needs to have a champion within TC39. I note that (as would be expected) the whatwg encoding spec. is expressed in WebIDL terms (DOMStrings, etc.) and it isn't yet clear to me how well it integrates with ES, ES standard library conventions, or non-browser hosts. Perhaps it's fine to leave this as a web platform API , but support for character set encoding/decoding is a general purpose capabilities and it might be reasonable to have a solution that isn't tied to a specific environment.

On Jan 10, 2014, at 10:26 AM, Anne van Kesteren wrote:

> On Fri, Jan 10, 2014 at 6:22 PM, Allen Wirfs-Brock
> <allen at wirfs-brock.com> wrote:
>> https://bugs.ecmascript.org/show_bug.cgi?id=1557 is a request that
>> StringView [1] over ArrayBuffers  be added to ES.
>> 
>> [1]
>> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView
> 
> Where is this from?

Don't know, I'm just creating visibility of the ES bug/feature request.
> 
> Google and Mozilla have implemented the API from
> http://encoding.spec.whatwg.org/ as a means to get strings out of
> bytes (and bytes out of strings). It's not clear we need anything
> else.

Same base point applies to that proposal.  If anybody wants this capability to be considered as a standard ES capability, it needs to have a champion within TC39.  I note that (as would be expected) the whatwg encoding spec. is expressed in WebIDL terms (DOMStrings, etc.) and it isn't yet clear to me how well it integrates with ES, ES standard library conventions,  or non-browser hosts.  Perhaps it's fine to leave this as a web platform API , but support for character set encoding/decoding  is a general purpose capabilities and it might be reasonable to have a solution that isn't tied to a specific environment.

Allen

# Brendan Eich (12 years ago)

Kenneth Russell wrote:

Adding a StringView to Typed Arrays would bring along all of the complexities of character set encoding and decoding to the Typed Array definitions. Typed Arrays were designed to be small, simple, and comprehensible enough that they would be easily implementable and optimizable. I believe that adding a StringView would contradict these goals.

+ a lot.

What can we do to make sure this thing stays dead, if it is dead? Anne may know some weird W3C protocol trick. ;-)

Kenneth Russell wrote:
> Adding a StringView to Typed Arrays would bring along all of the
> complexities of character set encoding and decoding to the Typed Array
> definitions. Typed Arrays were designed to be small, simple, and
> comprehensible enough that they would be easily implementable and
> optimizable. I believe that adding a StringView would contradict these
> goals.

+ a lot.

What can we do to make sure this thing stays dead, if it is dead? Anne 
may know some weird W3C protocol trick. ;-)

/be

# Dwayne (12 years ago)

I disagree. I think this should progress. It doesn't have to add any additional functionality to Typed Arrays. As it stands I would consider it a replacement for the purposes of TextEncoder and TextDecoder APIs. Currently the Mozilla TextDecoder Web API does not accept ASCII as a valid encoding option and defaults to UTF-8, if left unspecified.

I disagree. I think this should progress. It doesn't have to add any
additional functionality to Typed Arrays. As it stands I would consider it
a replacement for the purposes of TextEncoder and TextDecoder APIs.
Currently the Mozilla TextDecoder Web API does not accept ASCII as a valid
encoding option and defaults to UTF-8, if left unspecified.

On Fri, Jan 10, 2014 at 1:52 PM, Brendan Eich <brendan at mozilla.com> wrote:

> Kenneth Russell wrote:
>
>> Adding a StringView to Typed Arrays would bring along all of the
>> complexities of character set encoding and decoding to the Typed Array
>> definitions. Typed Arrays were designed to be small, simple, and
>> comprehensible enough that they would be easily implementable and
>> optimizable. I believe that adding a StringView would contradict these
>> goals.
>>
>
> + a lot.
>
> What can we do to make sure this thing stays dead, if it is dead? Anne may
> know some weird W3C protocol trick. ;-)
>
> /be
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/350b37f8/attachment.html>

# Boris Zbarsky (12 years ago)

On 1/10/14 3:47 PM, Dwayne wrote:

Currently the Mozilla TextDecoder Web API does not accept ASCII as a valid encoding option

I'm curious. What would you expect such an option to do? Byte-inflate like ISO-8859-1? Byte-inflate but throw on bytes with values > 127?

Act as a synonym for ISO-8859-9? Something else?

and defaults to UTF-8, if left unspecified.

Right, because it's meant for text, and for text UTF-8 is a pretty reasonable default nowadays.

On 1/10/14 3:47 PM, Dwayne wrote:
> Currently the Mozilla TextDecoder Web API does not accept ASCII as a
> valid encoding option

I'm curious.  What would you expect such an option to do?  Byte-inflate 
like ISO-8859-1?  Byte-inflate but throw on bytes with values > 127? 
Act as a synonym for ISO-8859-9? Something else?

> and defaults to UTF-8, if left unspecified.

Right, because it's meant for text, and for text UTF-8 is a pretty 
reasonable default nowadays.

-Boris

# Brendan Eich (12 years ago)

Dwayne wrote:

Currently the Mozilla TextDecoder Web API does not accept ASCII as a valid encoding option and defaults to UTF-8, if left unspecified.

That's a feature.

The '90s are over, let's not go back.

Why do you want ASCII, and what do you do with it?

Dwayne wrote:
> Currently the Mozilla TextDecoder Web API does not accept ASCII as a 
> valid encoding option and defaults to UTF-8, if left unspecified.

That's a feature.

The '90s are over, let's not go back.

Why do you want ASCII, and what do you do with it?

/be

# Dwayne (12 years ago)

On Fri, Jan 10, 2014 at 3:14 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

I'm curious. What would you expect such an option to do? Byte-inflate like ISO-8859-1? Byte-inflate but throw on bytes with values > 127? Act as a synonym for ISO-8859-9? Something else?

Exactly how StringView handles the option now. If I generate a random string using byte values then each char in that string should correspond to a single byte when specifying the ISO-8859-1. It doesn't really make since to use UTF-8 for bytes when that data should be manipulated as bytes in the first place. In the case of data being represented as a string but need to be handled as bytes.

bugzilla.mozilla.org/show_bug.cgi?id=957424

UTF-8 being the default is not the problem of course. Throwing an exception for ASCII is.

On Fri, Jan 10, 2014 at 3:14 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> I'm curious.  What would you expect such an option to do?  Byte-inflate
> like ISO-8859-1?  Byte-inflate but throw on bytes with values > 127? Act as
> a synonym for ISO-8859-9? Something else?

Exactly how StringView handles the option now. If I generate a random
string using byte values then each char in that string should correspond to
a single byte when specifying the ISO-8859-1. It doesn't really make since
to use UTF-8 for bytes when that data should be manipulated as bytes in the
first place. In the case of data being represented as a string but need to
be handled as bytes.

https://bugzilla.mozilla.org/show_bug.cgi?id=957424

UTF-8 being the default is not the problem of course. Throwing an exception
for ASCII is.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/8b5ae405/attachment.html>

# Dwayne (12 years ago)

UDP Datagrams.

UDP Datagrams.

On Fri, Jan 10, 2014 at 3:28 PM, Brendan Eich <brendan at mozilla.com> wrote:

> Dwayne wrote:
>
>> Currently the Mozilla TextDecoder Web API does not accept ASCII as a
>> valid encoding option and defaults to UTF-8, if left unspecified.
>>
>
> That's a feature.
>
> The '90s are over, let's not go back.
>
> Why do you want ASCII, and what do you do with it?
>
> /be
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/2ccb4f96/attachment.html>

# Boris Zbarsky (12 years ago)

On 1/10/14 4:29 PM, Dwayne wrote:

Exactly how StringView handles the option now. If I generate a random string using byte values then each char in that string should correspond to a single byte when specifying the ISO-8859-1.

OK, so specify ISO-8859-1, if that's what you're really doing. Or are you saying that you just want "ascii" to be a synonym for "iso-8859-1" here? But it'd be a lie, because ASCII actually means something, and it means something different from ISO-8859-1.

But really, if you just have bytes, not text, why are you generating a string from those byte values at all? This is where a typed array would make more sense...

On 1/10/14 4:29 PM, Dwayne wrote:
> Exactly how StringView handles the option now. If I generate a random
> string using byte values then each char in that string should correspond
> to a single byte when specifying the ISO-8859-1.

OK, so specify ISO-8859-1, if that's what you're really doing.  Or are 
you saying that you just want "ascii" to be a synonym for "iso-8859-1" 
here?  But it'd be a lie, because ASCII actually means something, and it 
means something different from ISO-8859-1.

But really, if you just have bytes, not text, why are you generating a 
string from those byte values at all?  This is where a typed array would 
make more sense...

-Boris

# Dwayne (12 years ago)

I mean char code points in the range (0-255) a byte. Use the desired terminology or name.

Primarily because of this bug -> Expose raw data on UDP socket messages:

bugzilla.mozilla.org/show_bug.cgi?id=952927

I generate a random string using code points that I eventually convert to bytes. Specifically in the case of a two or 20 char/byte ID. Where I need to be able to use the entire 16 bit or 160 space and then send as bytes and trust that ID will be same for both parties consistently. <-- To elaborate, I need to bencode this information before converting to bytes. I understand all of this could be worked around by just using String.charCodeAt or the synonymous String.codePointAt but why then have such a powerful API and disallow the fore-mentioned feature?

And why exactly have to separate APIs?

I mean char code points in the range (0-255) a byte. Use the desired
terminology or name.

Primarily because of this bug -> Expose raw data on UDP socket messages:
https://bugzilla.mozilla.org/show_bug.cgi?id=952927

I generate a random string using code points that I eventually convert to
bytes. Specifically in the case of a two or 20 char/byte ID. Where I need
to be able to use the entire 16 bit or 160 space and then send as bytes and
trust that ID will be same for both parties consistently. <-- To elaborate,
I need to bencode this information before converting to bytes. I understand
all of this could be worked around by just using String.charCodeAt or the
synonymous String.codePointAt but why then have such a powerful API and
disallow the fore-mentioned feature?

And why exactly have to separate APIs?

On Fri, Jan 10, 2014 at 3:40 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> On 1/10/14 4:29 PM, Dwayne wrote:
>
>> Exactly how StringView handles the option now. If I generate a random
>> string using byte values then each char in that string should correspond
>> to a single byte when specifying the ISO-8859-1.
>>
>
> OK, so specify ISO-8859-1, if that's what you're really doing.  Or are you
> saying that you just want "ascii" to be a synonym for "iso-8859-1" here?
>  But it'd be a lie, because ASCII actually means something, and it means
> something different from ISO-8859-1.
>
> But really, if you just have bytes, not text, why are you generating a
> string from those byte values at all?  This is where a typed array would
> make more sense...
>
> -Boris
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/cf40ded0/attachment.html>

# Brendan Eich (12 years ago)

Dwayne wrote:

UDP Datagrams.

Use a Uint8Array and string decoding/encoding API. Browsers have to copy anyway, you're not "optimizing" by using the (soon to be dead, I hope) StringView.

Dwayne wrote:
> UDP Datagrams.

Use a Uint8Array and string decoding/encoding API. Browsers have to copy 
anyway, you're not "optimizing" by using the (soon to be dead, I hope) 
StringView.

/be

# Dwayne (12 years ago)

No joke. But as far as optimization goes I'm limited. You wrote the book so thanks for at least hearing me out. ;)

No joke. But as far as optimization goes I'm limited. You wrote the book so
thanks for at least hearing me out. ;)

On Fri, Jan 10, 2014 at 4:07 PM, Brendan Eich <brendan at mozilla.com> wrote:

> Dwayne wrote:
>
>> UDP Datagrams.
>>
>
> Use a Uint8Array and string decoding/encoding API. Browsers have to copy
> anyway, you're not "optimizing" by using the (soon to be dead, I hope)
> StringView.
>
> /be
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/2955bfc8/attachment.html>

# Brendan Eich (12 years ago)

Dwayne wrote:

Primarily because of this bug -> Expose raw data on UDP socket messages: bugzilla.mozilla.org/show_bug.cgi?id=952927

Answering for bz: why do you need string-views or string-anythings to wrangle bytes in and out of a Uint8Array? Can you show some code?

Dwayne wrote:
> Primarily because of this bug -> Expose raw data on UDP socket 
> messages: https://bugzilla.mozilla.org/show_bug.cgi?id=952927

Answering for bz: why do you need string-views or string-anythings to 
wrangle bytes in and out of a Uint8Array? Can you show some code?

/be

# Dwayne (12 years ago)

Compensate the lack of rawData property --> Bug 952927

Buffer is a Uint8Array which has non standard methods on its prototype using a WeakMap technique. -->

This module will be used with BitTorrent PWP as well so its definitely necessary. DecipherCode/Firebit/blob/master/lib/dgram.js#L78

And here is a snippet covering the other reason(s): pastebin.mozilla.org/3986282

Thanks.

Compensate the lack of rawData property --> Bug 952927

Buffer is a Uint8Array which has non standard methods on its prototype
using a WeakMap technique. -->
This module will be used with BitTorrent PWP as well so its definitely
necessary.
https://github.com/DecipherCode/Firebit/blob/master/lib/dgram.js#L78

And here is a snippet covering the other reason(s):
http://pastebin.mozilla.org/3986282

Thanks.

On Fri, Jan 10, 2014 at 7:31 PM, Brendan Eich <brendan at mozilla.com> wrote:

> Dwayne wrote:
>
>> Primarily because of this bug -> Expose raw data on UDP socket messages:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=952927
>>
>
> Answering for bz: why do you need string-views or string-anythings to
> wrangle bytes in and out of a Uint8Array? Can you show some code?
>
> /be
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140110/588c812c/attachment.html>

# Boris Zbarsky (12 years ago)

On 1/10/14 10:46 PM, Dwayne wrote:

Compensate the lack of rawData property --> Bug 952927

Sure, but that should be fixed by adding such a property in this case, no? The only reason this is using a string is because it's using a somewhat braindead IDL (way more braindead for purposes of JS sanity than WebIDL is) to expose C data structures pretty directly.

On 1/10/14 10:46 PM, Dwayne wrote:
> Compensate the lack of rawData property --> Bug 952927

Sure, but that should be fixed by adding such a property in this case, 
no?  The only reason this is using a string is because it's using a 
somewhat braindead IDL (_way_ more braindead for purposes of JS sanity 
than WebIDL is) to expose C data structures pretty directly.

-Boris

# Anne van Kesteren (12 years ago)

On Fri, Jan 10, 2014 at 7:00 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

[...] it might be reasonable to have a solution that isn't tied to a specific environment.

Agreed. I have argued the same for URL parsing url.spec.whatwg.org at some point.

As for the API in the Encoding Standard, I think the only strong tie to the DOM at this point is its usage of DOMException. I have a few times on this list tried to figure out what the right way forward is for exceptions within the web platform so that they still fit well within the ES universe, but none of those led anywhere satisfactory yet.

On Fri, Jan 10, 2014 at 7:00 PM, Allen Wirfs-Brock
<allen at wirfs-brock.com> wrote:
> [...] it might be reasonable to have a solution that isn't tied to a specific environment.

Agreed. I have argued the same for URL parsing
http://url.spec.whatwg.org/ at some point.

As for the API in the Encoding Standard, I think the only strong tie
to the DOM at this point is its usage of DOMException. I have a few
times on this list tried to figure out what the right way forward is
for exceptions within the web platform so that they still fit well
within the ES universe, but none of those led anywhere satisfactory
yet.


-- 
http://annevankesteren.nl/

# Anne van Kesteren (12 years ago)

On Fri, Jan 10, 2014 at 9:40 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

OK, so specify ISO-8859-1, if that's what you're really doing.

Note that iso-8859-1 maps to windows-1252. There is an open bug on exposing a label to the API that has the "real" iso-8859-1 behavior: www.w3.org/Bugs/Public/show_bug.cgi?id=23971

On Fri, Jan 10, 2014 at 9:40 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> OK, so specify ISO-8859-1, if that's what you're really doing.

Note that iso-8859-1 maps to windows-1252. There is an open bug on
exposing a label to the API that has the "real" iso-8859-1 behavior:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23971


-- 
http://annevankesteren.nl/

# Brendan Eich (12 years ago)

I think based on bugs and bz's advice the Dwayne has been misled by bad old pre-WebIDL API in Gecko -- there's no reason to do any string-viewing here. Certainly not punning bytes as points in a character set encoding.

I think based on bugs and bz's advice the Dwayne has been misled by bad 
old pre-WebIDL API in Gecko -- there's no reason to do any 
string-viewing here. Certainly not punning bytes as points in a 
character set encoding.

/be

> Anne van Kesteren <mailto:annevk at annevk.nl>
> January 11, 2014 8:27 AM
>
> Note that iso-8859-1 maps to windows-1252. There is an open bug on
> exposing a label to the API that has the "real" iso-8859-1 behavior:
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=23971
>
>
> Boris Zbarsky <mailto:bzbarsky at MIT.EDU>
> January 10, 2014 1:40 PM
>
>
> OK, so specify ISO-8859-1, if that's what you're really doing.  Or are 
> you saying that you just want "ascii" to be a synonym for "iso-8859-1" 
> here?  But it'd be a lie, because ASCII actually means something, and 
> it means something different from ISO-8859-1.
>
> But really, if you just have bytes, not text, why are you generating a 
> string from those byte values at all?  This is where a typed array 
> would make more sense...
>
> -Boris
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
> Dwayne <mailto:rodneyd.teal at gmail.com>
> January 10, 2014 1:29 PM
>
>     I'm curious.  What would you expect such an option to do?
>      Byte-inflate like ISO-8859-1?  Byte-inflate but throw on bytes
>     with values > 127? Act as a synonym for ISO-8859-9? Something else?
>
>
> Exactly how StringView handles the option now. If I generate a random 
> string using byte values then each char in that string should 
> correspond to a single byte when specifying the ISO-8859-1. It doesn't 
> really make since to use UTF-8 for bytes when that data should be 
> manipulated as bytes in the first place. In the case of data being 
> represented as a string but need to be handled as bytes.
>
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=957424
>
> UTF-8 being the default is not the problem of course. Throwing an 
> exception for ASCII is.
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
> Boris Zbarsky <mailto:bzbarsky at MIT.EDU>
> January 10, 2014 1:14 PM
> On 1/10/14 3:47 PM, Dwayne wrote:
>> Currently the Mozilla TextDecoder Web API does not accept ASCII as a
>> valid encoding option
>
> I'm curious.  What would you expect such an option to do?  
> Byte-inflate like ISO-8859-1?  Byte-inflate but throw on bytes with 
> values > 127? Act as a synonym for ISO-8859-9? Something else?
>
>> and defaults to UTF-8, if left unspecified.
>
> Right, because it's meant for text, and for text UTF-8 is a pretty 
> reasonable default nowadays.
>
> -Boris
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
> Dwayne <mailto:rodneyd.teal at gmail.com>
> January 10, 2014 12:47 PM
> I disagree. I think this should progress. It doesn't have to add any 
> additional functionality to Typed Arrays. As it stands I would 
> consider it a replacement for the purposes of TextEncoder and 
> TextDecoder APIs. Currently the Mozilla TextDecoder Web API does not 
> accept ASCII as a valid encoding option and defaults to UTF-8, if left 
> unspecified.
>
>
>

# Allen Wirfs-Brock (12 years ago)

On Jan 11, 2014, at 6:13 AM, Anne van Kesteren wrote:

As for the API in the Encoding Standard, I think the only strong tie to the DOM at this point is its usage of DOMException. I have a few times on this list tried to figure out what the right way forward is for exceptions within the web platform so that they still fit well within the ES universe, but none of those led anywhere satisfactory yet.

I don't see any occurrences of DomException in encoding.spec.whatwg.org

It seems to be throwing TypeError for parameter validation issues which is what a TC39 spec. would generally do, except that for a few of those cases we might throw a RangeError instead.

There are a couple places where a string such as "EncodingError" is thrown. We'd never do that and would use either TypeError or RangeError.

The major Web platform dependency I see is the use of DOMString and associated attributes such as [EnsureUTF16]. Those shouldn't be there for a host environment independent spec.

Finally, it seems likely that the subclassing contract for TextEncoder/TextDecoder haven't been thought through and I notice that the examples instantiate instances of them without using the new operator.

But overall, it shouldn't be hard to fix these things and make it completely independent of the web platform. It could drop quite nicely into the new TC39 process model it you wanted to go that route of standardization.

On Jan 11, 2014, at 6:13 AM, Anne van Kesteren wrote:

> On Fri, Jan 10, 2014 at 7:00 PM, Allen Wirfs-Brock
> <allen at wirfs-brock.com> wrote:
>> [...] it might be reasonable to have a solution that isn't tied to a specific environment.
> 
> Agreed. I have argued the same for URL parsing
> http://url.spec.whatwg.org/ at some point.
> 
> As for the API in the Encoding Standard, I think the only strong tie
> to the DOM at this point is its usage of DOMException. I have a few
> times on this list tried to figure out what the right way forward is
> for exceptions within the web platform so that they still fit well
> within the ES universe, but none of those led anywhere satisfactory
> yet.
> 

I don't see any occurrences of DomException in http://encoding.spec.whatwg.org/ 

It seems to be throwing TypeError for parameter validation issues which is what a TC39 spec. would generally do, except that for a few of those cases we might throw a RangeError instead.

There are a couple places where a string such as "EncodingError" is thrown.  We'd never do that and would use either TypeError or RangeError.

The major Web platform dependency I see is the use of DOMString and associated attributes such as [EnsureUTF16].  Those shouldn't be there for a host environment independent spec.

Finally, it seems likely that the subclassing contract for TextEncoder/TextDecoder haven't been thought through and I notice that the examples instantiate instances of them without using the new operator.

But overall, it shouldn't be hard to fix these things and make it completely independent of the web platform.  It could drop quite nicely into the new TC39 process model it you wanted to go that route of standardization.

Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140111/14890f71/attachment.html>

# Anne van Kesteren (12 years ago)

On Sat, Jan 11, 2014 at 5:27 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

There are a couple places where a string such as "EncodingError" is thrown. We'd never do that and would use either TypeError or RangeError.

If you follow the link for "throw", you'll find it's a DOMException.

The major Web platform dependency I see is the use of DOMString and associated attributes such as [EnsureUTF16]. Those shouldn't be there for a host environment independent spec.

Sure, that's easily mapped though. (The whole EnsureUTF16 thing is in need of fixing in IDL.)

But overall, it shouldn't be hard to fix these things and make it completely independent of the web platform. It could drop quite nicely into the new TC39 process model it you wanted to go that route of standardization.

Agreed. I don't really have the bandwidth at the moment to work on this though. I have fixed the examples:

whatwg/encoding/commit/da5d1426a4e7ff7c7fea6724957b2c70df09bce4

On Sat, Jan 11, 2014 at 5:27 PM, Allen Wirfs-Brock
<allen at wirfs-brock.com> wrote:
> There are a couple places where a string such as "EncodingError" is thrown.
> We'd never do that and would use either TypeError or RangeError.

If you follow the link for "throw", you'll find it's a DOMException.


> The major Web platform dependency I see is the use of DOMString and
> associated attributes such as [EnsureUTF16].  Those shouldn't be there for a
> host environment independent spec.

Sure, that's easily mapped though. (The whole EnsureUTF16 thing is in
need of fixing in IDL.)


> But overall, it shouldn't be hard to fix these things and make it completely
> independent of the web platform.  It could drop quite nicely into the new
> TC39 process model it you wanted to go that route of standardization.

Agreed. I don't really have the bandwidth at the moment to work on
this though. I have fixed the examples:

https://github.com/whatwg/encoding/commit/da5d1426a4e7ff7c7fea6724957b2c70df09bce4


-- 
http://annevankesteren.nl/

# Allen Wirfs-Brock (12 years ago)

Another nit: the definition of "ASCII whitespace" is different from the definition of whitespace used by String.prototype.trim 1. That means that an implementation of this spec. that was implemented using JS couldn't use S.p.trim to process labels as described in the spec.

Another nit:  the definition of "ASCII whitespace" is different from the definition of whitespace used by String.prototype.trim [1].  That means that an implementation of this spec. that was implemented using JS couldn't use S.p.trim to process labels as described in the spec.

[1]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.trim 

Allen

On Jan 11, 2014, at 9:35 AM, Anne van Kesteren wrote:

> On Sat, Jan 11, 2014 at 5:27 PM, Allen Wirfs-Brock
> <allen at wirfs-brock.com> wrote:
>> There are a couple places where a string such as "EncodingError" is thrown.
>> We'd never do that and would use either TypeError or RangeError.
> 
> If you follow the link for "throw", you'll find it's a DOMException.
> 
> 
>> The major Web platform dependency I see is the use of DOMString and
>> associated attributes such as [EnsureUTF16].  Those shouldn't be there for a
>> host environment independent spec.
> 
> Sure, that's easily mapped though. (The whole EnsureUTF16 thing is in
> need of fixing in IDL.)
> 
> 
>> But overall, it shouldn't be hard to fix these things and make it completely
>> independent of the web platform.  It could drop quite nicely into the new
>> TC39 process model it you wanted to go that route of standardization.
> 
> Agreed. I don't really have the bandwidth at the moment to work on
> this though. I have fixed the examples:
> 
> https://github.com/whatwg/encoding/commit/da5d1426a4e7ff7c7fea6724957b2c70df09bce4
> 
> 
> -- 
> http://annevankesteren.nl/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140111/869b03d5/attachment.html>

# Brendan Eich (12 years ago)

This seems more than a nit!

Allen Wirfs-Brock wrote:
> Another nit:  the definition of "ASCII whitespace" is different from 
> the definition of whitespace used by String.prototype.trim [1].  That 
> means that an implementation of this spec. that was implemented using 
> JS couldn't use S.p.trim to process labels as described in the spec.
>
> [1]: 
> http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.trim 
> <http://people.mozilla.org/%7Ejorendorff/es6-draft.html#sec-string.prototype.trim> 
>

This seems more than a nit!

/be

# Anne van Kesteren (12 years ago)

You cannot use that method for CSS, HTML, HTTP, etc. either. For this API we could have a different definition of whitespace I suppose, but e.g. for <meta charset=...> I doubt we could do that without risking breakage (or at the HTML parser level, say).

On Sat, Jan 11, 2014 at 5:57 PM, Brendan Eich <brendan at mozilla.com> wrote:
> Allen Wirfs-Brock wrote:
>> Another nit:  the definition of "ASCII whitespace" is different from the
>> definition of whitespace used by String.prototype.trim [1].  That means that
>> an implementation of this spec. that was implemented using JS couldn't use
>> S.p.trim to process labels as described in the spec.
>>
>> [1]:
>> http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.trim
>> <http://people.mozilla.org/%7Ejorendorff/es6-draft.html#sec-string.prototype.trim>
>
> This seems more than a nit!

You cannot use that method for CSS, HTML, HTTP, etc. either. For this
API we could have a different definition of whitespace I suppose, but
e.g. for <meta charset=...> I doubt we could do that without risking
breakage (or at the HTML parser level, say).


-- 
http://annevankesteren.nl/

# Allen Wirfs-Brock (12 years ago)

I'm only talking about this specification and what it takes to decouple it from web platform dependencies. In this case, ASCII whitespace seems to only be used in processing the label parameter passed to the TextDecoder and TextEncoder constructors. So, it isn't clear how CSS or anything else is relevant to that.

On Jan 11, 2014, at 10:07 AM, Anne van Kesteren wrote:

> On Sat, Jan 11, 2014 at 5:57 PM, Brendan Eich <brendan at mozilla.com> wrote:
>> Allen Wirfs-Brock wrote:
>>> Another nit:  the definition of "ASCII whitespace" is different from the
>>> definition of whitespace used by String.prototype.trim [1].  That means that
>>> an implementation of this spec. that was implemented using JS couldn't use
>>> S.p.trim to process labels as described in the spec.
>>> 
>>> [1]:
>>> http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.trim
>>> <http://people.mozilla.org/%7Ejorendorff/es6-draft.html#sec-string.prototype.trim>
>> 
>> This seems more than a nit!
> 
> You cannot use that method for CSS, HTML, HTTP, etc. either. For this
> API we could have a different definition of whitespace I suppose, but
> e.g. for <meta charset=...> I doubt we could do that without risking
> breakage (or at the HTML parser level, say).

I'm only talking about this specification and what it takes to decouple it from web platform dependencies.  In this case, ASCII whitespace seems to only be used in processing the label parameter passed to the TextDecoder and TextEncoder constructors.  So, it isn't clear how CSS or anything else is relevant to that. 

Allen

# Anne van Kesteren (12 years ago)

As I said, the algorithm used to get an encoding is also used by HTML, CSS et al. See e.g. dev.w3.org/csswg/css-syntax/#input-byte-stream and www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining

On Sat, Jan 11, 2014 at 6:37 PM, Allen Wirfs-Brock
<allen at wirfs-brock.com> wrote:
> I'm only talking about this specification and what it takes to decouple it from web platform dependencies.  In this case, ASCII whitespace seems to only be used in processing the label parameter passed to the TextDecoder and TextEncoder constructors.  So, it isn't clear how CSS or anything else is relevant to that.

As I said, the algorithm used to get an encoding is also used by HTML,
CSS et al. See e.g.
http://dev.w3.org/csswg/css-syntax/#input-byte-stream and
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding


-- 
http://annevankesteren.nl/

# Allen Wirfs-Brock (12 years ago)

So, just don't couple them. Do this in TC39 and apply a multiple app platform perspective.

On Jan 11, 2014, at 10:52 AM, Anne van Kesteren wrote:

> On Sat, Jan 11, 2014 at 6:37 PM, Allen Wirfs-Brock
> <allen at wirfs-brock.com> wrote:
>> I'm only talking about this specification and what it takes to decouple it from web platform dependencies.  In this case, ASCII whitespace seems to only be used in processing the label parameter passed to the TextDecoder and TextEncoder constructors.  So, it isn't clear how CSS or anything else is relevant to that.
> 
> As I said, the algorithm used to get an encoding is also used by HTML,
> CSS et al. See e.g.
> http://dev.w3.org/csswg/css-syntax/#input-byte-stream and
> http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding

So, just don't couple them.  Do this in TC39 and apply a multiple app platform perspective.  

Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140111/70e8caeb/attachment.html>

# Stefano Gioffré (12 years ago)

I all! I'm the author of the API.

Stefano (alias "fusionchess")

I all!
I'm the author of the API.

Stefano (alias "fusionchess")
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140118/84767768/attachment.html>