Suggestions for the proposed ByteArray in ES4

# Jeff Walden (19 years ago)

developer.mozilla.org/es4/proposals/bytearray.html

What was the rationale for disallowing use of array literals for initializing a ByteArray? As described now, there exists no way to seed a ByteArray with values other than by using some host functionality to create it (or, of course, by manually assigning). There's nothing wrong with that, of course, but it does basically make creating large byte arrays within JS hard unless you use an array and loop over it to copy it (an eminently unsatisfactory solution). Brendan's suggestion of ByteArray.fromArray would help, but without some good reason not to steal array syntax I think it unnecessarily adds extra characters and typing.

ByteArray instances certainly don't need all 30-odd methods included in the AS implementation (especially since the thrust of the AS ByteArray is basically to act like an input/output stream). However, ByteArray does need some methods to actually be usable; assigning index-by-index is painful and ugly. I think it makes sense to copy /some/ of the methods over from Array to ByteArray, keeping in mind how ByteArray would actually be used.

push, pop Since the length property still has the magical behavior of the length property of an Array, ByteArray is clearly still intended to have a dynamic size. The alternatives of directly assigning to the relevant index or getting the last element and decrementing the length are more verbose and probably slower without a peephole optimization. (I don't see shift/unshift as being useful as often, and I don't see the likely increased implementation complexity as good either, if you were to try to make it reasonably efficient.)
concat There doesn't seem to be a way to combine ByteArrays together. concat returns a new Array and thus isn't the right name, but I think some similar, mutating functionality (type-dispatching push?) is needed here. (Note also that if it were implemented, ByteArray.prototype.push.apply would take an Array as its second parameter and thus wouldn't work here.) For example, in Mozilla I would find this useful with our stream-listener interface, which reads bytes from a stream by some count at a time. If I want to assemble the bytes into one array (or even must assemble consecutive ByteArrays together because the data being parsed is broken in the middle of a token in the stream's data format), a concatenation method is necessary to avoid meaning-obscuring loops. It might be good to make this method accept Arrays as well, but I'm not entirely sure.
indexOf, lastIndexOf Parsing data formats which use some form of lookahead (e.g. reading bytes of an HTTP header looking for CRLF) becomes easier and faster with indexOf. Other data formats with different structures (e.g. the zip format and its end-of-central-directory record) would want lastIndexOf.
toString A straight mapping of code points to characters in a string is simple and more useful than "[object ByteArray]" or similar.

I can see arguments for some of SpiderMonkey's array iteration extensions (some/every/map/filter/forEach) and for the other standard ES array methods, but I can't come up with a real-world use case for any of them and have thus left them off; someone else can come up with them if desired. Since most of the Array methods are generic anyway, I see no reason you couldn't copy them over using some Array.prototype.*.apply magic. Beyond the methods mentioned here, tho, I really don't think you need much more functionality most of the time.

One last suggestion: ByteArray should support slice notation, with the result referencing the original ByteArray (i.e., mutating the result of a slice of a ByteArray modifies the original) so long as the step is 1 and the slice is not a copying slice (copy = byteArray[:]). I'm not sure how referential ByteArrays should interact with push/pop/"concat"-like creature, either on the new ByteArray or on the original. Perhaps modifying the length of a dependent ByteArray forces a copy, with any elements which may have been removed from the original ByteArray replaced by 0? It'd be a little complicated, but it might work.

http://developer.mozilla.org/es4/proposals/bytearray.html

* push, pop
Since the length property still has the magical behavior of the length property of an Array, ByteArray is clearly still intended to have a dynamic size. The alternatives of directly assigning to the relevant index or getting the last element and decrementing the length are more verbose and probably slower without a peephole optimization. (I don't see shift/unshift as being useful as often, and I don't see the likely increased implementation complexity as good either, if you were to try to make it reasonably efficient.)

* concat
There doesn't seem to be a way to combine ByteArrays together. concat returns a new Array and thus isn't the right name, but I think some similar, mutating functionality (type-dispatching push?) is needed here. (Note also that if it were implemented, ByteArray.prototype.push.apply would take an Array as its second parameter and thus wouldn't work here.) For example, in Mozilla I would find this useful with our stream-listener interface, which reads bytes from a stream by some count at a time. If I want to assemble the bytes into one array (or even must assemble consecutive ByteArrays together because the data being parsed is broken in the middle of a token in the stream's data format), a concatenation method is necessary to avoid meaning-obscuring loops. It might be good to make this method accept Arrays as well, but I'm not entirely sure.

* indexOf, lastIndexOf
Parsing data formats which use some form of lookahead (e.g. reading bytes of an HTTP header looking for CRLF) becomes easier and faster with indexOf. Other data formats with different structures (e.g. the zip format and its end-of-central-directory record) would want lastIndexOf.

* toString
A straight mapping of code points to characters in a string is simple and more useful than "[object ByteArray]" or similar.

Jeff

--
Rediscover the Web!
http://snurl.com/get_firefox

Reclaim Your Inbox!
http://snurl.com/get_thunderbird

# Steven Johnson (19 years ago)

On 1/17/07 8:46 PM, "Jeff Walden" <jwalden at MIT.EDU> wrote:

developer.mozilla.org/es4/proposals/bytearray.html

What was the rationale for disallowing use of array literals for initializing a ByteArray? As described now, there exists no way to seed a ByteArray with values other than by using some host functionality to create it (or, of course, by manually assigning).

I think the concern was one of typing: given the expression [ 1, 2, 3 ], what type is it, Array or ByteArray? ( Perhaps we could use an explicit type annotation, e.g., [1,2,3]:ByteArray ... ) The point being that Array and ByteArray aren't interchangeable (Array provides much more functionality).

I suppose we could allow array literals to be coerced to ByteArray, though, and it could probably be optimized at compiletime in many cases.

But yeah, SOME way to do literal initialization would be a good idea.

ByteArray instances certainly don't need all 30-odd methods included in the AS implementation (especially since the thrust of the AS ByteArray is basically to act like an input/output stream). However, ByteArray does need some methods to actually be usable; assigning index-by-index is painful and ugly. I think it makes sense to copy /some/ of the methods over from Array to ByteArray, keeping in mind how ByteArray would actually be used.

[snip] Agreed. The proposal was written in a deliberately minimal form to try to keep implementation cost low and thus increase the likelihood of acceptance, but the methods you mention sound worthwhile and of tiny overhead.

(The AS methods are quite useful, though :-)

One last suggestion: ByteArray should support slice notation, with the result referencing the original ByteArray (i.e., mutating the result of a slice of a ByteArray modifies the original) so long as the step is 1 and the slice is not a copying slice (copy = byteArray[:]).

Oooh, yeah, that sounds like a good idea.

On 1/17/07 8:46 PM, "Jeff Walden" <jwalden at MIT.EDU> wrote:

> http://developer.mozilla.org/es4/proposals/bytearray.html
> 
> What was the rationale for disallowing use of array literals for initializing
> a ByteArray?  As described now, there exists no way to seed a ByteArray with
> values other than by using some host functionality to create it (or, of
> course, by manually assigning).

I think the concern was one of typing: given the expression [ 1, 2, 3 ],
what type is it, Array or ByteArray? ( Perhaps we could use an explicit type
annotation, e.g., [1,2,3]:ByteArray ... ) The point being that Array and
ByteArray aren't interchangeable (Array provides much more functionality).

I suppose we could allow array literals to be coerced to ByteArray, though,
and it could probably be optimized at compiletime in many cases.

But yeah, SOME way to do literal initialization would be a good idea.

> ByteArray instances certainly don't need all 30-odd methods included in the AS
> implementation (especially since the thrust of the AS ByteArray is basically
> to act like an input/output stream).  However, ByteArray does need some
> methods to actually be usable; assigning index-by-index is painful and ugly.
> I think it makes sense to copy /some/ of the methods over from Array to
> ByteArray, keeping in mind how ByteArray would actually be used.

[snip] Agreed. The proposal was written in a deliberately minimal form to
try to keep implementation cost low and thus increase the likelihood of
acceptance, but the methods you mention sound worthwhile and of tiny
overhead.

(The AS methods are quite useful, though :-)

> One last suggestion: ByteArray should support slice notation, with the result
> referencing the original ByteArray (i.e., mutating the result of a slice of a
> ByteArray modifies the original) so long as the step is 1 and the slice is not
> a copying slice (copy = byteArray[:]).

Oooh, yeah, that sounds like a good idea.

# Brendan Eich (19 years ago)

On Jan 17, 2007, at 8:46 PM, Jeff Walden wrote:

I can see arguments for some of SpiderMonkey's array iteration
extensions (some/every/map/filter/forEach) and for the other
standard ES array methods, but I can't come up with a real-world
use case for any of them and have thus left them off; someone else
can come up with them if desired. Since most of the Array methods
are generic anyway, I see no reason you couldn't copy them over
using some Array.prototype.*.apply magic. Beyond the methods
mentioned here, tho, I really don't think you need much more
functionality most of the time.

Note that the Array and String methods that do not constrain their | this| parameter's type -- most methods on Array.prototype and
String.prototype, the exceptions are toString, toLocaleString, and
valueOf -- are available as static methods of the same name, with a
leading explicit |this|-object parameter: Array.indexOf(arraylike,
value) returns -1 or the index of the first occurrence of value in
arraylike.

These methods work without optimization on any array-like object (one
having a length:uint32 property that limits non-negative integral
property identifiers, whether dense or sparse). So they'll work on
ByteArray out of the box. But it would be even more convenient if
ByteArray instances delegated to Array.prototype (or to optimized
versions of appropriate methods on ByteArray.prototype, or even
fixture methods declared in class ByteArray).

Proposal: make ByteArray a subclass of Array. ByteArray overrides
the non-generic toString and toLocaleString methods. Implementations
may optimize by overriding the remaining generic methods, or delegate
to the generic Array implementations.

One last suggestion: ByteArray should support slice notation, with
the result referencing the original ByteArray (i.e., mutating the
result of a slice of a ByteArray modifies the original) so long as
the step is 1 and the slice is not a copying slice (copy = byteArray [:]).

Slicing is generic, it maps onto slice(start[, stop[, step]]) and
setSlice(value, start[, stop[, step]]) instance methods.

However, dependent slices are a bad idea; mutation via a shared
reference is a hazard. Are you prematurely optimizing?

On Jan 17, 2007, at 8:46 PM, Jeff Walden wrote:

> I can see arguments for some of SpiderMonkey's array iteration  
> extensions (some/every/map/filter/forEach) and for the other  
> standard ES array methods, but I can't come up with a real-world  
> use case for any of them and have thus left them off; someone else  
> can come up with them if desired.  Since most of the Array methods  
> are generic anyway, I see no reason you couldn't copy them over  
> using some Array.prototype.*.apply magic.  Beyond the methods  
> mentioned here, tho, I really don't think you need much more  
> functionality most of the time.

Note that the Array and String methods that do not constrain their | 
this| parameter's type -- most methods on Array.prototype and  
String.prototype, the exceptions are toString, toLocaleString, and  
valueOf -- are available as static methods of the same name, with a  
leading explicit |this|-object parameter: Array.indexOf(arraylike,  
value) returns -1 or the index of the first occurrence of value in  
arraylike.

These methods work without optimization on any array-like object (one  
having a length:uint32 property that limits non-negative integral  
property identifiers, whether dense or sparse).  So they'll work on  
ByteArray out of the box.  But it would be even more convenient if  
ByteArray instances delegated to Array.prototype (or to optimized  
versions of appropriate methods on ByteArray.prototype, or even  
fixture methods declared in class ByteArray).

Proposal: make ByteArray a subclass of Array.  ByteArray overrides  
the non-generic toString and toLocaleString methods.  Implementations  
may optimize by overriding the remaining generic methods, or delegate  
to the generic Array implementations.

> One last suggestion: ByteArray should support slice notation, with  
> the result referencing the original ByteArray (i.e., mutating the  
> result of a slice of a ByteArray modifies the original) so long as  
> the step is 1 and the slice is not a copying slice (copy = byteArray 
> [:]).

Slicing is generic, it maps onto slice(start[, stop[, step]]) and  
setSlice(value, start[, stop[, step]]) instance methods.

However, dependent slices are a bad idea; mutation via a shared  
reference is a hazard.  Are you prematurely optimizing?

/be

# Brendan Eich (19 years ago)

On Jan 18, 2007, at 9:53 AM, Steven Johnson wrote:

I think the concern was one of typing: given the expression [ 1, 2,
3 ], what type is it, Array or ByteArray? ( Perhaps we could use an
explicit type annotation, e.g., [1,2,3]:ByteArray ... ) The point being that
Array and ByteArray aren't interchangeable (Array provides much more
functionality).

See previous message from me about Array methods being almost all
generic, intentionally so since ES1. Would it break AS3's ByteArray
to have it inherit from Array?

The current ES4 grammar can't parse ''[1,2,3]:ByteArray'', because it
wants a structural array type after the '':''. I believe that's a
bug, because we want to allow abstraction via defined type names, at
least. If that bug were fixed, then the question becomes: what is
the meaning of the expression if the type after the : is a nominal
one such as ByteArray? In a definition, e.g. ''let b:ByteArray =
[1,2,3];'' the meaning of the type annotation is to convert the
initializer as if by the ''to'' operator (http:// developer.mozilla.org/es4/spec/chapter_6_types.html).

So given a ''function to ByteArray(v) {...}" in class ByteArray, the
UI issue Jeff raises would be addressed. An implementation could
optimize if it knew the initializer was not referenced elsewhere.

I suppose we could allow array literals to be coerced to ByteArray,
though, and it could probably be optimized at compiletime in many cases.

This all "just works", except for the optimization, if we provide a
''to'' operator method.

(BTW, it seems verbose to have ''function to C() {...}'' in class C
in order to customize the ''to'' operator. I mean that the repeated
class name is unnecessary if ''to'' is a reserved identifier [and it
seems that it must be reserved in ES4; existing JS on the web uses
"to" for variable and parameter names, so there's no way around the
incompatibility here]. If instead, the overriding form were ''static
function intrinsic::from(v) {...}'' then we wouldn't need the
repeated typename or the magic ''to'' prefixing it, following
''function''. On the other hand, ''from'' opposing ''to'' is a little
obscure.)

On Jan 18, 2007, at 9:53 AM, Steven Johnson wrote:

> I think the concern was one of typing: given the expression [ 1, 2,  
> 3 ],
> what type is it, Array or ByteArray? ( Perhaps we could use an  
> explicit type
> annotation, e.g., [1,2,3]:ByteArray ... ) The point being that  
> Array and
> ByteArray aren't interchangeable (Array provides much more  
> functionality).

See previous message from me about Array methods being almost all  
generic, intentionally so since ES1. Would it break AS3's ByteArray  
to have it inherit from Array?

The current ES4 grammar can't parse ''[1,2,3]:ByteArray'', because it  
wants a structural array type after the '':''. I believe that's a  
bug, because we want to allow abstraction via defined type names, at  
least.  If that bug were fixed, then the question becomes: what is  
the meaning of the expression if the type after the : is a nominal  
one such as ByteArray?  In a definition, e.g. ''let b:ByteArray =  
[1,2,3];'' the meaning of the type annotation is to convert the  
initializer as if by the ''to'' operator (http:// 
developer.mozilla.org/es4/spec/chapter_6_types.html).

So given a ''function to ByteArray(v) {...}" in class ByteArray, the  
UI issue Jeff raises would be addressed.  An implementation could  
optimize if it knew the initializer was not referenced elsewhere.

> I suppose we could allow array literals to be coerced to ByteArray,  
> though,
> and it could probably be optimized at compiletime in many cases.

This all "just works", except for the optimization, if we provide a  
''to'' operator method.

(BTW, it seems verbose to have ''function to C() {...}'' in class C  
in order to customize the ''to'' operator. I mean that the repeated  
class name is unnecessary if ''to'' is a reserved identifier [and it  
seems that it must be reserved in ES4; existing JS on the web uses  
"to" for variable and parameter names, so there's no way around the  
incompatibility here]. If instead, the overriding form were ''static  
function intrinsic::from(v) {...}'' then we wouldn't need the  
repeated typename or the magic ''to'' prefixing it, following  
''function''. On the other hand, ''from'' opposing ''to'' is a little  
obscure.)

/be

# Jeff Dyer (19 years ago)

The current ES4 grammar can't parse ''[1,2,3]:ByteArray'', because it wants a structural array type after the '':''. I believe that's a bug...

It's a bug. Updated grammar posted at: compilercompany.com/es4/grammar.pdf

> The current ES4 grammar can't parse ''[1,2,3]:ByteArray'', because it 
> wants a structural array type after the '':''. I believe that's a 
> bug...
 
It's a bug. Updated grammar posted at: http://compilercompany.com/es4/grammar.pdf

# Steven Johnson (19 years ago)

See previous message from me about Array methods being almost all generic, intentionally so since ES1. Would it break AS3's ByteArray to have it inherit from Array?

We'd have to play some games with the internal representation (since ByteArray has a much more restricted one, obviously), but otherwise I think it would work. I'd have to review some stuff to be sure.

> See previous message from me about Array methods being almost all
> generic, intentionally so since ES1. Would it break AS3's ByteArray
> to have it inherit from Array?

We'd have to play some games with the internal representation (since
ByteArray has a much more restricted one, obviously), but otherwise I think
it would work. I'd have to review some stuff to be sure.

# Jeff Walden (19 years ago)

Brendan Eich wrote:

So given a ''function to ByteArray(v) {...}" in class ByteArray, the UI issue Jeff raises would be addressed. An implementation could optimize if it knew the initializer was not referenced elsewhere.

This was basically the way I supposed conversion would happen, yes. A bare array literal would still have type Array unless it was explicitly annotated as ByteArray. (The optimization to not create the Array was also something I assumed would be possible.)

(BTW, it seems verbose to have ''function to C() {...}'' in class C in order to customize the ''to'' operator. I mean that the repeated class name is unnecessary if ''to'' is a reserved identifier [and it seems that it must be reserved in ES4; existing JS on the web uses "to" for variable and parameter names, so there's no way around the incompatibility here].

How is this different from |for each (var foo in x)| versus |for (var foo in x)|, where single token lookahead resolves the iterating-keys and iterating-values cases? In this case if the token after 'to' is an identifier it's a conversion method, and if it's a left-paren it's a method. The same holds for |var a = x to T;| versus |var a = x;|, and |var v to T = x| versus |var v = x| (can't find this syntax in wiki but didn't look hard -- from weblogs.mozillazine.org/roadmap/archives/2005/11/js2.html in case it's actually relevant). Are there any others I've missed?

(This is not to say that I think introducing a context keyword is good. I think I'd want it treated similar to the way let and yield are treated by SpiderMonkey, where using either as an argument or identifer turns off their use in the new ways, as an implementation-dependent option, but where they're still keywords in ES4 code.)

If instead, the overriding form were ''static function intrinsic::from(v) {...}'' then we wouldn't need the repeated typename or the magic ''to'' prefixing it, following ''function''. On the other hand, ''from'' opposing ''to'' is a little obscure.)

I hesitate to ask here and not start a different thread, but why does class C contain |function to C(v) { ... }| to specify the to operator for |var q = x to T;|, or do I misunderstand? I'd have expected |function to T(v)| repeated for any T desired, with |function to *(v) { ... }| as a final specifiable fallback. The way I understand it now, |function to C| is probably going to contain a series of if-else if-else statements for |v is T1|, |v is T2|, etc.

mutation via a shared reference is a hazard. Are you prematurely optimizing?

Possibly so in some situations, possibly no in others. If I'm decoding a zip, have its bytes in a ByteArray, and have the index and size of a large compressed file within it, it would be convenient to be able to pass a slice rather than the ByteArray (presumably ByteArray as an object is passed by reference) and a start and end index (and also have to do this for all byte ranges in the data). The latter is at least a workaround if ByteArrays are mutably sliced, and it's not entirely intolerable. At the worst you could create a DependentByteArray class that wrapped a ByteArray instance and define enough operators to make their use transparent, I suspect.

Brendan Eich wrote:
> So given a ''function to ByteArray(v) {...}" in class ByteArray, the UI 
> issue Jeff raises would be addressed.  An implementation could optimize 
> if it knew the initializer was not referenced elsewhere.

This was basically the way I supposed conversion would happen, yes.  A bare array literal would still have type Array unless it was explicitly annotated as ByteArray.  (The optimization to not create the Array was also something I assumed would be possible.)

> (BTW, it seems verbose to have ''function to C() {...}'' in class C in 
> order to customize the ''to'' operator. I mean that the repeated class 
> name is unnecessary if ''to'' is a reserved identifier [and it seems 
> that it must be reserved in ES4; existing JS on the web uses "to" for 
> variable and parameter names, so there's no way around the 
> incompatibility here].

How is this different from |for each (var foo in x)| versus |for (var foo in x)|, where single token lookahead resolves the iterating-keys and iterating-values cases?  In this case if the token after 'to' is an identifier it's a conversion method, and if it's a left-paren it's a method.  The same holds for |var a = x to T;| versus |var a = x;|, and |var v to T = x| versus |var v = x| (can't find this syntax in wiki but didn't look hard -- from http://weblogs.mozillazine.org/roadmap/archives/2005/11/js2.html in case it's actually relevant).  Are there any others I've missed?

(This is not to say that I think introducing a context keyword is good.  I think I'd want it treated similar to the way let and yield are treated by SpiderMonkey, where using either as an argument or identifer turns off their use in the new ways, as an implementation-dependent option, but where they're still keywords in ES4 code.)

> If instead, the overriding form were ''static 
> function intrinsic::from(v) {...}'' then we wouldn't need the repeated 
> typename or the magic ''to'' prefixing it, following ''function''. On 
> the other hand, ''from'' opposing ''to'' is a little obscure.)

I hesitate to ask here and not start a different thread, but why does class C contain |function to C(v) { ... }| to specify the to operator for |var q = x to T;|, or do I misunderstand?  I'd have expected |function to T(v)| repeated for any T desired, with |function to *(v) { ... }| as a final specifiable fallback.  The way I understand it now, |function to C| is probably going to contain a series of if-else if-else statements for |v is T1|, |v is T2|, etc.

> mutation via a shared reference is a hazard.  Are you prematurely optimizing?

Possibly so in some situations, possibly no in others.  If I'm decoding a zip, have its bytes in a ByteArray, and have the index and size of a large compressed file within it, it would be convenient to be able to pass a slice rather than the ByteArray (presumably ByteArray as an object is passed by reference) and a start and end index (and also have to do this for all byte ranges in the data).  The latter is at least a workaround if ByteArrays are mutably sliced, and it's not entirely intolerable.  At the worst you could create a DependentByteArray class that wrapped a ByteArray instance and define enough operators to make their use transparent, I suspect.

Jeff

-- 
Rediscover the Web!
http://snurl.com/get_firefox

Reclaim Your Inbox!
http://snurl.com/get_thunderbird

# Igor Bukanov (19 years ago)

On 19/01/07, Jeff Walden <jwalden at mit.edu> wrote:

(This is not to say that I think introducing a context keyword is good. I think I'd want it treated similar to the way let and yield are treated by SpiderMonkey, where using either as an argument or identifer turns off their use in the new ways, as an implementation-dependent option, but where they're still keywords in ES4 code.)

This was a case in SpiderMonkey for a short period of time during FireFox 2.0 development. Before the release let and yield were turned into unconditional keywords. They require explicit version js1.7 setting to be active. See bugzilla.mozilla.org/show_bug.cgi?id=351515 for the reasons.

, Igor

On 19/01/07, Jeff Walden <jwalden at mit.edu> wrote:
> (This is not to say that I think introducing a context keyword is good.  I think I'd want it treated similar to the way let and yield are treated by SpiderMonkey, where using either as an argument or identifer turns off their use in the new ways, as an implementation-dependent option, but where they're still keywords in ES4 code.)

This was a case in SpiderMonkey for a short period of time during
FireFox 2.0 development. Before the release let and yield were turned
into unconditional keywords. They require explicit version js1.7
setting to be active. See
https://bugzilla.mozilla.org/show_bug.cgi?id=351515 for the reasons.

Regards, Igor