typed array strawman proposal
On 2010-01-26, at 13:43, Vladimir Vukicevic wrote:
Howdy,
At Brendan's request, I've just added a new strawman proposal for ES typed arrays to the wiki. This proposal comes from the WebGL group, which needed a way of efficient access to and manipulation of native machine-type arrays; once we came up with a reasonable baseline API, it looked like something that would be generally useful as more interop and performance demands are placed on ES. "Typed arrays" is probably not the best name; but that's probably an easy bikeshed.
The strawman page is at strawman:typed_arrays and the associated [rough] draft spec is available at people.mozilla.com/~vladimir/jsvec/TypedArray-spec.html
Thoughts & comments appreciated; not sure if this is something that could be considered for the core language, or for a "standard library" of sorts -- feels more like something that should go in the latter.
So, like Lisp "displaced arrays" (www.lispworks.com/documentation/lw50/CLHS/Body/f_mk_ar.htm)?
On Tue, Jan 26, 2010 at 10:43 AM, Vladimir Vukicevic <vladimir at mozilla.com> wrote:
Howdy,
At Brendan's request, I've just added a new strawman proposal for ES typed arrays to the wiki. This proposal comes from the WebGL group, which needed a way of efficient access to and manipulation of native machine-type arrays; once we came up with a reasonable baseline API, it looked like something that would be generally useful as more interop and performance demands are placed on ES. "Typed arrays" is probably not the best name; but that's probably an easy bikeshed.
I've made three similar proposals over in CommonJS.
wiki.commonjs.org/wiki/Binary/B, wiki.commonjs.org/wiki/Binary/D, wiki.commonjs.org/wiki/Binary/E
ArrayBuffer is a strict subset of what I'm calling ByteArray (which incidentally fits the TypeArray pattern you establish in your proposal). I think we could converge these proposals. Yours is most similar to Binary/B. Binary/D adds some bit types and improved the details a bit. Bineary/E abandoned the notion that binary types could become natives in CommonJS, removed the bit types, and reduced the API significantly again.
To get closer to your proposal, it would probably be easier to start from yours and add rather than start from one of mine and subtract. I recommend ByteArray for the name. It matches the pattern you establish for Integer Arrays. Secondly, I think "slice" needs to return an immutable fixed length type, or ought not be included. Daniel Friesen proposed a "Range" type and a "range" method in Binary/C which I incorporated in D and E. "range" is like slice but returns a view of the underlying buffer region and supports the relevant subset of the ByteArray API. With "range", there is no expectation of indefinite consistency. I think the various aligned typed arrays are a good idea, but I think CommonJS would be satisfied if we were to agree on the byte array buffer type initially.
Kris Kowal
For this proposal, I would not use the term ByteArray. It may lead to confusion with existing definitions of ByteArray, such as in ActionScript, where a ByteArray is a wrapper around arbitrary binary data. The ActionScript ByteArray APIs are file oriented, with members like readInt() or writeUTF(). A Google search for ByteArray returns several hits around the ActionScript ByteArray; it is a widely accepted construct to wrap binary data.
In .NET, a ByteArray is not a distinct object, but just a blob of byte data in the context of data transfers between a device and a PC.
I am unsure about the intent of JS ByteArrays. To me, the proposal looks like an Array implementation, where all elements are guaranteed to be of the same type. If this is the intent, then fine. But if the intent is to offer wrappers around native arrays, which would actually be memory areas of elements of the same type, why stick to the Array notation and semantics? Why not define a Vector object that may share a subset of the Array functionality, but that would be guaranteed to be a fast and memory-saving implementation?
Please enlighten me.
Michael
Rather than an arbitrary subset of sizes (Int32Array, etc) I would rather see some kind of generic ArrayMapping or ArrayVector that takes another array and the size of each cell (position of the array) in bits as an argument.
So.. new ArrayMapping(arrBuf, intBits, intStart, intFinish);
That way you're not imposing whatever specific need you have right now (in terms of cell size) on anyone that would have a slightly different need for the same principle.
I don't suppose the fact has to be pointed out that this is already perfectly possible to do? Create a map or vector on an array and use bitwise operators to get whatever size you desire? I'm not sure whether the WebGL group is asking for a new feature, or just something that would be faster than the aforementioned solution...
To be honest, I'm wouldn't really mind a ByteArray kind of object (even don't really care about the name), but I do oppose some kind of strict typing. It doesn't seem to fit the language...
Peter
On Jan 27, 2010, at 12:20 AM, Michael Daumling wrote:
I am unsure about the intent of JS ByteArrays. To me, the proposal
looks like an Array implementation, where all elements are
guaranteed to be of the same type. If this is the intent, then fine.
But if the intent is to offer wrappers around native arrays, which
would actually be memory areas of elements of the same type, why
stick to the Array notation and semantics? Why not define a Vector
object that may share a subset of the Array functionality, but that
would be guaranteed to be a fast and memory-saving implementation?
The WebGL group has aimed at the latter design. However, they've used
the "Array" name. I agree it's not ideal given the JS meaning of
"Array", which is much more complicated (properties with arbitrary
string-equated names, extensible .length, truncation via .length,
prototype-based delegation, holes which can be "seen through", the
list goes on).
"Buffer" is a better name, so perhaps ByteBuffer would be best, but
then the Int32Array, etc. names do not refer to the buffer-type name-
part "Array". One could imagine Int32Vector : VectorBuffer but the
latter seems redundant or incorrect (not a buffer of vectors, not a
vector of buffers). But let's not get hung up on names, yet, if the
semantics need attention.
Apart from the name, the indexing notation and .length seem
unassailable, even though .length is fixed.
So naming aside, what you ask for seems like what the WebGL group have
specified: efficient wrappers for native arrays.
On Jan 27, 2010, at 8:16 AM, Peter van der Zee wrote:
new ArrayMapping(arrBuf, intBits, intStart, intFinish);
The WebGL use-case cannot tolerate scaling by a variable "intBits"
element width. It wants constant (compile-time) element size.
Moreover, what you propose is strictly less usable than common machine-
typed views of byte vectors. So if something like the above were the
only API, there would soon be a set of standard wrappers. Let's avoid
requiring that wheel to be reinvented in the ecosystem.
I don't suppose the fact has to be pointed out that this is already
perfectly possible to do? Create a map or vector on an array and use bitwise
operators to get whatever size you desire?
No, JS Arrays are not usable as is. Besides all the problems with
prototypes, extensibility, .length mutability, and holes (holes really
do make it hard to optimize), JS Arrays hold values, values include
numbers, and numbers are (or appear to be) IEEE754 double precision
floating point.
Making JS users write bitwise ops to force things back to int32 or
uint32 (or narrower) does not address the common float (32-bit) case
in graphics hardware and *GL libraries. It also requires great care
and effort on the part of programmers -- and then the optimizing JS
engine implementers have to pattern-match on all the bitwise ops to
make it efficient.
But even if that mistaken path were taken, and the bulk of users taxed
heavily along with implementors, and ignoring holes and other
problems, the semantics of doubles remain. That is, if someone fails
to use the right bitwise clamping, they should be able to read and
write doubles. This too is not tolerable when dealing with graphics
hardware.
On 2010-01-27, at 12:17, Brendan Eich wrote:
On Jan 27, 2010, at 8:16 AM, Peter van der Zee wrote:
new ArrayMapping(arrBuf, intBits, intStart, intFinish);
The WebGL use-case cannot tolerate scaling by a variable "intBits" element width. It wants constant (compile-time) element size.
Really? In these days of the JIT-compiler, when is compile-time? I'd rather see a simple interface like the above, or like Lisp's displaced arrays, and just know that if I pick "normal" byte-sizes and offsets that I'm likely to trigger an optimization that turns my indirect array access into just a few hardware instructions (just a load if I am lucky, otherwise load/shift/mask).
On Jan 27, 2010, at 9:32 AM, P T Withington wrote:
On 2010-01-27, at 12:17, Brendan Eich wrote:
On Jan 27, 2010, at 8:16 AM, Peter van der Zee wrote:
new ArrayMapping(arrBuf, intBits, intStart, intFinish);
The WebGL use-case cannot tolerate scaling by a variable "intBits"
element width. It wants constant (compile-time) element size.Really?
Yes, really.
In these days of the JIT-compiler, when is compile-time?
It's when the C++ compiler compiles and optimizes the native code that
actually stuffs memory with machine-typed ints, more efficiently than
any JIT. You can't write such code in JS, anyway (unsafe), and if you
could we would still want ahead-of-time compilation.
Ok, say you implemented these types using an unsafe JS dialect, with
AOT compilation. Then you'd still want constants. TraceMonkey can make
such parameters constant on trace, but not all implementations can.
Anyway, we do not want to require exotic techniques. We want to allow C
++ implementations, which require constants to avoid obvious
performance hits for no good reason. Competition will kill any browser
foolish enough to take such hits.
On 2010-01-27, at 13:06, Brendan Eich wrote:
Anyway, we do not want to require exotic techniques. We want to allow C++ implementations, which require constants to avoid obvious performance hits for no good reason. Competition will kill any browser foolish enough to take such hits.
That seems inconsistent with the philosophy that classes can be syntactic sugar on closures that will be magically optimized.
I'm just seconding the suggestion that you could have a flexible syntax that would allow specifying any byte width and offset and optimize the cases where the byte width and offset are known at compile time. Instead of guessing which of the n*m combinations you should cast in stone as built-in types.
On Jan 27, 2010, at 10:15 AM, P T Withington wrote:
On 2010-01-27, at 13:06, Brendan Eich wrote:
Anyway, we do not want to require exotic techniques. We want to
allow C++ implementations, which require constants to avoid obvious
performance hits for no good reason. Competition will kill any
browser foolish enough to take such hits.That seems inconsistent with the philosophy that classes can be
syntactic sugar on closures that will be magically optimized.
That philosophy cannot possibly implement native GPU-oriented fixed- size machine-int-element-typed arrays!
I'm just seconding the suggestion that you could have a flexible
syntax that would allow specifying any byte width and offset and
optimize the cases where the byte width and offset are known at
compile time. Instead of guessing which of the n*m combinations you
should cast in stone as built-in types.
Generality is a disease in design. "It would be nice" is a malediction.
On 2010-01-27, at 13:17, Brendan Eich wrote:
On Jan 27, 2010, at 10:15 AM, P T Withington wrote:
On 2010-01-27, at 13:06, Brendan Eich wrote:
Anyway, we do not want to require exotic techniques. We want to allow C++ implementations, which require constants to avoid obvious performance hits for no good reason. Competition will kill any browser foolish enough to take such hits.
That seems inconsistent with the philosophy that classes can be syntactic sugar on closures that will be magically optimized.
That philosophy cannot possibly implement native GPU-oriented fixed-size machine-int-element-typed arrays!
I must be missing something. You can optimize:
new Int16Array(foo, n)
but not:
new IndirectArray(foo, 'int', 16, 0);
?
On Jan 27, 2010, at 10:36 AM, P T Withington wrote:
On 2010-01-27, at 13:17, Brendan Eich wrote:
On Jan 27, 2010, at 10:15 AM, P T Withington wrote:
On 2010-01-27, at 13:06, Brendan Eich wrote:
Anyway, we do not want to require exotic techniques. We want to
allow C++ implementations, which require constants to avoid
obvious performance hits for no good reason. Competition will
kill any browser foolish enough to take such hits.That seems inconsistent with the philosophy that classes can be
syntactic sugar on closures that will be magically optimized.That philosophy cannot possibly implement native GPU-oriented fixed- size machine-int-element-typed arrays!
I must be missing something. You can optimize:
new Int16Array(foo, n)
but not:
new IndirectArray(foo, 'int', 16, 0);
?
Not using classes desugaring to closures, no. I mean "no" to either of
the above. Constant is not the issue -- you changed the argument to
one of desugaring to closures.
Closures for hiding private variables may be optimized pretty
efficiently but there's a separate argument about the ultimate
performance of classes not desugared to closures, rather compiled to
more traditional runtime organizations. But that's yet another argument.
My point is that without machine types in JS, you can desugar and
lambda-code until the cows come home but you'll never bottom out in
the necessary machine instructions.
Only native code can do this currently, whether generated by a C++
compiler or even a crazy pattern-matching compiler that tries to
recognize bitwise ops clamping doubles stored in JS Arrays. The
latter, besides being crazy, does not prevent the doubles from
leaking, and again it is not something the spec should cater to, let
alone mandate.
Since we want to allow C++ implementations, indeed we expect them to
predominate, we need constant machine-int-type widths. We do not want
to add machine int types to JS itself -- I've seen no proposals
lately, and I'm opposed based on complexity and safety (silent
wraparound vs. overflow to double, but not an overflow exception).
On Jan 27, 2010, at 10:41 AM, Brendan Eich wrote:
Only native code can do this currently, whether generated by a C++
compiler or even a crazy pattern-matching compiler that tries to
recognize bitwise ops clamping doubles stored in JS Arrays. The
latter, besides being crazy, does not prevent the doubles from
leaking, and again it is not something the spec should cater to, let
alone mandate.
To avoid confusion, I mean "leak" as in "abstraction leak", not
"memory leak".
So long as some crazy JIT pattern matches clichés involving bitwise
ops before stores and after loads involving certain Array instances
(ahem, TraceMonkey does a bit of this for crypto code clichés), you
could argue that you can desugar ("sour") to closures plus bitwise ops
and plain old Arrays. But as noted, nothing guarantees that someone
won't write double into such an Array, and read it back.
This rules out such implementations as conforming to any spec such as
the WebGL "typed arrays" spec. We cannot allow such abstraction leaks.
On 1/26/2010 11:25 PM, Kris Kowal wrote:
On Tue, Jan 26, 2010 at 10:43 AM, Vladimir Vukicevic <vladimir at mozilla.com> wrote:
Howdy,
At Brendan's request, I've just added a new strawman proposal for ES typed arrays to the wiki. This proposal comes from the WebGL group, which needed a way of efficient access to and manipulation of native machine-type arrays; once we came up with a reasonable baseline API, it looked like something that would be generally useful as more interop and performance demands are placed on ES. "Typed arrays" is probably not the best name; but that's probably an easy bikeshed.
I've made three similar proposals over in CommonJS.
wiki.commonjs.org/wiki/Binary/B, wiki.commonjs.org/wiki/Binary/D, wiki.commonjs.org/wiki/Binary/E
ArrayBuffer is a strict subset of what I'm calling ByteArray (which incidentally fits the TypeArray pattern you establish in your proposal). I think we could converge these proposals. Yours is most similar to Binary/B. Binary/D adds some bit types and improved the details a bit. Bineary/E abandoned the notion that binary types could become natives in CommonJS, removed the bit types, and reduced the API significantly again.
It's actually somewhat different -- ArrayBuffer is just a raw buffer store with no ability to access its data directly. One of the /Type/Array objects need to be created to access it, in this case a Int8Array (or Uint8Array, depending on what you mean by "Byte").
We briefly considered making ArrayBuffer and Int8/Uint8Array the same, and letting you layer views on top of *8Array buffers, but it became more of a pain from an API design point of view. Having a separate "opaque buffer" type really simplifies both this API and any methods that want to use it -- if they don't care about the type of data, and just deal with raw binary buffers (e.g. network send/receive, file read, etc.) they can expect an ArrayBuffer. If they have an explicit need for a particular type (e.g. WebGL expecting an array of 16-bit ints), they can require the specific type(s).
To get closer to your proposal, it would probably be easier to start from yours and add rather than start from one of mine and subtract. I recommend ByteArray for the name. It matches the pattern you establish for Integer Arrays. Secondly, I think "slice" needs to return an immutable fixed length type, or ought not be included.
Everything is fixed-length here, though I'm not sure what you mean by
immutable -- ES Arrays' slice returns a mutable array. The issue with
slice() as currently specified on /Type/Array is that it returns a view
of the same underlying ArrayBuffer, so changes in the sliced array are
visible in the original. This is inconsistent with ES arrays, and could
be a source of confusion. Naming this range() as you suggest is
probably much better, and perhaps retaining slice() to make a copy.
However, with the API as written, current-slice() (range()) is trivially
implemented already (ignoring negative indices and range, but...):
new Uint8Array(origArray.buffer, origArray.byteOffset + startIndex, endIndex - startIndex);
And assuming range(), a real slice() would be: new Uint8Array(origArray.slice(start, end));
On Jan 27, 2010, at 10:56 AM, Vladimir Vukicevic wrote:
new Uint8Array(origArray.buffer, origArray.byteOffset + startIndex,
endIndex - startIndex);And assuming range(), a real slice() would be: new
Uint8Array(origArray.slice(start, end));
Confirmed with Vlad that the last line's bit to the right of the :
should be
new Uint8Array(origArray.range(start, end));
Apologies for not replying to the thread properly; just joined the list.
The typed array proposal Vlad sent out covers the WebGL use case and others which assemble data on the CPU to be handed to, for example, a graphics or audio subsystem.
I've seen some feedback that the proposal misses an important use case of parsing binary data read from the network. In such cases the endianness of the data is defined by the file format. The typed arrays operate on native-endian data (big-endian or little-endian, depending on the host). It is possible, though tedious and inefficient, to use the typed arrays and multiple overlapping views to read and write multi-byte values (ints, floats) with the correct endianness. People have pointed to the java.nio.ByteBuffer class as an example API, in particular its methods to read and write all of the Java primitive types with the appropriate endianness.
This functionality can be achieved with a new view type, without perturbing the existing typed array views in any way. A DataArray class can be added which supports unaligned loads and stores of all of the primitive types, with a well-defined endianness. Here is some prototype IDL:
[ Constructor(in ArrayBuffer buffer, optional in unsigned long byteOffset, optional in unsigned long length) ] interface DataArray : TypedArray { // Gets the value of the given type at the specified byte offset // from the start of the view. There is no alignment constraint; // multi-byte values may be fetched from any offset. // // For multi-byte values, the optional littleEndian argument // indicates whether a big-endian or little-endian value should be // read. If false or undefined, a big-endian value is read. // // These methods throw exceptions if they would read beyond the end // of the view. unsigned byte getUInt8(in unsigned long byteOffset); byte getInt8(in unsigned long byteOffset); unsigned short getUInt16(in unsigned long byteOffset, [optional] in boolean littleEndian); short getInt16(in unsigned long byteOffset, [optional] in boolean littleEndian); unsigned long getUInt32(in unsigned long byteOffset, [optional] in boolean littleEndian); long getInt32(in unsigned long byteOffset, [optional] in boolean littleEndian); unsigned long long getUInt64(in unsigned long byteOffset, [optional] in boolean littleEndian); long long getInt64(in unsigned long byteOffset, [optional] in boolean littleEndian); float getFloat(in unsigned long byteOffset, [optional] in boolean littleEndian); double getDouble(in unsigned long byteOffset, [optional] in boolean littleEndian);
// Stores a value of the given type at the specified byte offset // from the start of the view. There is no alignment constraint; // multi-byte values may be stored at any offset. // // For multi-byte values, the optional littleEndian argument // indicates whether the value should be stored in big-endian or // little-endian byte order. If false or undefined, the value is // stored in big-endian byte order. // // These methods throw exceptions if they would write beyond the end // of the view. void setUInt8(in unsigned long byteOffset, in unsigned byte value, [optional] in boolean littleEndian); void setInt8(in unsigned long byteOffset, in byte value, [optional] in boolean littleEndian); void setUInt16(in unsigned long byteOffset, in unsigned short value, [optional] in boolean littleEndian); void setInt16(in unsigned long byteOffset, in short value, [optional] in boolean littleEndian); void setUInt32(in unsigned long byteOffset, in unsigned long value, [optional] in boolean littleEndian); void setInt32(in unsigned long byteOffset, in long value, [optional] in boolean littleEndian); void setUInt64(in unsigned long byteOffset, in unsigned long long value, [optional] in boolean littleEndian); void setInt64(in unsigned long byteOffset, in long long value, [optional] in boolean littleEndian); void setFloat(in unsigned long byteOffset, in float value, [optional] in boolean littleEndian); void setDouble(in unsigned long byteOffset, in double value, [optional] in boolean littleEndian); };
Note that this view is stateless, and it would be trivial to write a stream-like class wrapping it. The individual accessors would be very easy to intrinsify as a few assembly instructions each.
I'd like to suggest that something like this either be mentioned in an appendix of the TypedArray proposal, or added to the proposal itself. It will likely cover the majority of the use cases of parsing binary files efficiently. (How to get the file's contents as an ArrayBuffer is a separate question.)
Comments?
Howdy,
At Brendan's request, I've just added a new strawman proposal for ES typed arrays to the wiki. This proposal comes from the WebGL group, which needed a way of efficient access to and manipulation of native machine-type arrays; once we came up with a reasonable baseline API, it looked like something that would be generally useful as more interop and performance demands are placed on ES. "Typed arrays" is probably not the best name; but that's probably an easy bikeshed.
The strawman page is at strawman:typed_arrays and the associated [rough] draft spec is available at people.mozilla.com/~vladimir/jsvec/TypedArray-spec.html
Thoughts & comments appreciated; not sure if this is something that could be considered for the core language, or for a "standard library" of sorts -- feels more like something that should go in the latter.