Feedback on Binary Data updates
The idea is definitely to subsume typed arrays as completely as possible.
- Array types of fixed length The current design fixes the length of an ArrayType instance as part of the ArrayType definition, instead of as a parameter to the resulting constructor. I'm not sure I understand the motivation for that.
The idea is that all Types have a known size, and all Data instances are allocated contiguously.
For example, if you could put unsized array types inside of struct types, it wouldn't be clear how to allocate an instance of the struct:
var MyStruct = new StructType({
a: Uint8Array,
b: Uint8Array
});
var s = new MyStruct; // ???
But you're right that this is inconsistent with typed arrays. Maybe this can be remedied by allowing both sized and unsized array types, and simply requiring nested types to be sized.
- Compatibility with Typed Arrays array objects There are a few divergences between Binary Data arrays and Typed Array arrays, that look like they could be addressed:
- The constructor difference mentioned above, including support for copy constructors.
I don't know what you mean by copy constructors. Are you talking about being able to construct a type by wrapping it around an existing ArrayBuffer? That doesn't copy, but I do think we should support it, as I said in my preso at the f2f in San Bruno. That's something I intended to add to the wiki page but hadn't gotten to yet.
- Lack of buffer, byteLength, byteOffset, BYTES_PER_ELEMENT. I see these are noted in TODO.
Yep.
I do think there's a case to be made for not exposing the ArrayBuffer for Data objects that were not explicitly constructed on top of an ArrayBuffer. This would hide architecture-specific data that is currently leaked by the Typed Arrays API. It also accommodates the two classes of usage scenario involving binary data:
Scenario 1: I/O
socket.readBuffer(1000, function(buf) {
var s = new MyStruct(buf, 0); // also allow an optional endianness argument
... do some computation on s ...
});
Scenario 2: Pure computation
var s = new MyStruct({ x: 0, y: 0 });
... do some computation on s ...
Scenario 1 comes up when reading files, network sockets, etc; here you have to let the programmer control the endianness and layout/padding. The simplest way to do the latter is simply to assume zero padding, as with Data Views, and then the programmer would have to insert padding bytes where necessary.
Scenario 2 comes up when building internal data structures. Here the system should use whatever padding and endianness is going to be the most efficient for the architecture, but that detail should ideally not be exposed to the programmer. So in that case, we could make the .buffer field censored, by having it be null or an accessor that throws.
- array.set(otherArr, offset) support on the Binary Data arrays
Good catch; looks unproblematic.
- Conversions, see below
- Different prototype chains, additional members like elementType on binary data arrays.
The last item is one of the reasons why it would be nice to pull the Typed Arrays objects into Binary Data, so that they could be augment to be fully consistent - for example, to expose the elementType.
If we can pull them into the prototype hierarchy, that's cool, but we still have to see. In particular, if we want to close off some of the leaks I describe above, then we may have to retain some distinction.
- Conversions The rules for conversions of argument values into the primitive value types seem to be different than typical ES conversions and those used by TypedArrays via WebIDL. Why not use ToInt32 and friends for conversion? Current rules appear to be quite strict - throwing on most type mismatches, and also more permissive for some unexpected cases like "0x"-prefixed strings.
Interesting question. I may have followed js-ctypes too blindly on this.
DataView integration with structs DataView is an important piece of Typed Arrays for reading from heterogenous binary data sources like files and network protocols, and for controlling endianness of data reads. DataView would seem to benefit from structs, and structs would benefit from DataView. This is another reason to want to spec DataView itself in ES.next. I imagine an additional pair of functions on DataView akin to the following would allow nice interop between DataView and Binary Data "Types"/"Data":
Data getData(Type type, unsigned long byteOffset, optional boolean littleEndian); void setData(Type type, unsigned long byteOffset, Data value, optional boolean littleEndian);
I agree that this kind of use case is important, and I'm not opposed to DataViews, but I'm not sure the ArrayBuffer approach described above doesn't already handle this, e.g.:
new T(ArrayBuffer buffer, unsigned long byteOffset, optional boolean littleEndian);
- Explicit inclusion of Uint32Array and similar objects The Uint32Array and similar objects defined in Type Arrays are the ones that are likely to be the most commonly used in many/most use cases, but these are missing from the ES.next proposal. Including them in the ES.next proposal explicitly, as supersets of the Typed Arrays objects, would avoid users having to manually create them, and help ensure full API consistency.
I'm open to this. I think there's no technical concern, just a question of what's the best "home" for Typed Arrays.
- A lot of meta- objects The spec defines 14 objects, without yet defining any of the 10 typed arrays objects. Several of the objects only serve as scaffolding for the meta-hierarchy, and don't appear to be objects which users are expected to frequently (or ever) work with. Are the named "Type" and "Data" objects needed in the proposal?
This doesn't really bother me. As you say, users don't really need to work with them; they're mostly there to set up the inheritance of shared methods, and they make for a nicely symmetric meta-class hierarchy. From the user's standpoint, they'll mostly just care about the primitive types, StructType and ArrayType, and then the type and data objects they create.
- Naming The term "Type" feels somewhat too generic for referring to struct shapes. The previous "block" terminology actually sounded more natural, or at least more scoped.
The reason I eliminated "block" was that it's such a highly-used term for many different things (e.g. block statements, block functions). The terms Type and Data are implicitly scoped to the @binary module, which is one of the benefits of modules: you don't have to explicitly scope every single definition's name to the subject matter at hand.
The idea is definitely to subsume typed arrays as completely as possible.
Great.
The idea is that all Types have a known size, and all Data instances are allocated contiguously.
For example, if you could put unsized array types inside of struct types, it wouldn't be clear how to allocate an instance of the struct:
var MyStruct = new StructType({ a: Uint8Array, b: Uint8Array }); var s = new MyStruct; // ???
But you're right that this is inconsistent with typed arrays. Maybe this can be remedied by allowing both sized and unsized array types, and simply requiring nested types to be sized.
I see. That makes sense from the struct type definition perspective. My assumption is that this usage will be a fair bit less common than the use of the array constructor for directly allocating an array. Having both sized and unsized may help, though the two are quite different, and it may be hard to sufficiently distinguish them. I wonder if it is too subtle to have "UInt8Array(5)" be the sized type, and "new UInt8Array(5)" be the allocation of a new array?
I don't know what you mean by copy constructors. Are you talking about being able to construct a type by wrapping it around an existing ArrayBuffer? That doesn't copy, but I do think we should support it, as I said in my preso at the f2f in San Bruno. That's something I intended to add to the wiki page but hadn't gotten to yet.
The following makes a copy of the buffer (and similar works if arr1 is a JS array): var arr1 = new Uint8Array(10); arr1[3] = 7; var arr2 = new Uint8Array(arr1); arr2[3] === 7 arr2[4] = 5 arr1[4] !== 5
I do think there's a case to be made for not exposing the ArrayBuffer for Data objects that were not explicitly constructed on top of an ArrayBuffer. This would hide architecture-specific data that is currently leaked by the Typed Arrays API. It also accommodates the two classes of usage scenario involving binary data: ...
Scenario 1 comes up when reading files, network sockets, etc; here you have to let the programmer control the endianness and layout/padding. The simplest way to do the latter is simply to assume zero padding, as with Data Views, and then the programmer would have to insert padding bytes where necessary.
Scenario 2 comes up when building internal data structures. Here the system should use whatever padding and endianness is going to be the most efficient for the architecture, but that detail should ideally not be exposed to the programmer. So in that case, we could make the .buffer field censored, by having it be null or an accessor that throws.
I see the two use cases, but I am a little concerned about the complexity of trying to support each with different representations of the struct. For example, what happens if I am using the "pure compute" example, and then decide I want to be able to serialize my large in-memory representation up to a binary file on the server?
I agree that this kind of use case is important, and I'm not opposed to DataViews, but I'm not sure the ArrayBuffer approach described above doesn't already handle this, e.g.: new T(ArrayBuffer buffer, unsigned long byteOffset, optional boolean littleEndian);
Indeed - that does address the use case, and aligns with what you do currently to extract embedded fixed length arrays from the buffer.
I'm open to this. I think there's no technical concern, just a question of what's the best "home" for Typed Arrays.
Right - I think that is the ultimate question. My feeling is that an ES.next spec that only includes what is currently spec'd in Binary Data will feel incomplete for many practical tasks, and would end up effectively taking a dependency on the web platform to provide a complete array story.
This doesn't really bother me. As you say, users don't really need to work with them; they're mostly there to set up the inheritance of shared methods, and they make for a nicely symmetric meta-class hierarchy. From the user's standpoint, they'll mostly just care about the primitive types, StructType and ArrayType, and then the type and data objects they create.
Yeah - I think this is likely not a significant concern. And I don't have a concrete suggestion for simplifying this currently.
The reason I eliminated "block" was that it's such a highly-used term for many different things (e.g. block statements, block functions). The terms Type and Data are implicitly scoped to the @binary module, which is one of the benefits of modules: you don't have to explicitly scope every single definition's name to the subject matter at hand.
The module scoping does help, but the name "type" still feels even more overloaded than "block". I won't bikeshed any further on naming yet though :-)
Luke
I saw that there have been some updates to the Binary Data proposal on the wiki. This is great, I think Binary Data is one of the really important enabling capabilities being added to ES.next, so I'm excited to see progress here.
Below is some feedback on the current draft proposal. The overarching theme of the feedback is the alignment with Typed Arrays. I would still really like to see Binary Data in ES.next subsume (through being a sufficiently compatible superset) Typed Arrays as currently defined and implemented in several browsers and Web APIs, to provide a single consistent and complete binary array model for the web platform. This seems to be within reach. I see that dherman is now an editor on the Typed Arrays spec as well, so I'm hoping that I'll hear that progress is already being made on this :-). I'd also be happy to help with this.
NOTES:
That is, I assumed the intended usage was:
var MyStruct = new StructType({x : uint8, y: uint8}) var MyArray = new ArrayType(MyStruct); var myArray1 = new MyArray (16); var myArray2 = new MyArray (32);
But in the current design it seems I have to create another constructor object to accomplish this:
var MyStruct = new StructType({x : uint8, y: uint8}) var MyArray1 = new ArrayType(MyStruct, 16); var MyArray2 = new ArrayType(MyStruct, 32); var myArray1 = new MyArray1(); var myArray2 = new MyArray2();
This doesn't feel like the right split between the meta levels, and makes the code more complex. Notably, I would have also expected that the "Uint8Array" constructor object defined in Typed Arrays could be defined by Binary Data directly, as:
var UInt8Array = new ArrayType(uint8);
This is less natural with the current model for array length.
The last item is one of the reasons why it would be nice to pull the Typed Arrays objects into Binary Data, so that they could be augment to be fully consistent - for example, to expose the elementType.
Conversions The rules for conversions of argument values into the primitive value types seem to be different than typical ES conversions and those used by TypedArrays via WebIDL. Why not use ToInt32 and friends for conversion? Current rules appear to be quite strict - throwing on most type mismatches, and also more permissive for some unexpected cases like "0x"-prefixed strings.
DataView integration with structs DataView is an important piece of Typed Arrays for reading from heterogenous binary data sources like files and network protocols, and for controlling endianness of data reads. DataView would seem to benefit from structs, and structs would benefit from DataView. This is another reason to want to spec DataView itself in ES.next. I imagine an additional pair of functions on DataView akin to the following would allow nice interop between DataView and Binary Data "Types"/"Data":
Data getData(Type type, unsigned long byteOffset, optional boolean littleEndian); void setData(Type type, unsigned long byteOffset, Data value, optional boolean littleEndian);
Explicit inclusion of Uint32Array and similar objects The Uint32Array and similar objects defined in Type Arrays are the ones that are likely to be the most commonly used in many/most use cases, but these are missing from the ES.next proposal. Including them in the ES.next proposal explicitly, as supersets of the Typed Arrays objects, would avoid users having to manually create them, and help ensure full API consistency.
A lot of meta- objects The spec defines 14 objects, without yet defining any of the 10 typed arrays objects. Several of the objects only serve as scaffolding for the meta-hierarchy, and don't appear to be objects which users are expected to frequently (or ever) work with. Are the named "Type" and "Data" objects needed in the proposal?
Naming The term "Type" feels somewhat too generic for referring to struct shapes. The previous "block" terminology actually sounded more natural, or at least more scoped.
Luke