Typed Objects / Binary Data Polyfills
On Sun, Nov 17, 2013 at 10:23 AM, K. Gadd <kg at luminance.org> wrote:
Are there any known-good polyfills for the current draft Typed Objects / Binary Data spec?
I want this, too, and will start working on it soon-ish if nobody else does or already did.
Presently, JSIL has a set of primitives that roughly correspond with a big chunk of the draft specification. I'm interested in seeing whether they can work atop ES6 typed objects, which means either adapting it to sit on top of an existing polyfill, or turning my primitives into a polyfill for the draft spec. If it's useful I might be able to find time for the latter - would having a polyfill like that be useful (assuming a good one doesn't already exist)?
Having an efficient equivalent to the spec in JS VMs is pretty important for JSIL to ever be able to deliver emscripten-level performance (a single emscripten-style fake heap is not an option because .NET relies on garbage collection). If a polyfill (even a partial one) could help move the process along for the spec, that'd be great. If what the process actually needs is some sort of feedback, maybe I could offer that instead. The status of the spec is unclear to me :)
The strawman at 1 is fairly close to what's going to end up in the spec, content-wise. Additionally, the implementation in SpiderMonkey is pretty complete by now, and there are lots of tests2. I don't know what the timing for integrating Typed Objects into the spec proper is, cc'ing Niko for that.
Since the strawman is close to the final spec, questions/nitpicks:
I noticed the current spec explicitly provides no control over element alignment/padding. Are there specific reasons behind that? It dramatically reduces the value of typed objects for doing file I/O (aside from the endianness problem, which actually doesn't matter in many of those cases), and it means they can't be used to provide direct compatibility with C struct layouts in specific cases - for example, filling vertex buffers. I understand that there is no desire to provide the full feature set available in C (fixed-size arrays, variable-size structs, etc.) but alignment/padding control is used quite a bit.
DataView has significant performance issues (some due to immature v8/spidermonkey implementations, some due to the design) that make it unsuitable for most of these use cases, even if it's the 'right' way to handle endianness (disputable).
The handwaving that WebGL implementations can 'just introspect' in these cases seems shortsighted considering the reality of WebGL: hundreds of shipped libraries and apps using current WebGL cannot be guaranteed to keep doing the right thing when interacting with typed arrays. If a typed array full of Typed Objects can still be treated like an array full of bytes or float32s, that allows existing WebGL code to keep working, as long as you ensure the layout of the objects is correct. That means people can start incrementally adding uses of Typed Objects to their code right away - and it means they can introduce them based on a polyfill of Typed Objects instead of waiting for the browser to implement both Typed Objects and new WebGL support for Typed Objects.
My primitives have control over alignment/padding and it doesn't seem to be that hard to implement (in JS, that is) - are there things that make this hard to provide from inside a VM? Being able to add extra padding, at least, would be pretty useful even if alignment has to remain locked to whatever the requirements are.
I see reference types are exposed (string, object, any) - the way this actually works needs to be clearly stated. Is it storing a GC pointer into the buffer? Are there safety concerns if it's overwritten, or loaded from a json blob or something else like that? How big are string/object/any in the internal representation? Does their size depend on whether the running browser is 32-bit or 64-bit?
I'd be open to collaborating on a polyfill of Typed Objects once it's clear how they actually work. We can repurpose JSIL's existing implementation and modify it to get the semantics in the spec.
Typed Objects polyfill lives here: dherman/structs.js Dave and I work on it, current status is pretty close to strawman minus handles and cursors (which are a bit controversial at this point and as far as I understand are not is Firefox implementation). The polyfill includes a bunch of tests; I haven't yet run it on Mozilla tests - will get to it soon hopefully.
I welcome and will be happy to review polyfill patches.
Dmitry
Oh, of course: I completely forgot about that. Thanks for the link!
Nice script indeed, and it would be very nice to somehow be able to flag that module for production/performance reason where slower engines in slower hardware are not penalized much if the native implementation is not in place.
Something that acts almost transparently, if that makes sense at all.
We definitely are looking for feedback on the proposal! Please keep it coming. Here are some answers reflecting our current thinking.
On Sun, Nov 17, 2013 at 4:07 PM, K. Gadd <kg at luminance.org> wrote:
Since the strawman is close to the final spec, questions/nitpicks:
I noticed the current spec explicitly provides no control over element alignment/padding. Are there specific reasons behind that? It dramatically reduces the value of typed objects for doing file I/O (aside from the endianness problem, which actually doesn't matter in many of those cases), and it means they can't be used to provide direct compatibility with C struct layouts in specific cases - for example, filling vertex buffers. I understand that there is no desire to provide the full feature set available in C (fixed-size arrays, variable-size structs, etc.) but alignment/padding control is used quite a bit.
DataView has significant performance issues (some due to immature v8/spidermonkey implementations, some due to the design) that make it unsuitable for most of these use cases, even if it's the 'right' way to handle endianness (disputable).
The handwaving that WebGL implementations can 'just introspect' in these cases seems shortsighted considering the reality of WebGL: hundreds of shipped libraries and apps using current WebGL cannot be guaranteed to keep doing the right thing when interacting with typed arrays. If a typed array full of Typed Objects can still be treated like an array full of bytes or float32s, that allows existing WebGL code to keep working, as long as you ensure the layout of the objects is correct. That means people can start incrementally adding uses of Typed Objects to their code right away - and it means they can introduce them based on a polyfill of Typed Objects instead of waiting for the browser to implement both Typed Objects and new WebGL support for Typed Objects.
The idea for alignment/padding is to specify it for typed objects. There is a rule that tell the user how the typed object will be aligned and that rule is set in stone. If the programmer declares a typed object, she/he knows what the memory layout is. However, we do not (at least in the first version) provide any explicit API for changing the alignment.
The rule is "Each field is padded to reside at a byte offset that is a multiple of the field type’s byte alignment (specified below via the [[ByteAlignment]] internal property). The struct type’s byte length is padded to be a multiple of the largest byte alignment of any of its fields." So every field is at its natural boundary, and the natural boundary of the struct is the largest of the natural boundaries of its field types. This rule is pretty much what C compilers do anyway.
This appears to us a good compromise between API and implementation complexity and expressiveness.
BTW, DataView are indeed implemented poorly performance-wise, at least in V8. It is on my short term list to fix this in V8.
My primitives have control over alignment/padding and it doesn't seem to be that hard to implement (in JS, that is) - are there things that make this hard to provide from inside a VM? Being able to add extra padding, at least, would be pretty useful even if alignment has to remain locked to whatever the requirements are.
I see reference types are exposed (string, object, any) - the way this actually works needs to be clearly stated. Is it storing a GC pointer into the buffer? Are there safety concerns if it's overwritten, or loaded from a json blob or something else like that? How big are string/object/any in the internal representation? Does their size depend on whether the running browser is 32-bit or 64-bit?
Typed objects that have 'string', 'object' or 'any' fields are "non-transparent". It means that there is no way for the programmer to get at their underlying storage. This is enforced by the spec.
Thanks and hope this helps, Dmitry
(Re: minor off-list clarification on non-transparent objects)
I see, it looks like this is the relevant bit of the strawman:
There are three built-in types that are considered opaque: Object, string, and Any. For security, they are not allowed to expose their internal storage since they may contain pointers (see below). A struct or array type is opaque if it contains opaque fields or elements, respectively.
A type that is not opaque is transparent. Overlooked it on my first read-through since it isn't directly referenced elsewhere. That seems like it addresses the problem.
Looking at the strawman and the polyfill and the spidermonkey implementation, each one seems to have a different API for arrays. Is this something that will get standardized later? I've seen get, getItem and [] as 3 different ways to read values out of a typed object array; [] seems like an obvious poor choice in terms of being able to polyfill, but I can see how end users would also want typed object arrays to act like actual arrays.
The spidermonkey implementation seems to expose 'fieldNames' in addition to fieldOffsets/fieldTypes for reflection, which seems like a good idea.
If DataView were to also get optimized in SpiderMonkey, that would release a lot of the pressure (use-case wise) for Typed Objects to expose fine-grained control over alignment/padding and it would make it less immediately necessary for them to exist. That's probably a good thing.
What is the intended use scenario when trying to pass a typed object array to WebGL? Pass the array's .buffer where a typed array would normally be passed? Or is it basically required that WebGL be updated to accept typed object arrays? It's not totally clear to me whether this will work or if it's already been figured out.
The elementType property on typed object arrays is a great addition; I'd suggest that normal typed arrays also be updated to expose an elementType. i.e. (new Uint8Array()).elementType === TypedObject.uint8
On Mon, Nov 18, 2013 at 12:07 PM, K. Gadd <kg at luminance.org> wrote:
(Re: minor off-list clarification on non-transparent objects)
I see, it looks like this is the relevant bit of the strawman:
There are three built-in types that are considered opaque: Object, string, and Any. For security, they are not allowed to expose their internal storage since they may contain pointers (see below). A struct or array type is opaque if it contains opaque fields or elements, respectively.
A type that is not opaque is transparent. Overlooked it on my first read-through since it isn't directly referenced elsewhere. That seems like it addresses the problem.
Looking at the strawman and the polyfill and the spidermonkey implementation, each one seems to have a different API for arrays. Is this something that will get standardized later? I've seen get, getItem and [] as 3 different ways to read values out of a typed object array; [] seems like an obvious poor choice in terms of being able to polyfill, but I can see how end users would also want typed object arrays to act like actual arrays.
We will definitely support []. "get"/"set" will not do because of typed array's existing set method that does something completely different (namely, memcpy). We might consider setItem/getItem - in the polyfill, these are considered internal methods.
The spidermonkey implementation seems to expose 'fieldNames' in addition to fieldOffsets/fieldTypes for reflection, which seems like a good idea.
Agreed, fill free to file issue (or a patch!) against the polyfill :)
If DataView were to also get optimized in SpiderMonkey, that would release a lot of the pressure (use-case wise) for Typed Objects to expose fine-grained control over alignment/padding and it would make it less immediately necessary for them to exist. That's probably a good thing.
Agreed, that is partly my motivation for prioritizing this work in V8.
What is the intended use scenario when trying to pass a typed object array to WebGL? Pass the array's .buffer where a typed array would normally be passed? Or is it basically required that WebGL be updated to accept typed object arrays? It's not totally clear to me whether this will work or if it's already been figured out.
I have some experiments lying around that wrap WebGL apis to make them understand typed objects - I'll polish them and post as samples. Fundamentally, the introspection APIs on typed objects should be enough to use typed objects with WebGL, but probably eventually WebGL API will use typed objects directly.
The elementType property on typed object arrays is a great addition; I'd suggest that normal typed arrays also be updated to expose an elementType. i.e. (new Uint8Array()).elementType === TypedObject.uint8
The idea is that all typed arrays will be special cases of typed objects, so that Uint8Array = new ArrayType(uint8).
Thanks, Dmitry
On Mon, Nov 18, 2013 at 03:07:28AM -0800, K. Gadd wrote:
If DataView were to also get optimized in SpiderMonkey, that would release a lot of the pressure (use-case wise) for Typed Objects to expose fine-grained control over alignment/padding and it would make it less immediately necessary for them to exist. That's probably a good thing.
Optimizing DataView in SM shouldn't be difficult. It hasn't been high on my priority list but it seems fairly straightforward.
Niko
On Sun, Nov 17, 2013 at 10:42:16PM +0100, Dmitry Lomov wrote:
Typed Objects polyfill lives here: dherman/structs.js Dave and I work on it, current status is pretty close to strawman minus handles and cursors (which are a bit controversial at this point and as far as I understand are not is Firefox implementation).
One correction:
Handles are implemented in the SpiderMonkey implementation and are being used in the PJS (nee Rivertrail) polyfill: nikomatsakis/pjs-polyfill
My point of view on handles:
Their design is indeed controversial. I summarized the design tradeoffs in a blog post [1]. After that post, Dmitry, Dave, and I basically decided to exclude handles from the typed objects spec but possibly to include them in other specs (such as PJS) that build on typed objects.
As I see it, the reasoning for deferring handles is as follows:
-
Handles are not as important for performance as initially imagined. Basically, the original impetus behind handles was to give users an explicit way to avoid the intermediate objects that are created by expressions like
array[1].foo[3]
. But, at least in the SM implementation, these intermediate objects are typically optimized away once we get into the JIT. Moreover, with an efficient GC, the cost of such intermediate objects may not be that high. Given those facts, the complexity of movable and nullable handles doesn't seem worth it. -
A secondary use case for handles is as a kind of revokable capability into a buffer, but for this to be of use we must make sure we get the details correct. For many advanced APIs, it can be useful to give away a pointer into a (previously unaliased) buffer and then be able to revoke that pointer, hence ensuring that the buffer is again unaliased. Use cases like this might justify movable and nullable handles, even if raw performance does not.
However, in cases like these, the details are crucial. If we were to design handles in isolation, rather than in tandem with the relevant specs, we might wind up with a design that does not provide adequate guarantees. Also -- there may be other ways to achieve those same goals, such as something akin to the current "neutering" of buffers that occurs when a buffer is transferred between workers.
Niko
And Niko's excellent post on handles is here: smallcultfollowing.com/babysteps/blog/2013/10/18/typed-object-handles Sorry for not referencing it in my reply!
On Sun, Nov 17, 2013 at 02:04:57PM +0100, Till Schneidereit wrote:
The strawman at [1] is fairly close to what's going to end up in the spec, content-wise. Additionally, the implementation in SpiderMonkey is pretty complete by now, and there are lots of tests[2].
Indeed, it's approaching full functionality. For those who may want to experiment, keep in mind that (1) all typed object support is only available in Nightly builds and (2) all globals are contained behind a "TypedObject" meta object (e.g., to create a new struct type, you write:
var Point = new TypedObject.StructType({x: TypedObject.float32, ...})
Here are the major features that are not yet landed and their status:
- Reference types (Bug 898359 -- reviewed, landing very soon)
- Support for unsized typed arrays (Bug 922115 -- implemented, not yet reviewed)
- Integrate typed objects and typed arrays (Bug 898356 -- not yet implemented)
Obviously #3 is the big one. I haven't had time to dig into it much yet, there are a number of minor steps along the way, but I don't see any fundamental difficulties. There are also various minor deviations between the spec, the polyfill, and the native SM implementation that will need to be ironed out.
I don't know what the timing for integrating Typed Objects into the spec proper is, cc'ing Niko for that.
Dmitry and I were planning on beginning work on the actual spec language soon. The goal is to advance the typed objects portion of the spec -- which I believe is fairly separable from the rest -- as quickly as possible, taking advantage of the new process.
Niko
if it's about experimenting then with(TypedObject) {}
would do :P
Any chance there will be a way to bypass most of the stuff for production?
On Mon, Nov 18, 2013 at 11:46:30AM -0800, Andrea Giammarchi wrote:
if it's about experimenting then
with(TypedObject) {}
would do :PAny chance there will be a way to bypass most of the stuff for production?
Sorry I don't understand the question.
Niko
I believe one of the benefits on having Typed Objects is to have more performant objects and collections of objects, as structs have been since basically ever in C.
In this case, a full-specs polyfill, as the one pointed out in this thread, is a very nice to have but it will inevitably slow down everything in production compared to vanilla JS objects for every not ever-green device/browser unable to optimize these objects/structs.
Accordingly, I wonder if there is any plan to make that polyfill able to ignore everything in production and do just the most essential work in order to not slow down already slow browsers in already slow devices (Android 2.3 but also FirefoxOS and ZTE ...)
As example, in 2007 I've proposed a "strict version of JavaScript" devpro.it/code/157.html
Explained here: webreflection.blogspot.com/2007/05/javastrict-strict-type-arguments.html
And with a single flag to false, all checks were disappearing from production in order to do not slow down things useful for developers only (as a polyfill for StructType would be) but not for browsers unable to optimize those references/objects/statically defined "things"
So my question was: is there any plan to be able to mark that polyfill in a way that all checks are ignored and just essentials operations are granted in order to trust the exposed behavior, without slowing down all non compatible browsers with all that logic ?
Or better: is there any plan to offer a simplified version for production that does not do everything as full-specs native would do?
I hope this is more clear, thanks for any answer.
just to extra-simplify:
you can build=debug and build=release ... is there any plan to be able to build=release that script? 'cause otherwise I'll spend some time creating a script that does inline analysis and optimizations at runtime for slower devices and/or production.
Thanks for further answers, if any.
Are there any known-good polyfills for the current draft Typed Objects / Binary Data spec?
Presently, JSIL has a set of primitives that roughly correspond with a big chunk of the draft specification. I'm interested in seeing whether they can work atop ES6 typed objects, which means either adapting it to sit on top of an existing polyfill, or turning my primitives into a polyfill for the draft spec. If it's useful I might be able to find time for the latter - would having a polyfill like that be useful (assuming a good one doesn't already exist)?
Having an efficient equivalent to the spec in JS VMs is pretty important for JSIL to ever be able to deliver emscripten-level performance (a single emscripten-style fake heap is not an option because .NET relies on garbage collection). If a polyfill (even a partial one) could help move the process along for the spec, that'd be great. If what the process actually needs is some sort of feedback, maybe I could offer that instead. The status of the spec is unclear to me :)