The Structured Clone Wars
On Thu, Jul 14, 2011 at 12:46 PM, Mark S. Miller <erights at google.com> wrote:
At the thread "LazyReadCopy experiment and invariant checking for [[Extensible]]=false" on es-discuss, On Wed, Jul 13, 2011 at 10:29 AM, David Bruant <david.bruant at labri.fr>wrote:
Hi,
Recently, I've been thinking about the structured clone algorithm used in postMessage
Along with Dave Herman < blog.mozilla.com/dherman/2011/05/25/im-worried-about-structured-clone>, I'm worried about structure clone < www.w3.org/TR/html5/common-dom-interfaces.html#safe-passing-of-structured-data>. In order to understand it better before criticizing it, I tried implementing it in ES5 + WeakMaps. My code appears below. In writing it, I noticed some ambiguities in the spec, so I implemented my best guess about what the spec intended.
Aside: Coding this so that it is successfully defensive against changes to primordial bindings proved surprisingly tricky, and the resulting coding patterns quite unpleasant. See the explanatory comment early in the code below. Separately, we should think about how to better support defensive programming for code that must operate in the face of mutable primordials.
Ambiguities:
- When the says "If input is an Object object", I assumed it meant 'if the input's [[Class]] is "Object" '.
- By "for each enumerable property in input" combined with "Note: This does not walk the prototype chain.", I assume it meant "for each enumerable own property of input".
- By "the value of the property" combined with "Property descriptors, setters, getters, and analogous features are not copied in this process.", I assume it meant "the result of calling the [[Get]] internal method of input with the property name", even if the enumerable own property is an accessor property.
- By "corresponding to the same underlying data", I assume it meant to imply direct sharing of read/write access, leading to shared state concurrency between otherwise shared-nothing event loops.
- By "add a new property to output having the same name" combined with "in the output it would just have the default state (typically read-write, though that could depend on the scripting environment).", I assume it meant to define the property as writable: true, enumerable: true, configurable: true, rather than to call the internal [[Put]] method, in order to avoid inherited setters.
Also note that the current editor's draft dev.w3.org/html5/spec/Overview.html#safe-passing-of-structured-data has some changes. Also there is some controversy about some of them www.w3.org/Bugs/Public/show_bug.cgi?id=12101
Something that isn't clear to me is which primordials are used to set the [[Prototype]] of the generated objects. It isn't covered in the the internal structured cloning algorithm. Perhaps it is, where structured clone is invoked.
Hmmm. This revision includes "except if obtaining the value of the property involved executing script". Now imagine writing a predicate for that in JS. Among native-only objects in ES5.1, you can do it by using the original Object.getOwnPropertyDescriptor, to first check if the property is an accessor property. But even in ES5.1, that does not work for non-native (host) objects, since host objects are free to override [[GetOwnProperty]] in ways that don't violate 8.6.2. Running user code during [[GetOwnProperty]] does not violate 8.6.2.
Since ES6 proxies are intended only to uphold invariants that apply to both native and non-native objects (as recently discussed on es-discuss), proxies may also run user code in response to Object.getOwnPropertyDescriptor. And, by design (also as recently discussed on es-discuss), there's no way to test whether an object is a proxy.
Do we really want a structured clone operation that cannot be implemented in JS? This seems bad.
I think we should remove all the requirements around special handling of functions that can run script. It can definitely cause head aches for implementors, but as long as any script can run during the structured cloning then this is a problem we have to deal with anyway.
The fact that scripts can cause us to enter infinite loops, by for example creating a getter which creates a new object with a getter which creates a new object etc, is really no worse than dealing with something like |while(1) {}| appearing anywhere in a page.
However, when I talked to Dave Herman about his concerns about structured clones, he had an entirely different concern. The fact that things like prototype chains disappear (and any behavior that went along with them), any getters (and you just get a snapshot of what they returned), any setters (and any sideeffects that they implement), etc meant that the clone risks producing something very different from what you started with.
His concerns, and my rebuttal, can be read at blog.mozilla.com/dherman/2011/05/25/im-worried-about-structured-clone
Also, I should say that this is my interpretation of Dave's concerns. Please don't attribute my words to him. And dave, if you see this, feel free to speak up :)
One possible solution would be to throw if any of the objects have getters/setters or prototypes != Object.prototype. This is obviously a pretty harsh change though.
This also happens with the JSON encoder for what it's worth.
My personal belief is that while this isn't ideal, it's better than the alternatives. But others might disagree.
/ Jonas
On Thu, Jul 14, 2011 at 2:46 PM, Mark S. Miller <erights at google.com> wrote:
Ambiguities:
- When the says "If input is an Object object", I assumed it meant 'if the input's [[Class]] is "Object" '.
- By "for each enumerable property in input" combined with "Note: This does not walk the prototype chain.", I assume it meant "for each enumerable own property of input".
- By "the value of the property" combined with "Property descriptors, setters, getters, and analogous features are not copied in this process.", I assume it meant "the result of calling the [[Get]] internal method of input with the property name", even if the enumerable own property is an accessor property.
I wrote Mozilla's initial cut at structured cloning, and I resolved each of these in the same way, except that in 1) I interpreted "If input is an Object object" to exclude proxies. (We could perhaps change it.)
Allen Wirfs-Brock write:
Something that isn't clear to me is which primordials are used to set the [[Prototype]] of the generated objects. It isn't covered in the internal structured cloning algorithm. Perhaps it is, where structured clone is invoked.
Consider that in IndexedDB, the "copy" is made at the time the object is put into the database, and then the copy is used in perhaps a completely different browser instance. And in the case of postMessage, the copy is in a totally separate heap that lives on a separate thread. Perhaps this makes it clearer what structured cloning really is.
It's serialization.
Or, it's a spec fiction to explain and codify the Web-visible effects of serialization and deserialization without specifying a serialization format.
We implement a pair of functions, JS_WriteStructuredClone and JS_ReadStructuredClone. The latter requires the caller to specify the global object in which the clone is to be made. That global's primordial prototypes are used. The actual serialization format is not exposed to the web.
Back to Mark S. Miller:
- By "corresponding to the same underlying data", I assume it meant to imply direct sharing of read/write access, leading to shared state concurrency between otherwise shared-nothing event loops.
Blobs and Files are immutable. The File API spec says:
"This interface represents immutable raw data." [emphasis in original] dev.w3.org/2006/webapi/FileAPI/#dfn-Blob
And finally there's the issue raised by David on the es-discuss thread: What should the structured clone algorithm do when encountering a proxy? The algorithm as coded below will successfully "clone" proxies, for some meaning of clone. Is that the clone behavior we wish for proxies?
The structured cloning algorithm should be redefined in terms of the ES object protocol. This seems necessary anyway, for precision.
The appropriate behavior regarding proxies would fall out of that; proxies would not have to be specifically mentioned in the algorithm's spec.
(Every algorithm that mentions proxies, or really any other object type, by name is one broken piece of a Proxy.isProxy implementation.)
2011/7/15 Jason Orendorff <jason.orendorff at gmail.com>
Back to Mark S. Miller:
And finally there's the issue raised by David on the es-discuss thread: What should the structured clone algorithm do when encountering a proxy? The algorithm as coded below will successfully "clone" proxies, for some meaning of clone. Is that the clone behavior we wish for proxies?
The structured cloning algorithm should be redefined in terms of the ES object protocol. This seems necessary anyway, for precision.
The appropriate behavior regarding proxies would fall out of that; proxies would not have to be specifically mentioned in the algorithm's spec.
+1. This also matches with the behavior of JSON.stringify(aProxy): serializing a proxy as data should simply query the object's own properties by calling the appropriate traps (in the case of JSON, this includes intercepting the call to 'toJSON').
(Every algorithm that mentions proxies, or really any other object
On Jul 14, 2011, at 9:30 PM, Jason Orendorff wrote:
On Thu, Jul 14, 2011 at 2:46 PM, Mark S. Miller <erights at google.com> wrote:
Allen Wirfs-Brock write:
Something that isn't clear to me is which primordials are used to set the [[Prototype]] of the generated objects. It isn't covered in the internal structured cloning algorithm. Perhaps it is, where structured clone is invoked.
Consider that in IndexedDB, the "copy" is made at the time the object is put into the database, and then the copy is used in perhaps a completely different browser instance. And in the case of postMessage, the copy is in a totally separate heap that lives on a separate thread. Perhaps this makes it clearer what structured cloning really is.
It's serialization.
Or, it's a spec fiction to explain and codify the Web-visible effects of serialization and deserialization without specifying a serialization format.
As such, it seem like this may be a poor specification approach. Translation to/from a static serialization format would make clear that there is no sharing of any active object mechanisms such as prototype objects. This is not clear in the current specification. If the specification did use an explicit serialization format in this manner then certainly a valid optimization in appropriate situations would be for an implementation to eliminate the actual encoding/decoding of the serialized representation and to directly generate the target objects. However, by specifying it terms of such a format you would precisely define the required transformation.
If you didn't want to be bothered with inventing a serialization format solely for specification purposes you could accomplish the same thing by specify structured clone as if it was a transformation to/from JSON format
We implement a pair of functions, JS_WriteStructuredClone and JS_ReadStructuredClone. The latter requires the caller to specify the global object in which the clone is to be made. That global's primordial prototypes are used. The actual serialization format is not exposed to the web.
What happens when cloning an object that is an "Object object" whose [[Prototype]] is not Object.prototype?
It's serialization.
Or, it's a spec fiction to explain and codify the Web-visible effects of serialization and deserialization without specifying a serialization format.
As such, it seem like this may be a poor specification approach. Translation to/from a static serialization format would make clear that there is no sharing of any active object mechanisms such as prototype objects. This is not clear in the current specification. If the specification did use an explicit serialization format in this manner then certainly a valid optimization in appropriate situations would be for an implementation to eliminate the actual encoding/decoding of the serialized representation and to directly generate the target objects. However, by specifying it terms of such a format you would precisely define the required transformation.
If you didn't want to be bothered with inventing a serialization format solely for specification purposes you could accomplish the same thing by specify structured clone as if it was a transformation to/from JSON format
JSON alone may not be enough, but it shouldn't be too troublesome to specify a slightly enhanced es-specific JSON extension that includes serializations for undefined, NaN, Infinity, -Infinity, etc. And naturally, support for Date, RegExp and Function would be a huge boon. If a referencing technique were addressed this could even include a <| equivalent to address the [[Prototype]] issue Allen mentioned.
In this context some of the limitations intentionally imposed in JSON are unnecessary, so why saddle the web platform with them? A more expressive standardized serialization would be useful across the board.
On Jul 15, 2011, at 8:51 AM, Dean Landolt wrote:
It's serialization.
Or, it's a spec fiction to explain and codify the Web-visible effects of serialization and deserialization without specifying a serialization format.
As such, it seem like this may be a poor specification approach. Translation to/from a static serialization format would make clear that there is no sharing of any active object mechanisms such as prototype objects. This is not clear in the current specification. If the specification did use an explicit serialization format in this manner then certainly a valid optimization in appropriate situations would be for an implementation to eliminate the actual encoding/decoding of the serialized representation and to directly generate the target objects. However, by specifying it terms of such a format you would precisely define the required transformation.
If you didn't want to be bothered with inventing a serialization format solely for specification purposes you could accomplish the same thing by specify structured clone as if it was a transformation to/from JSON format
JSON alone may not be enough, but it shouldn't be too troublesome to specify a slightly enhanced es-specific JSON extension that includes serializations for undefined, NaN, Infinity, -Infinity, etc. And naturally, support for Date, RegExp and Function would be a huge boon. If a referencing technique were addressed this could even include a <| equivalent to address the [[Prototype]] issue Allen mentioned.
In this context some of the limitations intentionally imposed in JSON are unnecessary, so why saddle the web platform with them? A more expressive standardized serialization would be useful across the board.
JSON + an appropiate schema is enough. You can define a JSON encoded schema that deals with undefined, NaN, etc. as well as circular object references, property attributes, and other issues. For example see allenwb/jsmirrors/blob/master/jsonObjSample.js for a sketch of such a schema.
For structured clone usage cases that is all you need.
I'm less convinced that one standardized universal JS object serialization format is such a good idea. There are lots of application specific issues involved in object serialization and to create a universal format/serializer/deserializer you have to make the policy that are applied highly parameterized. I think it might be better to leave that problem to library writers.
On Fri, Jul 15, 2011 at 10:22 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
What happens when cloning an object that is an "Object object" whose [[Prototype]] is not Object.prototype?
The original object's [[Prototype]] is entirely ignored.
During deserialization, the target global's initial Object.prototype object is used as the new object's [[Prototype]].
On Fri, Jul 15, 2011 at 1:26 AM, Tom Van Cutsem <tomvc.be at gmail.com> wrote:
2011/7/15 Jason Orendorff <jason.orendorff at gmail.com>
Back to Mark S. Miller:
And finally there's the issue raised by David on the es-discuss thread: What should the structured clone algorithm do when encountering a proxy? The algorithm as coded below will successfully "clone" proxies, for some meaning of clone. Is that the clone behavior we wish for proxies?
The structured cloning algorithm should be redefined in terms of the ES object protocol. This seems necessary anyway, for precision.
The appropriate behavior regarding proxies would fall out of that; proxies would not have to be specifically mentioned in the algorithm's spec.
+1. This also matches with the behavior of JSON.stringify(aProxy): serializing a proxy as data should simply query the object's own properties by calling the appropriate traps (in the case of JSON, this includes intercepting the call to 'toJSON').
Except that you don't want to do that for host objects. Trying to clone a File object by cloning its properties is going to give you an object which is a whole lot less useful as it wouldn't contain any of the file data. Once we define support for cloning ArrayBuffers the same thing will apply to it.
This might in fact be a big hurdle to implementing structured cloning in javascript. How would a JS implementation of structured clone determine if an object is a host object which would loose all its useful semantics if cloned, vs. a "plain" JS object which can usefully be cloned?
/ Jonas
On Jul 15, 2011, at 10:00 AM, Jonas Sicking wrote:
Except that you don't want to do that for host objects. Trying to clone a File object by cloning its properties is going to give you an object which is a whole lot less useful as it wouldn't contain any of the file data. Once we define support for cloning ArrayBuffers the same thing will apply to it.
This might in fact be a big hurdle to implementing structured cloning in javascript. How would a JS implementation of structured clone determine if an object is a host object which would loose all its useful semantics if cloned, vs. a "plain" JS object which can usefully be cloned?
/ Jonas
And a cloned JS object is a lot less useful if it has lost it's original [[Prototype]]. Generalizations about host objects are no more or less valid than generalizations about pure JS objects.
This issue applies to pure JS object graphs or any serialization scheme. Sometimes language specific physical clones won't capture the desired semantics. (Consider for example, an object that references a resource by using a symbolic token to access a local resource registry). That is why the ES5 JSON encoder/decoder includes extension points such as the toJSON method. To enable semantic encodings that are different form the physical object structure.
The structured clone algorithm, as currently written allows the passing of strings, so it is possible to use in to transmit anything that can be encoded within a string. All it takes needs is an application specific encoder/decoder. It seems to me the real complication is a desire for some structured clone use cases to avoid serialization and permit sharing via a copy-on-right of a real JS object graph. If you define this sharing in terms of serialization then you probably eliminate some of the language-specific low level sharing semantic issues. But you are still going to have higher lever semantic issues such as what does it mean to serialize a File. It isn't clear to me that there is a general solution to the latter.
On Fri, Jul 15, 2011 at 1:00 PM, Jonas Sicking <jonas at sicking.cc> wrote:
On Fri, Jul 15, 2011 at 1:26 AM, Tom Van Cutsem <tomvc.be at gmail.com> wrote:
2011/7/15 Jason Orendorff <jason.orendorff at gmail.com>
Back to Mark S. Miller:
And finally there's the issue raised by David on the es-discuss thread:
What should the structured clone algorithm do when encountering a proxy? The
algorithm as coded below will successfully "clone" proxies, for some meaning of clone. Is that the clone behavior we wish for proxies?
The structured cloning algorithm should be redefined in terms of the ES object protocol. This seems necessary anyway, for precision.
The appropriate behavior regarding proxies would fall out of that; proxies would not have to be specifically mentioned in the algorithm's spec.
+1. This also matches with the behavior of JSON.stringify(aProxy): serializing a proxy as data should simply query the object's own properties by calling the appropriate traps (in the case of JSON, this includes intercepting the call to 'toJSON').
Except that you don't want to do that for host objects. Trying to clone a File object by cloning its properties is going to give you an object which is a whole lot less useful as it wouldn't contain any of the file data. Once we define support for cloning ArrayBuffers the same thing will apply to it.
This might in fact be a big hurdle to implementing structured cloning in javascript. How would a JS implementation of structured clone determine if an object is a host object which would loose all its useful semantics if cloned, vs. a "plain" JS object which can usefully be cloned?
Through the use of a serializable predicate -- perhaps toJSON, as you recognized in your response Dave's referenced post. Is it really a problem if host objects don't survive in full across serialization boundaries? As you say "All APIs that use structured cloning are pretty explicit. Things like Worker.postMessage and IDBObjectStore.put pretty explicitly creates a new copy." If you expect host objects to survive across that boundary you'll quickly learn otherwise, and it won't take long to grok the difference.
Java draws a distinction between marshalling and serialization which might be useful to this discussion:
tools.ietf.org/html/rfc2713#section-2.3
To "marshal" an object means to record its state and codebase(s) insuch a
way that when the marshalled object is "unmarshalled," a copy of the original object is obtained, possibly by automatically loading the class definitions of the object. You can marshal any object that is serializable or remote (that is, implements the java.rmi.Remote interface). Marshalling is like serialization, except marshalling also records codebases.
I agree with the conclusion in Dave's post:
A more adaptable approach might be for ECMAScript to specify “transmittable”
data structures.
But the premise suggests the need for marshalling where we can get by without preserving all this environment information. If you don't like the behavior toJSON is already metaprogrammable, and as Allen suggests a schema could be used to capture deeper type information -- it could also communicate property descriptor config. I'd rather a fully self-describing format exist but I concede it's unnecessary.
On Fri, Jul 15, 2011 at 10:22 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Jul 14, 2011, at 9:30 PM, Jason Orendorff wrote:
Or, it's a spec fiction to explain and codify the Web-visible effects of serialization and deserialization without specifying a serialization format.
As such, it seem like this may be a poor specification approach.
Perhaps. Certainly the current spec language isn't ideal.
This algorithm is in the "Here's a bunch of random stuff" section of the HTML5 standard. Perhaps the ES spec is a better place for it. I'm not sure.
On Jul 15, 2011, at 12:00 PM, Jonas Sicking wrote:
2011/7/15 Jason Orendorff <jason.orendorff at gmail.com>
The structured cloning algorithm should be redefined in terms of the ES object protocol. This seems necessary anyway, for precision.
Except that you don't want to do that for host objects.
I only meant to say that the structured cloning algorithm should be specified in precise language, not that the meaning should be drastically changed. After all this is a deployed standard, right?
If this it were to be done in the style of the ES standard, it would mean offering an extension point, such as a [[Clone]] internal method, which cloneable host objects such as File could implement. (I say [[Clone]], but there are other possibilities.)
On Fri, Jul 15, 2011 at 1:30 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
On Jul 15, 2011, at 10:00 AM, Jonas Sicking wrote:
Except that you don't want to do that for host objects. Trying to clone a File object by cloning its properties is going to give you an object which is a whole lot less useful as it wouldn't contain any of the file data. Once we define support for cloning ArrayBuffers the same thing will apply to it.
This might in fact be a big hurdle to implementing structured cloning in javascript. How would a JS implementation of structured clone determine if an object is a host object which would loose all its useful semantics if cloned, vs. a "plain" JS object which can usefully be cloned?
/ Jonas
And a cloned JS object is a lot less useful if it has lost it's original [[Prototype]].
Didn't you just argue you could communicate this kind of information with a schema? You couldn't share the actual [[Prototype]] anyway. So you'd have to pass the expected behaviors along with the object (this is why a Function serialization would be wonderful, but this could be done in a schema too).
Sure, it won't be terribly efficient since (without mutable proto or a <| like mechanism in JSON) your worker would have to another key pass to tack on the appropriate behaviors. There's no benefit to the branding info (again, no shared memory) so I don't really see the problem. Why would this JS object be substantially less useful? It just requires a slightly different paradigm -- but this is to be expected. The only alternatives I can imagine would require some kind of spec. assistance (e.g. a specified schema format or a JSON++), which I gather you were trying to avoid.
Generalizations about host objects are no more or less valid than generalizations about pure JS objects.
This issue applies to pure JS object graphs or any serialization scheme. Sometimes language specific physical clones won't capture the desired semantics. (Consider for example, an object that references a resource by using a symbolic token to access a local resource registry). That is why the ES5 JSON encoder/decoder includes extension points such as the toJSON method. To enable semantic encodings that are different form the physical object structure.
The structured clone algorithm, as currently written allows the passing of strings, so it is possible to use in to transmit anything that can be encoded within a string. All it takes needs is an application specific encoder/decoder. It seems to me the real complication is a desire for some structured clone use cases to avoid serialization and permit sharing via a copy-on-right of a real JS object graph.
There are alternatives to CoW (dherman alluded to safely transferring ownership in his post, for instance).
If you define this sharing in terms of serialization then you probably
eliminate some of the language-specific low level sharing semantic issues. But you are still going to have higher lever semantic issues such as what does it mean to serialize a File. It isn't clear to me that there is a general solution to the latter.
Why does it matter what it means to serialize a File? For the use cases in question (IndexedDB and WebWorkers) there are various paths an app could take, why would this have to be spec'ed? What does toJSON do? And does a file handle really need to make it across this serialization boundary and into your IDB store for later retrieval? I suspect not.
On Jul 15, 2011, at 10:56 AM, Dean Landolt wrote:
On Fri, Jul 15, 2011 at 1:30 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Jul 15, 2011, at 10:00 AM, Jonas Sicking wrote:
Except that you don't want to do that for host objects. T ...
And a cloned JS object is a lot less useful if it has lost it's original [[Prototype]].
Didn't you just argue you could communicate this kind of information with a schema? You couldn't share the actual [[Prototype]] anyway. So you'd have to pass the expected behaviors along with the object (this is why a Function serialization would be wonderful, but this could be done in a schema too).
Sure, it won't be terribly efficient since (without mutable proto or a <| like mechanism in JSON) your worker would have to another key pass to tack on the appropriate behaviors. There's no benefit to the branding info (again, no shared memory) so I don't really see the problem. Why would this JS object be substantially less useful? It just requires a slightly different paradigm -- but this is to be expected. The only alternatives I can imagine would require some kind of spec. assistance (e.g. a specified schema format or a JSON++), which I gather you were trying to avoid.
I was only objecting to any argument that starts out by essentially saying host objects are have unique requirements. Anything that is an issue host objects is likely to be an issue for some pure JS application.
Generalizations about host objects are no more or less valid than generalizations about pure JS objects.
This issue applies to pure JS object graphs or any serialization scheme. Sometimes language specific physical clones won't capture the desired semantics. (Consider for example, an object that references a resource by using a symbolic token to access a local resource registry). That is why the ES5 JSON encoder/decoder includes extension points such as the toJSON method. To enable semantic encodings that are different form the physical object structure.
The structured clone algorithm, as currently written allows the passing of strings, so it is possible to use in to transmit anything that can be encoded within a string. All it takes needs is an application specific encoder/decoder. It seems to me the real complication is a desire for some structured clone use cases to avoid serialization and permit sharing via a copy-on-right of a real JS object graph.
There are alternatives to CoW (dherman alluded to safely transferring ownership in his post, for instance).
In either case there are contextual issues such as the [[Prototype]] problem. More generally if you have a behavioral based (methods+accessors are the only public interface) object model then you really can't CoW or transfer ownership meaningfully and maintain no shared state illusion.
If you define this sharing in terms of serialization then you probably eliminate some of the language-specific low level sharing semantic issues. But you are still going to have higher lever semantic issues such as what does it mean to serialize a File. It isn't clear to me that there is a general solution to the latter.
Why does it matter what it means to serialize a File? For the use cases in question (IndexedDB and WebWorkers) there are various paths an app could take, why would this have to be spec'ed? What does toJSON do? And does a file handle really need to make it across this serialization boundary and into your IDB store for later retrieval? I suspect not.
Again, just an example and something that structured clone does deal even if not in a very precise manner. I was trying to say that there probably isn't a general solution communicate higher level semantic information. It needs to be designed on a case by case basis.
On Fri, Jul 15, 2011 at 2:50 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
On Jul 15, 2011, at 10:56 AM, Dean Landolt wrote:
On Fri, Jul 15, 2011 at 1:30 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
On Jul 15, 2011, at 10:00 AM, Jonas Sicking wrote:
Except that you don't want to do that for host objects. T
...
And a cloned JS object is a lot less useful if it has lost it's original [[Prototype]].
Didn't you just argue you could communicate this kind of information with a schema? You couldn't share the actual [[Prototype]] anyway. So you'd have to pass the expected behaviors along with the object (this is why a Function serialization would be wonderful, but this could be done in a schema too).
Sure, it won't be terribly efficient since (without mutable proto or a <| like mechanism in JSON) your worker would have to another key pass to tack on the appropriate behaviors. There's no benefit to the branding info (again, no shared memory) so I don't really see the problem. Why would this JS object be substantially less useful? It just requires a slightly different paradigm -- but this is to be expected. The only alternatives I can imagine would require some kind of spec. assistance (e.g. a specified schema format or a JSON++), which I gather you were trying to avoid.
I was only objecting to any argument that starts out by essentially saying host objects are have unique requirements. Anything that is an issue host objects is likely to be an issue for some pure JS application.
Okay, but what of the assertion itself? Must a cloned JS object must maintain its [[Prototype]]? I'm just curious as to why.
Generalizations about host objects are no more or less valid than
generalizations about pure JS objects.
This issue applies to pure JS object graphs or any serialization scheme. Sometimes language specific physical clones won't capture the desired semantics. (Consider for example, an object that references a resource by using a symbolic token to access a local resource registry). That is why the ES5 JSON encoder/decoder includes extension points such as the toJSON method. To enable semantic encodings that are different form the physical object structure.
The structured clone algorithm, as currently written allows the passing of strings, so it is possible to use in to transmit anything that can be encoded within a string. All it takes needs is an application specific encoder/decoder. It seems to me the real complication is a desire for some structured clone use cases to avoid serialization and permit sharing via a copy-on-right of a real JS object graph.
There are alternatives to CoW (dherman alluded to safely transferring ownership in his post, for instance).
In either case there are contextual issues such as the [[Prototype]]
problem. More generally if you have a behavioral based (methods+accessors are the only public interface) object model then you really can't CoW or transfer ownership meaningfully and maintain no shared state illusion.
By who's definition of meaningful? IIUC you're asserting that directly sharing context like [[Prototype]] is both important and impossible. I contend that the behavior-based object model can be shared, if only indirectly, by completely detaching it from the main thread's "deeply intertwined, deeply mutable object graph" (to borrow dherman's colorful phrase). This would almost certainly require spec. support but I can think of at least a few ways to do it. If something like this were doable it could open the door for the most efficient structured clone I can think of: no clone at all.
If you define this sharing in terms of serialization then you probably
eliminate some of the language-specific low level sharing semantic issues. But you are still going to have higher lever semantic issues such as what does it mean to serialize a File. It isn't clear to me that there is a general solution to the latter.
Why does it matter what it means to serialize a File? For the use cases in question (IndexedDB and WebWorkers) there are various paths an app could take, why would this have to be spec'ed? What does toJSON do? And does a file handle really need to make it across this serialization boundary and into your IDB store for later retrieval? I suspect not.
Again, just an example and something that structured clone does deal even if not in a very precise manner. I was trying to say that there probably isn't a general solution communicate higher level semantic information. It needs to be designed on a case by case basis.
Indeed, certain applications may require custom handling. But it would be great if there were an easy and obvious default. It's good enough for JSON, which is a very similar use case (especially in the context of IDB).
So I'm still curious just how important is it to transmit a faithful representation of an object, prototype and all, for the WebWorker use case? I suspect a few compromises can be made to get to an efficient postMessage that could sidestep Structured Clone entirely. Wouldn't this be a more desirable outcome anyway?
On Jul 15, 2011, at 1:45 PM, Dean Landolt wrote:
On Fri, Jul 15, 2011 at 2:50 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
Okay, but what of the assertion itself? Must a cloned JS object must maintain its [[Prototype]]? I'm just curious as to why.
If it is a local clone, yes. Essential parts of an object's behavior (and even state) may be defined by the objects along its [[Prototype]] chain. If you eliminate those it isn't behaviorally the same kind of object.
If you are talking about cloning a non-local copy then it depends upon what you are really trying to accomplish. If you are trying to create a behaviorally equivalent clone in a remote but similar environment that then you are dealing with some sort of built-in object then maybe you will be satisfied with just connecting to the equivalent built-in pototypes in the remote system (this is essentially what structured clone is doing for well known JS objects like RgExp and Date). If it is an application defined object with application defined prototypes you may want to first force remote loading of your application so you can connect to it. Or maybe you want to also serialize the [[Prototype]] chain as part of the remote cloning operation or something else. May be all you really want do is just clone some static application data without any behavioral component at all (essentially what JSON does).
...
In either case there are contextual issues such as the [[Prototype]] problem. More generally if you have a behavioral based (methods+accessors are the only public interface) object model then you really can't CoW or transfer ownership meaningfully and maintain no shared state illusion.
By who's definition of meaningful? IIUC you're asserting that directly sharing context like [[Prototype]] is both important and impossible. I contend that the behavior-based object model can be shared, if only indirectly, by completely detaching it from the main thread's "deeply intertwined, deeply mutable object graph" (to borrow dherman's colorful phrase). This would almost certainly require spec. support but I can think of at least a few ways to do it. If something like this were doable it could open the door for the most efficient structured clone I can think of: no clone at all.
Perhaps, if you had the concept of immutable (and identity free??) behavioral specifications then they could be reified in multiple environment and perhaps sharing an underlying representation. But that isn't really how JavaScript programs are constructed today.
..
Indeed, certain applications may require custom handling. But it would be great if there were an easy and obvious default. It's good enough for JSON, which is a very similar use case (especially in the context of IDB).
So I'm still curious just how important is it to transmit a faithful representation of an object, prototype and all, for the WebWorker use case? I suspect a few compromises can be made to get to an efficient postMessage that could sidestep Structured Clone entirely. Wouldn't this be a more desirable outcome anyway?
Here's how I'd put it, if JSON is good enough for http server/browser client communications why isn't it good enough for communicating to a Web worker? It seems we would have better scalability if a task could be fairly transparently assigned to either a local worker or a remote compute server depending upon local capabilities, etc.
My experience (and I've worked with a lot of different OO languages and environments) is that transparent marshalling of object models for either communications or storage seldom ends up being a good long term solution. It seems very attractive but leads to problems such as schema evolution issues (particularly when long term storage is involved). Attractive nuance, don't do it :-)
On 7/15/11 1:37 PM, Dean Landolt wrote:
Is it really a problem if host objects don't survive in full across serialization boundaries?
Depending on what you mean by "in full", yes.
As you say "All APIs that use structured cloning are pretty explicit. Things like Worker.postMessage and IDBObjectStore.put pretty explicitly creates a new copy." If you expect host objects to survive across that boundary you'll quickly learn otherwise, and it won't take long to grok the difference.
The whole point of structured cloning is to pass across objects in a way that's pretty difficult to do via serialization using existing ES5 reflection facilities.
Java draws a distinction between marshalling and serialization which might be useful to this discussion:
Structured clone is closer to marshalling.
Le 14/07/2011 21:46, Mark S. Miller a écrit :
And finally there's the issue raised by David on the es-discuss thread: What should the structured clone algorithm do when encountering a proxy? The algorithm as coded below will successfully "clone" proxies, for some meaning of clone. Is that the clone behavior we wish for proxies?
I just wanted to point out that my thread was not about applying the structured clone algorithm on proxies, but implementing it with proxies (that the JS engine implements a lazy copy with proxies). However, it is indeed a very interesting question to ask.
At the thread "LazyReadCopy experiment and invariant checking for [[Extensible]]=false" on es-discuss, On Wed, Jul 13, 2011 at 10:29 AM, David Bruant <david.bruant at labri.fr>wrote:
Along with Dave Herman < blog.mozilla.com/dherman/2011/05/25/im-worried-about-structured-clone>,
I'm worried about structure clone < www.w3.org/TR/html5/common-dom-interfaces.html#safe-passing-of-structured-data>.
In order to understand it better before criticizing it, I tried implementing it in ES5 + WeakMaps. My code appears below. In writing it, I noticed some ambiguities in the spec, so I implemented my best guess about what the spec intended.
Aside: Coding this so that it is successfully defensive against changes to primordial bindings proved surprisingly tricky, and the resulting coding patterns quite unpleasant. See the explanatory comment early in the code below. Separately, we should think about how to better support defensive programming for code that must operate in the face of mutable primordials.
Ambiguities:
Are the above interpretations correct?
Given the access to shared mutability implied by #4, I'm wondering why MessagePorts are passed separately, rather than simply being other special case like File in the structured clone algorithm.
I've been advising people to avoid the structured clone algorithm, and send only JSON serializations + MessagePorts through postMessage. It's unclear to me why structured clone wasn't instead defined to be more equivalent to JSON, or to a well chosen subset of JSON. Given that they're going to co-exist, it behooves us to understand their differences better, so that we know when to advise JSON serialization/unserialization around postMessage vs. just using structured clone directly.
There are here a fixed set of data types recognized as special cases by this algorithm. Unlike JSON, there are no extension points for a user-defined abstraction to cause its own instances to effectively be cloned, with behavior, across the boundary. But neither do we gain the advantage of avoiding calls to user code interleaved with the structured clone algorithm, if my resolution of #3 is correct, since these [[Get]] calls can call getters.
In ES6 we intend to reform [[Class]]. Allen's ES6 draft < harmony:specification_drafts> makes a
valiant start at this. How would we revise structured clone to account for [[Class]] reform?
And finally there's the issue raised by David on the es-discuss thread: What should the structured clone algorithm do when encountering a proxy? The algorithm as coded below will successfully "clone" proxies, for some meaning of clone. Is that the clone behavior we wish for proxies?
------------- sclone.js ------------------------
var sclone;
(function () { "use strict";
// The following initializations are assumed to capture initial // bindings, so that sclone is insensitive to changes to these // bindings between the creation of the sclone function and calls // to it. Note that {@code call.bind} is only called here during // initialization, so we are insensitive to whether this changes to // something other than the original Function.prototype.bind after // initialization.
var Obj = Object; var WM = WeakMap; var Bool = Boolean; var Num = Number; var Str = String; var Dat = Date; var RE = RegExp; var Err = Error; var TypeErr = TypeError;
var call = Function.prototype.call;
var getValue = call.bind(WeakMap.prototype.get); var setValue = call.bind(WeakMap.prototype.set);
var getClassRE = (/[object (.*)]/); var exec = call.bind(RegExp.prototype.exec); var toClassString = call.bind(Object.prototype.toString); function getClass(obj) { return exec(getClassRE, toClassString(obj))[1]; }
var valueOfBoolean = call.bind(Boolean.prototype.valueOf); var valueOfNumber = call.bind(Number.prototype.valueOf); var valueOfString = call.bind(String.prototype.valueOf); var valueOfDate = call.bind(Date.prototype.valueOf);
var keys = Object.keys; var forEach = call.bind(Array.prototype.forEach);
var defProp = Object.defineProperty;
// Below this line, we should no longer be sensitive to the current // bindings of built-in services we rely on.
sclone = function(input) {
}; })();