Object ID's
On 3/16/07, P T Withington <ptw at pobox.com> wrote:
On 2007-03-16, at 05:23 EDT, Lars T Hansen wrote:
The use case for object IDs seems less clear.
To visually distinguish otherwise similar objects in a debugger, for one.
Granted...
If the language does not support object ID's internally, say in the introspection interface, you have a choice of maintaining a table (which the GC may already be doing), or annotating (but also polluting) the object (which the GC also may already be doing, but in a pollution-free manner). So, while you can simulate object ID's, it should be cleaner and more efficient to provide an introspection interface to them. (I'm not suggesting that hashcode should be the object ID, just that you might want to support object ID's.)
I actually thought the meta-object proposal was flagged "optional", but it's now mandated, otherwise I'd propose adding object IDs there. Still, that'd be the best place for the functionality, as you suggest.
On Mar 16, 2007, at 4:23 AM, Lars T Hansen wrote:
IMO we'd be better off following Java here and just spell out (more) clearly that you can't use the hash code as an object ID. The purpose of intrinsic::hashcode is after all to allow fast object-identity hashing. The use case for object IDs seems less clear.
Here's my problem: if one doesn't understanding the context behind
intrinsic::hashcode, it looks like a bug:
var key = new Widget(); // our unique lookup "key" var val = new Acme(); // value to reference var hash = {}; // our "hash table" var keycode = hashcode(key); // our hash table key hash[keycode] = val; // UNSAFE
I see from the discussion page that this may be tied into some sort
of dictionary datatype? I'm having trouble understanding the use
case for this otherwise. Sorry if I'm missing the point of this
feature, the context available in the docs is either missing or over
my head. :(
The use case that brought me to wondering about this was some vague
brainstorming about a generic data binding architecture. Given that
es4 has sealed objects, the only way to safely reference a dependent
object from its "data source" is to a) create a wrapper object (which
could get expensive) or b) use a lookup table.
But, there's no way in the current spec to use a sealed object as the
key in a table, unless you implement your own lookup mechanism that
accounts for collisions, but then again it seems a bit silly to
implement a hash table in JavaScript, doesn't it?
It's not a particularly strong use case, but my point is that if you
start from the position of wanting object A to reference object B,
but object A could be sealed, then there's no stupid-simple way to
set up the reference. (OTOH, for unsealed objects it's safe to
simply add B as a member of A, provided you DontEnum it, which is
awfully handy and simple.)
On 3/16/07, Neil Mix <nmix at pandora.com> wrote:
On Mar 16, 2007, at 4:23 AM, Lars T Hansen wrote:
IMO we'd be better off following Java here and just spell out (more) clearly that you can't use the hash code as an object ID. The purpose of intrinsic::hashcode is after all to allow fast object-identity hashing. The use case for object IDs seems less clear.
Here's my problem: if one doesn't understanding the context behind intrinsic::hashcode, it looks like a bug:
var key = new Widget(); // our unique lookup "key" var val = new Acme(); // value to reference var hash = {}; // our "hash table" var keycode = hashcode(key); // our hash table key hash[keycode] = val; // UNSAFE
Sure, but the most I can do is write a clear spec. If a programmer reads guarantees into the spec that are not there (or indeed does not understand the non-guarantees spelled out) then there's not much I can do.
I see from the discussion page that this may be tied into some sort of dictionary datatype?
So far we've failed to put a Dict type into the language; the utility is understood but there's not been a critical push. At this point it may have been back-burnered long enough to be postponed until 5th Edition.
I'm having trouble understanding the use case for this otherwise. Sorry if I'm missing the point of this feature, the context available in the docs is either missing or over my head. :(
It's useful for building good hash tables that can use any object for a key.
The use case that brought me to wondering about this was some vague brainstorming about a generic data binding architecture. Given that es4 has sealed objects, the only way to safely reference a dependent object from its "data source" is to a) create a wrapper object (which could get expensive) or b) use a lookup table.
But, there's no way in the current spec to use a sealed object as the key in a table, unless you implement your own lookup mechanism that accounts for collisions, but then again it seems a bit silly to implement a hash table in JavaScript, doesn't it?
Not particularly. Objects are nice and clean, they map strings to values. Mapping arbitrary values to values is useful sometimes but less rarely needed in my experience. The language could provide a built-in library (or even syntax support) for it, but ECMAScript is not Java and the language probably should only provide libraries that have substantial utility across the user base (just my opinion).
So far we've failed to put a Dict type into the language; the utility is understood but there's not been a critical push. At this point it may have been back-burnered long enough to be postponed until 5th Edition.
It feels more critical than the lambda syntax that is getting so much attention in the other current thread. What is needed in order to push this? I use AS3's Dictionary quite regularly for its weak keys, and I imagine there would be growing need in JS as the number and complexity of ajax apps increases.
Peter
On 3/20/07, Peter Hall <peter.hall at memorphic.com> wrote:
So far we've failed to put a Dict type into the language; the utility is understood but there's not been a critical push. At this point it may have been back-burnered long enough to be postponed until 5th Edition.
It feels more critical than the lambda syntax that is getting so much attention in the other current thread.
Amen to that, though there are several kinds of dictionary types one could discuss, from the simple property-less object type (more primitive than Object) to elaborate dictionary classes.
What is needed in order to push this? I use AS3's Dictionary quite regularly for its weak keys, and I imagine there would be growing need in JS as the number and complexity of ajax apps increases.
We're technically way beyond the deadline for adding something like weak tables (not something one implements on the user level) so I don't know what it would take, really. ByteArray made it in late but it's a very simple proposal.
On 2007-03-20, at 21:09 EDT, Lars T Hansen wrote:
It feels more critical than the lambda syntax that is getting so much attention in the other current thread.
Amen to that, though there are several kinds of dictionary types one could discuss, from the simple property-less object type (more primitive than Object) to elaborate dictionary classes.
The last time this came up, I said:
On 2007-01-05, at 20:14 EST, Brendan Eich wrote:
Another item falling out of the ES4 spec: hashes mapping string to
value where the mapping is not polluted by Object.prototype. A
late "save" may be possible, if anyone can suggest syntax. E.g.,
var hash = #{'foo':1, 'bar':2, 'baz':3}; alert('toString' in hash)
=> false. Eek, yet another attempt to use #.Since you can't build a pure hash in Javascript, this would be a
highly desirable addition. Naive use of Object for hash has been
the source of a number of subtle bugs in our code. One might even
be so bold as to make Hash the primitive and Object inherit from it?We would have many uses for Hash in our code base. I have defined
a dictionary class that I use for some cases, but often have had to
trade correctness for performance.A literal syntax would not be that important if you could have a
constructor with named arguments. (Because the constructor's
arguments
property would be a Hash?) It would also be useful to
have a constructor that constructed a new hash from an existing one.(Should these map value to value, rather than string to value?
E4X (ECMA 357) already introduced QName objects as identifiers, so
one can't pretend all properties are named by strings, if one
believes in E4X.)value -> value would be a bonus that I would greatly appreciate,
but then won't you need to define a protocol for extending hash- code computation and===
?
I think having a pollution-free dictionary type would be much more
important that the =>
syntax shortcut. Having a dictionary that
can map values to values would be a bonus, but also important. You
can't do either of these things efficiently in the language, which is
a good reason to make them built in.
Yes, The lambda discussion pales in comparison.
On Mar 20, 2007, at 6:34 PM, P T Withington wrote:
I think having a pollution-free dictionary type would be much more important that the
=>
syntax shortcut. Having a dictionary that can map values to values would be a bonus, but also important. You can't do either of these things efficiently in the language, which is a good reason to make them built in.
You're right. I thought so when I brought this up a couple of months
ago, and got "no new features" as the auto-reply. As Lars notes, that
is very likely to be the answer again, but I'm willing to try one
more time.
I would like a value -> value map unpolluted by prototype delegation.
The string -> value map without pollution can be implemented by
converting arbitrary keys to string and using the value -> value map.
As noted previously, objects in ES4 (which introduces Name objects to
represent namespace-qualified property names) and ES3 with E4X (which
calls Name QName) map not just string -> value, but (string, Name) ->
value. o.q::n is not the same as o['q::n'].
The developer.mozilla.org/es4/proposals/catchalls.html
proposal leaves ident untyped, but it can't be of type *. Practical
implementations that I know of would type it (int, string, Name) or
something similar (int might be constrained to fewer than 32 bits in
a tagged word). Catchalls therefore look tempting for building value -
value maps, but without changing the inside of every ES3
implementation to treat property names as arbitrary values instead of
interned strings and tagged ints, or possibly qualified-name objects,
the key type will be constrained.
So a value -> value map looks like a very special object, not
something accessed via o.p or o[i] or o.q::n or o.q::[i]. This brings
us back to the Flash Dictionary class (livedocs.adobe.com/flex
2/langref/flash/utils/Dictionary.html#methodSummary) from AS3. Note
the weakKeys constructor parameter. This class does suffer prototype-
delegated pollution, but does not intern its identifiers, instead
using === to partition values into key equivalence classes (so -0 ===
0 and NaN !== NaN).
Do we want a class that has magic (special-cased) property lookup
rules, that does not delegate to Object.prototype, and that somehow
can be operated upon without the standard methods from
Object.prototype, or overrides for these (e.g., toString)?
I think not. The minimal solution, for which intrinsic::hashcode was
devised, is a class with methods you have to call, say has, get, and
put. No property lookup syntax for testing whether a word is in the
dictionary, and no literal syntax for constructing instances by
writing the key/value pairs in succession. But it could be part of
the standard, so you could count on it everywhere. It could be
written in ES4 if it were acceptable to hold only strong references
to the keys.
Would this be enough? If so, I think TG1 ought to consider it. It
would be a small addition to the standard library.
Flash Dictionary users who rely on weakKeys = true should pipe up for
the use-cases that require this.
Brendan Eich wrote:
Would this be enough? If so, I think TG1 ought to consider it. It
would be a small addition to the standard library.
This sort of thing -- either in app frameworks or the standard library -- was certainly the purpose of adding intrinsic::hashcode. I hoped to do a whole container library in the process of ES4, but it got shelved.
Flash Dictionary users who rely on weakKeys = true should pipe up for
the use-cases that require this.
I admit to not having considered that scenario. Perhaps a general Weak<T> type would do?
a class with methods you have to call, say has, get, and put. No property lookup syntax for testing whether a word is in the dictionary, and no literal syntax for constructing instances by writing the key/value pairs in succession. But it could be part of the standard, so you could count on it everywhere.
Moving intrinsic::hashcode into this proposed class would assuage any
concerns I have about the hashcode proposal.
On Mar 20, 2007, at 6:30 PM, Lars T Hansen wrote:
Sure, but the most I can do is write a clear spec. If a programmer reads guarantees into the spec that are not there (or indeed does not understand the non-guarantees spelled out) then there's not much I can do.
Developers read documentation? ;P The spec is very clear (thank you
Lars), anyone who reads it and jumps to the wrong conclusion has it
coming to them. I worry more about developers who see the method
listed in reference docs that proliferate on the web (devguru.com,
gotAPI.com, etc), the reliability of which may be questionable. A
globally scoped method named hashcode with a summary of "returns a
hashcode for the given object"... well let's just say that I could
see a younger version of me making an assumption I ought not. ;)
I think the global scoping is my concern, it looks too all-purpose in
that context. Hence scoping to Brendan's proposed class suits me
just fine.
On 3/20/07, Neil Mix <nmix at pandora.com> wrote:
On Mar 20, 2007, at 6:30 PM, Lars T Hansen wrote:
Sure, but the most I can do is write a clear spec. If a programmer reads guarantees into the spec that are not there (or indeed does not understand the non-guarantees spelled out) then there's not much I can do.
Developers read documentation? ;P The spec is very clear (thank you Lars), anyone who reads it and jumps to the wrong conclusion has it coming to them. I worry more about developers who see the method listed in reference docs that proliferate on the web (devguru.com, gotAPI.com, etc), the reliability of which may be questionable. A globally scoped method named hashcode with a summary of "returns a hashcode for the given object"... well let's just say that I could see a younger version of me making an assumption I ought not. ;)
I think the global scoping is my concern, it looks too all-purpose in that context. Hence scoping to Brendan's proposed class suits me just fine.
Do you really think that people will assume that hash means perfect hash? I've never seen it taught or used that way. Maybe it's a common programming mistake that I'm blissfully unaware of?
On Mar 20, 2007, at 10:22 PM, Graydon Hoare wrote:
Flash Dictionary users who rely on weakKeys = true should pipe up for the use-cases that require this.
I admit to not having considered that scenario. Perhaps a general Weak<T> type would do?
In the spirit of intrinsic::hashcode, a WeakRef.<T> built-in type
would give library authors the tools needed to make their own
dictionary classes till the cows come home. But again I would hope
that we could provide one simple, guaranteed-to-be-there, default
implementation.
We still might want WeakRef.<T> for other use-cases.
We still might want WeakRef.<T> for other use-cases.
WeakRef.<T> could be derived from something like the AS3 Dictionary -
though not optimally. However, I think it's worth clarifying here that the AS3 dictionary has weak keys, but not values. Weak values would be nice-to-have, but can also be implemented on the back of a weak-key dictionary.
Peter
FWIW, The standard Flex Framework contains the string "new Dictionary" 19 times, of which 10 are weak keyed.
Peter
On 21/03/07, Brendan Eich <brendan at mozilla.org> wrote:
Catchalls therefore look tempting for building value -
value maps,
It looks like the proposal misses function intrinsic::delete(ident) and perhaps intrinsic::has(ident). The latter is necessary to allow to use objects with catchalls under with() {}.
but without changing the inside of every ES3 implementation to treat property names as arbitrary values instead of interned strings and tagged ints, or possibly qualified-name objects, the key type will be constrained.
Rhino does not threat property names as interned strings and do the full string equality test if necessary during the property search.
, Igor
On 3/21/07, Peter Hall <peter.hall at memorphic.com> wrote:
FWIW, The standard Flex Framework contains the string "new Dictionary" 19 times, of which 10 are weak keyed.
Here are some examples of those usages:
CSSStyleDeclaration uses it to keep track of clones of itself, so it can update them after changes to itself. The clones are created to be inserted into prototype chains of other CSSStyleDeclarations, which is how styles are inherited to subclasses.
To cache the result of expensive operations (eg HierarchicalCollectionView and DefaultDataDescriptor)
To implement unique ids. (uids are mainly used in list controls and collections).
To implement a simple weak reference. (eg ModuleManager)
Peter
On Mar 21, 2007, at 1:05 AM, Bob Ippolito wrote:
Do you really think that people will assume that hash means perfect hash? I've never seen it taught or used that way. Maybe it's a common programming mistake that I'm blissfully unaware of?
I agree with you in the general case, but context is everything. I
think for many developers JavaScript Objects are synonymous with
hashtables, and therefore it's not unreasonable to expect that a
global hashcode method would generate keys for use in the native
tables. (Consider that the equivalent Java method can be safely used
for object keys in the "native" Java Hashtable.) Within the context
that a developer is likely to learn about this method, there's
nothing that makes it obviously clear that this method is intended
for building your own hash table class, and should be avoided for
general-purpose object keys.
I kinda doubt there will be widespread misuse of this method
(although it's possible), but I also imagine that more than a few
people would see it and think "wtf"? Who knows, maybe I'm wrong.
A side remark about requiring intrinsic::hashcode to be unique. It can be extremely expensive to implement that in Rhino since Java does not require that Object.hashCode returns unique numbers. From java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() :
Igor Bukanov scripsit:
A side remark about requiring intrinsic::hashcode to be unique. It can be extremely expensive to implement that in Rhino since Java does not require that Object.hashCode returns unique numbers. From java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode()
What's worse, Java encourages class authors to override hashCode, and there is no basicHashCode method (as in Smalltalk) to retrieve the approximation-to-object-identity. (IMHO you ought not to override equals and hashCode unless your class represents values, but Java programmers routinely do so.)
On 21/03/07, John Cowan <cowan at ccil.org> wrote:
What's worse, Java encourages class authors to override hashCode, and there is no basicHashCode method (as in Smalltalk) to retrieve the approximation-to-object-identity.
Java provides System.identityHashCode for that, java.sun.com/javase/6/docs/api/java/lang/System.html#identityHashCode(java.lang.Object)
, Igor
I have a question about feasibility to implement intrinsic::hashcode (even with collisions) efficiently in an implementation with a copy GC collector. For me at seems that it would require to put an id in each and every object to ensure that the hashcode stays the same during the lifetime of the object. Or there is some trick to avoid that?
On the other hand Dictionary can be implemented efficiently even if GC moves objects since GC can rehash the table.
, Igor
On 3/21/07, Igor Bukanov <igor at mir2.org> wrote:
I have a question about feasibility to implement intrinsic::hashcode (even with collisions) efficiently in an implementation with a copy GC collector. For me at seems that it would require to put an id in each and every object to ensure that the hashcode stays the same during the lifetime of the object. Or there is some trick to avoid that?
Ole Agesen published a trick a few years back in TAPOS (www.wiley.com/legacy/compbooks/object/index.html). You return the object's address when the hashcode is first obtained and set a bit in the object header. If the object survives a relocating collection and the bit is set, a field is prefixed to the header that records the original address. So you need two free bits in the header in every relocatable object, but that beats needing a full word.
You can also use a system-wide weak table that records objects that have hash codes, I imagine, with rehashes following GC, and it's not obvious this won't be efficient enough in practice (but it's not as elegant).
In the copying collectors I know of, where hashcode is based on
location, hashtables get rehashed on the first access after a gc that
may have moved objects in the table. There are lots of tricks for
minimizing the rehashes. (Ask for details if you want.) OTOH, I am
pretty sure that the Java copying collectors rely on the hash/object-
id being stored in the object.
On Mar 21, 2007, at 7:27 AM, Igor Bukanov wrote:
On 21/03/07, Brendan Eich <brendan at mozilla.org> wrote:
Catchalls therefore look tempting for building value -
value maps,
It looks like the proposal misses function intrinsic::delete(ident)
Why is this needed? There's no delete hook that can be overridden, so
the delete operator (not a function or method) can be used on any
property, either named by lexical reference, or by obj.prop or obj
[computed_name] or more complicated forms involving ::.
and perhaps intrinsic::has(ident). The latter is necessary to allow to use objects with catchalls under with() {}.
You're right that without has (not the intrinsic::has that would be
the override-able built-in -- but why do we need that given the in
operator?) one cannot make with (obj) foo work when obj = {get *(id)
{ if (id === "foo") return 42; return undefined }}. The parallel
construct would be obj = {has *(id) {...}}. Is this really necessary
just for with and with-like constructs (E4X's o.(e) filtering
predicate)?
On 22/03/07, Brendan Eich <brendan at mozilla.org> wrote:
On Mar 21, 2007, at 7:27 AM, Igor Bukanov wrote:
It looks like the proposal misses function intrinsic::delete(ident)
Why is this needed? There's no delete hook that can be overridden, so the delete operator (not a function or method) can be used on any property, either named by lexical reference, or by obj.prop or obj [computed_name] or more complicated forms involving ::.
Do you mean that ES4 already allows to override delete operator?
You're right that without has (not the intrinsic::has that would be the override-able built-in -- but why do we need that given the in operator?) one cannot make with (obj) foo work when obj = {get *(id) { if (id === "foo") return 42; return undefined }}. The parallel construct would be obj = {has *(id) {...}}. Is this really necessary just for with and with-like constructs (E4X's o.(e) filtering predicate)?
Oh, "in" can already be overwritten as well? Then one do not need has
- as the implementation can just call the in implementation during scope lookup to implement with statement.
, Igor
I wanted to mention a few things about the subjects around objectIds/hashCode/weak refs: First, I would love to be able to define weak references in ES4. I have worked on projects where this would be enormously helpful in reducing memory usage, and there is simply no way (that I know of) to reproduce this capability (it is much more than making something easier, it is making it possible). To me this would be the single greatest feature of ES4, and as far as the spec goes, does not seem like it would be a difficult addition. Second, I think it would be nice to have access to unique IDs for objects, that would be helpful in certain situations. Third, in Java hashCodes must NOT be unique in situations where two separate objects are defined (by the equal method) to be equivalent. For example: String str1 = new String("hello"); String str2 = new String("hello"); With these two variables str1==str2 should return false and str1.equals(str2) should equal true, and therefore str1 and str2 hashCode must be the same (JavaDocs make this very clear and this is why the String class in Java overrides the hashCode method instead of using the default implementation that usually returns a unique number corresponding to a memory location). As long as a language allows one to define the meaning of equivalence between separate objects (and those objects can be used as keys in hashes), hashCodes must not be always unique. Java allows defining the meaning of equivalence by overriding the equal method, ES4 will allow definining equivalence by overriding the == operator, so I would assume it is necessary that hashCodes not be always unique. I am not sure if this was already articulated, but it didn't seem to be. Anyway, I apologize if this is too late in the discussion, but I do really hope that weak referencing makes it in the spec. Kris
Would you be happy without unique ids being built-in, since you can implement unique ids yourself with weak-keyed dictionaries?
Peter
Yes, that would be better than nothing. However, I think that the lure of unique ids is the idea that the VM surely must be holding some unique number for each object, and exposing them would seem not very difficult and extremely fast. I don't think unique ids are a real big deal though, just tantalizing knowing they are probably there yet inaccessible. Are weak-keyed dictionaries already in the spec? Will there be no singular weak reference objects (like WeakRef<T>)?
Kris
----- Original Message ---
tantalizing knowing they are probably there yet inaccessible.
I think the important word here is "probably". The specification can't mandate this sort of thing about it should be implementated.
Are weak-keyed dictionaries already in the spec?
I don't think there is anything in there, weak-ref-wise.
Peter
On 3/27/07, Peter Hall <peter.hall at memorphic.com> wrote:
tantalizing knowing they are probably there yet inaccessible.
I think the important word here is "probably". The specification can't mandate this sort of thing about it should be implementated.
I've done (high-quality) run-time systems for Scheme and ECMAScript, and none of them have had anything like object IDs hidden in the implementation. For non-moving collectors the problem is trivial, but for moving collectors it's a fair amount of work to implement object IDs in a way that's scalable and efficient.
Are weak-keyed dictionaries already in the spec?
I don't think there is anything in there, weak-ref-wise.
There is nothing.
On 27/03/07, Lars T Hansen <lth at acm.org> wrote:
On 3/27/07, Peter Hall <peter.hall at memorphic.com> wrote:
tantalizing knowing they are probably there yet inaccessible.
I think the important word here is "probably". The specification can't mandate this sort of thing about it should be implementated.
I've done (high-quality) run-time systems for Scheme and ECMAScript, and none of them have had anything like object IDs hidden in the implementation. For non-moving collectors the problem is trivial, but for moving collectors it's a fair amount of work to implement object IDs in a way that's scalable and efficient.
Which again raises a question of the need to define hashCode in the Object class. If the reason for its existence is to allow to write hash tables (both weak and non weak), then what about just providing Dictionary and WeakDictionary without intrinsic::hashCode?
With copy collectors the overhead of Dictionary and WeakDictionary would be just in extra code to rehash the hash tables after GC. This is better than to use extra memory in each and every object even if that memory is just 2 bits for objects that are not hashed.
, Igor
On 2007-03-16, at 05:23 EDT, Lars T Hansen wrote:
To visually distinguish otherwise similar objects in a debugger, for
one.
If the language does not support object ID's internally, say in the
introspection interface, you have a choice of maintaining a table
(which the GC may already be doing), or annotating (but also
polluting) the object (which the GC also may already be doing, but in
a pollution-free manner). So, while you can simulate object ID's, it
should be cleaner and more efficient to provide an introspection
interface to them. (I'm not suggesting that hashcode should be the
object ID, just that you might want to support object ID's.)