Simple maps/sets: parameterize the comparator?

# Axel Rauschmayer (14 years ago)

harmony:simple_maps_and_sets

Currently, using Object.is() is hard-coded. But one could allow a comparator function being handed in (with Object.is being the default).

# Mark S. Miller (14 years ago)

yes, I would like to extend their constructor in this way. However, I'm not sure how to spec it -- help appreciated. The problem is that the comparator needs to provide both an equivalence operation and a corresponding hash operation. When these agree, all is well. What should we specify happens when passing in a misbehaving comparator? The dilemma is that

  1. detecting misbehavior is expensive,
  2. having a deterministic collection misbehavior in the face of comparator misbehavior seems hard, and
  3. we'd like to avoid yet more under-specification. The similar under-specification of Array.prototype.sort is already bad enough.
# Allen Wirfs-Brock (14 years ago)

On Dec 27, 2011, at 10:20 AM, Mark S. Miller wrote:

Hi Axel, yes, I would like to extend their constructor in this way. However, I'm not sure how to spec it -- help appreciated. The problem is that the comparator needs to provide both an equivalence operation and a corresponding hash operation. When these agree, all is well. What should we specify happens when passing in a misbehaving comparator? The dilemma is that

  1. detecting misbehavior is expensive,

For sort comparison function we simply say that the behavior is undefined for a misbehaving comparison function. It isn't clear that this has cause any significant problems or that it is very likely that we will require detection of misbehaving sort comparison functions.

  1. having a deterministic collection misbehavior in the face of comparator misbehavior seems hard, and

But it isn't clear that this is a requirement.

  1. we'd like to avoid yet more under-specification. The similar under-specification of Array.prototype.sort is already bad enough.

But, I think that the key point here is that if we allow specification of an arbitrary comparator function then we must require that a corresponding hash function is also provided. That seems like a big step down a slippery slope.

We should probably avoid pressures for the build-ins to cover all possible use cases. In ES.next we will should have all the primitives necessary for JS programmers to define their own efficient collection objects, including various kinds of hash-tables. Hopefully some good libraries of such will get written.

# Mark S. Miller (14 years ago)

On Wed, Dec 28, 2011 at 4:23 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:

On Dec 27, 2011, at 10:20 AM, Mark S. Miller wrote:

Hi Axel, yes, I would like to extend their constructor in this way. However, I'm not sure how to spec it -- help appreciated. The problem is that the comparator needs to provide both an equivalence operation and a corresponding hash operation. When these agree, all is well. What should we specify happens when passing in a misbehaving comparator? The dilemma is that

[responding out of order]

  1. having a deterministic collection misbehavior in the face of comparator misbehavior seems hard, and

But it isn't clear that this is a requirement.

I agree. In fact, I think such detection is likely hard enough that it is effectively a requirement that the spec not obligate an implementation to detect such misbehavior.

  1. detecting misbehavior is expensive,

For sort comparison function we simply say that the behavior is undefined for a misbehaving comparison function. It isn't clear that this has cause any significant problems or that it is very likely that we will require detection of misbehaving sort comparison functions.

Likewise, I expect that detecting misbehavior sort comparison functions is too hard to be practical. However, I strongly disagree that the current underspecification isn't a problem. How undefined is this behavior? (Actually, the language in 15.4.4.11 is "implementation defined", which isn't really any better.) Can [1,2,3].sort(function(){return 1;}) evaluate to the password of the current user? Can it violate same origin restrictions or memory safety? Can it launch the nukes? If so, then none of our reasoning about security (whether conventional or ocap) or safely of any kind is sound. Rather, we get away with such reasoning because we all share an unstated model about the real limits on this misbehavior. Since any sound reasoning about safety or security depends on this shared understanding, it behooves us as authors of the specification to codify these limits.

For example, regarding sorting, leaving aside sparseness, I suspect our shared model is that any sort algorithm implementation itself takes only the following observable steps, but in some arbitrary order and not necessarily terminating:

obj.[Get], where i is an integer && 0 <= i && i < len (recall that len is already the result of array.[Get]) obj.[[Put]](String(i), x), where i as above, and x is only a value previously gotten with a [[Get]] ToNumber(comparefn(x, y)), where x and y are only values previously gotten with a [[Get]]

To adjust for sparseness, the sort function may also invoke

obj.[HasProperty] obj.[HasOwnProperty] // perhaps? I see no mention of it in the algorithm obj.[Delete] obj.[[Put]]('length', String(i))

Notice that this specification does not assume away the possibility of arbitrary side effects or I/O caused by the comparefn or by accessor properties at obj indexes or 'length'.

  1. we'd like to avoid yet more under-specification. The similar under-specification of Array.prototype.sort is already bad enough.

But, I think that the key point here is that if we allow specification of an arbitrary comparator function then we must require that a corresponding hash function is also provided. That seems like a big step down a slippery slope.

We should probably avoid pressures for the build-ins to cover all possible use cases. In ES.next we will should have all the primitives necessary for JS programmers to define their own efficient collection objects, including various kinds of hash-tables. Hopefully some good libraries of such will get written.

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is. I agree (though it didn't make it into ES6), but that's the dual of the question Axel is asking.

What do you think of my approach to limiting the possible misbehavior of sort?

# Mark S. Miller (14 years ago)

On Wed, Dec 28, 2011 at 6:48 PM, Mark S. Miller <erights at google.com> wrote: [...]

For example, regarding sorting, leaving aside sparseness, I suspect our shared model is that any sort algorithm implementation itself takes only the following observable steps, but in some arbitrary order and not necessarily terminating:

obj.[Get], where i is an integer && 0 <= i && i < len (recall that len is already the result of array.[Get]) obj.[[Put]](String(i), x), where i as above, and x is only a value previously gotten with a [[Get]] ToNumber(comparefn(x, y)), where x and y are only values previously gotten with a [[Get]]

To adjust for sparseness, the sort function may also invoke

obj.[HasProperty] obj.[HasOwnProperty] // perhaps? I see no mention of it in the algorithm obj.[Delete] obj.[[Put]]('length', String(i))

Notice that this specification does not assume away the possibility of arbitrary side effects or I/O caused by the comparefn or by accessor properties at obj indexes or 'length'.

And a call to sort stops performing any of these observable actions once it completes (whether by returning or throwing).

Are there any implementation freedoms that we actually wish to allow that are outside the above limits? I cannot think of any.

# Brendan Eich (14 years ago)

From: "Mark S. Miller" <erights at google.com>

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is. I agree (though it didn't make it into ES6), but that's the dual of the question Axel is asking.

If we really need it, I believe that we can still choose to add a standard Object.hash to go along with Object.is, and not break the complexity budget or aspirational schedule. Just my two cents. We had some consensus in 2010 on how such a hash would avoid leaking addresses or partial address order:

esdiscuss/2010-July/011489

# Mark S. Miller (14 years ago)

On Wed, Dec 28, 2011 at 10:06 PM, Brendan Eich <brendan at mozilla.com> wrote:

From: "Mark S. Miller" <erights at google.com>

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is. I agree (though it didn't make it into ES6), but that's the dual of the question Axel is asking.

If we really need it, I believe that we can still choose to add a standard Object.hash to go along with Object.is, and not break the complexity budget or aspirational schedule. Just my two cents. We had some consensus in 2010 on how such a hash would avoid leaking addresses or partial address order:

esdiscuss/2010-July/011489

I'd be happy to see it accepted into ES-next. Part of negotiating the complexity budget is to also beware of precedent setting, i.e., accepting new small items may invite the advancing of other new small items, each of which is well below the complexity budget. But even then, I think I still agree on this one.

A shame no one thought to advance it at the historic May meeting.

# David Bruant (14 years ago)

Le 29/12/2011 05:08, Mark S. Miller a écrit :

On Wed, Dec 28, 2011 at 6:48 PM, Mark S. Miller <erights at google.com <mailto:erights at google.com>> wrote: [...]

For example, regarding sorting, leaving aside sparseness, I
suspect our shared model is that any sort algorithm implementation
itself takes only the following observable steps, but in some
arbitrary order and not necessarily terminating:


obj.[[Get]](String(i)), where i is an integer && 0 <= i && i < len
(recall that len is already the result of array.[[Get]]('length'))
obj.[[Put]](String(i), x), where i as above, and x is only a value
previously gotten with a [[Get]]
ToNumber(comparefn(x, y)), where x and y are only values
previously gotten with a [[Get]]

More explicitely, here: comparefn.[[Call]] with 'this' bound to 'undefined'.

# Axel Rauschmayer (14 years ago)

We should probably avoid pressures for the build-ins to cover all possible use cases. In ES.next we will should have all the primitives necessary for JS programmers to define their own efficient collection objects, including various kinds of hash-tables. Hopefully some good libraries of such will get written.

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is.

I’m curious: Is it likely that such libraries will get written by the community? This domain does not seem to enjoy much interest. (I would volunteer, but have neither the necessary expertise/experience nor the time.)

Can we profit from prior work? The following comes to mind:

# Axel Rauschmayer (14 years ago)

I’m not sure that there is much one can do to protect programmers from themselves, given that writing good hash functions is hard. One could provide a test suite.

Random thoughts:

  • Enforce that both equality and hash function are specified. A common Java problem.

  • Provide a hash function to complement Object.is() – Brandon suggested Object.hash().

  • Most common case: two objects are considered equal if (e.g.) two of their properties are equal. How can we help programmers here? Combining equals seems easy, combining hashes harder:

    function hashPerson(p) { return Object.combineHash(Object.hash(p.last), Object.hash(p.first)); // Alternative: Object.hash(p.last, p.first) }

    function equalsPerson(p1, p2) { return Object.is(p1.last, p2.last) && Object.is(p1.first, p2.first) }

  • Internationalization mainly seems to matter for sorting, so it’s probably not an issue here.

# Allen Wirfs-Brock (14 years ago)

On Dec 28, 2011, at 6:48 PM, Mark S. Miller wrote:

...

Likewise, I expect that detecting misbehavior sort comparison functions is too hard to be practical. However, I strongly disagree that the current underspecification isn't a problem. How undefined is this behavior? (Actually, the language in 15.4.4.11 is "implementation defined", which isn't really any better.) Can [1,2,3].sort(function(){return 1;}) evaluate to the password of the current user? Can it violate same origin restrictions or memory safety? Can it launch the nukes? If so, then none of our reasoning about security (whether conventional or ocap) or safely of any kind is sound. Rather, we get away with such reasoning because we all share an unstated model about the real limits on this misbehavior. Since any sound reasoning about safety or security depends on this shared understanding, it behooves us as authors of the specification to codify these limits.

For example, regarding sorting, leaving aside sparseness, I suspect our shared model is that any sort algorithm implementation itself takes only the following observable steps, but in some arbitrary order and not necessarily terminating:

obj.[Get], where i is an integer && 0 <= i && i < len (recall that len is already the result of array.[Get]) obj.[[Put]](String(i), x), where i as above, and x is only a value previously gotten with a [[Get]] ToNumber(comparefn(x, y)), where x and y are only values previously gotten with a [[Get]]

To adjust for sparseness, the sort function may also invoke

obj.[HasProperty] obj.[HasOwnProperty] // perhaps? I see no mention of it in the algorithm obj.[Delete] obj.[[Put]]('length', String(i))

Notice that this specification does not assume away the possibility of arbitrary side effects or I/O caused by the comparefn or by accessor properties at obj indexes or 'length'.

This is very close to how sort is actually specified in ES5.1(See 15.4.4.11 and particularly step 1 of the first algorithm with numbered steps). I'm not sure how this is particularly relevant as we seem to agree that it would be impractical to detect misbehaving sort functions (for which we already have a specification) or hash functions.

  1. we'd like to avoid yet more under-specification. The similar under-specification of Array.prototype.sort is already bad enough.

But, I think that the key point here is that if we allow specification of an arbitrary comparator function then we must require that a corresponding hash function is also provided. That seems like a big step down a slippery slope.

We should probably avoid pressures for the build-ins to cover all possible use cases. In ES.next we will should have all the primitives necessary for JS programmers to define their own efficient collection objects, including various kinds of hash-tables. Hopefully some good libraries of such will get written.

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is. I agree (though it didn't make it into ES6), but that's the dual of the question Axel is asking.

Actually, that wasn't my point. The existence of WeakMap/Map implies the existence of an internal object identify hash value. Using them, application level code can associate its own identify associated hash values with arbitrary objects and in some cases might even use private names to decorate objects with such hash values. This isn't as simple and probably slightly less efficient then using a system exposed identify hash value but practical enough for anybody who want to built special has tables that need to make use of identify comparisons.

My real point, was that extending the Map specification to allow for use of arbitrary comparison functions (and hence arbitrary hash functions) is perhaps a step too far because of the issues we were discussing above. I was using the fact that in ES.next application level code can implement efficient hash based data structures as a supporting argument to that conclusion.

Note that I wouldn't object to limited parameterization of Map that allows selection among ==, ===, and is as the comparison function as these all use the same object identify hash function.

# Allen Wirfs-Brock (14 years ago)

On Dec 28, 2011, at 10:14 PM, Mark S. Miller wrote:

On Wed, Dec 28, 2011 at 10:06 PM, Brendan Eich <brendan at mozilla.com> wrote: From: "Mark S. Miller" <erights at google.com>

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is. I agree (though it didn't make it into ES6), but that's the dual of the question Axel is asking.

If we really need it, I believe that we can still choose to add a standard Object.hash to go along with Object.is, and not break the complexity budget or aspirational schedule. Just my two cents. We had some consensus in 2010 on how such a hash would avoid leaking addresses or partial address order:

esdiscuss/2010-July/011489

I'd be happy to see it accepted into ES-next. Part of negotiating the complexity budget is to also beware of precedent setting, i.e., accepting new small items may invite the advancing of other new small items, each of which is well below the complexity budget. But even then, I think I still agree on this one.

A shame no one thought to advance it at the historic May meeting.

If there is no controversy about it, then I would like to see it added. However, it isn't sometime I wanted to use committee Karma on if there were any objections.

# Allen Wirfs-Brock (14 years ago)

On Dec 29, 2011, at 6:34 AM, Axel Rauschmayer wrote:

We should probably avoid pressures for the build-ins to cover all possible use cases. In ES.next we will should have all the primitives necessary for JS programmers to define their own efficient collection objects, including various kinds of hash-tables. Hopefully some good libraries of such will get written.

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is.

I’m curious: Is it likely that such libraries will get written by the community? This domain does not seem to enjoy much interest. (I would volunteer, but have neither the necessary expertise/experience nor the time.)

Well, If nobody else creates such a library, I will.

I think up to now, JS has not been particularly hospitable to collection libraries, but this will be different in ES.next.

Can we profit from prior work? The following comes to mind:

See my experiments with Smalltalk-80 collections in ES.next at allenwb/ESnext-experiments . However, if I was actually going to use Smalltalk collections as a model I would start with the ANSI Smalltalk collection specification wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf section 5.7

# Mark S. Miller (14 years ago)

On Thu, Dec 29, 2011 at 9:48 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote: [...]

For example, regarding sorting, leaving aside sparseness, I suspect our shared model is that any sort algorithm implementation itself takes only the following observable steps, but in some arbitrary order and not necessarily terminating:

obj.[Get], where i is an integer && 0 <= i && i < len (recall that len is already the result of array.[Get]) obj.[[Put]](String(i), x), where i as above, and x is only a value previously gotten with a [[Get]] ToNumber(comparefn(x, y)), where x and y are only values previously gotten with a [[Get]]

To adjust for sparseness, the sort function may also invoke

obj.[HasProperty] obj.[HasOwnProperty] // perhaps? I see no mention of it in the algorithm obj.[Delete] obj.[[Put]]('length', String(i))

Notice that this specification does not assume away the possibility of arbitrary side effects or I/O caused by the comparefn or by accessor properties at obj indexes or 'length'.

This is very close to how sort is actually specified in ES5.1(See 15.4.4.11 and particularly step 1 of the first algorithm with numbered steps). I'm not sure how this is particularly relevant as we seem to agree that it would be impractical to detect misbehaving sort functions (for which we already have a specification) or hash functions.

My point is that we could specify that sort takes only the above actions whether on not comparefn misbehaves (or its other similar preconditions are violated) -- without any need to detect whether comparefn misbehaves (or any of its other preconditions are violated). Sort is only require to terminate and produce a sorted list if all these preconditions hold, as now, but the only allowed "implementation defined behavior" is the repeated taking of the above steps, in any order, possible forever, but without any further such actions once the call to sort completes.

My guess is that all current and anticipated implementations of sort already obey this tighter spec. But by codifying this tighter spec, we repair a hole in efforts to reason formally and precisely about safety and security properties of JS.

If anyone knows or can even imagine a realistic counter example to my guess, please post.

  1. we'd like to avoid yet more under-specification. The similar under-specification of Array.prototype.sort is already bad enough.

But, I think that the key point here is that if we allow specification of an arbitrary comparator function then we must require that a corresponding hash function is also provided. That seems like a big step down a slippery slope.

We should probably avoid pressures for the build-ins to cover all possible use cases. In ES.next we will should have all the primitives necessary for JS programmers to define their own efficient collection objects, including various kinds of hash-tables. Hopefully some good libraries of such will get written.

I take it you are simply reiterating the desire for a standard system provided high entropy object-identity hash that is guaranteed to correspond to === and is. I agree (though it didn't make it into ES6), but that's the dual of the question Axel is asking.

Actually, that wasn't my point. The existence of WeakMap/Map implies the existence of an internal object identify hash value. Using them, application level code can associate its own identify associated hash values with arbitrary objects and in some cases might even use private names to decorate objects with such hash values. This isn't as simple and probably slightly less efficient then using a system exposed identify hash value but practical enough for anybody who want to built special has tables that need to make use of identify comparisons.

My real point, was that extending the Map specification to allow for use of arbitrary comparison functions (and hence arbitrary hash functions) is perhaps a step too far because of the issues we were discussing above. I was using the fact that in ES.next application level code can implement efficient hash based data structures as a supporting argument to that conclusion.

I had missed this point, and it is an excellent one. I agree. Let's keep Map as simple as possible while still providing collection library authors the primitive they need to author great collection libraries.

Note that I wouldn't object to limited parameterization of Map that allows selection among ==, ===, and is as the comparison function as these all use the same object identify hash function.

Since neither or these so-called equality operators even define equivalence classes over their inputs, the cannot be the equality portion of any correct comparator. Please let's omit this extension especially. If our fear of extension is comparator misbehavior, it seems insane to avoid this by allowing only incorrect comparators.

# Axel Rauschmayer (14 years ago)

I’m curious: Is it likely that such libraries will get written by the community? This domain does not seem to enjoy much interest. (I would volunteer, but have neither the necessary expertise/experience nor the time.)

Well, If nobody else creates such a library, I will.

  • Number.MAX_VALUE

This has the potential to be one of the most important parts of ECMAScript.next (whether as part of ECMA-262 or not).

I think up to now, JS has not been particularly hospitable to collection libraries, but this will be different in ES.next.

I might be wrong about this, but: people reinvent different kinds of wheels in different programming languages. And collections don’t seem to appeal to the JS community.

Can we profit from prior work? The following comes to mind:

See my experiments with Smalltalk-80 collections in ES.next at allenwb/ESnext-experiments . However, if I was actually going to use Smalltalk collections as a model I would start with the ANSI Smalltalk collection specification wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf section 5.7

Makes sense. There might be some additional insights in what the Lispers are doing. This is more of a hunch than actual knowledge of mine – I simply like their way of thinking about object-orientation.

# Andreas Rossberg (14 years ago)

On 29 December 2011 18:48, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

The existence of WeakMap/Map implies the existence of an internal object identify hash value.

Why? AFAICT, there is nothing currently requiring these collections to be implemented as hash tables in particular. And there may perhaps be reasons for an implementation not to (hash tables tend to be vastly overused anyway). I would prefer not to hard-code that as a requirement.