Iteration in ES4
On Apr 25, 2008, at 12:09 PM, Jason Orendorff wrote:
Here are some more comments on iteration in ES4; still more to come.
Great to have feedback on a spec derived from a pretty old proposal
-- better late than not in time. :-)
=== for-each on Array objects ===
The planned behavior, as far as I can discern it, is like this:
- Properties will be visited in the order they were added.
- Enumerable properties of Array.prototype will be visited. (This will hurt libraries that add extra Array methods there, like Prototype www.prototypejs.org/api/array. There are also more obscure cases.)
You are concerned with for-each-in, I know, but the same concern
arises with for-in. This has come up many times, and there is a
converging (I hope) proposal to add Object.defineProperty or
something named similarly, which allows "DontEnum" properties to be
added to objects, including to standard constructor prototypes.
- Non-numeric ("expando") properties will be visited.
Similarity to for-in, which is bound by backward compatibility. One
improvement: the iteration protocol allows swapping in better
behavior. From the proposal:
Array.prototype.iterator::get = iterator::DEFAULT_GET_VALUES; Array.prototype.iterator::contains = function (v) this.indexOf(v) != -1;
Then one does not have to use for-each-in at all, and Arrays become
much more pleasant to use in a small-world or modular program that
customizes Array.prototype like that.
I think users will find all these details astonishing and undesirable. The first seems especially perverse. No prior standard requires it.
You probably have read 4.2 in www.ecmascript.org/es4/spec
incompatibilities.pdf but I thought I'd point to it for the list. The
de-facto standard set by competing browser implementations starting
in 1995 trumps the de-jure standard regarding for-in enumeration
following property creation order, even for arrays.
But for-each-in could do otherwise. That's a fair point, since in E4X
(ECMA-357), for-each-in follows XMLList child index order.
So let's say we make for-each-in special for Array, as it is for
XMLList (and XML, but vacuously) in E4X. Now for-each-in and for-in
differ more substantially than in the former enumerating values and
the latter keys. This could be a good thing, but it might be annoying
if one is rewriting code that does
for (prop in obj) { ... obj[prop] ... }
to look like
for each (value in obj) { ... value ... }
where obj might be an Array. The symmetry between for-each-in and for-
in that E4X half-supports (viz, prototype property enumeration with
shadowing, and deleted-after-loop-starts coherence) is broken.
The latter two are kind of implicitly specified in E4X.
The conversations I've had with E4X principals suggest they did not
intend for-each-in to consider prototype properties at all. But the
spec flatly contradicts that intention in the Semantics section:
The order of enumeration is defined by the object (steps 6 and 6a in
the first algorithm and steps 7 and 7a in the second algorithm). When
e evaluates to a value of type XML or XMLList, properties are
enumerated in ascending sequential order according to their numeric
property names (i.e., document order for XML objects).
The mechanics of enumerating the properties (steps 7 and 7a in the
first algorithm, steps 8 and 8a in the second) is implementation
dependent. Properties of the object being enumerated may be deleted
during enumeration. If a property that has not yet been visited
during enumeration is deleted, then it may not be visited. If new
properties are added to the object being enumerated during
enumeration, the newly added properties are not guaranteed to be
visited in the active enumeration. Enumerating the properties of an
object includes enumerating properties of its prototype and the
prototype of the prototype, and so on, recursively; but a property of
a prototype is not enumerated if it is "shadowed" because some
previous object in the prototype chain has a property with the same
name.
(end of E4X spec citation)
So intent and spec may be out of whack, and we should consider doing
something more aligned with intent in ES4.
A cost-benefit analysis applies here. The cost of following E4X is
real, e.g. Web pages that use Prototype can't use for-each on Arrays.
This is a bigger problem for for-in, and most Ajax libraries steer
clear of adding properties to any standard constructor prototypes. So
it's a good reminder of a deeper problem, but not as compelling with
for-each-in as with for-in -- and the ship sailed 13 years ago.
Again I'm not sure if rescuing for-each-in is going to pay off, if
the price is loss of symmetry with for-in -- and assuming programmers
can save themselves by customizing via the iteration protocol.
I don't see any offsetting benefit. Note that several ES4 classes will have custom for-each behavior, so ES4's for-each generally won't behave the way
it's specified in E4X anyway.
And E4X left order of enumeration up to "the object" anyway. And,
this may bear repeating, E4X is not in ES4. We don't want to break
E4X with a conflicting change from ES3, but where it's incomplete or
buggy, we should not be bound by it.
I like your suggestion to make for-each-in iterate indexed values, in
index order, for Array. The further thought I would appreciate more
feedback on is that for-in is in the same boat. While we can extend
E4X's for-each-in to do what you want for Array, we can't change for-
in even under opt-in versioning without imposing a hidden, and
potentially high, migration tax.
Lars and I have talked about supplying a few standard iterators for
arrays that do follow index order (and are otherwise sane). Then the
programmer would have to hook these up, or call them explicitly on
the right of 'in'. We could even sugar the opt-in, potentially a lot
(a pragma? 'use sane_array_iteration' or something like that). Comments?
=== "Type" suffix on structural type names ===
The proposal defines structural types named IteratorType,
IterableType, and ItemizableType. I think they should be named Iterator, Iterable, and Itemizable.
I agree. The -Type suffix has bothered me, although the reflect::
interface names use it too (for better reasons). I'm guilty, I will
remove it (Dave Herman was going to weigh in on it, and I bet he
agrees).
=== Generator.throw type parameter ===
The Generator class has a third type parameter, the type of exceptions that can be thrown to it. I can't think of a use case where this doesn't feel like the Java "throws" clause, which ES4 otherwise
rejects. I think the throw method should accept any value, and the third type parameter should be dropped.
We don't have declared exceptions in ES4 (and won't in ES-anything),
but you made me throw up a little in my mouth by reminding me of them
in Java :-/. I'm ok with this change too, but I'd like Lars and Dave
to sign off too.
=== Generator return-type annotations ===
The proposal doesn't specify how return-type annotations work on generator-functions. I think generator-functions should only accept a return-type annotation that boils down to one of:
// (the default)
Iterator.<X> // I'm just an iterator Generator.<X, Y> // I'm a coroutine
The run-time type of the generator-iterators produced by these
functions would be, respectively:Generator.<*, *> Generator.<X, void> // I'm just an iterator, don't send() me data Generator.<X, Y> // I'm a coroutine, send() me Ys
I like this too.
Thanks for the great comments!
On Apr 25, 2008, at 2:08 PM, Brendan Eich wrote:
for (prop in obj) { ... obj[prop] ... }
to look like
for each (value in obj) { ... value ... }
where obj might be an Array. The symmetry between for-each-in and for- in that E4X half-supports (viz, prototype property enumeration with shadowing, and deleted-after-loop-starts coherence) is broken.
Just in case this is not well-known, SpiderMonkey starting in Firefox
1.5 supported E4X and made the for-each-in loop work for all object
types, not just XML/XMLList. But not on Array element (indexed
property) values only, in index order -- again property creation
order, and named as well as indexed enumerable properties, are
visited. This shares code with for-in and preserves the equivalence
shown in the rewrite example above.
=== "Type" suffix on structural type names ===
The proposal defines structural types named IteratorType,
IterableType, and ItemizableType. I think they should be named Iterator, Iterable, and Itemizable.
+1
I agree. The -Type suffix has bothered me, although the reflect::
interface names use it too (for better reasons). I'm guilty, I will
remove it (Dave Herman was going to weigh in on it, and I bet he
agrees).
Yes, the -Type suffix makes sense in the reflect:: interfaces, because their instances are types (reflections of types, at least). But IterableType et al don't pass the "is-a" test.
I agree. The -Type suffix has bothered me, although the reflect::
interface names use it too (for better reasons). I'm guilty, I will
remove it (Dave Herman was going to weigh in on it, and I bet he
agrees).Yes, the -Type suffix makes sense in the reflect:: interfaces, because their instances are types (reflections of types, at least). But IterableType et al don't pass the "is-a" test.
I've updated the iterators/generators proposal page to eliminate the -Type suffix from
ItemizableType, IterableType, ContainerType, IteratorType
I'll update the RI shortly.
Brendan Eich wrote:
On Apr 25, 2008, at 2:08 PM, Brendan Eich wrote:
for (prop in obj) { ... obj[prop] ... }
to look like
for each (value in obj) { ... value ... }
where obj might be an Array. The symmetry between for-each-in and for- in that E4X half-supports (viz, prototype property enumeration with shadowing, and deleted-after-loop-starts coherence) is broken.
Just in case this is not well-known, SpiderMonkey starting in Firefox
1.5 supported E4X and made the for-each-in loop work for all object
types, not just XML/XMLList. But not on Array element (indexed
property) values only, in index order -- again property creation
order, and named as well as indexed enumerable properties, are
visited. This shares code with for-in and preserves the equivalence
shown in the rewrite example above.
I'm baffled trying to figure out what you're trying to say in the last paragraph.
Waldemar
On Apr 28, 2008, at 6:04 PM, Waldemar Horwat wrote:
Brendan Eich wrote:
On Apr 25, 2008, at 2:08 PM, Brendan Eich wrote:
for (prop in obj) { ... obj[prop] ... }
to look like
for each (value in obj) { ... value ... }
where obj might be an Array. The symmetry between for-each-in and
for- in that E4X half-supports (viz, prototype property enumeration with shadowing, and deleted-after-loop-starts coherence) is broken.Just in case this is not well-known, SpiderMonkey starting in Firefox 1.5 supported E4X and made the for-each-in loop work for all object types, not just XML/XMLList. But not on Array element (indexed property) values only, in index order -- again property creation order, and named as well as indexed enumerable properties, are visited. This shares code with for-in and preserves the equivalence shown in the rewrite example above.
I'm baffled trying to figure out what you're trying to say in the
last paragraph.
Let me try again:
I added for-each-in support for all types when implementing E4X in
SpiderMonkey, not just for XMLList and XML types. But I did not make
for-each-in do anything different given an array object on the right
of 'in' from what the for-in would do if you used the loop variable
to index into the array to get the value produced in the loop
variable by for-each-in.
Does that help?
Here are some more comments on iteration in ES4; still more to come.
=== for-each on Array objects ===
The planned behavior, as far as I can discern it, is like this:
I think users will find all these details astonishing and undesirable. The first seems especially perverse. No prior standard requires it.
The latter two are kind of implicitly specified in E4X. A cost-benefit analysis applies here. The cost of following E4X is real, e.g. Web pages that use Prototype can't use for-each on Arrays. I don't see any offsetting benefit. Note that several ES4 classes will have custom for-each behavior, so ES4's for-each generally won't behave the way it's specified in E4X anyway.
=== "Type" suffix on structural type names ===
The proposal defines structural types named IteratorType, IterableType, and ItemizableType. I think they should be named Iterator, Iterable, and Itemizable. The proposal says the "Type" suffix is to help lead people who aren't familiar with structural types away from a specific mistake: trying to subclass IteratorType. I doubt this will succeed. People so inclined will reach for their subclassing hammer anyway.
And I think the funny names do a disservice to people trying to learn the language. Things that should be obvious to the point of tautology ("an Iterator object is an iterator") will need explanation ("an IteratorType object is an iterator"). People coming from languages with reflection or metaclassing will be extra confused ("is an IteratorType object an iterator type?"). Please don't do this.
=== Generator.throw type parameter ===
The Generator class has a third type parameter, the type of exceptions that can be thrown to it. I can't think of a use case where this doesn't feel like the Java "throws" clause, which ES4 otherwise rejects. I think the throw method should accept any value, and the third type parameter should be dropped.
=== Generator return-type annotations ===
The proposal doesn't specify how return-type annotations work on generator-functions. I think generator-functions should only accept a return-type annotation that boils down to one of:
Iterator.<X> // I'm just an iterator Generator.<X, Y> // I'm a coroutine
The run-time type of the generator-iterators produced by these functions would be, respectively:
Generator.<*, *> Generator.<X, void> // I'm just an iterator, don't send() me data Generator.<X, Y> // I'm a coroutine, send() me Ys