Array comprehension syntax

# Olav Kjær (15 years ago)

While experimenting with code-completion for ECMAScript I noticed a limitation with array comprehension syntax. The syntax as proposed (and implemented in Mozilla) has the output expression before the loop and filter:

[i.toString() for each (i in [1,2,3]) if (i != 2)]

The problem is that the variable i in the output expression is defined by code to the right of it. Since people usually writes code left to right, an editor with autocompletion is not able to provide type hints like member proposals for i. This also limits what hints an editor can give when writing an array literal. E.g. it cannot immediately flag an undefined variable in the first expression in an array literal as a possible typo, because we may be in the process of writing a comprehension in which case the variable may be defined in the loop which have not yet been typed. (Generally expressions in ECMAScript never rely on definitions to the right of the expression - comprehensions are the exception as far as I can tell.)

Traditional synax like:

var x = [];    for each (i in [1,2,3]) if (i != 2) x.push(i.toString());

or

[1,2,3].filter(function(i) { return i != 2}).map(function(i){ return i.toString(); })

does not have the problem, since they read from left to right.

I wonder if it would make sense to turn array comprehension syntax around so that they are written:

[for each (i in [1,2,3]) if (i!=2) i.toString()]

This also makes it look more familiar since it is closer to the traditional for-statement syntax. I realize this goes against the tradition from most other languages which support list comprehensions. e.g. Python. I admit l like how it reads with the output expression first, but I also think good tooling-support is pretty important for the success of a language.

Olav Junker Kjær (www.mascaraengine.com)

# Brendan Eich (15 years ago)

This sounds like a binary trade: follow Python and other precedent, or
help autocompletion tools. I don't buy it, but it is hard to argue on
these terms. Putting the comprehension expression on the right could
help, but JS is dynamic: do you really know the type of i in more
interesting cases than [i.toString() for each (i in [1,2,3]))]? That
is a contrived example.

Real comprehensions are not so easy to analyze for likely
autocompletions.

Real comprehensions are short enough the saved typing is not huge, in
my experience.

# Jeff Walden (15 years ago)

On 13.9.09 11:56 , Olav Kjær wrote:

While experimenting with code-completion for ECMAScript I noticed a limitation with array comprehension syntax. The syntax as proposed (and implemented in Mozilla) has the output expression before the loop and filter:

 [i.toString() for each (i in [1,2,3]) if (i != 2)]

The problem is that the variable i in the output expression is defined by code to the right of it.

How do you address the following function?

function foo() { i = 3; var i; return i; }

Or this one?

function foo() { try { var i = 2; throw i; } catch (e) { return i; } }

Name before declaration is a problem already posed by the existing language syntax, isn't it?

# Jason Orendorff (12 years ago)

I have a few proposals and questions about ES6 array comprehensions: people.mozilla.org/~jorendorff/es6-draft.html#sec-11.1.4.2

  1. It seems simpler and more general to accept arbitrary sequences of 'for' and 'if' clauses:

    [x if x !== undefined]
    [g for u of users if u.isAdmin() for g of u.groups()]
    

    In the current draft only [Expr ForClause+ IfClause?] is allowed.

  2. These comprehensions are permitted: [EXPR for x of obj if a, b, c] [EXPR for x of obj if x = 3]

    The first is surprising to me for two reasons: first, because commas in an array literal usually separate array elements, whereas these are sequencing commas; second, it's odd that commas are not permitted in the EXPR part of the comprehension but are permitted in the if-condition.

    The second is not so surprising, but it occurs to me that unparenthesized assignment in this context will usually be a mistake, so it might be nice to make that a syntax error.

    So I propose changing "if Expression" to "if ConditionalExpression" in the ArrayComprehension construction.

  3. SpiderMonkey already supports this nonstandard syntax: [x for each (x in obj)] A paren-free ES6 array comprehension could begin with a function call, like this: [x for each(x in obj).y of z] Currently SpiderMonkey treats 'each' as a keyword when it appears after the 'for' keyword.

    The two syntaxes are distinguishable in all cases, right? It's a little painful to get NotIn right in this case, but we can hack it. As long as the syntax is unambiguous.

# Tab Atkins Jr. (12 years ago)

On Fri, Sep 21, 2012 at 4:32 PM, Jason Orendorff <jason.orendorff at gmail.com> wrote:

  1. It seems simpler and more general to accept arbitrary sequences of 'for' and 'if' clauses:

    [x if x !== undefined]
    [g for u of users if u.isAdmin() for g of u.groups()]
    

    In the current draft only [Expr ForClause+ IfClause?] is allowed.

I assume this is meant to match Python's syntax.

  1. These comprehensions are permitted: [EXPR for x of obj if a, b, c] [EXPR for x of obj if x = 3]

    The first is surprising to me for two reasons: first, because commas in an array literal usually separate array elements, whereas these are sequencing commas; second, it's odd that commas are not permitted in the EXPR part of the comprehension but are permitted in the if-condition.

No strong opinion, but I'd be fine with your suggested change. Sequencing commas in an if statement are weird anyway.

The second is not so surprising, but it occurs to me that
unparenthesized assignment in this context will usually be a
mistake, so it might be nice to make that a syntax error.

Again, no strong opinion, but this kind of thing is occasionally useful, despite the hazard.

  1. SpiderMonkey already supports this nonstandard syntax: [x for each (x in obj)] A paren-free ES6 array comprehension could begin with a function call, like this: [x for each(x in obj).y of z] Currently SpiderMonkey treats 'each' as a keyword when it appears after the 'for' keyword.

    The two syntaxes are distinguishable in all cases, right? It's a little painful to get NotIn right in this case, but we can hack it. As long as the syntax is unambiguous.

I assume that they're always distinguishable due to the presence/absence of a space after "each", yes.

# Allen Wirfs-Brock (12 years ago)

On Sep 21, 2012, at 4:32 PM, Jason Orendorff wrote:

...

  1. These comprehensions are permitted: [EXPR for x of obj if a, b, c] [EXPR for x of obj if x = 3]

The first is surprising to me for two reasons: first, because commas in an array literal usually separate array elements, whereas these are sequencing commas; second, it's odd that commas are not permitted in the EXPR part of the comprehension but are permitted in the if-condition.

Allowing comma in the EXPR would permit things like:

[ thisIsALongSubExpression, andThisIsAlsoALongSubExpression, andThisIsAlsoALongSubExpression, andThisIsAlsoALongSubExpression, whichLookLikeElementsOfAnArrayLiteral, butAreReallyJustSubExpessionsWhoseValuesAreDiscard, x for for x of obj ]

This would at least be confusing to human readers and recursive decent parsers would probably prefer to not have to deal with it.

The notes in the draft also suggest the possibility of using AssignmentExpressions in the for and if parts. I think that would probably be a good thing to do.

The second is not so surprising, but it occurs to me that unparenthesized assignment in this context will usually be a mistake, so it might be nice to make that a syntax error.

yes but a parenthesized assignment in that position is also likely to be a mistake and there isn't any good syntactic way to prevent that.

I could probably go along with either AssignmentExpression or ConditionalExpression here. But, I'm a bit comfortable about nanny syntax restriction because they add complexity and makes the language seem arbitrarily inconsistent.

# Allen Wirfs-Brock (12 years ago)

On Sep 21, 2012, at 4:32 PM, Jason Orendorff wrote:

I have a few proposals and questions about ES6 array comprehensions: people.mozilla.org/~jorendorff/es6-draft.html#sec-11.1.4.2

  1. It seems simpler and more general to accept arbitrary sequences of 'for' and 'if' clauses:

    [x if x !== undefined] [g for u of users if u.isAdmin() for g of u.groups()]

In the current draft only [Expr ForClause+ IfClause?] is allowed.

Allowing an arbitrary sequence of |for| and |if| clauses is arguably more complex and harder to understand than the current proposal for a sequence of |for| clauses followed by a single optional |if| clause.

My personal bias, is that comprehensions are just sugar that are best used to express relatively simple and common construction use cases. Complicated sequences of |for| and |if| clauses will be rarely seen and hence less understandable than the equivalent explicit looping expansions.

  1. These comprehensions are permitted: [EXPR for x of obj if a, b, c] [EXPR for x of obj if x = 3]

see separate response

...

  1. SpiderMonkey already supports this nonstandard syntax: [x for each (x in obj)] A paren-free ES6 array comprehension could begin with a function call, like this: [x for each(x in obj).y of z] Currently SpiderMonkey treats 'each' as a keyword when it appears after the 'for' keyword.

The two syntaxes are distinguishable in all cases, right? It's a little painful to get NotIn right in this case, but we can hack it. As long as the syntax is unambiguous.

For-of was introduced to be a more generalizable equivalent to for-each-in. I don't think we want to go back on that design decision and it isn't clear in the above suggestion what the alternative would be to [x for x of keys(obj)] // or [x for x of obj.keys()] Is it [x for keys( x of obj)] ????

There seems to be all sort of complications for the form you suggesting. For example, must the function that is called be identified by a PrimaryExpressioin? Is only one argument allowed? Is the binding declaration of "x" always the first argument position. Is something the the |of| keyword in a parameter posEtc.

I find it bad enough that the Expression that references the for clause bound identifiers is to the left of the bindings. I think burying bindings in augment positions would make things even worse.

# Jason Orendorff (12 years ago)

On Sat, Sep 22, 2012 at 11:11 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

  1. SpiderMonkey already supports this nonstandard syntax: [x for each (x in obj)] A paren-free ES6 array comprehension could begin with a function call, like this: [x for each(x in obj).y of z] Currently SpiderMonkey treats 'each' as a keyword when it appears after the 'for' keyword.

[...] There seems to be all sort of complications for the form you suggesting.

I'm sorry for the confusion here. I didn't mean to propose adding for-each-in to ES6. That would indeed be awful. I just meant that SpiderMonkey cannot necessarily drop this extension. So if the ES6 syntax and SpiderMonkey's syntax can't coexist, it will be a problem for us.

# Brendan Eich (12 years ago)

Perhaps the thing to do is keep to Python: for+ if? if you get my pidgin-EBNF.

Separate issues:

  • Paren-free heads can run into trouble: esdiscuss/2011-September/016804. But in a comprehension, the subsequent 'for', 'if', or ']' (or ')' for a genexp) helps, so this may not be a problem in fact. We need to work through it, though.

  • We don't need to worry about ECMA-357 (E4X) for-each-in too much here. SpiderMonkey can cope, the standard shouldn't contextually reserve 'each' or anything.

# Brendan Eich (12 years ago)

Jason Orendorff wrote:

I'm sorry for the confusion here. I didn't mean to propose adding for-each-in to ES6. That would indeed be awful. I just meant that SpiderMonkey cannot necessarily drop this extension. So if the ES6 syntax and SpiderMonkey's syntax can't coexist, it will be a problem for us.

Not to bother es-discuss too much, but in case there's interest:

I do hope we can retire 'each' from SpiderMonkey at some point. Supporting it in comprehensions has a cost. Your example:

[x for each(x in obj).y of z]

can't be parsed without error given our current for-each implementation, and backtracking and restarting with 'each' not contextually meaningful sounds "fun".

I don't expect a lot of SpiderMonkey- (and Rhino-) specific code will break, so making this an error when we support ES6 may be the best way. Pull the bandage off quickly!

# Yusuke Suzuki (12 years ago)
# Jason Orendorff (12 years ago)

On Sat, Sep 22, 2012 at 11:21 AM, Brendan Eich <brendan at mozilla.org> wrote:

Perhaps the thing to do is keep to Python: for+ if? if you get my pidgin-EBNF.

But that isn't Python's syntax. Python's comprehensions are: for (for|if)*

Haskell's comprehensions are: (for|if|let)* This is what I would prefer for JS.

Clojure's comprehensions are: for+ (if|while|let)* Clojure puts the expression at the right, which I like, for the reasons Allen mentioned.

CoffeeScript's comprehensions are: for when? Only a single 'for' clause. It can't be used to flatten an array of arrays.

I actually kind of like Allen's argument about not wanting to encourage the use of array comprehensions for complicated use cases. However I'm not sure how that squares with a distaste for nanny syntax restrictions! :)

# Brendan Eich (12 years ago)

Jason Orendorff wrote:

On Sat, Sep 22, 2012 at 11:21 AM, Brendan Eich<brendan at mozilla.org>
wrote:

Perhaps the thing to do is keep to Python: for+ if? if you get my pidgin-EBNF.

But that isn't Python's syntax. Python's comprehensions are: for (for|if)*

Oh! I didn't know you could put if in the middle (or I forgot, more likely...).

Haskell's comprehensions are: (for|if|let)* This is what I would prefer for JS.

Clojure's comprehensions are: for+ (if|while|let)* Clojure puts the expression at the right, which I like, for the reasons Allen mentioned.

We could do whatever we like but it will be a pain to parse both old and new in SpiderMonkey!

Actually I think we should not change lightly, not just based on SpiderMonkey, but on SM + Rhino + (Python being closer to JS in community intersection size than Clojure).

CoffeeScript's comprehensions are: for when? Only a single 'for' clause. It can't be used to flatten an array of arrays.

Yeah, not as strong a precedent -- we should treat JS as big brother here, over time.

I actually kind of like Allen's argument about not wanting to encourage the use of array comprehensions for complicated use cases. However I'm not sure how that squares with a distaste for nanny syntax restrictions!

Yup. I thought perhaps Allen left out a "not" or otherwise inverted his meaning, though.

# Allen Wirfs-Brock (12 years ago)

On Sep 22, 2012, at 10:39 AM, Brendan Eich wrote:

Jason Orendorff wrote:

...

Yeah, not as strong a precedent -- we should treat JS as big brother here, over time.

I actually kind of like Allen's argument about not wanting to encourage the use of array comprehensions for complicated use cases. However I'm not sure how that squares with a distaste for nanny syntax restrictions!

Yup. I thought perhaps Allen left out a "not" or otherwise inverted his meaning, though.

I don't think so:

My personal bias, is that comprehensions are just sugar that are best used to express relatively simple and common construction use cases. Complicated sequences of |for| and |if| clauses will be rarely seen and hence less understandable than the equivalent explicit looping expansions.

My personal preference would be:

  1. don't have comprehensions at all (but that's not harmonious, so I'm not actually suggesting it)
  2. a) one or more fir clauses followed by a single optional if clause (what is currently in the draft)
# Brendan Eich (12 years ago)

Allen Wirfs-Brock wrote:

On Sep 22, 2012, at 10:39 AM, Brendan Eich wrote:

Jason Orendorff wrote:

... Yeah, not as strong a precedent -- we should treat JS as big brother here, over time.

I actually kind of like Allen's argument about not wanting to encourage the use of array comprehensions for complicated use cases. However I'm not sure how that squares with a distaste for nanny syntax restrictions! Yup. I thought perhaps Allen left out a "not" or otherwise inverted his meaning, though.

I don't think so:

My personal bias, is that comprehensions are just sugar that are best used to express relatively simple and common construction use cases. Complicated sequences of |for| and |if| clauses will be rarely seen and hence less understandable than the equivalent explicit looping expansions.

Ok, but this doesn't help get your solution to the inherent tension between "not using comprehensions for complicated cases" vs. "distaste for nanny syntax restrictions." :-|

  1. a) one or more [for] clauses followed by a single optional if clause (what is currently in the draft)

How is this not a nanny syntax restriction? The desugaring works without issue for either of

[xy for x in range(XDIM) for y in range(YDIM)if x & 1 ] [xy for x in range(XDIM) if x & 1 for y in range(YDIM)]

but the last has the virtue of skipping the y iteration for even x values.

# Allen Wirfs-Brock (12 years ago)

On Sep 22, 2012, at 2:52 PM, Brendan Eich wrote:

Allen Wirfs-Brock wrote:

My personal bias, is that comprehensions are just sugar that are best used to express relatively simple and common construction use cases. Complicated sequences of |for| and |if| clauses will be rarely seen and hence less understandable than the equivalent explicit looping expansions.

Ok, but this doesn't help get your solution to the inherent tension between "not using comprehensions for complicated cases" vs. "distaste for nanny syntax restrictions." :-|

My nanny remark was in regard to Jason's suggestion of requiring a ConditionalExpression rather than an AssignmentExpression in if clauses. eg, forbidding: if x=3

  1. a) one or more [for] clauses followed by a single optional if clause (what is currently in the draft)

actually what I meant to say was that personal preference for ES6 comprehensions, in increasing complexity order is:

  1. [a non-starter alternative]
  2. a single for clause followed by a single optional if clause
  3. one or more for clauses followed by a single optional if clause (what is in the current draft)
  4. a sequence where each element consists of a for clause and an optional if clause

How is this not a nanny syntax restriction? The desugaring works without issue for either of

[xy for x in range(XDIM) for y in range(YDIM)if x & 1 ] [xy for x in range(XDIM) if x & 1 for y in range(YDIM)]

but the last has the virtue of skipping the y iteration for even x values.

(don't you need to use of instead of in)

I don't think this is a nanny syntax issue as it doesn't involve trying to make it impossible to express something that is likely erroneous. Instead it is a simplicity issue. "a sequence of for clauses followed by an optional if clause" is strictly simpler to express and understand than "a sequence where each element consists of a for clause and an optional if clause". If use of non-trailing if clause is quite rare, then it may be adding unneeded complexity to the language to support them, since there are already other ways to express the same rare thing.

As all of these comprehension forms are supported in various other language, it is likely that we could do some data mining and find out occurrence frequencies in real world corpuses . If non-trailing if clauses are rare we probably don't need them?

# Brendan Eich (12 years ago)

Allen Wirfs-Brock wrote:

On Sep 22, 2012, at 2:52 PM, Brendan Eich wrote:

Allen Wirfs-Brock wrote:

My personal bias, is that comprehensions are just sugar that are best used to express relatively simple and common construction use cases. Complicated sequences of |for| and |if| clauses will be rarely seen and hence less understandable than the equivalent explicit looping expansions. Ok, but this doesn't help get your solution to the inherent tension between "not using comprehensions for complicated cases" vs. "distaste for nanny syntax restrictions." :-|

My nanny remark was in regard to Jason's suggestion of requiring a ConditionalExpression rather than an AssignmentExpression in if clauses. eg, forbidding: if x=3

This is far less nanny-ish and better motivated than preventing interleaved 'if's.

The desugaring of comprehensions is straightforward. Nannying over it in one case ('if' in middle) but not another ('if' at end) is inconsistent as well as nanny-ish. The expression grammar in the 'if' condition is a subtler matter.

  1. a) one or more [for] clauses followed by a single optional if clause (what is currently in the draft)

actually what I meant to say was that personal preference for ES6 comprehensions, in increasing complexity order is:

  1. [a non-starter alternative]

It's pretty disharmonious, also not respectful of the champions model, to keep on like this :-(.

  1. a single for clause followed by a single optional if clause

"Two-dimensional" flattened comprehensions are useful and Python supports them, and so do SpiderMonkey and Rhino.

Given 'if' and the end, the two or more 'for' heads mean 'if' in the middle is necessary to avoid unwanted iteration in some cases.

  1. one or more for clauses followed by a single optional if clause (what is in the current draft)

Yes, but your listing preferences or what's current says nothing about why anyone should prefer what you say you prefer!

  1. a sequence where each element consists of a for clause and an optional if clause

What do you mean by "sequence"?

But let's not digress. This is a precedent-free and anti-champion notion.

How is this not a nanny syntax restriction? The desugaring works without issue for either of

[xy for x in range(XDIM) for y in range(YDIM)if x& 1 ] [xy for x in range(XDIM) if x& 1 for y in range(YDIM)]

but the last has the virtue of skipping the y iteration for even x values.

(don't you need to use of instead of in)

(Yes, this is something I'm going to have to validate in a JS shell and then make paren-free -- I write Python by default for comprehensions.)

I don't think this is a nanny syntax issue as it doesn't involve trying to make it impossible to express something that is likely erroneous.

Neither does a ConditionalExpression rather than Expression for 'if'!

Your argument that one can always write loops out cuts both ways. Use it consistently, please.

Instead it is a simplicity issue. "a sequence of for clauses followed by an optional if clause" is strictly simpler to express and understand than "a sequence where each element consists of a for clause and an optional if clause".

Simpler counting what complexity beans? Either way:

[xy for x of range(XDIM) for y of range(YDIM) if x& 1] [xy for x of range(XDIM) if x& 1 for y of range(YDIM)]

we have two 'for's and one 'if'. Either way the desugaring or mental translation after the comprehension expression on the left is purely left-to-right. Same bean-count.

Specifiy your beans!

If use of non-trailing if clause is quite rare, then it may be adding unneeded complexity to the language to support them, since there are already other ways to express the same rare thing.

Arguing about rarity is not arguing about complexity. You shifted arguments, but the draft spec has trailing 'if' and that's a sunk-bean-cost (at least by you, the editor -- not by Jason the champion-apparent after me).

As all of these comprehension forms are supported in various other language, it is likely that we could do some data mining and find out occurrence frequencies in real world corpuses .

Jason already did look at other languages for grammar. Surveying code is hard since much is firewalled. This is not harmonious anyway. Why are you editing anything other than the champion's proposal?

To drop the Harmony process discussion (which I think is critically important), what's the complexity argument? Two 'for's and one 'if', at end or middle does not matter to the desugaring rules.

# Claus Reinke (12 years ago)
  1. yes to (for|if)* - also for generator expressions

  2. Array comprehensions with for/if do not look very readable - the generators and filters seem to merge into one blob of text.

[x*y for x of range(XDIM) if x&1 for y of range(YDIM)]

While syntax highlighting and/or formatting can help
to make the individual items stand out:

[x*y for x of range(XDIM)
        if x&1
        for y of range(YDIM)]

There may be syntactic alternatives that make it easier to
spot and separate the generators and filters. ES syntax
tweaking is notoriously tricky, but perhaps something like

[ x*y : x of range(XDIM); x&1; y of range(YDIM) ]   

Claus

# Jason Orendorff (12 years ago)

On Sat, Sep 22, 2012 at 11:57 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

actually what I meant to say was that personal preference for ES6 comprehensions, in increasing complexity order is:

  1. [a non-starter alternative]
  2. a single for clause followed by a single optional if clause
  3. one or more for clauses followed by a single optional if clause (what is in the current draft)
  4. a sequence where each element consists of a for clause and an optional if clause

We disagree on what notion of complexity is important here.

The amount of implementation complexity we're talking about here is peanuts. I can implement all the options we've discussed by lunchtime. It's just not an issue. I can fit all the proposals in a tweet using a little BNF:

1. nothing  2. for if?  3. for+ if?  4. (for if?)*
5. for (for|if)*  6. (for|if|let)*

(I include your proposal 4 for completeness, but no one has really proposed that. 5 is Python and 6 is Haskell--the two proposals I favor.)

But what is actually at issue here is how JS should treat the developer who writes this:

books = [book  for author in authors
                 if author.home_state == 'TN'
                   for book in author.books()];

And the options here are (1) run it; (2) throw a SyntaxError. If it throws, the developer will be surprised. They will have to rephrase their thought to fit our syntactic whims; and they will have to add a rule to their mental model of the language. I claim this is the notion of complexity we should worry about.

# Erik Arvidsson (12 years ago)

That should throw a syntax error since it uses 'in' instead of 'of'.

Other than that I'm in favor of proposal 5 (for (for | if)*).

If we go with the current ES6 draft and you used 'of' instead of 'in' it would have been a syntax error.

# Brendan Eich (12 years ago)

Erik Arvidsson wrote:

That should throw a syntax error since it uses 'in' instead of 'of'.

LOL! You can tell Jason is a Pythonista (or Pythonist?). I guess I am too, slightly.

Anyway, we'll figure out how to write 'of' not 'in' when the file name ends in .js!

# Jason Orendorff (12 years ago)

Erik Arvidsson wrote:

That should throw a syntax error since it uses 'in' instead of 'of'.

D'oh!

Brendan Eich wrote:

LOL! You can tell Jason is a Pythonista (or Pythonist?). I guess I am too, slightly.

Anyway, we'll figure out how to write 'of' not 'in' when the file name ends in .js!

My secret is out.

I see a few lines of Emacs Lisp in my future. :-|