May 24-26 rough meeting notes

# Waldemar (8 years ago)

A whole bunch of agenda reordering

Test262 report slideshow

Debate about who should host the test262 website Deferred discussion

[For the purposes of these notes, Harmony and ES.next are synonyms. We were using "advanced to Harmony" to mean "advanced to ES.next".]

"RegExps match reality": Waldemar: Omitting these error cases was deliberate in ES3. They shouldn't be in the base language any more than getYear or HTML string functions should be. If they go anywhere, it should be in Annex B. Debate about whether these are extensions or conflicts with the spec. MarkM: conflicts should be codified in the spec; extensions should not be normative. Debate about whether these are syntactic or semantic extensions and whether they fall into the chapter 16 exemption. Waldemar: even if they don't currently fall under the chapter 16 extension, we could make them fall under the chapter 16 extension.

Brendan, Allen: In the next spec we'll have an optional normative section of web reality extensions. Put these in that section, along with things like getYear and compile. Consesnsus reached on this approach for Harmony.

Lookbehind support is promoted to required normative.

Multiline regexps are out of Harmony. There is no currently written-up proposal.

Pragmas: Would rather have these than string literals. Don't require "use" to be a reserved word. Controversy about whether unrecognized pragmas should be ignored (Brendan: both ways bite back), but (I think) that's an unsolvable problem. Consensus on moving these to Harmony.

Versioning: <script type="application/ecmascript;version=6"> (RFC 4329)

use version 6; module LOL { ... } </script>

There are good reasons to have the metadata both externally (in the script tag) and internally (in the script). External versioning allows implementations to avoid fetching a script at all if they won't understand it. Internal versioning helps in the case where the external version is detached.

Brendan's idea: <script-if type=...> ... <script-elif type=...> ... <script-else> ... </script>

Consensus on moving some form of versioning into Harmony. The strawman is a bit light at this time, so no specifics yet.

MemberExpression <| ProtoLiteral syntax: Why can't ProtoLiteral be a variable? Could extend the syntax to allow expressions there, with <| doing a shallow copy. Copy internal properties as well? Not clear what that would mean for some objects such as dates.

Shallow copy is problematic if right side is a proxy. Would need a clone-with-prototype handler.

Waldemar: <| seems too ad-hoc. Would want a more general mechanism that also allows for creating an object that's born sealed.

Brendan: Object literal extension defaults seem ad-hoc. Constants (and maybe methods) shouldn't be configurable.

MarkM: "Mystery Meat" problem here. Not comfortable with Perlish "punctuation soup". Allen: Words are too verbose, which was the feedback from past meetings. Waldemar: Improve the usability. Would prefer to set configurability en masse on the properties of an object rather than having to mark each one. := should go back to being called "const" and should come with the right defaults so that no other modifiers are needed in the common case.

Discussion about dynamic vs. static super lookup. When a method is extracted, "super" used in a . (or []) expression stays bound while "this" is dynamic. Note that a bare "super" not in a . or [] expression means "this".

MarkM: Gave example of why super can't have simple dynamic semantics such as "start from the 2nd prototype". This causes infinite loops in class hierarchies. More complicated semantics where both a "this" and a "super" are passed in every call might be possible.

Brendan: super without class is too weak and causes problems in nested functions.

MarkM: super must be in a . or [] expression.

What should super.foo = 42 mean (where super is an lvalue)? What if foo is not an accessor?

Private names: Note that you now can't use the public name gotten from for-in or getOnwPropertyNames to index into the object. Should the name objects be reflectable via getOnwPropertyNames at all? Waldemar: Objects to leaking the presence of private names. Wants no reflection on private names, with proxies doing membranes via the getI/setI approach. Discussion about interaction of shallow cloning with private names. Advanced to Harmony, with the big open issue of reflecting on private names.

Classes: "constructor" is a contextually reserved word. Waldemar: Classes as an abstraction should have a clear way of (preferably enforceably) documenting the shape of instances: what properties they have, what they are (getters/setters/guard), etc. This proposal is too weak here. Waldemar: A const class must define its instance properties; otherwise the constructor won't be able to create them. MarkM: Instances aren't sealed until the constructor finishes. Waldemar: If it's the local constructor, that won't work. If it's the entire chain of constructors, then that can be indefinitely later, and the derived classes can muck with the unfrozen/unsealed contents of the instance. For example, if the attributes like const don't take effect until all constructors finish then a derived class can alter the value of this class's const instance fields.

MarkM: Wants clarity about scope and time of evaluation. Brendan: class C { get x() {return 42} x = []; static x = []; } Issue about puttig a mutable x = [] on the prototype and the instances colliding because they all try to insert elements into the same array rather than each instance having its own array.

Waldemar: Don't like the subversion of const away from its semantics in the rest of the language, where it means write-once-by-definition with a read barrier before the definition. Here a const field can be mutated many times even after the constructor finishes. That won't work for guards.

Brendan: Proposed correction: constructor(x, y) = { x: x|| default_x, y: ... } MarkM: Had earlier declarative instance construction proposal, but Allen convinced him that the imperative style is the most familiar to ECMAScript programmers. Allen: No I didn't. Declarative construction is important. Waldemar: There's someplace you need to specify instance attribute properties, and some of them can only be specified at the time the property is created. Also, need to be able to interleave declarative initialization of instance properties with computation of temporaries so declarative initializations can share work.

Discussion of older, closures-based version of the class proposal, which have a simple declarative syntax: doku.php?id=strawman:classes_with_trait_composition&rev=1299750065 This one is simple but defines properties only on class instances, not on the class constructor or prototype.

Mixed proposal: doku.php?id=strawman:classes_with_trait_composition&rev=1305428366 This one is more complicated because it allows definition of properties both on class instances and on the class constructor and prototype.

MarkM wrote a table comparing the three proposals:

                     Closures       Mixed             Separate

Class properties none static in class static in class Prototype properties none public in class decl in class Instance private lexical capture (in all) Class private inst private decl private in ctor expando only Public inst properties public decl public in ctor expando only Return override none none optional Constructor code yes yes yes Class code no yes no

expandos are dynamically created properties: this.foo=42

MarkM's preference is Closures (best), Mixed (middle), Separate (worst).

Group trying to come up with a variant of the Mixed proposal.

Brendan: What's the order of initialization in a class hierarchy? Declarative instance initialization should be able to rely on super instance initialization having run once they call the superconstructor.

Scoped Object Extensions: Waldemar: These use the same mechanisms as early ES4 namespaces. What happens when extensions conflict with each other? Can't be a compile-time error. Run-time error. Extension "objects" (perhaps better called Extension records) are themselves not extendable. Waldemar: What happens when you write to an extension property? Delete one? Proxy one? Peter: Make them immutable. Alex: Won't work. Such things in the prototype will inhibit shadowing. Peter: Make them writable but nonconfigurable. Waldemar: What happens when you have an extension A.count, and then code does A.count++? In the current semantics this creates an invisible non-extension property on A and doesn't work. Peter: You shouldn't be writing to extensions. Waldemar: We've already ruled out the immutable model. Can't change the get rule without also changing the put and related rules. DaveH: Let's look for alternatives. Could this be layered on top of private names or such? Waldemar: No. Debated extensions vs. private. Lexical restriction of extensions is a blessing as well as a curse. If a module extends Tree with a where method, that module can't call a Tree utility in another module that relies on "where". Waldemar: Membranes don't work. The proxy would need to get a hold of the lexical scope. getOwnProperties won't see extension properties because it's a function. Freezing won't work -- won't see extension properties because it's a function. DaveH: Agree that this solves an important problem, but the issues of the implications on the object model are severe. Sam: We'll need to reify GetOwnProperty for proxies to take lexical scopes as arguments. Brendan: Not just proxies. Proxies and put are sub-problems of the larger problem of making a significant change to the metaobject protocol. Luke: There are other things already promoted to Harmony that are as underspecified and cross-cutting as this. Brendan: The issue is that this proposal is so new. Waldemar: The issue is that this probosal is not new; over the years we've ran against deep problems when working in this area in the context of namespaces. Debate about the Harmony process. Brendan's summary of consensus: We'll try to get into ES.next but not promoted to Harmony yet. Need to address the problems with semantics and get some code experience.

Trademarks/guards: [didn't take detailed notes because I was driving discussion] Very long debate about merits and process. Not promoted to a proposal for Harmony. Dave: Insufficient time to experiment with this before 2013.

Random: First paragraph accepted, second not.

Quasis: Backslash sequences are not interpreted when passing to the quasitag function. The default is join, so: foo=${void 0}\n evaluates to "foo=\n". \u1234 evaluates to "\u1234". $\u1234 evaluates to "\u1234". Allen: Worried about yet another escaping syntax. Waldemar: Also unhappy about confusion of escaping syntax overlap. Example: foo = \$. RexExps containing $ characters would be misinterpreted. Waldemar: Alternate proposal: Use only the regular \ syntax for escaping. An unescaped $ followed by an identifier would turn into a substitution. All other $'s would be passed through literally. Also, it should be up to the quasi parser whether it gets the escaped or raw text (with the default being escaped). The quasi parser function could specify its desire via a property on the function. Devate about what could go into substitution bodies. Waldemar: Grammar is not lexable via the same kind of lexer technology used for the rest of the language; lexer is not a state machine any more. OK with allowing only sequences of dot-separated identifiers.

String formatting: Not on agenda, so not in Harmony. Brendan: We haven't got it together yet. It doesn't address injection attacks.

Classes and privacy revisited:

  • Return allowed?
  • Verbose repetition of public/private prefixes
  • One or two namespaces for instance variables
  • Require a prefix like "public" to define prototype data properties
  • Private proto-properties
  • private(this) vs. alternatives
  • Attribute controls
  • Half-override of get/set pair

Return allowed? No consensus. Return is useful for migrating current code and memoizing but conflicts with inheritance.

Verbose repetition of public/private prefixes: Agreed to solve this by allowing comma-separated declarations: public x=3, y=5;

One or two namespaces: Consensus on two.

Require a prefix like "public" to define prototype data properties: Consensus on yes.

Private proto-properties: Consensus on yes.

Long discussion about private instance variable access syntax. private(expr).x is ok as a long form for an arbitrary expression, but we need a shorthand for private(this).x. No consensus on which shorthand to use. Waldemar: Use just x as the shorthand (every private instance variable x would also join the lexical scope as sugar for private(this).x)?

Attribute controls: Allen has a proposal.

Half-override of get/set pair: Allen has a proposal.

Categorization: No review, no advance: Unicode Concurrency Enumeration Simple module functions

No review, advance: Multiple Globals Maps and Sets Math enhancements Function.toString Name property of functions

Review, advance: String Extras Array Comprehensions Completion Reform Classes Extended Object Literals (the goals were advanced, the syntax used by the strawman was not) Quasis

Review, no advance: Soft Fields Object Change Notifications Deferred Functions Guard Syntax Paren-Free Arrow Function/Block

Completion reform: This is a breaking change. There's no way to specify a language version for eval code, short of a pragma.

Extended Object Literals: Waldemar and others objected to most aspects of the current syntax; it produces a punctuation soup and possibly conflicts with guards.

Object Change Notifications: Luke: We tried and failed to make proxies work for this use case (as well as for copy-on-write). Waldemar: Why can't this be done using proxies? I understand why proxies can't support the API proposed for object change notifications, but why can't they solve the larger user need here? Luke: Too inefficient and would require full membranes to support object identity. Allen: This allows people to break into abstractions by putting observers on objects they don't own. DaveH: Proxies are deliberately less powerful than this in that they don't allow you to attach behaviors to arbitrary objects that you don't own. MarkM: Notification happening synchronously is a security problem. Observers can run at times when code doesn't expect malicious changes. Waldemar: The same argument applies to getters, setters, and proxies. You need to lock things down to guard against those anyway, and the same lockdown would presumably apply to observers. Cormac: Just hand around only the proxied versions of observable objects. No need for membranes. Sam: This would let you observe private names. Luke: No it wouldn't. It would only show alterations to properties whose names you can see via getOwnPropertyNames. MarkM: This could be done by turning the current true/false extensibility property into a tri-state true/false/observe property. Existing properties can be replaced with getter/setter pairs. The extensibility hook would be notified when new properties are created. Waldemar: This won't notify you of property deletes. A delete will delete a configurable getter/setter pair without notifying any observers. MarkM: Is observing deletion needed? Luke: Yes, particularly for arrays. Discussion about deferred vs. immediate notifications. Immediate is preferred. Brendan: This might be a better "watch". Cormac: There is a lot of overlap between this and proxies. Is there a way to do some synthesis? MarkM: This has a lot of open research issues. Cormac: Proxies may have flaws that this would fix. MarkM: We should work on this in parallel, so that this informs proxies. Not advanced to Harmony.

Quasis: $\u0061 is a reference to the variable <a>, not the string constant "a".

Waldemar: Why do we need variables with unguessable names? Resolved: Don't name these variables at all. They're still created at the top-level scope but don't have lexical scope names. Waldemar: What happens when you have an invalid escape sequence so you can't generate a decoded string to pass to the function (even though the function is only interested in the raw string)? Resolved: Modified the proposal to remove the interleaved arguments and instead put both the raw and the decoded strings on the frozen identity object. Decoded strings are missing if they don't decode. Advanced to Harmony.

Deferred functions: Discussion (and lots of confusion) about the semantics of the proposal. The Q class is not part of the proposal. Continuations are meant to be one-shot, possibly with reusing the same continuation object across calls. MarkM: There is also an error-handler in addition to the then-handler. MarkM: Issue with chaining values. Peter: Every time a function that contains an await statement is called, it returns an new Deferred object. The semantics of Deferred are built into the language.

class Deferred { then(callback) {this.callbacks.push(callback)} constructor() { this.callbacks = []; } callback(result) { for (var v: this.callbacks) { v(result); } this.callbacks = []; } }

Cormac: "then" never returns anything interesting. Waldemar: How do you await multiple things concurrently? A sequence of await statements will wait to launch the second until the first one is done. Peter: Call these multiple things without using await and then use a combinator to wait for all of the results. MarkM: This requires separate turns. Brendan: This is a syntax full of library choices that could be done in other ways that we should not be specifying yet. (Examples: Form of Deferred objects, turn issue.) Deferred implemented using Generators? Need a top-level dispatch function. Also generators don't let you return a value. DaveH and Brendan: Deferred is coexpressive with generators. However, there are policies underneath Deferred (scheduling etc.) that are premature to standardize. Solve this problem using generators as they are. Luke: Concerned that things we're shooting down have more value than things we've adopted. DaveH: Need to keep the pipeline going. Waldemar: How do you simulate return values using generators? DaveH: Use the last yield to pass the return value. The caller will then close the generator. Not advanced to Harmony.

Guard Syntax: Technical aspects ok, but some are worried about the effects of this change on the ECMAScript style. Cormac's student Tim will work with DaveH to prototype this this summer. Not advanced to Harmony, pending more user experience.

Paren-free: Compatibility issue with: if (cond) a = b; catch, for, for-in heads must not be parenthesized. MarkM: Cost of incompatibility is too high, particularly for things like for(;;) heads. Would prefer to have full upwards compatibility, including old semantics for for(a in b). Several others: Don't want significnt parentheses. MarkM: We must not make it impossible to write code that runs on both old and new syntax. Not advanced to Harmony.

Arrow Function/Block: function f() { a.forEach({| | return 3}); } The return will return out of f. Note also that the implementation of forEach could have a try-finally statement that catches and revokes the return. This kind of cross-function return catching is new. The cage around block lambda is too tight. Luke: Concern about call syntax strangeness. This kind of syntax only works if it's designed holistically. Debate about completion value leaks. Waldemar: Use of | conflicts with common use of trademarks. Alex: Objects to new "little pieces of cleverness". Too many things to teach to people. Not advanced to Harmony.

Next meeting two days Jul 27-28.

# David Bruant (8 years ago)

Le 27/05/2011 01:22, Waldemar a écrit :

Categorization: No review, no advance: Unicode Concurrency Enumeration Simple module functions

No review, advance: Multiple Globals Maps and Sets Math enhancements Function.toString Name property of functions

Review, advance: String Extras Array Comprehensions Completion Reform Classes Extended Object Literals (the goals were advanced, the syntax used by the strawman was not) Quasis

Review, no advance: Soft Fields Object Change Notifications Deferred Functions Guard Syntax Paren-Free Arrow Function/Block

I do not see anything about proxies. If I remember well, a lot of decisions had been delayed from the previous meeting to this one. Did something happened about proxies?

Thanks for these notes and I'm glad to see ECMAScript moving. So exciting :-)

# Brendan Eich (8 years ago)

On May 27, 2011, at 1:39 AM, David Bruant wrote:

I do not see anything about proxies. If I remember well, a lot of decisions had been delayed from the previous meeting to this one. Did something happened about proxies?

Tom kindly agreed to move these issues to the July meeting, since Proxies are in Harmony and ES.next and we needed the time this meeting to go over strawmen.

# Brendan Eich (8 years ago)

On May 26, 2011, at 4:22 PM, Waldemar wrote:

Arrow Function/Block: function f() { a.forEach({| | return 3}); } The return will return out of f. Note also that the implementation of forEach could have a try-finally statement that catches and revokes the return. This kind of cross-function return catching is new.

And some on TC39 <3 this "Tennent sequel" feature, to quote dherman. Others cited the new as potentially too much for average users to grok. No one hated it overtly.

The cage around block lambda is too tight. Luke: Concern about call syntax strangeness. This kind of syntax only works if it's designed holistically. Debate about completion value leaks. Waldemar: Use of | conflicts with common use of trademarks. Alex: Objects to new "little pieces of cleverness". Too many things to teach to people. Not advanced to Harmony.

More was said here that is good feedback for Harmony, no matter what gets into ES.next.

We talked about how shorter function syntax is hard to do well and standardize. The traps include:

We didn't get our act together in time to get shorter function syntax into ES.next at this meeting, which I regard as a personal failure in part. But we will keep trying. It's important, as Alex Russell argued.

I took a straw poll:

Block lambda revival with feedback issues fixed, in favor (whether 2nd or 1st choice): 6 hands up.

Arrow function syntax, with grammar formalism addressed: 8 hands up.

There can be only one (of the above): 9 hands up.

Therefore I'll work on the strawman:arrow_function_syntax strawman (I'll update strawman:block_lambda_revival based on feedback as well, to keep it up to date).

Peter Hallam kindly offered to help come up with a new grammar formalism for the spec that can pass the "Waldemar test" (if that is possible; not as hard as the Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman to Traceur (code.google.com/p/traceur-compiler). We talked about Narcissus support too, to get more user testing.

User testing would be most helpful in providing negative results, for usability or any other bug (parsing complexity, etc.). To get arrow function syntax into Harmony we still need a non-LR(1) approach, because the LR(1) way parses too broad (or even the wrong) a cover grammar: Expression as the arrow formal parameter list.

So more to do for arrows, but that seems like the winning direction, with block lambdas a close second.

# Jorge (8 years ago)

On 27/05/2011, at 11:01, Brendan Eich wrote:

On May 26, 2011, at 4:22 PM, Waldemar wrote:

Arrow Function/Block: function f() { a.forEach({| | return 3}); } The return will return out of f. Note also that the implementation of forEach could have a try-finally statement that catches and revokes the return. This kind of cross-function return catching is new.

And some on TC39 <3 this "Tennent sequel" feature, to quote dherman. Others cited the new as potentially too much for average users to grok. No one hated it overtly.

It's not that "it's too much to grok", it's that as I like that (blocks) syntax so much, I'd prefer to use it for (shorter) functions (syntax) instead of the (ugly, extraneous, imho) "arrow syntax proposal", not for blocks.

Also, I wonder if in order to make blocks first class, do we need any new syntax ?

function f() { a.forEach({ return 3 }); }

?

# Jorge (8 years ago)

On 27/05/2011, at 11:36, Jorge wrote:

On 27/05/2011, at 11:01, Brendan Eich wrote:

On May 26, 2011, at 4:22 PM, Waldemar wrote:

Arrow Function/Block: function f() { a.forEach({| | return 3}); } The return will return out of f. Note also that the implementation of forEach could have a try-finally statement that catches and revokes the return. This kind of cross-function return catching is new.

And some on TC39 <3 this "Tennent sequel" feature, to quote dherman. Others cited the new as potentially too much for average users to grok. No one hated it overtly.

It's not that "it's too much to grok", it's that as I like that (blocks) syntax so much, I'd prefer to use it for (shorter) functions (syntax) instead of the (ugly, extraneous, imho) "arrow syntax proposal", not for blocks.

Also, I wonder if in order to make blocks first class, do we need any new syntax ?

function f() { a.forEach({ return 3 }); }

?

I have edited (a copy of) the arrow_function_syntax strawman wiki page, to see side-by-side the current function syntax, the arrow syntax and the blocks (applied to functions) syntax:

jorgechamorro.com/blocks.html

# Brendan Eich (8 years ago)

On May 27, 2011, at 2:36 AM, Jorge wrote:

Also, I wonder if in order to make blocks first class, do we need any new syntax ?

function f() { a.forEach({ return 3 });

The problem is that a block statement is ambiguous with an object initialiser. See strawman:arrow_function_syntax#grammar_changes in particular the "To enable unparenthesized ObjectLiteral expressions as bodies of arrow functions, without ambiguity with Block bodies, restrict LabelledStatement as follows..." section.

# Rick Waldron (8 years ago)

I have no intention of bike-shedding, but the following example keeps popping up:

Arrow Function/Block: function f() { a.forEach({| | return 3}); }

...And I wonder if forEach was only used to illustrate an example of the block lambda syntax? If this was meant to serve as an actual representation of a real world use case, I would be negligent if I didn't speak up and note that the use of "a.forEach({| | return 3});" is incorrect. a.forEach( callback ) returns undefined per spec.

Forgive me if I've misunderstood the use of "forEach" in this example.

# Brendan Eich (8 years ago)

This example was all about return in block-lambda returning from enclosing function (the function f). That's all. Yeah, it was not useful code ;-).

# Waldemar Horwat (8 years ago)

On 05/27/11 02:01, Brendan Eich wrote:

More was said here that is good feedback for Harmony, no matter what gets into ES.next.

We talked about how shorter function syntax is hard to do well and standardize. The traps include:

We didn't get our act together in time to get shorter function syntax into ES.next at this meeting, which I regard as a personal failure in part. But we will keep trying. It's important, as Alex Russell argued.

I took a straw poll:

Block lambda revival with feedback issues fixed, in favor (whether 2nd or 1st choice): 6 hands up.

Arrow function syntax, with grammar formalism addressed: 8 hands up.

There can be only one (of the above): 9 hands up.

Therefore I'll work on the strawman:arrow_function_syntax strawman (I'll update strawman:block_lambda_revival based on feedback as well, to keep it up to date).

What that poll didn't indicate is the soundness of the reasons for choosing one or the other. "I slightly prefer A to B" is different from "if we choose B then we'll need to spend months coming up with a new formalism for the spec."

Peter Hallam kindly offered to help come up with a new grammar formalism for the spec that can pass the "Waldemar test" (if that is possible; not as hard as the Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman to Traceur (code.google.com/p/traceur-compiler). We talked about Narcissus support too, to get more user testing.

If we need to come up with a new formalism, that's a very powerful signal that there's something seriously flawed in the design. Even if it happens to work now, it will produce surprises down the road as we try to extend the expression or parameter grammar. The places where the grammar is not LR(1) up in C++ are some of the most frustrating and surprising ones for users to deal with, and C++ does not even have the feedback from the parser to the lexer. Perl does and its grammar is both ambiguous and undecidable as a result. Note that implementations of Perl exist, which in this case simply means that the documented Perl "spec" is not sound or faithful -- all implementations are in fact taking shortcuts not reflected in the spec.

 Waldemar
# Brendan Eich (8 years ago)

On May 27, 2011, at 12:27 PM, Waldemar Horwat wrote:

Peter Hallam kindly offered to help come up with a new grammar formalism for the spec that can pass the "Waldemar test" (if that is possible; not as hard as the Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman to Traceur (code.google.com/p/traceur-compiler). We talked about Narcissus support too, to get more user testing.

If we need to come up with a new formalism, that's a very powerful signal that there's something seriously flawed in the design.

Or the spec.

LR(1) is good, I like it, but all the browser JS implementations, and Rhino, use top-down hand-crafted parsers, even though JS is not LL(1). That is a big disconnect between spec and reality.

As you've shown these can look good but be future hostile or downright buggy, so we need a formalism that permits mechanical checking for ambiguities. We don't want two ways to parse a sentence in the language.

But this does not mean we must stick with LR(1).

Even if it happens to work now, it will produce surprises down the road as we try to extend the expression or parameter grammar. The places where the grammar is not LR(1) up in C++ are some of the most frustrating and surprising ones for users to deal with, and C++ does not even have the feedback from the parser to the lexer. Perl does and its grammar is both ambiguous and undecidable as a result. Note that implementations of Perl exist, which in this case simply means that the documented Perl "spec" is not sound or faithful -- all implementations are in fact taking shortcuts not reflected in the spec.

The problem is we are already cheating.

AssignmentExpression : ConditionalExpression LeftHandSideExpression = AssignmentExpression LeftHandSideExpression AssignmentOperator AssignmentExpression

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

Building on this, destructuring assignment parses more of what was formerly rejected by semantic checking: {p: q} = o destructures o.p into q (which must be declared in Harmony -- it is an error if no such q was declared in scope).

We can certainly write semantic rules for destructuring to validate the object literal as an object pattern; ditto arrays. But the LR(1) grammar is not by itself valid specifying sentences in the language, just as it did not all these years for assignment expressions.

Now, for arrow functions (you already know this, just reciting for the es-discuss list) we could parse the ArrowFormalParameters : Expression and then write semantics to validate that comma expression as arrow function formal parameters.

Right now, the expression grammar and the formal parameter list grammar are "close". They have already diverged in Harmony due to rest and spread not being lookalikes: spread (harmony:spread) allows ... AssignmentExpression while rest wants only ... Identifier.

But we still can cope: the Expression grammar is a cover grammar for FormalParameterList.

Of course, the two sub-grammars may diverge in a way we can't parse via parsing a comma expression within the parentheses that come before the arrow. Guards seem like they will cause the parameter syntax to diverge, unless you can use them in expressions (not in the strawman).

The conclusion I draw from these challenges, some already dealt with non-grammatically by ES1-5, is that we should not make a sacred cow out of LR(1). We should be open to a formalism that is as checkable for ambiguities, and that can cope with the C heritage we already have (assignment expressions), as well as new syntax.

# Waldemar Horwat (8 years ago)

On 05/27/11 16:00, Brendan Eich wrote:

On May 27, 2011, at 12:27 PM, Waldemar Horwat wrote:

Peter Hallam kindly offered to help come up with a new grammar formalism for the spec that can pass the "Waldemar test" (if that is possible; not as hard as the Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman to Traceur (code.google.com/p/traceur-compiler). We talked about Narcissus support too, to get more user testing.

If we need to come up with a new formalism, that's a very powerful signal that there's something seriously flawed in the design.

Or the spec.

LR(1) is good, I like it, but all the browser JS implementations, and Rhino, use top-down hand-crafted parsers, even though JS is not LL(1). That is a big disconnect between spec and reality.

As you've shown these can look good but be future hostile or downright buggy, so we need a formalism that permits mechanical checking for ambiguities. We don't want two ways to parse a sentence in the language.

But this does not mean we must stick with LR(1).

Even if it happens to work now, it will produce surprises down the road as we try to extend the expression or parameter grammar. The places where the grammar is not LR(1) up in C++ are some of the most frustrating and surprising ones for users to deal with, and C++ does not even have the feedback from the parser to the lexer. Perl does and its grammar is both ambiguous and undecidable as a result. Note that implementations of Perl exist, which in this case simply means that the documented Perl "spec" is not sound or faithful -- all implementations are in fact taking shortcuts not reflected in the spec.

The problem is we are already cheating.

/AssignmentExpression/ : /ConditionalExpression/ /LeftHandSideExpression/ = /AssignmentExpression/ /LeftHandSideExpression/ /AssignmentOperator/ /AssignmentExpression/

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

Building on this, destructuring assignment parses more of what was formerly rejected by semantic checking: {p: q} = o destructures o.p into q (which must be declared in Harmony -- it is an error if no such q was declared in scope).

We can certainly write semantic rules for destructuring to validate the object literal as an object pattern; ditto arrays. But the LR(1) grammar is not by itself valid specifying sentences in the language, just as it did not all these years for assignment expressions.

Now, for arrow functions (you already know this, just reciting for the es-discuss list) we could parse the /ArrowFormalParameters/ : /Expression/ and then write semantics to validate that comma expression as arrow function formal parameters.

Right now, the expression grammar and the formal parameter list grammar are "close". They have already diverged in Harmony due to rest and spread not being lookalikes: spread (harmony:spread) allows ... /AssignmentExpression/ while rest wants only ... /Identifier/.

But we still can cope: the /Expression/ grammar is a cover grammar for /FormalParameterList/.

Of course, the two sub-grammars may diverge in a way we can't parse via parsing a comma expression within the parentheses that come before the arrow. Guards seem like they will cause the parameter syntax to diverge, unless you can use them in expressions (not in the strawman).

The conclusion I draw from these challenges, some already dealt with non-grammatically by ES1-5, is that we should not make a sacred cow out of LR(1). We should be open to a formalism that is as checkable for ambiguities, and that can cope with the C heritage we already have (assignment expressions), as well as new syntax.

Given that LR(1) is the most general grammar available before you start getting into serious complexity (it subsumes LALR and other commonly studied grammars), there is a big cliff here and I think it's foolish to plan to jump off it without completely understanding the consequences. This is especially true because there are other paths available for compact function syntax that do not involve jumping off that cliff. I realize that C++ and Perl put up with ambiguity, and it seriously bites them. Quick, what's the difference between the following in C++?

int x(int()); int x(-int());

 Waldemar
# Sam Tobin-Hochstadt (8 years ago)

On Fri, May 27, 2011 at 9:20 PM, Waldemar Horwat <waldemar at google.com> wrote:

Given that LR(1) is the most general grammar available before you start getting into serious complexity (it subsumes LALR and other commonly studied grammars),

This is simply begging the question on what is "serious complexity". There are plenty of parsers for widely-used programming languages (other than Perl and C++) which use lookahead greater than 1, for example.

# Brendan Eich (8 years ago)

On May 27, 2011, at 6:20 PM, Waldemar Horwat <waldemar at google.com> wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.

Given that LR(1) is the most general grammar available before you start getting into serious complexity (it subsumes LALR and other commonly studied grammars),

Again, all real JS engines use top-down parsers, including v8.

there is a big cliff here and I think it's foolish to plan to jump off it without completely understanding the consequences.

Agreed. Let's understand them, or use Expression as a cover grammar -- just like LetHandSideExpression (which produces NewExpression, a real jswtf).

This is especially true because there are other paths available for compact function syntax that do not involve jumping off that cliff.

Such as?

I realize that C++ and Perl put up with ambiguity, and it seriously bites them. Quick, what's the difference between the following in C++?

int x(int()); int x(-int());

Yet C++ is wildly successful inside Google and outside. Trade-offs...

# Brendan Eich (8 years ago)

On May 27, 2011, at 6:42 PM, Brendan Eich wrote:

On May 27, 2011, at 6:20 PM, Waldemar Horwat <waldemar at google.com> wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.

Again, real browser JS engines use top-down parsing and no Reference type, instead specializing the AST for LeftHandSideExpressions to be precise, with early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.

This is not an argument to remove Reference types from the spec. Maybe we should if we get a better parsing algorithm and grammar. There is a web of trade-offs.

I realize that C++ and Perl put up with ambiguity, and it seriously bites them. Quick, what's the difference between the following in C++?

int x(int()); int x(-int());

Yet C++ is wildly successful inside Google and outside. Trade-offs...

I hasten to add that I'm not endorsing the undecideable crazyland of C++ syntax. Simply pointing to trade-offs taken differently in other languages that are surely successful in spite of lack of precise-enough LR(1) (cover) grammars.

# Jorge (8 years ago)

On 27/05/2011, at 12:24, Brendan Eich wrote:

On May 27, 2011, at 2:36 AM, Jorge wrote:

Also, I wonder if in order to make blocks first class, do we need any new syntax ?

function f() { a.forEach({ return 3 });

The problem is that a block statement is ambiguous with an object initialiser. See strawman:arrow_function_syntax#grammar_changes in particular the "To enable unparenthesized ObjectLiteral expressions as bodies of arrow functions, without ambiguity with Block bodies, restrict LabelledStatement as follows..." section.

As labels are seldom used in JS, perhaps the easiest way to avoid ambiguities would be to forbid blocks from beginning with a label ?

Would it be too much (or too ugly) to require the programmer to disambiguate (only) in this corner case ?

An object: { label: expression }

An error: { label: return 5 }

A block: { ; label: expression }

A block: { noLabelHere ... }

# Brendan Eich (8 years ago)

We didn't talk about this change. It is yet another migration early-error to consider. It's certainly simpler than a more powerful parsing algorithm than LR(1), but we might want to cross that bridge anyway for arrow functions. If we succeed there, we may not need such an incompatible restriction on labels.

# Peter Michaux (8 years ago)

On Thu, May 26, 2011 at 4:22 PM, Waldemar <waldemar at google.com> wrote:

Versioning: <script type="application/ecmascript;version=6"> (RFC 4329) use version 6; module LOL { ... } </script> There are good reasons to have the metadata both externally (in the script tag) and internally (in the script). External versioning allows implementations to avoid fetching a script at all if they won't understand it. Internal versioning helps in the case where the external version is detached.

Brendan's idea: <script-if type=...> ... <script-elif type=...> ... <script-else> ... </script>

Consensus on moving some form of versioning into Harmony. The strawman is a bit light at this time, so no specifics yet.

A lot of the above looks like HTML. Isn't versioning that depends on HTML out of scope for the ECMAScript standard?

Peter

# Brendan Eich (8 years ago)

On May 28, 2011, at 11:32 AM, Peter Michaux wrote:

On Thu, May 26, 2011 at 4:22 PM, Waldemar <waldemar at google.com> wrote:

Versioning: <script type="application/ecmascript;version=6"> (RFC 4329) use version 6; module LOL { ... } </script> There are good reasons to have the metadata both externally (in the script tag) and internally (in the script). External versioning allows implementations to avoid fetching a script at all if they won't understand it. Internal versioning helps in the case where the external version is detached.

Brendan's idea: <script-if type=...> ... <script-elif type=...> ... <script-else> ... </script>

www.mail-archive.com/[email protected]/msg05005.html had the example I was trying to reconstruct from memory at last week's meeting:

<script-if type="application/ecmascript;version=6"> // new.js inline-exanded here </script-if else> <script ...> </script> </script-if end>

Consensus on moving some form of versioning into Harmony. The strawman is a bit light at this time, so no specifics yet.

A lot of the above looks like HTML. Isn't versioning that depends on HTML out of scope for the ECMAScript standard?

Yes, so? Call the jusdiction police :-P. We were talking about a "systems" problem, which requires looking across layers and considering the big picture.

At the meeting, Mark Miller suggested we take the idea of <script-if> to the public-script-coord mailing list. I'll do that next week.

# Brendan Eich (8 years ago)

On May 28, 2011, at 12:57 PM, Brendan Eich wrote:

On May 28, 2011, at 11:32 AM, Peter Michaux wrote:

A lot of the above looks like HTML. Isn't versioning that depends on HTML out of scope for the ECMAScript standard?

Yes, so? Call the jusdiction police :-P.

Er, "jurisdiction".

The versioning issue

# Thaddee Tyl (8 years ago)

Date: Sat, 28 May 2011 12:57:04 -0700 From: Brendan Eich <brendan at mozilla.com> www.mail-archive.com/[email protected]/msg05005.html had the example I was trying to reconstruct from memory at last week's meeting:

<script-if type="application/ecmascript;version=6">  // new.js inline-exanded here </script-if else>  <script ...>  </script> </script-if end>

Consensus on moving some form of versioning into Harmony.  The strawman is a bit light at this time, so no specifics yet.

A lot of the above looks like HTML. Isn't versioning that depends on HTML out of scope for the ECMAScript standard?

Yes, so? Call the jusdiction police :-P. We were talking about a "systems" problem, which requires looking across layers and considering the big picture.

At the meeting, Mark Miller suggested we take the idea of <script-if> to the public-script-coord mailing list. I'll do that next week.

It really needs to be discussed by the whatwg however.

# Brendan Eich (8 years ago)

On May 29, 2011, at 6:45 AM, Thaddee Tyl wrote:

Consensus on moving some form of versioning into Harmony. The strawman is a bit light at this time, so no specifics yet.

A lot of the above looks like HTML. Isn't versioning that depends on HTML out of scope for the ECMAScript standard?

Yes, so? Call the jusdiction police :-P. We were talking about a "systems" problem, which requires looking across layers and considering the big picture.

At the meeting, Mark Miller suggested we take the idea of <script-if> to the public-script-coord mailing list. I'll do that next week.

It really needs to be discussed by the whatwg however.

Oh really? Why exactly is that? Note that I'm a founder of whatwg.org.

The plan of record that I cited and plan to use regarding this kind of JS/HTML/DOM cross-cutting concern is to mail to public-script-coord at w3.org. That ought to be enough to start engaging the several interested groups.

# Thaddee Tyl (8 years ago)

Don't be upset!

I just believe that new HTML syntax would be better off in the HTML living standard. More people read it, more people contribute to correcting its bugs. Getting involved in it can only be beneficial.

# Brendan Eich (8 years ago)

On May 29, 2011, at 12:55 PM, Thaddee Tyl wrote:

Don't be upset!

Not at all, I'm simply skeptical (and saucy in saying so) about jurisdictional fights this early in thinking creatively about cross-cutting solutions. Seems kind of silly to call process police, doesn't it?

I just believe that new HTML syntax would be better off in the HTML living standard. More people read it, more people contribute to correcting its bugs. Getting involved in it can only be beneficial.

Yes, I agree, and public-script-coord at w3.org is read by all the best HTML5/HTML/W3C/WHATWG people, whatever their w3.org, whatwg.org, or other affiliations.

# Thaddee Tyl (8 years ago)

On Sun, May 29, 2011 at 10:00 PM, Brendan Eich <brendan at mozilla.com> wrote:

On May 29, 2011, at 12:55 PM, Thaddee Tyl wrote:

Don't be upset!

Not at all, I'm simply skeptical (and saucy in saying so) about jurisdictional fights this early in thinking creatively about cross-cutting solutions. Seems kind of silly to call process police, doesn't it?

No fight intended.

I just believe that new HTML syntax would be better off in the HTML living standard. More people read it, more people contribute to correcting its bugs. Getting involved in it can only be beneficial.

Yes, I agree, and public-script-coord at w3.org is read by all the best HTML5/HTML/W3C/WHATWG people, whatever their w3.org, whatwg.org, or other affiliations.

Does it mean that I should not discuss this here?

If not, I believe that, given the fact that browsers will implement ES.next incrementally, we should find a way to allow graceful fallback, rather than version-driven conditionals.

The fact that Harmony features are already batched up makes this easy; maybe we can use a different use-pragma that already defined. Something like an object literal. Object literals are so cool.

var features = Object.features || {}; features.es6 = features.es6 || {}; ... if (features.es6.proxies) {  Object.createHandled = function(proto, objDesc, noSuchMethod) {    var handler = {      get: function(rcvr, p) {        return function() {          var args = [].slice.call(arguments, 0);          return noSuchMethod.call(this, p, args);        };      }    };    var p = Proxy.create(handler, proto);    return Object.create(p, objDesc);  }; } else {  Object.createHandled = function(proto, objDesc, noSuchMethod) {    var p = Object.create(p, objDesc);    p.noSuchMethod = noSuchMethod;    return p;  }; }

# Brendan Eich (8 years ago)

On May 29, 2011, at 2:58 PM, Thaddee Tyl wrote:

... I believe that, given the fact that browsers will implement ES.next incrementally, we should find a way to allow graceful fallback, rather than version-driven conditionals.

This is trivial for new global object properties, but such additions in the past, even including JSON, have been breaking changes for some web sites. With the module system in ES.next we are probably instead going to add standard modules named by anti-URI module resource locators such as "@name", and thereby avoid polluting the global object, or even the Object constructor.

But say we do add, e.g. Object.createPrivateName(name, visible). That will be detectable using good old "object detection" -- no need for a "features" object as you show.

The fact that Harmony features are already batched up makes this easy; maybe we can use a different use-pragma that already defined. Something like an object literal. Object literals are so cool.

if (features.es6.proxies) {

How is this better or different from any version-driven conditional? It's not, at least not up until the second dot.

Authors will want to object-detect APIs expressed as built-in objects, but other features including new syntax cannot be handled easily or at all that way, and many authors will want the whole pie.

# Thaddee Tyl (8 years ago)

On Mon, May 30, 2011 at 6:55 AM, Brendan Eich <brendan at mozilla.com> wrote:

On May 29, 2011, at 2:58 PM, Thaddee Tyl wrote:

... I believe that, given the fact that browsers will implement ES.next incrementally, we should find a way to allow graceful fallback, rather than version-driven conditionals.

This is trivial for new global object properties, but such additions in the past, even including JSON, have been breaking changes for some web sites.

They have coped with that by using polyfills such as:

if (!this.JSON) { this.JSON = {}; ... }

With the module system in ES.next we are probably instead going to add standard modules named by anti-URI module resource locators such as "@name", and thereby avoid polluting the global object, or even the Object constructor. But say we do add, e.g. Object.createPrivateName(name, visible). That will be detectable using good old "object detection" -- no need for a "features" object as you show.

The fact that Harmony features are already batched up makes this easy; maybe we can use a different use-pragma that already defined. Something like an object literal. Object literals are so cool. if (features.es6.proxies) {

How is this better or different from any version-driven conditional? It's not, at least not up until the second dot.

Authors will want to object-detect APIs expressed as built-in objects, but other features including new syntax cannot be handled easily or at all that way, and many authors will want the whole pie.

/be

In the current Harmony proposals, there are as many "syntax breakers" (with which current js parsers fire a syntax error) as there are "soft breakers" (which allow graceful degradation). Syntax breakers include things like "let", "const", destructuring, parameter default values, rest parameters, spread, modules and generators. Those are the syntax errors. Soft breakers include proxies, weak maps, egal, proper tail calls, binary data, Number, String, Function and Math improvements, typeof null and completion reform. These, on the other hand, can be feature-detected.

For most of the soft breakers, people will definitely want to check for implementation backing. For things like proper tail calls, feature detection can mean performance tuning; for typeof null it means that we can make sure that, in all cases, our code won't break.

For the syntax breakers, however, browsers will, indeed, have to implement all those features in one shot. They cannot do it incrementally, or else ES.next-only code may very well break.

What do you think? Should we force browser vendors to implement soft-breakers in one shot too, even though we could let them do it one feature at a time via feature detection?

# Brendan Eich (8 years ago)

On May 30, 2011, at 1:54 AM, Thaddee Tyl wrote:

On Mon, May 30, 2011 at 6:55 AM, Brendan Eich <brendan at mozilla.com> wrote:

On May 29, 2011, at 2:58 PM, Thaddee Tyl wrote:

... I believe that, given the fact that browsers will implement ES.next incrementally, we should find a way to allow graceful fallback, rather than version-driven conditionals.

This is trivial for new global object properties, but such additions in the past, even including JSON, have been breaking changes for some web sites.

They have coped with that by using polyfills such as:

if (!this.JSON) { this.JSON = {}; ... }

That's not what I was referring to. The problem was a few years ago some Facebook JS defined a JSON object with Encode and Decode methods (if memory serves), no stringify or parse. It also detected this.JSON but assumed that property value being truthy meant that its own non-ES5 Encode and Decode methods were present.

Polyfills are great but they arise a posteriori. The hard case is when someone in 2006 defined their own JSON, and it did not match what ES5 would standardize years in the future. Same thing that happened with JSON could happen with the new global and Object properties in ES.next.

IIRC the same thing happened with Object.keys, so this is not just about global properties. It's a risk in the pre-module world. With Harmony modules, the requiring code names the module and composes names using lexical scope.

For most of the soft breakers, people will definitely want to check for implementation backing.

It's hard to disagree with "people will definitely". Some surely will.

For things like proper tail calls, feature detection can mean performance tuning;

This is unlikely to be autoconf-tested. Lack of it means a stack overflow exception, which may take a while. I think this is an extremely unlikely thing for developers to want to foist on users, or get away with if they try.

for typeof null it means that we can make sure that, in all cases, our code won't break.

If you mean typeof code can be written to work in both old and new versions, yes -- we covered that after the January meeting, IIRC.

For the syntax breakers, however, browsers will, indeed, have to implement all those features in one shot. They cannot do it incrementally, or else ES.next-only code may very well break.

This is a bit overstated. All open-source browsers implement piecewise in their nightly build channels or repositories. Even IE in its platform previews since 9 and on into 10 has produced some new pieces of whole standards for testing. This is a good thing.

In final releases, we'd like all or nothing, but there's no "cannot" -- if you mean "must not", nothing stops a repeat of IE9, which had ES5 support except for strict mode.

What do you think? Should we force browser vendors to implement soft-breakers in one shot too, even though we could let them do it one feature at a time via feature detection?

What makes you think "we" can "force" anyone to do otherwise?

Software doesn't come together in one perfect big-bang integration. Neither do standards. Indeed we need both incremental standards drafting and prototype implementation of drafts in order to produce a better standard. ES.next and even strawmen for Harmony editions after it should be prototyped and tested in nightly builds, platform previews, and the like.

The coordination among vendors is a good topic for TC39 when we get further along, but it's really not productive to try to "force" anything yet. And it would be counterproductive to punish vendors that prototype-implement strawmen in order to assess implementability, usability, and other qualities we want in the next edition of the standard.

# Waldemar Horwat (8 years ago)

On 05/27/11 19:36, Brendan Eich wrote:

On May 27, 2011, at 6:42 PM, Brendan Eich wrote:

On May 27, 2011, at 6:20 PM, Waldemar Horwat<waldemar at google.com> wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.

Again, real browser JS engines use top-down parsing and no Reference type, instead specializing the AST for LeftHandSideExpressions to be precise, with early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.

Top-down LL parsers are subsumed by LR parsers. The hierarchy is:

LL(0) < LL(1) < LR(1). LR(0) < SLR(1) < LALR(1) < LR(1).

I realize that C++ and Perl put up with ambiguity, and it seriously bites them. Quick, what's the difference between the following in C++?

int x(int()); int x(-int());

Yet C++ is wildly successful inside Google and outside. Trade-offs...

I hasten to add that I'm not endorsing the undecideable crazyland of C++ syntax. Simply pointing to trade-offs taken differently in other languages that are surely successful in spite of lack of precise-enough LR(1) (cover) grammars.

The lack of a solid grammar means that a decade later C++ compilers are still fixing grammar bugs from the original standard. Here's a fun one that gcc is just getting around now to fixing:

 template <class T> struct A {
     template <class T2> A(T2);
     static A x;
 };
 template<> A<int>::A<int>(A<int>::x);

(Is that last line is actually a definition of x or a declaration of the constructor? See www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#147 )

 Waldemar
# Brendan Eich (8 years ago)

On May 31, 2011, at 1:21 PM, Waldemar Horwat wrote:

On 05/27/11 19:36, Brendan Eich wrote:

On May 27, 2011, at 6:42 PM, Brendan Eich wrote:

On May 27, 2011, at 6:20 PM, Waldemar Horwat<waldemar at google.com> wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.

Again, real browser JS engines use top-down parsing and no Reference type, instead specializing the AST for LeftHandSideExpressions to be precise, with early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.

Top-down LL parsers are subsumed by LR parsers. The hierarchy is:

LL(0) < LL(1) < LR(1). LR(0) < SLR(1) < LALR(1) < LR(1).

Yes, I know -- but that is not in practice so helpful to implementors, since they have to (at least) tediously refactor to remove left recursion.. The reason to use LR(1) in the spec is not to help implementors, as far as I can tell.

The reason to use LR(1) that you have cited is to have a validated-unambiguous grammar using a well-known formalism. It's a good reason, but it applies to other approaches.

Is there anything particularly compelling about LR(1) vs. say PEG (which is unambiguous by construction) -- linear time, memory proportional to PDA depth, or other? I have my own thoughts but I'd be interested in yours.

# Waldemar Horwat (8 years ago)

On 05/31/11 13:34, Brendan Eich wrote:

On May 31, 2011, at 1:21 PM, Waldemar Horwat wrote:

On 05/27/11 19:36, Brendan Eich wrote:

On May 27, 2011, at 6:42 PM, Brendan Eich wrote:

On May 27, 2011, at 6:20 PM, Waldemar Horwat<waldemar at google.com> wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.

Again, real browser JS engines use top-down parsing and no Reference type, instead specializing the AST for LeftHandSideExpressions to be precise, with early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.

Top-down LL parsers are subsumed by LR parsers. The hierarchy is:

LL(0)< LL(1)< LR(1). LR(0)< SLR(1)< LALR(1)< LR(1).

Yes, I know -- but that is not in practice so helpful to implementors, since they have to (at least) tediously refactor to remove left recursion.. The reason to use LR(1) in the spec is not to help implementors, as far as I can tell.

The reason to use LR(1) that you have cited is to have a validated-unambiguous grammar using a well-known formalism. It's a good reason, but it applies to other approaches.

Is there anything particularly compelling about LR(1) vs. say PEG (which is unambiguous by construction) -- linear time, memory proportional to PDA depth, or other? I have my own thoughts but I'd be interested in yours.

Yes. I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

Negative rules are the bane of grammars and behind the majority of the problems with the C++ grammar, including the examples I listed earlier. They make a grammar non-understandable because the order of the rules is subtly significant and makes it hard to reason about when an X matches a Z; a language extension might expand the definition of Y to make an X no longer match a Q, and you wouldn't know it just by looking at a grammar with negative rules. In a positive-rule-only grammar you'd discover the problem right away because the grammar wouldn't compile.

Negative rules also interact badly with both semicolon insertion and division-vs-regexp lexer disambiguation. One might naively think that semicolon insertion would be an ideal match for negative rules: You first try to parse

tokens-on-line1 tokens-on-line2

as a single statement and, only if that fails, you move on to parsing it as two statements with a virtual semicolon between them. That, however, doesn't work. Here's a simple counterexample:

a + b (c) = d

Negative rules would insert a virtual semicolon here because

a + b(c) = d

is not a valid parse. However, the correct ECMAScript behavior is not to insert a semicolon.

 Waldemar
# Waldemar Horwat (8 years ago)

On 05/31/11 14:30, Waldemar Horwat wrote:

On 05/31/11 13:34, Brendan Eich wrote:

On May 31, 2011, at 1:21 PM, Waldemar Horwat wrote:

On 05/27/11 19:36, Brendan Eich wrote:

On May 27, 2011, at 6:42 PM, Brendan Eich wrote:

On May 27, 2011, at 6:20 PM, Waldemar Horwat<waldemar at google.com> wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue. The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it. We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.

Again, real browser JS engines use top-down parsing and no Reference type, instead specializing the AST for LeftHandSideExpressions to be precise, with early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.

Top-down LL parsers are subsumed by LR parsers. The hierarchy is:

LL(0)< LL(1)< LR(1). LR(0)< SLR(1)< LALR(1)< LR(1).

Yes, I know -- but that is not in practice so helpful to implementors, since they have to (at least) tediously refactor to remove left recursion.. The reason to use LR(1) in the spec is not to help implementors, as far as I can tell.

The reason to use LR(1) that you have cited is to have a validated-unambiguous grammar using a well-known formalism. It's a good reason, but it applies to other approaches.

Is there anything particularly compelling about LR(1) vs. say PEG (which is unambiguous by construction) -- linear time, memory proportional to PDA depth, or other? I have my own thoughts but I'd be interested in yours.

Yes. I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

Negative rules are the bane of grammars and behind the majority of the problems with the C++ grammar, including the examples I listed earlier. They make a grammar non-understandable because the order of the rules is subtly significant and makes it hard to reason about when an X matches a Z; a language extension might expand the definition of Y to make an X no longer match a Q, and you wouldn't know it just by looking at a grammar with negative rules. In a positive-rule-only grammar you'd discover the problem right away because the grammar wouldn't compile.

Negative rules also interact badly with both semicolon insertion and division-vs-regexp lexer disambiguation. One might naively think that semicolon insertion would be an ideal match for negative rules: You first try to parse

tokens-on-line1 tokens-on-line2

as a single statement and, only if that fails, you move on to parsing it as two statements with a virtual semicolon between them. That, however, doesn't work. Here's a simple counterexample:

a + b (c) = d

Negative rules would insert a virtual semicolon here because

a + b(c) = d

is not a valid parse. However, the correct ECMAScript behavior is not to insert a semicolon.

Waldemar

Typo: "A language extension might expand the definition of Q to make an X no longer match a Z, and you wouldn't know it just by looking at a grammar with negative rules."

 Waldemar
# Brendan Eich (8 years ago)

On May 31, 2011, at 2:30 PM, Waldemar Horwat wrote:

I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

Negative rules are the bane of grammars and behind the majority of the problems with the C++ grammar, including the examples I listed earlier. They make a grammar non-understandable because the order of the rules is subtly significant and makes it hard to reason about when an X matches a Z; a language extension might expand the definition of Y to make an X no longer match a Q, and you wouldn't know it just by looking at a grammar with negative rules. In a positive-rule-only grammar you'd discover the problem right away because the grammar wouldn't compile.

Thanks -- you've made this point before and I've agreed with it. It helps to restate and amplify it, I think, because my impression is that not many people "get it".

PEG users may be happy with their JS parsers at any given point in the language's standard version-space, of course.

It still could be that we use LL(1) or another positive-rule-only grammar, of course, but we can hash that out separately.

Negative rules also interact badly with both semicolon insertion and division-vs-regexp lexer disambiguation. One might naively think that semicolon insertion would be an ideal match for negative rules: You first try to parse

tokens-on-line1 tokens-on-line2

Heh; this doesn't pass the first rule of ASI fight-club: there's no insertion is there is no error.

# Waldemar Horwat (8 years ago)

On 05/31/11 14:55, Brendan Eich wrote:

On May 31, 2011, at 2:30 PM, Waldemar Horwat wrote:

I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

Negative rules are the bane of grammars and behind the majority of the problems with the C++ grammar, including the examples I listed earlier. They make a grammar non-understandable because the order of the rules is subtly significant and makes it hard to reason about when an X matches a Z; a language extension might expand the definition of Y to make an X no longer match a Q, and you wouldn't know it just by looking at a grammar with negative rules. In a positive-rule-only grammar you'd discover the problem right away because the grammar wouldn't compile.

Thanks -- you've made this point before and I've agreed with it. It helps to restate and amplify it, I think, because my impression is that not many people "get it".

PEG users may be happy with their JS parsers at any given point in the language's standard version-space, of course.

It still could be that we use LL(1) or another positive-rule-only grammar, of course, but we can hash that out separately.

Negative rules also interact badly with both semicolon insertion and division-vs-regexp lexer disambiguation. One might naively think that semicolon insertion would be an ideal match for negative rules: You first try to parse

tokens-on-line1 tokens-on-line2

as a single statement and, only if that fails, you move on to parsing it as two statements with a virtual semicolon between them. That, however, doesn't work. Here's a simple counterexample:

a + b (c) = d

Negative rules would insert a virtual semicolon here because

a + b(c) = d

is not a valid parse. However, the correct ECMAScript behavior is not to insert a semicolon.

Heh; this doesn't pass the first rule of ASI fight-club: there's no insertion is there is no error.

I don't understand the premise of your comment on ASI. Here there is an error in parsing without a virtual semicolon and no error in parsing with a virtual semicolon, so a PEG-like ASI would erroneously insert one.

 Waldemar
# Brendan Eich (8 years ago)

On May 31, 2011, at 3:59 PM, Waldemar Horwat wrote:

I don't understand the premise of your comment on ASI. Here there is an error in parsing without a virtual semicolon and no error in parsing with a virtual semicolon, so a PEG-like ASI would erroneously insert one.

Sorry, I misread the example! It looked at first like the one in 7.9.2 at the end.

# Kam Kasravi (8 years ago)

On May 31, 2011, at 2:55 PM, Brendan Eich <brendan at mozilla.com> wrote:

On May 31, 2011, at 2:30 PM, Waldemar Horwat wrote:

I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

PEGs use of ordered choice provides an opportunity to minimize backtracking, but it still backtracks given a nonterminal where the first ordered choice is incorrect. A PEG must return one parse tree or an error after potentially exhausting all choices (unlike GLR). I believe there are differing motivations to pick a parser depending on your goals, if you're experimenting with the grammar or want a parser to transform an extended grammar then PEGs make alot of sense because they closely matche the BNF grammar and it's easy to introduce new grammar rules. It's likely PEGs could provide diagnostics related to LR(1) ambiguity, at least with pegjs it looks like this could be built into the algorithm. I understand the motivation to avoid any parser which tolerates ambiguous LR(1) grammars, but PEGs can be great tools given the LR(1) requirement is enforced.

# Brendan Eich (8 years ago)

On May 31, 2011, at 9:08 PM, Kam Kasravi wrote:

On May 31, 2011, at 2:55 PM, Brendan Eich <brendan at mozilla.com> wrote:

On May 31, 2011, at 2:30 PM, Waldemar Horwat wrote:

I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

[just noting you are replying to Waldemar's words here, not mine. /be]

PEGs use of ordered choice provides an opportunity to minimize backtracking, but it still backtracks given a nonterminal where the first ordered choice is incorrect. A PEG must return one parse tree or an error after potentially exhausting all choices (unlike GLR).

Yes. The problem with PEGs is not ambiguity (multiple parse trees for one sentence) but the negative-rule future hostility problem that Waldemar cited. That is hard to see at any given instant. It comes up when evolving a language.

I believe there are differing motivations to pick a parser depending on your goals, if you're experimenting with the grammar or want a parser to transform an extended grammar then PEGs make alot of sense because they closely matche the BNF grammar and it's easy to introduce new grammar rules. It's likely PEGs could provide diagnostics related to LR(1) ambiguity, at least with pegjs it looks like this could be built into the algorithm. I understand the motivation to avoid any parser which tolerates ambiguous LR(1) grammars, but PEGs can be great tools given the LR(1) requirement is enforced.

This matches Tom's testimony.

At this point I'm working under assumption ES.next sticks with the LR(1) grammar. First target: destructuring, using an extended Reference type. There are alternatives (thanks to dherman for discussion today about this) but I think this is the minimal patch to ES5.

Arrow function syntax can be handled similarly, provided Expression covers ArrowFormalParameters. But that is a strawman, so it'll go after destructuring.

# Kam Kasravi (8 years ago)

On May 31, 2011, at 9:34 PM, Brendan Eich <brendan at mozilla.com> wrote:

On May 31, 2011, at 9:08 PM, Kam Kasravi wrote:

On May 31, 2011, at 2:55 PM, Brendan Eich <brendan at mozilla.com> wrote:

On May 31, 2011, at 2:30 PM, Waldemar Horwat wrote:

I would not want to use anything like a PEG to standardize a grammar. Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule. This turns all rules following the first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q or a B or .... You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

[just noting you are replying to Waldemar's words here, not mine. /be]

PEGs use of ordered choice provides an opportunity to minimize backtracking, but it still backtracks given a nonterminal where the first ordered choice is incorrect. A PEG must return one parse tree or an error after potentially exhausting all choices (unlike GLR).

Yes. The problem with PEGs is not ambiguity (multiple parse trees for one sentence) but the negative-rule future hostility problem that Waldemar cited. That is hard to see at any given instant. It comes up when evolving a language.

I believe there are differing motivations to pick a parser depending on your goals, if you're experimenting with the grammar or want a parser to transform an extended grammar then PEGs make alot of sense because they closely matche the BNF grammar and it's easy to introduce new grammar rules. It's likely PEGs could provide diagnostics related to LR(1) ambiguity, at least with pegjs it looks like this could be built into the algorithm. I understand the motivation to avoid any parser which tolerates ambiguous LR(1) grammars, but PEGs can be great tools given the LR(1) requirement is enforced.

This matches Tom's testimony.

At this point I'm working under assumption ES.next sticks with the LR(1) grammar. First target: destructuring, using an extended Reference type. There are alternatives (thanks to dherman for discussion today about this) but I think this is the minimal patch to ES5.

Arrow function syntax can be handled similarly, provided Expression covers ArrowFormalParameters. But that is a strawman, so it'll go after destructuring.

There was no suggestion

# Brendan Eich (8 years ago)

On May 31, 2011, at 10:07 PM, Kam Kasravi wrote:

Is it a given that the grammar extensions in the various strawmen are all LR(1)?

See earlier in this thread, where this strawman was mentioned as requiring either a cover grammar for ArrowFormalParameters (namely Expression), plus a restriction against arrow block body being a block-with-labeled-statement-as-first-child -- or else GLR or PEG or something that is not going to fly in the spec.

# Jorge (8 years ago)

On 28/05/2011, at 16:29, Brendan Eich wrote:

On May 28, 2011, at 1:49 AM, Jorge wrote:

On 27/05/2011, at 12:24, Brendan Eich wrote:

On May 27, 2011, at 2:36 AM, Jorge wrote:

Also, I wonder if in order to make blocks first class, do we need any new syntax ?

function f() { a.forEach({ return 3 });

The problem is that a block statement is ambiguous with an object initialiser. See strawman:arrow_function_syntax#grammar_changes in particular the "To enable unparenthesized ObjectLiteral expressions as bodies of arrow functions, without ambiguity with Block bodies, restrict LabelledStatement as follows..." section.

As labels are seldom used in JS, perhaps the easiest way to avoid ambiguities would be to forbid blocks from beginning with a label ?

Would it be too much (or too ugly) to require the programmer to disambiguate (only) in this corner case ?

A block: { noLabelHere ... }

We didn't talk about this change. It is yet another migration early-error to consider.

But it's not very usual to begin a block with a label.

It's certainly simpler than a more powerful parsing algorithm than LR(1),

If you

1.- keep the familiar { block } syntax for first class blocks, and 2.- use {|| ... } for shorter functions syntax and 3.- keep the (obligatory) parens as the call() operator

wouldn't we gain everything in the arrow syntax and block lambdas strawmen, except for paren-freeness ?

And, wouldn't that be easier for the current (proven) parsers, and pose almost zero risks in this respect ?

And, wouldn't that be in line the already known, much appreciated by many of us, current JS (and C) style ?

{ block }( call ) or {|| ... }( call )

foo bar baz ... wtf ? foo(bar(baz)) ? foo(bar)(baz) ? foo(bar)(baz)() ? meh! This syntax introduces ambiguities !

Do David and Jeremy like it ? Good for them. Do JavaScripters like it ? The least we can say is that it's quite polemical : rails/rails/commit/9f09aeb8273177fc2d09ebdafcc76ee8eb56fe33

but we might want to cross that bridge anyway for arrow functions.

<fwiw>

Arrow syntax is extraneous to JS developers. It's an unnecessary, radical style change. And ugly: there are JS developers that just don'tlike* it.

So, why ?

Paren-free(ness) is a fashion: foo bar baz, what's a function, who's calling who ? with which parameters ? Meh! Ambiguities.

</fwiw>

# Brendan Eich (8 years ago)

On Jun 1, 2011, at 10:07 AM, Jorge wrote:

A block: { noLabelHere ... }

We didn't talk about this change. It is yet another migration early-error to consider.

But it's not very usual to begin a block with a label.

You're right, and it can be an ArrowBodyBlock, not a backward-compatible Block, so this is only a refactoring-from-function-to-arrow-syntax tax. Good idea.

It's certainly simpler than a more powerful parsing algorithm than LR(1),

If you

1.- keep the familiar { block } syntax for first class blocks, and 2.- use {|| ... } for shorter functions syntax and 3.- keep the (obligatory) parens as the call() operator

wouldn't we gain everything in the arrow syntax and block lambdas strawmen, except for paren-freeness ?

No, some people object to the cleverness of block-lambdas, the TCP preservation for break, continue, return, and |this|, and the completion-return too.

Paren-free syntax for block-lambdas as control abstractions is controverisal, but less so, and IMHO trumped up (since when was C# in its multi-version glory designed holistically at 1.0? LINQ and other innovations have come relatively quickly).

Block-lambdas are more divisive because they introduce novel runtime semantics, akin to throwing uncatchable-except-by-the-VM exceptions.

Arrow function syntax is just syntax, so it is an easier sell, provided the grammar and semantics can be added to the ECMA-262 framework without too much trouble.

And, wouldn't that be easier for the current (proven) parsers, and pose almost zero risks in this respect ?

The parsing problems of arrows are really only in the ECMA-262 spec-space, not in real implementations.

And, wouldn't that be in line the already known, much appreciated by many of us, current JS (and C) style ?

{ block }( call ) or {|| ... }( call )

Since when can you call a block in JS or C?

A function expression is not a block, it starts with 'function(' at the least.

foo bar baz ... wtf ? foo(bar(baz)) ? foo(bar)(baz) ? foo(bar)(baz)() ? meh! This syntax introduces ambiguities !

Not the way the strawman grammar works. Did you read it?

Here you seem to be inveighing against the paren-free CallWithBlockArguments production, not against arrow function syntax. Yet you switch targets immediately:

Do David and Jeremy like it ? Good for them. Do JavaScripters like it ? The least we can say is that it's quite polemical : rails/rails/commit/9f09aeb8273177fc2d09ebdafcc76ee8eb56fe33

What are you talking about now? Arrows or call-with-block-lambda-argument paren-free calls? The two are entirely separate strawmen, the latter part of block-lambdas.

If by David you mean DHH, didn't he endorse CoffeeScript, which uses arrow function syntax, and not any Ruby-ish block-lambda proposal from me? Please keep your arguments straight!

but we might want to cross that bridge anyway for arrow functions.

<fwiw>

Arrow syntax is extraneous to JS developers. It's an unnecessary, radical style change. And ugly: there are JS developers that just don'tlike* it.

People say that about block-lambdas and not just for the non-function-looking syntax -- especially for the semantics, which are novel.

So, why ?

I think the shoe is on the other foot.

Look, I wrote up both strawman:arrow_function_syntax and strawman:block_lambda_revival to give both the arrow fans (not just CoffeeScript, it is in many languages and math-y notation systems) and block-lambdas (originated in Smalltalk, also in a more layered form in Ruby) both a fair shake.

Don't shoot the messengers who want only one, or neither. And don't shoot me for drafting these. JS's function syntax with mandatory return in braced body is too long. But fixing that is not a simple matter of only shortening function to some ugly and conflicted punctuator. It's not a matter of pretending block-lambdas are blocks and are already callable. It requires careful work to meet several often-conflicting goals and requirements.

Paren-free(ness) is a fashion: foo bar baz, what's a function, who's calling who ? with which parameters ? Meh! Ambiguities.

How about you stop ranting and read the block-lambda grammar.

</fwiw>

If we succeed there, we may not need such an incompatible restriction on labels.

-1

-1 on what, the incompatible change being rejected? It isn't necessary anyway: we can split Block and ArrowBodyBlock or anything like it. So cheer up!