Proposal: Modify automatic semicolon insertion in strict mode
Eric Suen wrote:
Your proposal make nosense to me, first, I not sure what is your strict mode means, because in strict mode there is no automatic semicolon insertion.
in must examples you give there is no automatic semicolon insertion, your proposal make thing even worse, these are just different coding style, for examples, for long expression, it could be:
a + b
or
a
- b
and
a + b
if this will cause syntax error, I think no one will use JavaScript...
Yes, it will make currently valid coding styles invalid, but that's for the sake of a clean grammar. I modeled my rules off what Ruby allows, and I haven't really heard Ruby authors complaining about those rules. If what you said was true, then Ruby would not be growing in popularity, even if it has a different language genealogy.
I can envision two clean options for strict mode:
- mandating semicolons
- modifying automatic semicolon insertion along the lines of what I propose
I'm fine with either.
Yuh-Ruey Chen wrote:
Most people on this list consider automatic semicolon insertion something of a mistake. However, I think this feature would be fine if it were not so "eager" and thus causing all sorts of subtle errors and hampering language evolution (e.g. the ongoing lambda discussion). By eager, I mean that there are too many cases where automatic semicolon insertion takes place.
There are, but I think it's a bad idea to change semicolon insertion in such a way that currently valid programs are still valid but are parsed differently.
So here's the proposal:
- Stricter rules for function call and subscript operators
... f (args) ...
currently parses as:
... f(args) ...;
I propose that this instead parses as:
... f; (args) ...;
Likewise,
... a [args] ....
should parse as:
... a; [args] ...;
Note that:
... f( args ) ...
should still parse as:
... f(args) ...;
All of these changes affect the behaviour of valid programs without making them invalid.
- Stricter rules for binary operators (including '.' and ',')
... a BINARY_OP b
currently parses as:
... a BINARY_OP b ...;
I propose that this instead parses as:
... a; BINARY_OP b ...;
and thus results in a syntax error, unless it's a unary operator.
Note that:
... a BINARY_OP b ...
should still parse as:
... a BINARY_OP b ...;
Since expressions starting with unary operators are usually valid yet erroneous code...
No, it's perfectly commonplace to break a line before a binary operator:
var foo = some_very_long___________________________________expression + some_other_expression;
- All expression statements that start with an expression of lesser precedence than an assignment operator should result in an error, or at the very least, a warning. This applies even if the statement does have an effect. For example, all the following lines should generate some warning:
+(a = 10) +func() a * func() [a, b, c]
OTOH, if operator overloading is ever added to the language, this will be trickier, since user-defined operator methods may cause side effects.
This has more merit than 1) or 2). I considered it for Jacaranda, although it didn't make it into the spec.
However, warnings are out of scope for the language specification, unless they are in some informative annex. Also, this proposal doesn't seem to have anything to do with semicolon insertion.
(Optional) To simplify grammar implementation, these new rules apply even if the expressions are nested within group operators, [...]
Since this is all clearly incompatible with existing ES3 code, this requires an opt-in: these new rules should only be followed under strict mode.
That's not sufficient. Strict mode is supposed to be a subset.
Yuh-Ruey Chen wrote:
Eric Suen wrote:
Your proposal make nosense to me, first, I not sure what is your strict mode means, because in strict mode there is no automatic semicolon insertion.
No, semicolon insertion occurs also in strict mode. Perhaps it shouldn't.
in must examples you give there is no automatic semicolon insertion, your proposal make thing even worse, these are just different coding style, for examples, for long expression, it could be: [...] if this will cause syntax error, I think no one will use JavaScript...
Yes, it will make currently valid coding styles invalid, but that's for the sake of a clean grammar. I modeled my rules off what Ruby allows, and I haven't really heard Ruby authors complaining about those rules. If what you said was true, then Ruby would not be growing in popularity, even if it has a different language genealogy.
That's not a valid argument: Ruby had these rules from the start.
On Dec 7, 2008, at 9:48 PM, David-Sarah Hopwood wrote:
Yuh-Ruey Chen wrote:
Most people on this list consider automatic semicolon insertion something of a mistake. However, I think this feature would be fine
if it were not so "eager" and thus causing all sorts of subtle errors
and hampering language evolution (e.g. the ongoing lambda discussion). By eager, I mean that there are too many cases where automatic semicolon insertion takes place.There are, but I think it's a bad idea to change semicolon insertion in such a way that currently valid programs are still valid but are parsed differently.
Indeed, this is a non-starter. No browser-based implementation can
take the chance of breaking content by doing this. Opt-in versioning?
That just complicates things for implementors and users of the language.
I think it is a waste of time to fuss about ASI in the belief that
doing so will make novel syntax easier to add. It surely could, but
changing the meaning of extant programs (or such programs under a new
explicit version selection) is not going to happen. And we should be
skeptical of syntactic innovation that wants it to.
We should keep syntax from growing as randomly as it has in Ruby,
frankly. If ASI helps, great. I earlier plugged eclecticism but that
was to kick off the {|...| ...} lambda syntax bikeshed-thread. We've
reached some conclusions thanks to that thread. Borrowed syntax now
seems strictly less desirable than it did before that discussion.
On Dec 7, 2008, at 9:53 PM, David-Sarah Hopwood wrote:
Yuh-Ruey Chen wrote:
Eric Suen wrote:
Your proposal make nosense to me, first, I not sure what is your strict mode means, because in strict mode there is no automatic semicolon insertion.
No, semicolon insertion occurs also in strict mode. Perhaps it
shouldn't.
TC39 already agreed it should, for ES3.1. Strict mode has enough
migration-tax that we do not want to risk making it unused in practice
due to lack of ASI. It's hard to know what would happen, but the sense
of the committee was to leave ASI alone in strict mode in order to
make strict mode easier to use in existing code, as well as new code
meant to work in old browsers where the author didn't test carefully.
Such undertesting happens, just as for quirky HTML -- the XHTML utopia
notwithstanding (IE processed text/xhtml using quirky syntax error
correction, guaranteeing non-wellformedness to some degree in the
deployed content). Missing semicolons happen too, in spite of the best
intentions.
Never mind that many JS developers would have our heads for removing
ASI from strict mode, the problem remains that programmers don't know
where they depend on ASI, and if we try to force them to care, we'll
fail -- "they" may not even be around, and the content errors will be
foisted on innocent end users of the incumbent content.
I was pushing for this agreement, so others should give their own
views, but we did in fact reach a decision on this point.
Yuh-Ruey Chen wrote:
Eric Suen wrote:
Your proposal make nosense to me, first, I not sure what is your strict mode means, because in strict mode there is no automatic semicolon insertion.
No, semicolon insertion occurs also in strict mode. Perhaps it shouldn't.
Oops, I got this from:
www.mozilla.org/js/language/js20-2002-04/core/pragmas.html#strict
Brendan Eich wrote:
On Dec 7, 2008, at 9:53 PM, David-Sarah Hopwood wrote:
No, semicolon insertion occurs also in strict mode. Perhaps it shouldn't.
TC39 already agreed it should, for ES3.1. Strict mode has enough migration-tax that we do not want to risk making it unused in practice due to lack of ASI. It's hard to know what would happen, but the sense of the committee was to leave ASI alone in strict mode in order to make strict mode easier to use in existing code, as well as new code meant to work in old browsers where the author didn't test carefully.
Such undertesting happens, just as for quirky HTML -- the XHTML utopia notwithstanding (IE processed text/xhtml using quirky syntax error correction, guaranteeing non-wellformedness to some degree in the deployed content). Missing semicolons happen too, in spite of the best intentions.
Never mind that many JS developers would have our heads for removing ASI from strict mode, the problem remains that programmers don't know where they depend on ASI, and if we try to force them to care, we'll fail -- "they" may not even be around, and the content errors will be foisted on innocent end users of the incumbent content.
Why? That would only happen if you added "use strict;" to a program fragment without testing that the resulting program still parses correctly, which is an obviously silly thing to do. If it fails to parse, and no programmer who can fix it is available, then don't add the "use strict;". Alternatively, put the source through a filter that adds the semicolons automatically, once and for all. What am I missing here?
On Dec 8, 2008, at 9:48 PM, David-Sarah Hopwood wrote:
Why? That would only happen if you added "use strict;" to a program
fragment without testing that the resulting program still parses correctly,
Parses and executes over all paths correctly, you must mean.
which is an obviously silly thing to do. If it fails to parse, and no
programmer who can fix it is available, then don't add the "use strict;".
Why impose unnecessary work on someone trying to use strict?
Alternatively, put the source through a filter that adds the
semicolons automatically, once and for all. What am I missing here?
What filter? Procured from where? At what cost in time and trouble?
What problem are you solving?
Eric Suen wrote:
Yuh-Ruey Chen wrote:
Eric Suen wrote:
Your proposal make nosense to me, first, I not sure what is your strict mode means, because in strict mode there is no automatic semicolon insertion. No, semicolon insertion occurs also in strict mode. Perhaps it shouldn't.
Oops, I got this from:
www.mozilla.org/js/language/js20-2002-04/core/pragmas.html#strict-mode
That page is long out-of-date, and not relevant to ES3.1 or Harmony.
Brendan Eich wrote:
On Dec 8, 2008, at 9:48 PM, David-Sarah Hopwood wrote:
Why? That would only happen if you added "use strict;" to a program fragment without testing that the resulting program still parses correctly,
Parses and executes over all paths correctly, you must mean.
No, I don't. The context was an argument against removing semicolon insertion from strict mode. If you add "use strict;" (to code that is statically known, not to code that is passed to 'eval') and the resulting program still parses, then you know that the lack of semicolon insertion in strict mode made no difference. It isn't necessary to run the code in order to test that.
Of course you should do other testing as well, to account for other differences between non-strict and strict mode, but that's not relevant to semicolon insertion.
which is an obviously silly thing to do. If it fails to parse, and no programmer who can fix it is available, then don't add the "use strict;".
Why impose unnecessary work on someone trying to use strict?
I repeat, not even checking that code still parses after adding "use strict;" is obviously silly, and we should not claim that strict mode is usable without doing that -- whether or not it does semicolon insertion. In other words, the work is necessary.
Alternatively, put the source through a filter that adds the semicolons automatically, once and for all. What am I missing here?
What filter? Procured from where? At what cost in time and trouble?
One that I'd write and open-source, if it were sufficient to win this argument ;-)
On Dec 9, 2008, at 1:00 AM, David-Sarah Hopwood wrote:
Why impose unnecessary work on someone trying to use strict?
I repeat, not even checking that code still parses after adding "use strict;" is obviously silly, and we should not claim that strict mode is usable without doing that -- whether or not it does semicolon insertion. In other words, the work is necessary.
It's more than just making sure the file parses correctly -- it's
adding the semicolons too. Are you assuming that most programs will
parse correctly without semicolon insertion?
Foo.prototype.bar = function() { //... } // <-- whoops, no semicolon
For me that qualifies as "unnecessary work," aka "pedantic." YMMV.
I don't think it's right to ask "what are the reasons someone wouldn't
migrate to strict mode?" Non-migration is the de-facto "winning"
position. The real question is, "why would someone bother to
migrate?" If you have fully functional code, why spend any time
modifying it, introducing risk, while delaying other work? This lies
at the heart of Brendan's concerns about migration tax. Any tax is
going to inhibit migration.
On Dec 9, 2008, at 1:00 AM, David-Sarah Hopwood wrote:
Brendan Eich wrote:
On Dec 8, 2008, at 9:48 PM, David-Sarah Hopwood wrote:
Why? That would only happen if you added "use strict;" to a program fragment without testing that the resulting program still parses correctly,
Parses and executes over all paths correctly, you must mean.
No, I don't. The context was an argument against removing semicolon insertion from strict mode. If you add "use strict;" (to code that is statically known, not to code that is passed to 'eval') and the
resulting program still parses, then you know that the lack of semicolon
insertion in strict mode made no difference. It isn't necessary to run the
code in order to test that.
Wrong:
s = 'c'; b = 42; c = 33; g = 1; Number.prototype.exec = function(s){var c=this; return eval(s);}; a = b /c/g.exec(s); print(a);
Verify by adding a semicolon at the end of the a = b statement
expression and noting the different result.
Of course you should do other testing as well, to account for other differences between non-strict and strict mode, but that's not
relevant to semicolon insertion.
How's the weather there on planet Utopia?
I see that Neil Mix made the obvious point about the game theory here.
Don't put strict mode on the losing side or it will not be used as
much as you would like.
On Dec 9, 2008, at 10:26 AM, Brendan Eich wrote:
Wrong:
s = 'c'; b = 42; c = 33; g = 1; Number.prototype.exec = function(s){var c=this; return eval(s);}; a = b /c/g.exec(s); print(a);
Sorry, I'm wrong -- that doesn't show the problem. ASI only inserts on
an error, and there's no error there. I thought there was a case not
covered by ES3 that would parse yet change meaning without ASI,
compared to with ASI. It looks like there is no such case, so you're
right that only parse-ability needs to be tested.
That still means work for someone, possibly dealing with thousands of
lines of code written by someone else. It's still an unmotivated
migration tax.
Again, what problem are you trying to solve by removing ASI from
strict mode? You didn't answer last time I asked.
Neil Mix wrote:
On Dec 9, 2008, at 1:00 AM, David-Sarah Hopwood wrote:
Why impose unnecessary work on someone trying to use strict?
I repeat, not even checking that code still parses after adding "use strict;" is obviously silly, and we should not claim that strict mode is usable without doing that -- whether or not it does semicolon insertion. In other words, the work is necessary.
It's more than just making sure the file parses correctly -- it's adding the semicolons too.
You've lost the context of my original post, which answered this point:
| Brendan Eich wrote: | > Never mind that many JS developers would have our heads for removing ASI | > from strict mode, the problem remains that programmers don't know where | > they depend on ASI, and if we try to force them to care, we'll fail -- | > "they" may not even be around, and the content errors will be foisted on | > innocent end users of the incumbent content. | | Why? That would only happen if you added "use strict;" to a program fragment | without testing that the resulting program still parses correctly, which is | an obviously silly thing to do. If it fails to parse, and no programmer | who can fix it is available, then don't add the "use strict;". | Alternatively, put the source through a filter that adds the semicolons | automatically, once and for all.
I will write the filter, if there is agreement not to do semicolon insertion in strict mode. I'm implementing an ANTLRv3 parser for ES3.1 anyway (starting with Patrick Hulsmeijer's ES3 grammar), so this will not be much extra work.
Are you assuming that most programs will parse correctly without semicolon insertion?
No, I'm not assuming that.
Foo.prototype.bar = function() { //... } // <-- whoops, no semicolon
For me that qualifies as "unnecessary work," aka "pedantic."
I hope it's not too pedantic of me to point out that "unnecessary work" means something quite different from "pedantic". Or maybe it is pedantic, but necessary ;-)
YMMV.
Whether the semicolon is unnecessary depends on what is on the next line after your example. For instance:
Foo.prototype.bar = function() { //... } (anything)
This will unintentionally apply the function to (anything) and set Foo to the result. So the semicolon is necessary here.
Since the above code is valid ECMAScript, the filter can't reject it, but it can heuristically warn about code that is likely to have omitted semicolons where they are needed.
In this case, "(anything)" is at the same indentation level as "Foo...", which is a dead giveaway that the programmer intended it to be a new statement. Almost all common coding styles require a continuation of a statement on the previous line to be indented.
Similarly, if a semicolon would be inserted at the end of a line and the next line is indented further to the right, then that is also probably a mistake.
Clearly, this kind of heuristic can't be put in the language spec, because ECMAScript does not use indentation-based syntax and cannot be compatibly changed to do so. It can only be checked by a tool that is not strictly part of the language implementation (although it could be integrated with an IDE, editor, or debugger).
I don't think it's right to ask "what are the reasons someone wouldn't migrate to strict mode?" Non-migration is the de-facto "winning" position. The real question is, "why would someone bother to migrate?"
Isn't that obvious? To provide a better chance of detecting errors in their code. But I suspect that strict mode will be more relevant for new code. If it is used in a large proportion of new code and in heavily relied-on code (libraries like Prototype, jQuery, etc., and internally within browser implementations), then it will have been successful.
Brendan Eich wrote:
On Dec 9, 2008, at 1:00 AM, David-Sarah Hopwood wrote:
Brendan Eich wrote:
On Dec 8, 2008, at 9:48 PM, David-Sarah Hopwood wrote:
Why? That would only happen if you added "use strict;" to a program fragment without testing that the resulting program still parses correctly,
Parses and executes over all paths correctly, you must mean.
No, I don't. The context was an argument against removing semicolon insertion from strict mode. If you add "use strict;" (to code that is statically known, not to code that is passed to 'eval') and the resulting program still parses, then you know that the lack of semicolon insertion in strict mode made no difference. It isn't necessary to run the code in order to test that.
Wrong:
s = 'c'; b = 42; c = 33; g = 1; Number.prototype.exec = function(s){var c=this; return eval(s);}; a = b /c/g.exec(s); print(a);
As you acknowledged in a follow-up, this example is invalid, because it parses in the same way regardless of semicolon insertion. My argument above is correct, and easily provable by observing that ES3 only specifies semicolon insertion in cases that would otherwise be a syntax error.
In fact the probable mistake in the above example is heuristically detectable because "/c/..." is at the same indentation level as "a = b", and if the "/" were intended to be a division, it would be indented. The filter that I'm suggesting could (would, if I write it) detect that and give a warning.
Of course you should do other testing as well, to account for other differences between non-strict and strict mode, but that's not relevant to semicolon insertion.
How's the weather there on planet Utopia?
Please stick to technical arguments.
To answer your question about how removing semicolon insertion from strict mode would help to avoid errors, first note that a programmer who habitually added semicolons after every statement would be less likely to make the error above, or similar errors, such as the one I point out in a follow-up to Neil Mix, or the one that was discussed when we were considering '^' as a prefix for lambda expressions.
So, if strict mode normally required semicolons (even if not in the above specific example), and if the strict mode subset were what is normally taught (or self-taught), then the potential for such bugs would be reduced.
Here is another example that is discussed in the rationale of the Jacaranda specification:
var a = x + y;
Suppose that a programmer mistypes + as ++, for example:
var a = x ++ y;
That will be parsed as
var a = x; ++ y;
instead of as "var a = x ++ y;", which is a syntax error. In other words, syntax errors are less likely to be detected as such, rather than silently interpreted as something different. This applies to a wide range of possible syntax errors.
Again, the filter I suggested could heuristically detect this mistake, because a semicolon is inserted between two lines where the second is indented relative to the first.
It's more than just making sure the file parses correctly -- it's
adding the semicolons too.You've lost the context of my original post, which answered this
point:
No, I followed it carefully. I think you have an unrealistic
expectation for uptake of your filter.
I will write the filter, if there is agreement not to do semicolon
insertion in strict mode. I'm implementing an ANTLRv3 parser for ES3.1 anyway (starting with Patrick Hulsmeijer's ES3 grammar), so this will not
be much extra work.
So what? Will you write the scripts that will perform these changes
seamlessly within <insert-source-control-product-here>? Will it be
able to handle SSI macros in my source files? What should I tell my
boss if your code breaks something that was previously working?
You missed my point: any amount of work is a burden. For what
gain? Exactly why is lack-of-semicolon-insertion of value to me? Why
should I devote even one second of my time to converting my source?
Foo.prototype.bar = function() { //... } // <-- whoops, no semicolon
For me that qualifies as "unnecessary work," aka "pedantic."
I hope it's not too pedantic of me to point out that "unnecessary
work" means something quite different from "pedantic". Or maybe it is
pedantic, but necessary ;-)
Requiring me to add a semicolon in a location where my intent is
already clear is both unnecessary and pedantic. :P
Whether the semicolon is unnecessary depends on what is on the next
line after your example. For instance:Foo.prototype.bar = function() { //... } (anything)
This will unintentionally apply the function to (anything) and set Foo to the result. So the semicolon is necessary here.
Yes, I know, which is why I cited that specific example. You're
missing my point: it is extremely common for JS hackers to define
functions like above -- as an expression -- without placing a
semicolon after the closing brace. This is very natural, since JS is
graphically similar to C, Java, etc. Requiring semicolons after the
closing brace goes against the grain of years of hacking for most
coders. You're asking for a change in behavior that is both ingrained
and reflexive.
It's not out-of-line to ask for such changes, but I have yet to see
what your rationale is, other than making parsing easier. (Which has
zero benefit to the coder.) What's the benefit to Joe Hacker that's
worth the cost?
Since the above code is valid ECMAScript, the filter can't reject it, but it can heuristically warn about code that is likely to have
omitted semicolons where they are needed.
If such warnings are common, the warnings become noise and are
ignored. And my point is that these warnings will be extremely
common. So you can generate warnings (lots of noise requiring
pedantic changes to silence), or you can update the code auto-insert
semicolons (with all the source-control and general scare-factor
headaches that come with it). I have a hard time seeing either option
receiving much uptake in the community at large.
I don't think it's right to ask "what are the reasons someone
wouldn't migrate to strict mode?" Non-migration is the de-facto "winning" position. The real question is, "why would someone bother to
migrate?"Isn't that obvious? To provide a better chance of detecting errors in their code.
In code that's already working? The signal-to-noise ratio is way
off. Besides, exactly how often do JS coders trip up on semicolon
insertion behavior?
But I suspect that strict mode will be more relevant for new code. If it is used in a large proportion of new code and in
heavily relied-on code (libraries like Prototype, jQuery, etc., and internally within browser implementations), then it will have been successful.
So we can agree that strict mode will receive little uptake in legacy
codebases?
Uptake in new code will only occur if strict mode provides more
benefit than annoyance. And I don't think semicolon insertion is a
problem from the end-coder's perspective. I just don't see lots of
people saying, "gee I wish JavaScript didn't have that crazy semicolon
insertion" or "hey, watch out for you line-endings in JavaScript."
It's a difficult problem wrt parsing, to be sure. But it feels like
the desire to abolish semicolon insertion is rooted more in
simplifying parsers than it is in making hackers' lives easier.
On Dec 10, 2008, at 10:05 PM, David-Sarah Hopwood wrote:
Of course you should do other testing as well, to account for other differences between non-strict and strict mode, but that's not
relevant to semicolon insertion.How's the weather there on planet Utopia?
Please stick to technical arguments.
This is not a technical argument, it's a political one (what's the
common good for the polis). Stop trying to distract from that with
technical digressions.
Your bold claims about a filter being zero-cost and freely available
are not credible. And you haven't produced any evidence of a non-
hypothetical problem with ASI worth solving in strict mode. Yes, there
are gotchas, but they don't seem to bite users -- meanwhile, the web
is full of JS with missing semicolon errors automatically corrected by
ASI.
Truly we could punish such sloppiness and train users who choose
strict mode, if only they'll put up with the training. That's not
credible either. Strict mode could well fail to be used if it is too
pecksniffian or gradgrindian, or whatever the right Dickensian trope
is here (both, probably).
To answer your question about how removing semicolon insertion from strict mode would help to avoid errors, first note that a programmer who habitually added semicolons after every statement would be less likely to make the error above, or similar errors, such as the one I point out in a follow-up to Neil Mix, or the one that was discussed when we were considering '^' as a prefix for lambda expressions.
That's nice, but in practice lack of semicolons are not causing bugs.
Meanwhile, on planet Earth, there's a lot of new and old content
that's missing semicolons, which won't move to strict mode if strict
mode disables ASI.
So, if strict mode normally required semicolons (even if not in the above specific example), and if the strict mode subset were what is normally taught (or self-taught), then the potential for such bugs would be reduced.
The second "if" is a big one.
What is the bug potential? Near zero, as far as I can tell. I welcome
real-world evidence to the contrary.
Here is another example that is discussed in the rationale of the Jacaranda specification:
var a = x + y;
Suppose that a programmer mistypes + as ++, for example:
var a = x ++ y;
I don't welcome synthetic examples.
Again, what problem are you solving? I hate to add it, but I mean
"real-world problem", one you can show biting deployed code on the
web, or tell a few anecdotes about. Not hypothetical problems from a
spec (ES1 had a few of those too).
On 2008-12-11, at 03:21EST, Brendan Eich wrote:
Again, what problem are you solving? I hate to add it, but I mean
"real-world problem", one you can show biting deployed code on the
web, or tell a few anecdotes about. Not hypothetical problems from a
spec (ES1 had a few of those too).
My name is Joe Hacker, and I have a question.
I am planning on buying into es3.1, and I want to know wether your
semicolon insertion plan is going to prevent me from doing that?
Here are 2 bugs that I hit currently, and an anecdote:
Bug 1:
return answerWithSideEffect();
I believe you have fixed that in your campaign proposal. Under the
previous administration's plan, a semi-colon was inserted so this
returned undefined, and never executed the next line, much to my
disappointment.
Bug 2:
foo = function() {...} (...)
Under the previous administration, this did not do what I expected,
because I came from a language where semicolons after close braces
were optional. What is your campaign proposal for this issue?
Anecdote 1:
I have a syntax directed editor which pretty much forces me to insert
semicolons, because it does not have a powerful enough analyzer to
figure out where semicolons would be inserted implicitly, and it
relies on semicolons to do its auto-indenting.
How does your campaign plan to address the issue of syntax directed
editors? Will they be burdened with the tax of having to implement a
full parser (that can gracefully recover from incomplete code
fragments) in order to correctly predict implicit semicolons?
On Dec 11, 2008, at 7:33 AM, P T Withington wrote:
On 2008-12-11, at 03:21EST, Brendan Eich wrote:
Again, what problem are you solving? I hate to add it, but I mean
"real-world problem", one you can show biting deployed code on the
web, or tell a few anecdotes about. Not hypothetical problems from
a spec (ES1 had a few of those too).My name is Joe Hacker, and I have a question.
I am planning on buying into es3.1, and I want to know wether your
semicolon insertion plan is going to prevent me from doing that?Here are 2 bugs that I hit currently, and an anecdote:
Bug 1:
return answerWithSideEffect();
I believe you have fixed that in your campaign proposal. Under the
previous administration's plan, a semi-colon was inserted so this
returned undefined, and never executed the next line, much to my
disappointment.
Glad you mentioned this. The restricted productions for break and
continue to label, and postfix ++ and --, have this potential problem
too, but return\noverlong-expression is the only one that bites. jwz
gave me grief in the ancient days over it. Crock brought it up as the /
casus belli/ behind the desire to remove ASI from 3.1 strict mode when
we were meeting in Oslo this past July.
This is the bath-water to try to throw out, while keeping the ASI baby
in strict mode.
At the Oslo meeting we talked about how the true bug with ASI applied
to return\nexpr was the dead code left in its wake -- the expr; on its
own after the return, unreachable by control flow in the common case.
I proposed analyzing dead code and making it a strict error for return
to orphan an expression statement.
(It's true that sometimes one hacks an early return into code to debug
(binary search a large function for a bug, e.g.), and this would not
be possible in strict mode. The debugger driver who is bisecting would
have to comment out the strict pragma.)
Then at a later meeting (I think the Redmond one in September, which I
missed) it was argued this was too onerous, since ECMA-262 does not
require much compile-time analysis.
There are two general cases (pretend bar() is a really long expression):
- if (foo) { return bar(); } baz();
and
- if (foo) return bar();
The braced case (1) is clear, because the basic block containing bar()
has only one predecessor, which ends in return. This case can be
detected while parsing -- no control flow graph dominator relation
computation required.
The unbraced case (2) cannot be "corrected" to
if (foo)
return bar();
without analyzing harder (say, trying to prove foo is a constant
truthy value) and even then guessing what is meant. And we should not
guess based on any complicated rules or analyses. So the ES1-3 ASI
rule should apply in this case, resulting in
if (foo)
return;
bar();
Is it really too onerous (upon implementors) for strict mode to make
case (1) an error?
Bug 2:
foo = function() {...} (...)
Under the previous administration, this did not do what I expected,
because I came from a language where semicolons after close braces
were optional. What is your campaign proposal for this issue?
I don't propose anything. This is the first non-hypothetical report
I've heard in all the years; sorry it hit you.
While people do leave semicolons off after assignment statements,
including ones with function expressions as the right-hand side
expression, it's rare to start the next statement with (.
The fact is that C and other languages that do not require ; after a
function definition's closing brace do require it after function
declarations (prototypes), and after other declarations and statements.
Anecdote 1:
I have a syntax directed editor which pretty much forces me to
insert semicolons, because it does not have a powerful enough
analyzer to figure out where semicolons would be inserted
implicitly, and it relies on semicolons to do its auto-indenting.How does your campaign plan to address the issue of syntax directed
editors? Will they be burdened with the tax of having to implement
a full parser (that can gracefully recover from incomplete code
fragments) in order to correctly predict implicit semicolons?
Yeah, that is the right trade-off. Joe Hacker benefits, Tucker Super-
Hacker works harder. Sorry, but you are l33t and your skilz are better
remunerated than Joe's.
On 2008-12-11, at 13:26EST, Brendan Eich wrote:
Is it really too onerous (upon implementors) for strict mode to make
case (1) an error?
I think Joe the Hacker would be most happy if there were a mode (it
might be called strict) where the parser would say:
"Joe, you wrote two lines of code here that the previous
administration would have inserted a semicolon between without even
consulting you, but which I know could actually go either way. May I
suggest that you either insert the semicolon yourself, move the
newline, or add some parens, so I don't have to worry about what was
you might have actually meant?"
Well, it doesn't have to use those exact words, but, rather than
having the rule be "you get a semicolon any time one would fit" or
"you get a semicolon any time one is needed", have the rule be "if a
semicolon would fit, but is not needed, ask the user what they really
meant".
P T Withington wrote:
On 2008-12-11, at 13:26EST, Brendan Eich wrote:
Is it really too onerous (upon implementors) for strict mode to make case (1) an error?
I think Joe the Hacker would be most happy if there were a mode (it might be called strict) where the parser would say:
"Joe, you wrote two lines of code here that the previous administration would have inserted a semicolon between without even consulting you, but which I know could actually go either way. May I suggest that you either insert the semicolon yourself, move the newline, or add some parens, so I don't have to worry about what was you might have actually meant?"
That is precisely what the filter I suggested provides. Of course the same check could be implemented as a compiler warning.
Most people on this list consider automatic semicolon insertion something of a mistake. However, I think this feature would be fine if it were not so "eager" and thus causing all sorts of subtle errors and hampering language evolution (e.g. the ongoing lambda discussion). By eager, I mean that there are too many cases where automatic semicolon insertion takes place.
So here's the proposal:
... f (args) ...
currently parses as:
... f(args) ...;
I propose that this instead parses as:
... f; (args) ...;
Likewise,
... a [args] ....
should parse as:
... a; [args] ...;
Note that:
... f( args ) ...
should still parse as:
... f(args) ...;
... a BINARY_OP b
currently parses as:
... a BINARY_OP b ...;
I propose that this instead parses as:
... a; BINARY_OP b ...;
and thus results in a syntax error, unless it's a unary operator.
Note that:
... a BINARY_OP b ...
should still parse as:
... a BINARY_OP b ...;
Since expressions starting with unary operators are usually valid yet erroneous code...
+(a = 10) +func() a * func() [a, b, c]
OTOH, if operator overloading is ever added to the language, this will be trickier, since user-defined operator methods may cause side effects.
... ( ... a BINARY_OP b ... ) ...
should still parse as
... ( ... a; BINARY_OP b ... ) ...;
and result in a syntax error. Ditto for [], and {}. I believe the official Ruby interpreter works this way and I haven't heard many complaints about it.
Since this is all clearly incompatible with existing ES3 code, this requires an opt-in: these new rules should only be followed under strict mode.
This is auxiliary to this proposal, but it needs to be stated: There should be a tool that converts ES3 to ES-Harmony in a similar vein to Python's 2to3 tool, that would ease the migration cost of above and other strict mode changes.