Inner functions and outer 'this' (Re: That hash symbol)
2011/3/26 Claus Reinke <claus.reinke at talk21.com>:
We spent time yesterday at the TC39 meeting not only on shorter syntax but exactly how to support better |this| handling for several distinct use-cases: inner functions that want the outer |this|, callbacks that want a certain |this|, and object methods that want the receiver when called as methods of a given (receiver) object (else possibly a default such as the outer function's |this|).
That reminds me of an old solution looking for this problem.
Back in 1976, Klaus Berkling suggested to complement the lambda calculus with an operator to protect variables from the nearest enclosing binding [1].
The idea is simply that (lexically scoped) variables usually are bound to the next enclosing binding of the same name, while protected (lexically scoped) variables are bound to the next outer enclosing binding of the same name (each protection key skips one level of binding, lexically).
If I may use '#' as a placeholder for a suitable protection key, then this translates to Javascript as
function Outer() { var x = "outer"; function Inner() { var x = "inner";
log(x); // "inner" log(#x); // "outer" log(##x); // global scope, probably unbound } }
To clarify, if I have the code
foo(#x)
and suddenly I realize that I need a to guard it with a condition,
if (bar) { foo(#x) }
this would ?not? change the binding of #x since, although I have introduced a new let scoped block, x is not declared in that scope so it is not counted against the stack of #'s.
It is not always easy to count scopes though without knowing a lot of details. E.g. changing
function f() {}
(function () { function () { return f; } })()
requires two hashes
function f() {}
(function () { function f() { return ##f; } })()
since f was introduced both into the body scope, and into the outer scope via the function declaration.
It seems brittle, and it seems that overloading labels to name scopes would provide a less brittle alternative if this really is desired.
myScopeName: { let x;
function f(x) { g(x + myScopeName.x); } }
The idea is simply that (lexically scoped) variables usually are bound to the next enclosing binding of the same name, while protected (lexically scoped) variables are bound to the next outer enclosing binding of the same name (each protection key skips one level of binding, lexically).
To clarify, if I have the code
foo(#x)
and suddenly I realize that I need a to guard it with a condition,
if (bar) { foo(#x) }
this would ?not? change the binding of #x since, although I have introduced a new let scoped block, x is not declared in that scope so it is not counted against the stack of #'s.
This would not change the binding of #x, because it does not introduce or remove any bindings for x. Both bindings and protections are specific to variable names.
It is not always easy to count scopes though without knowing a lot of details. E.g. changing
function f() {}
(function () { function () { return f; } })()
requires two hashes
function f() {}
(function () { function f() { return ##f; } })()
since f was introduced both into the body scope, and into the outer scope via the function declaration.
The counting is (usually) just by enclosing binders, not by how many constructs or nested scopes they affect. So, I would count one intervening binding here, and one protection # to keep the binding structure as before:
function f() {} // target this binding
(function () { function f() { // skip this binding return #f; } })()
We could have this instead, where there'd be two bindings for f to skip, and two protections to keep the binding structure intact:
function f() {} // target this binding
(function () { var f = function f() { // skip these bindings return ##f; } })()
It seems brittle,
Not usually (did your counting scopes cloud the idea, or am I missing some Javascript-specific problem?) - it has been the basis of soft- and hardware implementations of functional (and functional-logic) languages.
If we can find the binding for an unprotected variable, we can find the binding to skip for a protected variable; one binding and one protection cancel each other; once an unprotected variable has a binding, it is bound.
It is still just the usual lexical scoping in action - the only change is that shadowed bindings can now be reached, by protecting variables from shadowing bindings.
That is all one needs to know about the system at the programmer-level, at least for a language like Javascript, whose operational semantics isn't defined by rewrite rules (*).
Hope this helps, Claus
(*) As a student, I found programming with it very intuitive, even though the language we were given did execute by rewriting (so the protection keys adjusted to dynamically changing program structure, e.g, when a function definition was inlined at a call site);
implementers find it similar to deBruijn-Indices (if an implementation renames all variables to be the same, the resulting protection keys correspond directly to indices for stack access), but those are not usually palatable to programmers; en.wikipedia.org/wiki/De_Bruijn_index
when reasoning about programs, or rewriting programs, it means that names are less of a problem, because we can always preserve the binding structure (no more exceptions like "you can wrap code in a function, provided the code does not mention the function name or parameters").
2011/3/26 Claus Reinke <claus.reinke at talk21.com>:
The idea is simply that (lexically scoped) variables usually are bound to the next enclosing binding of the same name, while protected (lexically scoped) variables are bound to the next outer enclosing binding of the same name (each protection key skips one level of binding, lexically).
To clarify, if I have the code
foo(#x)
and suddenly I realize that I need a to guard it with a condition,
if (bar) { foo(#x) }
this would ?not? change the binding of #x since, although I have introduced a new let scoped block, x is not declared in that scope so it is not counted against the stack of #'s.
This would not change the binding of #x, because it does not introduce or remove any bindings for x. Both bindings and protections are specific to variable names.
It is not always easy to count scopes though without knowing a lot of details. E.g. changing
function f() {}
(function () { function () { return f; } })()
requires two hashes
function f() {}
(function () { function f() { return ##f; } })()
since f was introduced both into the body scope, and into the outer scope via the function declaration.
The counting is (usually) just by enclosing binders, not by how many constructs or nested scopes they affect. So, I would count one intervening binding here, and one protection # to keep the binding structure as before:
function f() {} // target this binding
(function () { function f() { // skip this binding
The line directly above introduces two bindings in two scopes. Did you mean that both of them are skipped, or just one?
return #f; } })()
We could have this instead, where there'd be two bindings for f to skip, and two protections to keep the binding structure intact:
function f() {} // target this binding
(function () { var f = function f() { // skip these bindings return ##f; } })()
It seems brittle,
Not usually (did your counting scopes cloud the idea, or am I missing some Javascript-specific problem?) - it has
I have never used such a feature, and thanks for recounting your experiences below. Counting scopes seems brittle to me because changes to intervening scopes can break code that is distant from them. That's, of course, always the case with introducing a variable that may mask.
2011/3/26 Mike Samuel <mikesamuel at gmail.com>:
2011/3/26 Claus Reinke <claus.reinke at talk21.com>:
The idea is simply that (lexically scoped) variables usually are bound to the next enclosing binding of the same name, while protected (lexically scoped) variables are bound to the next outer enclosing binding of the same name (each protection key skips one level of binding, lexically).
To clarify, if I have the code
foo(#x)
and suddenly I realize that I need a to guard it with a condition,
if (bar) { foo(#x) }
this would ?not? change the binding of #x since, although I have introduced a new let scoped block, x is not declared in that scope so it is not counted against the stack of #'s.
This would not change the binding of #x, because it does not introduce or remove any bindings for x. Both bindings and protections are specific to variable names.
It is not always easy to count scopes though without knowing a lot of details. E.g. changing
function f() {}
(function () { function () { return f; } })()
requires two hashes
function f() {}
(function () { function f() { return ##f; } })()
since f was introduced both into the body scope, and into the outer scope via the function declaration.
The counting is (usually) just by enclosing binders, not by how many constructs or nested scopes they affect. So, I would count one intervening binding here, and one protection # to keep the binding structure as before:
function f() {} // target this binding
(function () { function f() { // skip this binding
The line directly above introduces two bindings in two scopes. Did you mean that both of them are skipped, or just one?
I think I was wrong about a function declaration introducing two bindings.
I thought the function f() {} behaved the same as the var f = function f() { ... } below.
var f = function f() { alert(typeof f); }; alert(typeof f); // alerts function var g = f; f = 3; alert(typeof f); // alerts number // alerts function because the reference to f does not bind to the global f g();
This is an interesting idea, never heard of it before. That said, it seems a better start for brainstorming than as an end of it. The previously-mentioned concerns about numbering being fragile seem real to me. Further, how would this interact with eval introducing (or in some systems even removing) lexical bindings? (Or maybe eval doesn't matter if this only applies in a new language mode with no eval-like construct -- I'm not up
function f() {} // target this binding
(function () { function f() { // skip this binding
The line directly above introduces two bindings in two scopes. Did you mean that both of them are skipped, or just one?
This made me feel as if I was missing something, so I went back to the ES spec: by my reading of '13. Function Definition', a FunctionDeclaration for 'f' introduces a binding for 'f' into the scope in which the declaration occurs (in this case, the body of the immediately executed anonymous function). That is one binding.
If we had a named FunctionExpression 'f' instead, that would introduce a binding for 'f' in 'f's body. That would still be one binding.
Since 'f's body is nested in that of the anonymous function, any binding in scope in the outer function's body is also in scope in 'f's body, as long as it isn't shadowed by a more local binding.
Not usually (did your counting scopes cloud the idea, or am I missing some Javascript-specific problem?) - it has
I have never used such a feature, and thanks for recounting your experiences below.
Ok, sorry. It is easy to forget for me that this idea is new to most programmers. Fortunately, in that case, just playing around with the idea will help. I just want to make sure that this isn't hampered by miscommunications.
Counting scopes seems brittle to me because changes to intervening scopes can break code that is distant from them. That's, of course, always the case with introducing a variable that may mask.
Indeed, the problem of shadowing is always there. When one starts rewriting/refactoring programs, that problem becomes more prominent, because one needs to avoid variable capture. And when one starts reasoning about program rewriting, the problem is so central that is needs a solution.
De Bruijn went with abandoning all names, which only works at implementation-level (and presenting users a named form re-introduces the problem); others went with conventions that work only at proof-level (where pesky, but well-understood details can be abstracted away); most went with forcing renaming of bound variables, but this really messes with the programmer-chosen names.
The latter is the issue Javascript programmers are running into with the fixed special name 'this' (with manual renaming a la 'var that = this', to avoid shadowing).
Berkling's approach was the first that introduced a systematic way of handling binding structure and preventing name capture without interfering with programmer-chosen names. I wouldn't expect Javascript code to use more than '#this' or '##this', but since such expectations never outlive practice, it is good if the system can handle whatever programmers actually do.
Claus
2011/3/26 Jeff Walden <jwalden+es at mit.edu>:
Further, how would this interact with eval introducing (or in some systems even removing) lexical bindings? (Or maybe eval doesn't matter if this only
If, as at esdiscuss/2011-February/012896 , harmony is based on ES5 strict, eval cannot add or remove new lexical bindings.
Further, how would this interact with eval introducing (or in some systems even removing) lexical bindings?
Disclaimer 1: the Berkling-style systems I'm familiar with did not support eval, so I cannot argue from experience here
Disclaimer 2: the reason for this was that unlimited reflection support breaks all equational theories about programs (eg toString can distinguish otherwise semantically equivalent programs, so only identical programs can be considered equal if reflection is taken into account) and equational reasoning was core to reduction languages
Eval already breaks lexical scoping, so there is little hope of it having no interactions with extensions of lexical scoping.
One might limit "eval('code')" not to have any effect on the context's lexical scope chain, but that would indeed limit eval's functionality (the eternal conflict between expressiveness and reasoning). Limiting eval/toString to be unaffected by the context's local scope chain would also be possible, but again, that would mean changing a lot of code.
Since all that an implementation or programmer has to go with in the presence of eval is the lexical scope chain at any point of execution, adding protection is probably not going to make matters worse wrt eval. Similarly, toString can already distinguish between codes that differ only by renaming.
Is that sufficiently vague / reassuring?-)
Nevertheless this 35-year-old idea seems fresh in the context of ECMAScript development, and worth thinking about, so thanks for bringing it up.
Always happy to promote a good idea!-)
Claus
The questions about eval look mostly unproblematic to me. In ES5-strict and Harmony, eval is unable to modify its caller's scope. In legacy mode, I imagine the semantics would be pretty straightforward, if problematic; but eval being able to affect its caller's scope is problematic anyway, so it doesn't really bother me.
The bigger issue is that scoping mechanisms in the de Bruijn tradition are brittle for programming. They make sense as intermediate representations or notations for proof frameworks, because they can (sometimes) be easier to reason about formally, but they're fragile in the face of refactoring.
To be fair, your suggestion is more moderate than de Bruijn, although it's not clear whether you're proposing the ability to refer to shadowed bindings of all variables or just |this|. If it's the former, I'm strongly opposed. If it's the latter, well, I guess I'm still pretty opposed, just maybe less strongly. :)
Seriously, the problem you're trying to solve is that |this| is too fragile in the face of refactoring, but you're solving it with a mechanism that's just as sensitive to refactoring. It does make it syntactically simpler to fix than |var self = this|, but the fix is just as brittle to the next refactoring. And people already know how to use |var self = this|, so this would just introduce one more programming pattern you have to teach people for dealing with |this|-capture, but a less robust pattern than the one they already have.
But more broadly, my problem with this suggestion is that it's too drastic a semantic change for the specific problem it's addressing. I much prefer the space we've been exploring that allows for explicit binding of `this'. It's more robust and less disruptive a change to lexical scoping.
Here, cant we use "^" instead of "#", I dont think it will conflict with existing "bit xor" operator. Like:-
function Outer() { var x = "outer"; function Inner() { var x = "inner";
log(x); // "inner"
log(^x); // "outer"
log(^^x); // global scope, probably unbound
var save_outer_x = ^x;
^^x = "new global value" ;
} }
I am really astonished to hear protection keys being thought of as "brittle" under transformation: that is just the opposite of what they are about!
Executive summary:
- de Bruijn indices are a good assembly language of
binding constructs, suitable for automatic transformation,
but not suitable for human use
- the classical name-based scheme is suitable for human
use, in principle, but is brittle under transformation
(many transformations can not be applied without
renaming); the more important these transformations
become, the less suitable this scheme is, for either
humans or machines (because of excessive renaming)
- Berkling's protection keys add flexibility to name-based
schemes, *removing* their brittleness under transformation;
it combines suitability for automated transformation
with readability for humans (in fact, both de Bruijn's
and the classical name-based scheme are special cases
of Berkling's)
The bigger issue is that scoping mechanisms in the de Bruijn tradition are brittle for programming. They make sense as intermediate representations or notations for proof frameworks, because they can (sometimes) be easier to reason about formally, but they're fragile in the face of refactoring.
Sorry, but you have two directly conflicting statements in one sentence there (unless I misunderstand what you mean be "fragile"?): the point of de Bruijn choosing that representation was to make automatic transformation of scoped terms possible/easy, without breaking proofs. That is the opposite of "fragile" to me.
To be fair, your suggestion is more moderate than de Bruijn, although it's not clear whether you're proposing the ability to refer to shadowed bindings of all variables or just |this|.
Indeed, I wouldn't want to use de Bruijn indices at the programmer level, while Berkling's protection keys enter the frame only when naming alone isn't sufficient.
The problem is wider than 'this', 'this' just seems to be the main use case where the problem surfaces in Javascript practice at the moment.
With scopes and prototypes, Javascript has two separate binding chains, but with prototypes, we can use different prefixes, "(prototype.)*.name", to skip entries in that chain.
Btw, have you ever wondered whether 'var'-bindings are recursive?
function Outer() { var x = "outer", outer_x = x; function Inner() { var x = [x]; // var bindings are not recursive, are they? log(x + ' - ' + [outer_x]); } Inner(); } Outer();
Will this log '["outer"] - ["outer"]', or '[[[[[[[[...', or something else entirely?
I'm not making proposals yet, I'm just bringing some useful ideas to the attention of those who have to come up with proposals, hoping to make their work easier!-)
Seriously, the problem you're trying to solve is that |this| is too fragile in the face of refactoring, but you're solving it with a mechanism that's just as sensitive to refactoring.
It seems I am still not communicating Berkling's ideas successfully (sorry about that):
his reduction systems were based on refactoring problem descriptions (functional programs) to solution descriptions (values of the functional language). His protection keys and their handling were at the core of every scope-related reduction/refactoring rule in those systems. If protection keys were at all "brittle" in the face of refactoring, none of his systems would ever have worked.
Quite contrary to being "brittle", protection keys introduce the necessary flexibility that allows these systems to avoid breaking the binding structure while automatically rewriting the code.
It does make it syntactically simpler to fix than |var self = this|, but the fix is just as brittle to the next refactoring.
Perhaps it helps to clarify the problem: when we want to transform a program in a way that introduces new bindings where they get in between an existing binding and some of its bound variables (shadowing them), we have to make additional changes to compensate (to keep the original binding structure and program meaning intact). If that is what you mean by "brittle", there is just no way around it.
The only question is how to do the compensation. The 'var self = this' route is in the traditional line, renaming bound variables. For non-'this' use cases, its main drawback is that is changes the names chosen by the programmer. If one does that during automated refactorings, the final program is sometimes not recognizable to the programmer who wrote the initial code (the alternative is for the refactoring tool to balk and ask the programmer to resolve the naming conflict before invoking the refactoring again).
For 'this', the issue is the other way round: the name is not the programmer's choice, the binders for 'this' are implicit (so we cannot rename at the binder), and one cannot rename the variable without losing its special functionality (so we end up with two names instead of one: 'self' is lexically scoped, redirects to 'this', which is more dynamically scoped).
The protection route is non-traditional, yes, but it neatly solves the main issues: instead of renaming bound variables, it just protects variables from getting bound erroneously; instead of having to invent fresh variable names, automatic transformation tools just have to do a little arithmetic.
Claus
I am really astonished to hear protection keys being thought of as "brittle" under transformation: that is just the opposite of what they are about!
Sorry to astonish you. :)
Executive summary:
de Bruijn indices are a good assembly language of binding constructs, suitable for automatic transformation, but not suitable for human use
the classical name-based scheme is suitable for human use, in principle, but is brittle under transformation (many transformations can not be applied without renaming); the more important these transformations become, the less suitable this scheme is, for either humans or machines (because of excessive renaming)
Berkling's protection keys add flexibility to name-based schemes, removing their brittleness under transformation; it combines suitability for automated transformation with readability for humans (in fact, both de Bruijn's and the classical name-based scheme are special cases of Berkling's)
Well, you're just asserting that Berkling's keys remove their brittleness without evidence. And I'm afraid I've seen enough evidence in my experience to make me doubt your claim.
Just take a look at what happens to the definition of alpha-conversion in a language with Berkling's keys. Traditionally, when you alpha-convert a binding x, you can stop the induction whenever you hit an inner binding (shadowing) of x. Not so with your proposal: now you have to keep going -- and decrement an accumulator argument to the algorithm (not exactly obvious!) -- to look for "outer" references within the inner binding.
What does this mean in practice? It means that someone refactoring at an outer level may be unaware that they've affected the index of an inner reference by changing a binding. It means you lose some of the locality that you get with traditional lexical scope.
The bigger issue is that scoping mechanisms in the de Bruijn tradition are brittle for programming. They make sense as intermediate representations or notations for proof frameworks, because they can (sometimes) be easier to reason about formally, but they're fragile in the face of refactoring.
Sorry, but you have two directly conflicting statements in one sentence there (unless I misunderstand what you mean be "fragile"?): the point of de Bruijn choosing that representation was to make automatic transformation of scoped terms possible/easy, without breaking proofs. That is the opposite of "fragile" to me.
No contradiction. A machine is much better at consistently shifting big collections of integer indices without making a mistake. As you acknowledged, de Bruijn notation is not fit for humans.
The problem is wider than 'this', 'this' just seems to be the main use case where the problem surfaces in Javascript practice at the moment.
And the reason for this is that |this| is implicitly bound. Which, as I said in my last email, is a good motivation for considering making it possible explicitly bind |this|. IOW, there are other solutions than the one you describe that address the problem. And I believe your solution is worse than the problem.
With scopes and prototypes, Javascript has two separate binding chains, but with prototypes, we can use different prefixes, "(prototype.)*.name", to skip entries in that chain.
Prototype chains and lexical environments are two totally different semantic constructs -- only one is used for scope. They shouldn't be confused.
Btw, have you ever wondered whether 'var'-bindings are recursive?
This is a really basic question about JS, and not appropriate for this list.
Seriously, the problem you're trying to solve is that |this| is too fragile in the face of refactoring, but you're solving it with a mechanism that's just as sensitive to refactoring.
It seems I am still not communicating Berkling's ideas successfully (sorry about that):
No, I understand the semantics you describe. I think you just haven't understood what I'm saying. I'm not talking about automatic rewriting. All of these representations of binding structure (names, de Bruijn indices, the "locally nameless" approach, the Berkling representation you describe) are different ways of representing graphs as trees. For the sake of formal systems, they're all equally expressive. The problem is that some of them are more susceptible to human error than others.
If you spend time working with tree manipulation languages like XPath, you find that being able to refer to "parent" links makes your program sensitive to fine-grained changes in the tree structure, which means that refactoring is more likely to end up making mistakes.
If protection keys were at all "brittle" in the face of refactoring, none of his systems would ever have worked.
Nonsense-- there are all sorts of technologies that work and are brittle. Ask me over a beer some time, and I'll name a few. ;)
Quite contrary to being "brittle", protection keys introduce the necessary flexibility that allows these systems to avoid breaking the binding structure while automatically rewriting the code.
I think the key to the misunderstanding here is the word "automatically." Of course you can automatically refactor just about any representation of variable binding. The question with a given representation is whether it's sensitive to small changes introduced by ordinary human refactoring (i.e., the everyday experience of programming) and therefore more error-prone.
On Mar 27, 2011, at 11:13 AM, David Herman wrote:
To be fair, your suggestion is more moderate than de Bruijn, although it's not clear whether you're proposing the ability to refer to shadowed bindings of all variables or just |this|. If it's the former, I'm strongly opposed. If it's the latter, well, I guess I'm still pretty opposed, just maybe less strongly. :)
Dave, I think that applying this solution to the this-scoping issue may be a really good idea
Claus, thanks from bringing a new (old) idea into the discussion.
I agree with Dave about all the fragility issues he mentioned for the general case. Also, I just don't see the general problem (if it is even a "problem") as one that is important enough to try to "fix" in the language since it is probably rare and it can be avoided by careful naming.
However, the specific case of 'this' is a different matter. 'this' is implicitly rebound in every function scope and the JavaScript programmer has no direct control over the naming and shadowing. The result is that they have to know and use the self renaming pattern.
This issue shows up enough in defining objects via object literal methods and in methods that call higher order functions that it is something that may well be worth addressing in the language. Particularly as we discuss adding additional declarative forms to Harmony that included nested method definitions. The outer 'this' problem is limited enough that we can avoid things like dynamic scoping complications in the solution. Also it is limited enough that I don't believe that the solution imposes refactoring complications.
Here is a sketch of a proposal:
^this is added as a new lexical token of the language. When spoken it is pronounced as "outer this" . In the expression grammar ^this is a primary expression.
It is a syntax error for ^this to appear outside of a function body. It may not occur at the top level of a program.
When evaluated, the value of ^this is the this binding of the function that immediately lexically encloses the function body that contains the ^this. It is a early syntax error if there is no such function. For example:
//at the top level var self = ^this; //syntax error, at the top level function foo() { ^this.bar(); //syntax error, no enclosing function }
The two primary use cases for ^this are exemplified by the following two examples:
MyObj.prototype.addClickHandingForElement(elem) { elem.addEventListener('click', function (e) {^this.handleClick(this,e)}); }
MyObj.prototype.wrap = function () {
// create a wrapper object that limits access to the properties of one of my objects
return {
name: this.id, //fix name of wrapper at creation (uses this of wrap call
get foo () {return ^this.foo}, //outer this is this of wrap call
set bar(val) {^this.bar = val} //outer this is this of wrap call
}
Note that only one level outer this access is supported, ^^this would be a syntax error. In the rare cases where somebody really needs deeper access to shadowed this binding then they can fall back to the self pattern.
I see minimal refactoring hazards here as the outer scope reference is limited to one level, is explicitly marked at the usage site, and only applies to this.
Possible reservations: This use of ^ probably would preclude its use as a short form of the 'return' keyword (for those people who aren't in favor of adding implicit returns). It looks odd to old Smalltalk programmers.
Overall, I really like ^this as a narrow solution to a specific real usage problem. I'm interested in reactions and unless somebody thinks of something that seriously torpedoes it I will probably write it up as a strawman.
On Mon, Mar 28, 2011 at 10:35, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
Overall, I really like ^this as a narrow solution to a specific real usage problem. I'm interested in reactions and unless somebody thinks of something that seriously torpedoes it I will probably write it up as a strawman.
I like ^this.
We (Chromium/V8) discussed introducing 'self' as a a way to get the lexically bound 'this'. The main issue we could think of was that it might be hard for users to know when to use '^this' vs when to use 'this'.
On Mon, Mar 28, 2011 at 10:16 PM, Erik Arvidsson <erik.arvidsson at gmail.com> wrote:
On Mon, Mar 28, 2011 at 10:35, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
Overall, I really like ^this as a narrow solution to a specific real usage problem. I'm interested in reactions and unless somebody thinks of something that seriously torpedoes it I will probably write it up as a strawman.
I like ^this.
We (Chromium/V8) discussed introducing 'self' as a a way to get the lexically bound 'this'. The main issue we could think of was that it might be hard for users to know when to use '^this' vs when to use 'this'.
There seem to be a bunch of different but related suggestions here, some of which seem more useful than others:
- |^this| as a new special token that always refers to the lexical meaning of |this|.
- |^this| as a new special way to get the version of |this| bound in the next outer scope. I believe that this is sometimes different from 1 (maybe just at the top level).
- ^ as a general way to move out a scope, but restricted to |this|
- ^ as a general way to refer up the scope chain, for any identifier including |this|.
Personally, I think that a way to name the implicit binding of the receiver would be better than adding more hardcoded names to the standard. The |^this| proposals seem problematically implicit, especially since we had quite reasonable suggestions (at the meeting at least) for naming |this| explicitly instead.
On Mar 28, 2011, at 8:30 PM, Sam Tobin-Hochstadt wrote:
On Mon, Mar 28, 2011 at 10:16 PM, Erik Arvidsson <erik.arvidsson at gmail.com> wrote:
On Mon, Mar 28, 2011 at 10:35, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
Overall, I really like ^this as a narrow solution to a specific real usage problem. I'm interested in reactions and unless somebody thinks of something that seriously torpedoes it I will probably write it up as a strawman.
I like ^this.
We (Chromium/V8) discussed introducing 'self' as a a way to get the lexically bound 'this'. The main issue we could think of was that it might be hard for users to know when to use '^this' vs when to use 'this'.
There seem to be a bunch of different but related suggestions here, some of which seem more useful than others:
- |^this| as a new special token that always refers to the lexical meaning of |this|.
not in my proposal
- |^this| as a new special way to get the version of |this| bound in the next outer scope. I believe that this is sometimes different from 1 (maybe just at the top level).
almost, but not quite what I am proposing. ^this refers to the this binding of the function that lexically encloses the containing function. this remains a reserved identifier and is not rebindable by block level declarations. ^this is illegal at the top level or directly in the direct body of a top level function because in nether case is there a useful outer this value to reference.
- ^ as a general way to move out a scope, but restricted to |this|
not in my proposal
- ^ as a general way to refer up the scope chain, for any identifier including |this|.
not in my proposal.
(I think I'm the only one to use the syntax ^this in a proposal so I'm not sure where 1,3,4 (at least using ^this syntax) came from.
Personally, I think that a way to name the implicit binding of the receiver would be better than adding more hardcoded names to the standard. The |^this| proposals seem problematically implicit, especially since we had quite reasonable suggestions (at the meeting at least) for naming |this| explicitly instead.
The reason I really like ^this is it co-exists very nicely with the existing fixed implicit this binding. It address the primary scoping issue that arises from that implicit binding. Unlike the explicit this naming forms that have been discussed it would work in all function definition forms without adding any new header syntax to any of the function definition forms.
On Tue, Mar 29, 2011 at 4:16 AM, Erik Arvidsson <erik.arvidsson at gmail.com>wrote:
We (Chromium/V8) discussed introducing 'self' as a a way to get the lexically bound 'this'. The main issue we could think of was that it might be hard for users to know when to use '^this' vs when to use 'this'.
self
will almost definitely break code and coding habits, whereas ^this
might but is very unlikely to.
On Mar 28, 2011, at 10:35 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Mar 27, 2011, at 11:13 AM, David Herman wrote:
To be fair, your suggestion is more moderate than de Bruijn, although it's not clear whether you're proposing the ability to refer to shadowed bindings of all variables or just |this|. If it's the former, I'm strongly opposed. If it's the latter, well, I guess I'm still pretty opposed, just maybe less strongly. :)
Dave, I think that applying this solution to the this-scoping issue may be a really good idea
Claus, thanks from bringing a new (old) idea into the discussion.
I agree with Dave about all the fragility issues he mentioned for the general case. Also, I just don't see the general problem (if it is even a "problem") as one that is important enough to try to "fix" in the language since it is probably rare and it can be avoided by careful naming.
However, the specific case of 'this' is a different matter. 'this' is implicitly rebound in every function scope and the JavaScript programmer has no direct control over the naming and shadowing. The result is that they have to know and use the self renaming pattern.
This issue shows up enough in defining objects via object literal methods and in methods that call higher order functions that it is something that may well be worth addressing in the language. Particularly as we discuss adding additional declarative forms to Harmony that included nested method definitions. The outer 'this' problem is limited enough that we can avoid things like dynamic scoping complications in the solution. Also it is limited enough that I don't believe that the solution imposes refactoring complications.
Here is a sketch of a proposal:
^this is added as a new lexical token of the language. When spoken it is pronounced as "outer this" . In the expression grammar ^this is a primary expression.
It is a syntax error for ^this to appear outside of a function body. It may not occur at the top level of a program.
When evaluated, the value of ^this is the this binding of the function that immediately lexically encloses the function body that contains the ^this. It is a early syntax error if there is no such function. For example:
//at the top level var self = ^this; //syntax error, at the top level function foo() { ^this.bar(); //syntax error, no enclosing function }
The two primary use cases for ^this are exemplified by the following two examples:
MyObj.prototype.addClickHandingForElement(elem) { elem.addEventListener('click', function (e) {^this.handleClick(this,e)}); }
This does provide an elegant solution but perhaps for a somewhat uncommon problem. Today I think most would do the following (if not using self): MyObj.prototype.addClickHandingForElement(elem) { elem.addEventListener('click', this.handleClick.bind(this)) } Where the event.target would be used in the context of handleclick. I've noticed that coffescript uses this=> to autobind but not disambiguate. One possible alternative would be to have a keyword modifier similar to your suggested method called "callback" which autobinds this. From a declarative viewpoint it would be clear what methods within the type were intended to be used in this way. Although it would not handle the inner outer this use case. Your second example below I would of done as:
MyObj.prototype.wrap = (function (wrapper) {
// create a wrapper object that limits access to the properties of one of my objects
return {
name: this.id, //fix name of wrapper at creation (uses this of wrap call
get foo () {return wrapper.foo}, //outer this is this of wrap call
set bar(val) {wrapper.bar = val} //outer this is this of wrap call
})(this);
even if ^this were available since I would find it clearer. At least in my experience I do not run into many scenarios where both this and ^this are accessed in the same context, but I do use/find many scenarios where callbacks should be bound especially when custom events are used liberally or within node. In those cases where I need both I tend to curry the inner this and provide it as a parameter to the outer callback. Somewhat contrived for the first example since elem is available in the event but it would look like elem.addEventListener('click', this.handleClick.bind(this).curry(elem)) assuming curry was available on the Function prototype.
On Tue, Mar 29, 2011 at 12:12 AM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
(I think I'm the only one to use the syntax ^this in a proposal so I'm not sure where 1,3,4 (at least using ^this syntax) came from.
Well, I wasn't sure if you meant 1 or 2, and other people had suggested 3 and 4 (but I think using # instead of ^).
Personally, I think that a way to name the implicit binding of the receiver would be better than adding more hardcoded names to the standard. The |^this| proposals seem problematically implicit, especially since we had quite reasonable suggestions (at the meeting at least) for naming |this| explicitly instead.
The reason I really like ^this is it co-exists very nicely with the existing fixed implicit this binding. It address the primary scoping issue that arises from that implicit binding. Unlike the explicit this naming forms that have been discussed it would work in all function definition forms without adding any new header syntax to any of the function definition forms.
I agree entirely that it goes with the existing fixed implicit |this| binding -- I just think that cuts the other way. The reason we're having this discussion is that the existing behavior of |this| isn't always what you want, and is hard to get around because of its fixed and implicit nature. I think we should alleviate that problem, not just the worst symptom.
On 2011-03-29, at 08:52, Sam Tobin-Hochstadt wrote:
I agree entirely that it goes with the existing fixed implicit |this| binding -- I just think that cuts the other way. The reason we're having this discussion is that the existing behavior of |this| isn't always what you want, and is hard to get around because of its fixed and implicit nature. I think we should alleviate that problem, not just the worst symptom.
Way back in
I raised the this
problem: When you write a function you can choose the names of all your parameters (for maximum legibility of your code) except the implicit one, where you are forced to accept the name this
. If you could specify a different name, specifying which implicit binding you meant in the presence of multiples would be simplified.
I won't propose a syntax for specifying an alternative name for this
, for fear of being taken out to the (bike)shed and getting caned, but I do think it worth considering: why must that implicit parameter have a fixed name?
This is what Sam is referring to -- we've been talking about exactly such a feature. I continue to believe that something like the ^this feature we've been talking about is as likely to introduce bugs as it is to fix bugs. It's like special language support for off-by-one errors.
Dave
PS A propos of nothing, the ^this syntax probably doesn't work because of ASI; try parsing:
x = y
^this.foo()
On Mar 29, 2011, at 7:26 AM, David Herman wrote:
This is what Sam is referring to -- we've been talking about exactly such a feature.
Sorry if that wasn't clear: at the last face-to-face we talked about allowing you to give your own custom name for the |this|-parameter, so that you could continue to refer to outer-bound |this|. This would use good ol' lexical scope, instead of exotic scope-crawling indices.
On Mar 29, 2011, at 7:26 AM, David Herman wrote:
PS A propos of nothing, the ^this syntax probably doesn't work because of ASI; try parsing:
x = y ^this.foo()
I specified that "^this" was a lexical token so I ASI should work fine. But you would have to say y ^ /'whitespace/ this.foo() if you wanted xor which is a breaking syntax change.
I suppose we could use the universal # solution and make it #^this although that is less visually attractive.
I am really astonished to hear protection keys being thought of as "brittle" under transformation: that is just the opposite of what they are about!
Sorry to astonish you. :)
No problem!-) I was just afraid of the idea being misunderstood or being associated with fear-inducing phrasing, either of which can easily derail a discussion in which good ideas can otherwise speak for themselves.
By the time we have agreed to what extent the concerns remain valid or can be alleviated, the rest of the community may well have stampeded away in panic! Not that this group necessarily works that way, of course, just general human nature:-)
- Berkling's protection keys add flexibility to name-based schemes, removing their brittleness under transformation; it combines suitability for automated transformation with readability for humans (in fact, both de Bruijn's and the classical name-based scheme are special cases of Berkling's)
Well, you're just asserting that Berkling's keys remove their brittleness without evidence.
Ah, ok - I have been trying to summarize the idea on its own, using evidence-by-logic rather than by-reference or by-example, to avoid going through the history of Berkling style reduction systems on this list (it has been a while, but I would be happy to explain a bit if that would be of interest/appropriate here[*]).
From your other comments I infer that you understand how
the scheme works but also that you are still lumping together the various related schemes that have been used in this context. The latter might keep you from seeing what is different about Berkling's scheme, and hence how the brittleness is removed, by straightforward technical means (no magic or subjectivity).
Since this seems to be relevant information for a language- specific design/spec discussion list talking about bindings, I'll try to outline the process (even as a rough proof outline, it beats more examples, as far as evidence is concerned). For those of you not interested in reduction: the same issues arise in refactoring. From memory (please excuse any errors):
Church's original lambda-K calculus was meant to pin
down the scoping issues that mathematicians and logicians
encountered; it had partially defined conversion rules (in
case of naming conflicts, the rules were simply not
applicable), so one had to choose a suitable sequence of
renamings (alpha rule) and conversion (beta rule) to
ensure that such conflicts and rule failures were avoided.
That is brittle in the strictest sense.
Later presentations of lambda-calculus mixed a bit of
renaming into conversion (either in beta reduction or in
substitution). That leads to uncontrolled renamings of
bound variables, to bypass the conflicts. That is not nice,
and it is brittle wrt programmer-chosen names.
Berkling's version just acknowledges that lambda calculus
is missing some of the terms that would allow to represent
the binding structures that can arise during conversion.
The consequence of having too few terms to represent
all intermediate results of reduction is that one either has
to avoid the conversion sequences that lead into those
holes (leading to partially defined reduction rules), or
one has to put the available terms to double use
(leading to conflicts between programmer intent and
implementation needs, as far as names are concerned).
Adding the missing terms gives the manoeuvring room
to define the conversion rules fully, without gaps and
without having to resort to renaming. So brittleness
under transformation, in both senses, is avoided.
De Bruijn's scheme doesn't help there: it has even fewer terms than Church's (only a single term representing all alpha-equivalent ones). Similar shortcomings seemed to apply to most other related schemes I've seen - most of them work technically but fail pragmatically.
Berkling's extra terms do not add error-prone complexity in terms of binding structure, because other terms with equivalent binding structure already exist in Church's version. The extra terms are just designed in such a way that both binding structure and programmer-chosen names can be preserved during program transformations.
And I'm afraid I've seen enough evidence in my experience to make me doubt your claim.
Does the above help? We have to be careful not to confuse "the new thing is different" with "the new thing is brittle". In this case, "the new system has more complex rules" can actually mean "the new system is easier to use in practice", because the new rules have fewer exceptions.
Just take a look at what happens to the definition of alpha- conversion in a language with Berkling's keys. Traditionally, when you alpha-convert a binding x, you can stop the induction whenever you hit an inner binding (shadowing) of x. Not so with your proposal: now you have to keep going -- and decrement an accumulator argument to the algorithm (not exactly obvious!) -- to look for "outer" references within the inner binding.
Indeed. That is a direct consequence of adding the "missing" terms (terms in which parts of the binding structure are expressed through protection keys) - the old conversion rules arise out of the new ones as (premature) optimizations for the case that no protection keys are present at any stage.
But that the rules have to do more work to cover the additional terms does not make the system "brittle" in any way - and it is quite unambiguous what needs to be done.
Also, the shortcuts can be recovered as compile-time optimizations: if your program starts out without protection keys, the old rules still apply; and the new rules only introduce protection keys if the old rules would have failed. So, if you want to stay in the protection-free subset of the language, you can do so. You do not have to pay if you don't use the additional flexibility.
In fact, even the "extra work" is misleading: we could represent the same binding structure by choosing different names, and then the corresponding alpha-conversion/renaming would have to do the same amount of work.
Example: if we enter a new function body, the outer 'this' is not accessible directly, so any renaming of 'this' (if it were feasible) could stop there; but since programmers have adapted to that by renaming 'this' to 'that' or 'self' or .., we wouldn't be renaming 'this', we would be renaming 'that' or 'self' or .., and that renaming would have to 'look for "outer" references within the inner binding'.
It needs a shift of perspective to see the old system as an incomplete realization of the lambda-calculus ideas, but once that is done, it seems clear why and how the old system is brittle (missing terms, partially defined rules) and how the new system is more complete/consistent.
I tend to mention de Bruijn not just because his scheme predates Berkling's but because it helps people to understand how protection keys work. However, since you already understand that, I recommend that you think about the differences to de Bruijn, rather than the similarities; in particular, that shift-of-perspective is a rewarding exercise!
Since it has been so long, allow me to refer to a younger self:-) Section "2.2 lambda-calculi" in [1]. That was just my own way of formalizing Berkling's system as part of the introductory background for my own work.
As such, it lacks formal proofs and details, but Figures 2.2-2.5 spell out what you describe, and Theorem 2.1 outlines how Church's calculus emerges if one omits terms involving protection keys. As an introduction, it is way too brief to touch on the wider world of Berkling's ideas[*], but it has the advantage of being readily available online.
What does this mean in practice? It means that someone refactoring at an outer level may be unaware that they've affected the index of an inner reference by changing a binding.
Then that is not a refactoring, is it?-) I see what you mean, but you already have that problem if you add or remove a binding now. As you said yourself: the binding structure is just represented differently.
It means you lose some of the locality that you get with traditional lexical scope.
<off topic>
It would actually be possible to recover even the bit of locality that you refer to, but that would need a more drastic language change (the non-locality comes because the protection keys are attached directly to variable occurences, so adjusting protection is non-local in the same sense as traditional substitution is; the remedies for both non-localities work along similar lines: propagate the information step-by-step instead of all-at-once, move representations from meta- language to object-language).
I doubt that this generalization would ever fly with pragmatic programmers, but for language designers and implementers it is interesting: if we make the protection key manipulation explicit in the language (rather than the meta-language, as most formalizations do), we can perform compile-time optimizations on them (similar to compile-time garbage collection or type-checking).
I seem to recall that I once tried to derive the more efficient supercombinator reduction from the more general beta reduction rules (if we make the protection key manipulations explicit instead of pushing them right down to variable occurrences, some of them cancel out, and never need to be pushed into the expression). </off topic>
A machine is much better at consistently shifting big collections of integer indices without making a mistake.
Yes, also at managing scopes during refactoring. I prefer tool support for refactoring (I was part of the team that built such for Haskell98[2]), but given Javascript's nature, it would be hard to prove any refactorings correct. So one will have to resort to a combination of tool support and testing - still preferable to manual refactoring (which most of us are doing so far).
And refactoring tools can profit from having a programming language in which program transformations can be expressed without having to force renamings.
The problem is wider than 'this', 'this' just seems to be the main use case where the problem surfaces in Javascript practice at the moment.
And the reason for this is that |this| is implicitly bound. Which, as I said in my last email, is a good motivation for considering making it possible explicitly bind |this|.
I'm not against such attempts: thinking of 'this' as implicitly bound is already much nicer than thinking of it as dynamically bound, and being able to make the implicit binding explicit (so that we can choose another name than 'this') would be a nice option to have, even if protection keys had any chance.
However, any changes in that area would also touch the core of what Javascript is about. There is no free lunch: if the core language needs to be changed, that is going to have wide-ranging effects. That does not mean that such a change is not worth the cost.
Btw, have you ever wondered whether 'var'-bindings are recursive?
This is a really basic question about JS, and not appropriate for this list.
Ah, perhaps I was too implicit. Let me make the design issue clearer, and how it relates to our discussion:
function Outer() { var prefix = "Outer>"; function Inner () { var prefix = prefix+"Inner>"; // this does not work; // neither the outer nor the inner definition are // accessible on the rhs, only the uninitialized // inner binding } }
I know definition constructs in which the left-hand side is available on the right-hand side (recursive value definitions in other languages, function definitions in Javascript), and I know definition constructs in which it isn't, but any previous value is (assignment in Javascript, non-recursive definitions in other languages); in all these constructs, either the current or the previous definition is available on the right-hand side;
Javascript's 'var' is the only construct I know (apart from memory-model bugs in Java, perhaps) in which the left-hand side name is forced to be undefined on the right-hand side; which means that neither the current nor the previous definition are available on the right-hand side, because the undefined state shadows both!
One could argue that the interim state (bound, but not initialized) should not be exposed in an initializer, and I would agree, but with protection keys, at least, this should work:
var prefix = ^prefix+"Inner>"; // "Outer>Inner>"
I hope that explains why I raised this issue on this list?
.. I'm not talking about automatic rewriting. All of these representations of binding structure (names, de Bruijn indices, the "locally nameless" approach, the Berkling representation you describe) are different ways of representing graphs as trees. For the sake of formal systems, they're all equally expressive. The problem is that some of them are more susceptible to human error than others.
While that is true, I did not find Berkling's system any more error-prone than Church's in practice. As a student, I implemented a small functional language - parser, compiler, runtime system, based on the G-machine papers, in a Berkling-style reduction language, and so did the other students in that course - protection keys were not a problem (that was a semester-long project, following on from another semester-long introduction to functional programming, so we had ample time and opportunity to play with the system).
I only have small-scale experience with non-Berkling lambda calculi implementations, but I always found the automatic renamings somewhere between confusing and annoying (both as a user and as an implementer).
During our project implementing the Haskell Refactorer HaRe[2], I often found the lack of protection keys in Haskell limiting for our refactoring design choices - either we had to initiate renamings that the programmer had not asked for, to make the intended refactoring go through, or we had to abandon the refactoring with an error message indicating the naming conflict, asking the programmer to do a rename first, then try again..
If protection keys were at all "brittle" in the face of refactoring, none of his systems would ever have worked.
Nonsense-- there are all sorts of technologies that work and are brittle. Ask me over a beer some time, and I'll name a few. ;)
I assume some of them have been tested on students or similarly suitable and unsuspecting subjects?-)
Quite contrary to being "brittle", protection keys introduce the necessary flexibility that allows these systems to avoid breaking the binding structure while automatically rewriting the code.
I think the key to the misunderstanding here is the word "automatically." Of course you can automatically refactor just about any representation of variable binding. The question with a given representation is whether it's sensitive to small changes introduced by ordinary human refactoring (i.e., the everyday experience of programming) and therefore more error-prone.
"automatic rewriting" is something humans can do, too: complex tasks can be made less error-prone by having a consistent algorithm to follow, instead of one consisting of simple rules with lots of exceptions.
For reference, and just as a subjective impression: students would get protection key manipulation wrong, but they would also get name capture wrong - the difference seemed to be that explaining the latter took longer than explaining the former. I would be tempted to generalize that key manipulation errors tended to be problems of execution, while name capture errors tended to be problems of understanding, but I have no data to back such intuition.
But the real answer to your question is that protection keys do not introduce new problems here: in closed programs, the same binding structures can be expressed without protection keys, so all the small changes and sensitivities already arise without them.
Of course, protection keys would give Javascript obfuscators a new toy to play with, but they have ample toys already. So, I am only referring to normal use in absence of reflection.
Claus
[1] community.haskell.org/~claus/publications/phd.html [2] www.cs.kent.ac.uk/projects/refactor-fp ( www.youtube.com/watch?v=4I7VZV7elnY )
[*] Only for those of you who wanted more references or evidence of use:
I learned about Berkling's ideas mostly in Werner Kluge's
research group (only met Berkling when he visited Kluge).
Kluge tended to be more interested in how language
design influenced paradigm-specific implementations
(both abstract and real machines), but he is still the best
source of references and explanations about Berkling's
work. Kluge retired long ago, but has kept some of his
old pages around:
http://www.informatik.uni-kiel.de/inf/Kluge/
http://www.informatik.uni-kiel.de/~wk/papers_func.html
For a very brief history and some references, see the
related work section in Kluge's
"Abstract Lambda Calculus Machines"
2nd Central European Functional Programming School
CEFP 2007, LNCS 5161 (2008), pp. 112 -- 157
(available online from those urls). That last section is an
extract from one of Kluge's textbooks on the subject.
The PI-RED systems were the implementations of the
Kiel Reduction Language, based on Berkling's reduction
languages, so for reports on usage and implementation
of a Berkling-style language, you might find the online
versions of
"A User's Guide for the Reduction System PI-RED"
Kluge, W., Internal Report 9409, Dept. of Computer
Science, University of Kiel, Germany, 1994
"Using PI-RED as a teaching tool for functional
programming and program execution"
Kluge, W., Rathsack, C., Scholz, S.-B., Proc. Int. Symp.
on Functional Programming Languages in Education,
Nijmegen, The Netherlands, 1996, LNCS 1022, (1996),
pp. 231-250
"PI-RED+ An interactive compiling graph reducer
for an applied lambda-calculus"
Gaertner, D., Kluge, W. E., Journal of Functional
Programming Vol. 6, No. 5, (1996), pp. 723-756
relevant (availabe online via second url above).
On Mar 29, 2011, at 7:28 AM, David Herman wrote:
On Mar 29, 2011, at 7:26 AM, David Herman wrote:
This is what Sam is referring to -- we've been talking about exactly such a feature.
Sorry if that wasn't clear: at the last face-to-face we talked about allowing you to give your own custom name for the |this|-parameter, so that you could continue to refer to outer-bound |this|. This would use good ol' lexical scope, instead of exotic scope-crawling indices.
Yes, and there was considerable debate at the meeting on the desirability of the |this| solution (see Waldemar's notes) . ^this is an alternative proposal.
We probably need to directly compare the two and some of the objections to each.
Overview of proposals
this| - mandatory explicit naming of implicit this parameter
The original discussion in January added explicit naming and lexical scoping of the implicit "this" parameter to # function declarations. In January we didn't settle on a syntax but in March we seem to be converging on something like: #foo(self | arg1, arg2) {...} in the body of the above function, "self" would be bound to the implicit this parameter of the function and "this" would resolve to any lexically enclosing declarations of "this" or would be undefined if there were none.
It isn't clear whether or not arbitrary declarations of "this" would be allowed within such functions or whether "this" could only be bound in a "this |" context.
I believe that the intent of the proposal was that if you actually want to use "this" to refer to the implicit parameter you would have to explicitly declare it as: #foo(this | arg1, arg2) {...} and that a declaration such as #foo( arg1, arg2) {...} would not have a local binding for "this" and its implicit parameter would be inaccessible.
In the original discussion the "this|" declaration was only applied to #-functions. All other function declaration forms would continue to reserve the meaning of "this" as a reference to the implicit parameter of the immediately enclosing function.
However, using the "this|" syntax it would probably be possible to extend the already existing function declaration forms: function foo(self | arg1) {}; ({get prop(self |) {} }) new Function("self | arg1", "return 42'); etc. However for backwards compatibility, if the "|" was not present in these legacy forms they would have to fall back to the legacy implicit this binding.
^this - explicit one-level outer this reference
In this proposal "^this" is a lexical token that may occurs as a PrimaryExpession. It is interpreted as a reference to the implicit this parameter of the outer function that immediately encloses the function body containing the "^this" token. In other words, it access the implicit this binding that is lexically one level further out than the current implicitly binding of "this".
It is a early error to use "^this" in any context where there isn't an outer function that encloses the the "^this" reference.
"this" remains a reserved word in all function declaration forms that refers to the implicit this parameter. It can not be used as the identifier in any declaration.
Pros and Cons for Each Proposal
this| Pros
It makes the implicit this parameter explicit. Normal lexical scoping rules apply to "this". It is similar to Python and perhaps other functional OO languages. It allows arbitrary deep access to the implicit this parameter of outer functions. It takes a lexically scoped functional view rather than a OO view of methods.
this| Cons It requires a new function declaration form. The meaning of "this" will be different in functions declared using various forms. Most methods only need the inner implicit this, yet they are forced to choose between: function() {} #(this|){} //net saving only 4 chars. Self-calls are no longer lexically explicit, you have to look at the function header to recognize one Refactoring hazard when moving statements containing self-calls between methods. It is a solution that is more general than is needed to address the primary use cases at issue. It remove "the this object" from the vocabulary for talking about methods. What is the new terminology for the implicit method parameters? It takes a lexically scoped functional view rather than a OO view of methods.
^this Pros It doesn't change anything about existing conventions for the implicit "this" parameter or "this" references. It works equally well with both new and legacy function declaration forms It address the primary use cases for outer this access without adding seldom needed generality Consistency: "this" always means "this"; self-calls are always lexically explicit It takes a classic OO view of methods .
^this Cons Overloads "^" character; maintaining backwards compat with expressions like foo^this.bits would require parsing hackery. May need to use "#this", "@this", or "#^this" instead. It doesn't handle more than one level of function nesting. It isn't like Python It preserves the common OO language convention of implicitly naming the "receiver" parameter It takes a classic OO view of methods.
I'll leave it to reader to weigh the above pros and cons. But I do have a closing statement:
There is a decades long disagreement among designers/users of function and object-oriented languages. OO proponents think there is something special about the "receiver" of a method call and that "self-calls" have special significance. This perspective pervasively colors OO software designs. Functional proponents (and while I'm happy to represent OO people, I'm reluctant to put specific words into the broad group of functional people) seem to view objects/methods as simply one of many abstractions than can be constructed out of higher-order functions. They see little that is "special" about OO conventions and don't generally apply OO software design techniques.
JavaScript up to this point seems to have done a pretty good job of balancing the OO and functional perspective within the language. However, I think removing the specialness of "this" and the implicit this parameter may be a step too far that breaks that balance.
On 2011-03-29, at 14:19, Allen Wirfs-Brock wrote:
I'll leave it to reader to weigh the above pros and cons. But I do have a closing statement:
There is a decades long disagreement among designers/users of function and object-oriented languages. OO proponents think there is something special about the "receiver" of a method call and that "self-calls" have special significance.
If I had a vote, it would be for a way to explicitly name the -1th
argument to a function. And I would wish for it to be available in all function forms, defaulting to using the legacy name this
, if not otherwise specified. I believe it not only addresses the issue in this thread, but leaves the way open for generic functions.
[As a user, I infer I fall into your "functional proponent" camp, but I claim to also be an o-o proponent. I just find it much easier to think in generic functions and consider the "distinguished receiver" of message passing as being a degenerate case of that, which has a layer of syntactic sugar to let you express foo(a, b, c) as a.foo(b, c), if you like to think the other way.]
On Mar 29, 2011, at 11:19 AM, Allen Wirfs-Brock wrote:
The original discussion in January added explicit naming and lexical scoping of the implicit "this" parameter to # function declarations. In January we didn't settle on a syntax but in March we seem to be converging on something like: #foo(self | arg1, arg2) {...}
That's what I wrote on the white board last week at the meeting, yes. It's not set in stone but I'm warmest to it (which is why I threw it up! ;-). Alternatives include
#foo(self . arg1, arg2) {...}
where . is a play on how |this| comes from the base object to the left of dot in a property reference expression forming the callee in the call expression, but dot is higher precedence as an operator than comma, and visually light, so this seems less good.
The only "lower precedence" punctuator/operator that comes to mind is semicolon:
#fooo(self; arg1, arg2) {...}
but that seems like asking for trouble given ; being otherwise only a statement terminator, modulo ASI. Consider someone using src.split(';') on existing well-formed (perhaps from Function.prototype.toString()) source in a string named src.
in the body of the above function, "self" would be bound to the implicit this parameter of the function and "this" would resolve to any lexically enclosing declarations of "this" or would be undefined if there were none.
That was the key point of the discussion: lexical |this|, treating it as an identifier, to capture |this| from a closure. Exactly what people do today via
function f() { var self = this; return function (){... self... }; }
It isn't clear whether or not arbitrary declarations of "this" would be allowed within such functions or whether "this" could only be bound in a "this |" context.
We did not discuss allowing |this| to be bound otherwise, except
#foo(this = this| arg1, arg2) {...}
which did come up. One could use |self| on the left of = there, but the idea was to support
harmony:parameter_default_values
even for the optional receiver parameter. The default parameter would be used if the function were called via an unqualified name, e.g., f() instead of o.p().
Then Waldemar asked how one omits the receiver when calling via .apply or .call. Obviously passing undefined is defined in ES5 strict as propagating that value, not selecting a default parameter. This question could be answered plausibly, I speculate (not gonna try here).
I believe that the intent of the proposal was that if you actually want to use "this" to refer to the implicit parameter you would have to explicitly declare it as: #foo(this | arg1, arg2) {...} and that a declaration such as #foo( arg1, arg2) {...} would not have a local binding for "this" and its implicit parameter would be inaccessible.
... and therefore the outer |this| binding would be inherited, as is done for a lexically scoped closure upvar.
this| Cons It requires a new function declaration form. The meaning of "this" will be different in functions declared using various forms. Most methods only need the inner implicit this, yet they are forced to choose between: function() {} #(this|){} //net saving only 4 chars.
I wouldn't require the | there, for a small savings. The |this| reserved word as first and only parameter name, in this case only, is enough.
Self-calls are no longer lexically explicit, you have to look at the function header to recognize one
This is true today anyway (even ignoring host objects), for functions bound to a particular this or self using var self=this or Function.prototype.bind.
Refactoring hazard when moving statements containing self-calls between methods.
Same counter-argument applies here.
It is a solution that is more general than is needed to address the primary use cases at issue.
I don't think we agree on the primary use case. If the primary use case is lexical |this| inherited by inner (sharp-function) from outer (sharp- or old-style) function, then #(arg1, arg2) {... this ...} suffices.
If the goal is to support OO-and-only-OO methods where |this| always has a sane binding, even when the function is not called via o.p() but rather via f() or f.apply(...), then the generality of not only an explicit receiver formal parameter, but a default value for the parameter (which could be lexically-outer |this|) is as far as I can tell necessary.
It remove "the this object" from the vocabulary for talking about methods. What is the new terminology for the implicit method parameters?
This is a con?! :-P
"The |this| object" (with unpronounceable | brackets) is completely confusing and a source of endless frustration. Anyway, see my first counter-"con" above: var self=this and bind already make it uncertain that you always get the receiver, even if you contrive all calls to be of the form o.p().
It may be o is the self-same bound receiver and therefore it doesn't matter, but that's not the crucial case. Where o might be the wrong object (or via f.apply(o)), then the lack of ability to override a lexical or otherwise-bound |this| exists already.
It takes a lexically scoped functional view rather than a OO view of methods.
I say this is a "pro" in the face of no strong OO method support in JS today. If a class proposal wins approval and does strengthen the OO view of methods, well, that proposal can adjust or interact with this one accordingly.
^this Pros It doesn't change anything about existing conventions for the implicit "this" parameter or "this" references. It works equally well with both new and legacy function declaration forms It address the primary use cases for outer this access without adding seldom needed generality
No, I don't think it does. Some of the use-cases want |this| not |^this|, i.e., the syntax matters. The refactoring hazard (I don't see you mention it below) bites.
The other use-case, what Alex Russell calls soft-bind, also wants |this| not |^this|. It simply wants a parameter default value instead of a hard binding to a given receiver object.
Consistency: "this" always means "this"; self-calls are always lexically explicit It takes a classic OO view of methods .
^this Cons Overloads "^" character; maintaining backwards compat with expressions like foo^this.bits would require parsing hackery. May need to use "#this", "@this", or "#^this" instead. It doesn't handle more than one level of function nesting. It isn't like Python It preserves the common OO language convention of implicitly naming the "receiver" parameter It takes a classic OO view of methods.
Leaving out the refactoring hazard seems like a big omission here. Functions get wrapped in lambdas all the time, and one can propagate |this| with some care by several means (var self=this, .bind) already.
But changing every use of the receiver parameter from |this| or |self| to |^this|, and then having to wish for |^^this| or else do more (var grand_self = this) hacks, shows the cost of working around the brittleness. Programmers do not want to distribute the encoding of an incidental or provisional parent-child relationship to every use of the receiver parameter.
I'll leave it to reader to weigh the above pros and cons. But I do have a closing statement:
There is a decades long disagreement among designers/users of function and object-oriented languages. OO proponents think there is something special about the "receiver" of a method call and that "self-calls" have special significance. This perspective pervasively colors OO software designs. Functional proponents (and while I'm happy to represent OO people, I'm reluctant to put specific words into the broad group of functional people) seem to view objects/methods as simply one of many abstractions than can be constructed out of higher-order functions. They see little that is "special" about OO conventions and don't generally apply OO software design techniques.
JavaScript up to this point seems to have done a pretty good job of balancing the OO and functional perspective within the language. However, I think removing the specialness of "this" and the implicit this parameter may be a step too far that breaks that balance.
I'm to blame for all this. I'm not religious either. But I have to disagree about "good job of balancing".
JS's OO commitments are quite weak, or "flexible". Developers get burned by |this| being a parameter computed based on the callee's expression form all the time. This is a specific design decision I made in great haste, and while it allows different patterns or paradigms to be used by convention, programming in the large requires more than fallible, attackable conventions.
Since people already use var self=this and f.bind(o), we can clearly see the active use-cases (at least by source frequency if not dynamic frequency, but the latter could be studied in an instrumented browser engine). I claim these use-cases are not driven by any religious commitment. Rather, often you need to partially apply a function with a var self=this receiver, and it can't possibly work with any other receiver. Is that really "FP" vs. "OO"? I don't think so.
But the need to pass a callback or downward funarg that receives an guaranteed |this|, which most conveniently is often the enclosing function's |this|, is a common use-case. I think we should address it directly, and not turn this into a philosophical balancing act.
Also, I strongly believe that we need a strong-OO proposal in hand before we can judge the harm that explicit receiver parameterization might do to OO methods.
On Mar 29, 2011, at 1:12 PM, P T Withington wrote:
If I had a vote, it would be for a way to explicitly name the
-1th
argument to a function. And I would wish for it to be available in all function forms, defaulting to using the legacy namethis
, if not otherwise specified. I believe it not only addresses the issue in this thread, but leaves the way open for generic functions.[As a user, I infer I fall into your "functional proponent" camp, but I claim to also be an o-o proponent. I just find it much easier to think in generic functions and consider the "distinguished receiver" of message passing as being a degenerate case of that, which has a layer of syntactic sugar to let you express foo(a, b, c) as a.foo(b, c), if you like to think the other way.]
Yes, the generic function "camp" largely coming out of the Lisp world has always been much more closely assigned with the functional world than with the object-oriented world. To us objectivists a.foo(b,c) really does carry a very different meaning than foo(a,b,c). The OO design process centers on identify the objects that will make up a system, not the functions that make up the system.
On Tue, Mar 29, 2011 at 2:19 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
JavaScript up to this point seems to have done a pretty good job of balancing the OO and functional perspective within the language. However, I think removing the specialness of "this" and the implicit this parameter may be a step too far that breaks that balance.
I disagree with the idea that this is changing the balance. Brendan has responded at length about the proposals, but I wasn't talking at all about removing the distinguished first argument. For example:
function f(mythis | x, y) { ... }
is very much in the OO tradition of distinguishing the receiver -- it's just using lexical scope for disambiguation.
On Mar 29, 2011, at 3:03 PM, Sam Tobin-Hochstadt wrote:
On Tue, Mar 29, 2011 at 2:19 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
JavaScript up to this point seems to have done a pretty good job of balancing the OO and functional perspective within the language. However, I think removing the specialness of "this" and the implicit this parameter may be a step too far that breaks that balance.
I disagree with the idea that this is changing the balance. Brendan has responded at length about the proposals, but I wasn't talking at all about removing the distinguished first argument. For example:
function f(mythis | x, y) { ... }
is very much in the OO tradition of distinguishing the receiver -- it's just using lexical scope for disambiguation.
(I think you just reinforced the point I was trying to make. You prioritize the use of lexical scoping over a single unambiguous distinguished meaning for "this")
I think what would change the balance would be not continuing to have "this" serve as the implicit default for the name of the distinguished receiver. In particular for #f(x,y) {this.foo(x,y)} when such definitions occurs in a method like context such as method1, method2, and getter below:
function makeChild() { return { #method1() {this.method2()}, // assume # allowed as concise obj lit method property def #method2() {this.getter}, get #getter() {this.parent}, //implicit return parent = this } }
but instead having to say
function makeChild() { return { #method1(this|) {this.method2()}, #method2(this|) {this.getter}, get #getter(this|) {this.parent}, //implicit return parent = this } }
or
function Child(parent) { this.method1 =#(this|) {this.method2()}; this.method2 =#(this|) {this.getter}; this.defineProperty('getter',{get:#getter(this|) {this.parent}}); this.parent = parent; }
Optionally allowing explicit use of an alternative receiver name is probably acceptable as long as there is a default name. The OO tradition is to distinguish the receiver by using a fixed distinguished name, not to provide a distinguish place for arbitrarily naming the receiver. Python and various lisp object systems allow arbitrary receiver naming, but they have arguably never been in the mainstream of object-oriented language design. If you take away the special meaning of "this" and you don't have alternative vocabulary such as Smalltalk does to talk about the "receiver" of a method then you don't have a simple common way to talk about "the object on which the method was invoked" and that is an essential concept in the OO tradition.
Summarizing the state of proposals from your discussion:
function foo(arg1, arg2) {...} // implicit parameter/receiver 'this'
#foo(this | arg1, arg2) {...} // implicit parameter/receiver 'this'
#foo( arg1, arg2) {...} // no implicit parameter/receiver // any outer 'this' is in scope
It seems that this is painting one problem (control over 'this') twice, (a) by making it possible to name the implicit parameter and (b) by removing the implicit parameter unless named.
As a consequence, the advantage of having less syntactic noise disappears for many use cases, so usage will likely be split between the two forms:
#(this| ..) { .. }
function ( .. ) { .. }
If the idea is to make '#' different from 'function' in avoiding 'this' when possible, then this split makes some sense (as in: "you should not use 'this' here, but we support it in this not-so-handy form if you absolutely need it").
If the idea is to replace 'function' with something more useable, with less syntactic noise, then why not solve it the other way round? Optional naming of implicit parameters already gives us control over what is in scope, so we don't have to drop 'this' by default as well:
#( .. ) { .. } // implicit parameter 'this'
#(_| ..) { .. } // implicit parameter '_', outer 'this' is in scope
That would mean we'd have to write more if we don't want the inner 'this', and it would mean that there always is an implicit parameter (named or not). Would that be better or worse than the current proposal?
The difference between '#' and 'function' would be smaller, making it more likely that the former will be preferred, and less likely that users will trip over missing 'this', but I don't know whether Javascript implementations can drop unused parameters early?
Claus
On Tue, Mar 29, 2011 at 7:08 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
On Mar 29, 2011, at 3:03 PM, Sam Tobin-Hochstadt wrote:
On Tue, Mar 29, 2011 at 2:19 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
JavaScript up to this point seems to have done a pretty good job of balancing the OO and functional perspective within the language. However, I think removing the specialness of "this" and the implicit this parameter may be a step too far that breaks that balance.
I disagree with the idea that this is changing the balance. Brendan has responded at length about the proposals, but I wasn't talking at all about removing the distinguished first argument. For example:
function f(mythis | x, y) { ... }
is very much in the OO tradition of distinguishing the receiver -- it's just using lexical scope for disambiguation.
(I think you just reinforced the point I was trying to make. You prioritize the use of lexical scoping over a single unambiguous distinguished meaning for "this")
I think what would change the balance would be not continuing to have "this" serve as the implicit default for the name of the distinguished receiver. In particular for #f(x,y) {this.foo(x,y)} when such definitions occurs in a method like context such as method1, method2, and getter below:
First, I think that the major problem here is that we're talking about two issues together: #-functions and lexical |this|. I think we should resolve these separately -- there's no reason they have to go together. That's not to say that I think #-functions without an implicit bindings are necessarily a bad idea, just that it's a separate issue.
[snip]
Optionally allowing explicit use of an alternative receiver name is probably acceptable as long as there is a default name. The OO tradition is to distinguish the receiver by using a fixed distinguished name, not to provide a distinguish place for arbitrarily naming the receiver. Python and various lisp object systems allow arbitrary receiver naming, but they have arguably never been in the mainstream of object-oriented language design. If you take away the special meaning of "this" and you don't have alternative vocabulary such as Smalltalk does to talk about the "receiver" of a method then you don't have a simple common way to talk about "the object on which the method was invoked" and that is an essential concept in the OO tradition.
Second, I think this is taking a pretty narrow view of the OO tradition. In particular, writing Python (a fairly popular language, these days :) and Flavors (from the same decade as Smalltalk) out of the mainstream seems pretty extreme.
On 2011-03-29, at 17:52, Allen Wirfs-Brock wrote:
On Mar 29, 2011, at 1:12 PM, P T Withington wrote:
If I had a vote, it would be for a way to explicitly name the
-1th
argument to a function. And I would wish for it to be available in all function forms, defaulting to using the legacy namethis
, if not otherwise specified. I believe it not only addresses the issue in this thread, but leaves the way open for generic functions.[As a user, I infer I fall into your "functional proponent" camp, but I claim to also be an o-o proponent. I just find it much easier to think in generic functions and consider the "distinguished receiver" of message passing as being a degenerate case of that, which has a layer of syntactic sugar to let you express foo(a, b, c) as a.foo(b, c), if you like to think the other way.]
Yes, the generic function "camp" largely coming out of the Lisp world has always been much more closely assigned with the functional world than with the object-oriented world. To us objectivists a.foo(b,c) really does carry a very different meaning than foo(a,b,c). The OO design process centers on identify the objects that will make up a system, not the functions that make up the system.
While I appreciate this is a "religious battle", IWBNI there were a solution that allowed alternative religions, rather than only the mainstream one. Hence my vote.
On 2011-03-29, at 17:38, Brendan Eich wrote:
[...]
We did not discuss allowing |this| to be bound otherwise, except
#foo(this = this| arg1, arg2) {...}
Am I right in understanding the above to be an idiom for trampolining the outer this
binding into the closure? (With the hazard that someone could override that binding.) Or are you just giving an example, not endorsing a particular idiom?
[...]
The other use-case, what Alex Russell calls soft-bind, also wants |this| not |^this|.
Is there an example of what 'soft-bind' means somewhere?
If I've got this right, the idea of soft bind is that the function distinguishes whether it's called as a function or as a method; if called as a function, it uses the lexical binding of |this|, and if called as a method, it uses the dynamically pass-in binding of |this|.
Note that the .call and .apply methods don't allow you to make this distinction.
See strawman:soft_bind for the method-API proposal.
How reasonable would it be to treat 'this' in a similar way that the prototype chain is referenced when resolving identifiers? I realize there would be ambiguity for like named identifiers but that problem exists now... In Allen's first example this.handleClick(this,e) would not be resolved with it's implicit binding to elem, so 'this' would be bound lexically and the code would just work. The second 'this' would of course default to the implicit binding.
On Mar 31, 2011, at 10:32 AM, Kam Kasravi wrote:
How reasonable would it be to treat 'this' in a similar way that the prototype chain is referenced when resolving identifiers? I realize there would be ambiguity for like named identifiers but that problem exists now... In Allen's first example this.handleClick(this,e) would not be resolved with it's implicit binding to elem, so 'this' would be bound lexically and the code would just work. The second 'this' would of course default to the implicit bindin
Other languages have tried to integrate inheritance based binding with lexical binding. See f dyla2007.unibe.ch/?download=dyla07-Gilad.pdf and bracha.org/newspeak-modules.pdf for one example and additional references.
There are some complications to this approach and it seems to me that it is unlikely to be something that could be retrofitted in JavaScript.
Thank you Allen for the references and analysis. It's interesting that Gilad argues resolution should be lexical followed by inheritance to avoid 'unanticipated name capture' that may exist within the inheritance chain. Probably even less of an option given backward compatibility requirements.
That reminds me of an old solution looking for this problem.
Back in 1976, Klaus Berkling suggested to complement the lambda calculus with an operator to protect variables from the nearest enclosing binding [1].
The idea is simply that (lexically scoped) variables usually are bound to the next enclosing binding of the same name, while protected (lexically scoped) variables are bound to the next outer enclosing binding of the same name (each protection key skips one level of binding, lexically).
If I may use '#' as a placeholder for a suitable protection key, then this translates to Javascript as
function Outer() { var x = "outer"; function Inner() { var x = "inner";
}
I've seen this in action in a functional language based on his ideas, in the late 80s, and have since been repeatedly surprised by general unawareness of Berkling's ideas. Variants have been rediscovered occasionally - git's "HEAD^" is a recent example:
www.kernel.org/pub/software/scm/git/docs/gittutorial.html
Of course, applying the idea to Javascript's 'this' is complicated by dynamic scoping. Personally, I've found it useful to think of 'this' as an implicit, lexically scoped parameter - then I only have to worry about which applications pass which implicit parameter.
I don't know whether that is sufficient to make Berkling's idea applicable here, but I wanted to make sure it is known;-)
Claus www.haskellers.com/user/claus
[1] The original report is hard to come by, the idea was also published again in 1982, items 5/6 in this bibliography:
www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Berkling:Klaus_J=.html
@techreport{Berkling76, author = {Berkling, K.J.}, title = {{A Symmetric Complement to the Lambda Calculus}}, institution = GMD, note = {ISF-76-7}, month = {September}, year = 1976, abstract = {"The calculi of Lambda-conversion" introduced by A.Church are complemented by a new operator lambda-bar, which is in some sense the inverse to the lambda-operator. The main advantage of the complemented system is that variables do not have to be renamed. Conversely, any renaming of variables in a formula is possible. Variables may, however, appear with varied numbers of lambda-bars in front of them. Implementations of the lambda calculus representation with the symmetric complement are greatly facilitated.
In particular, a renaming of all variables in a formula to the same one is possible. Variables are then distinguished only by the number of preceding lambda-bars. Finally, we give a four symbol representation of the lambda calculus based on the above mentioned freedom in renaming. }, topics = {FP - Lambda Calculi} }