'this' is more complicated than I thought (PropertyReferences break equivalences)
The behavior of References isn't as arbitrary or different from other
languages as it might seem.
It's really a way to specify l-values.
When you assign to an object property, e.g., "o.x = 42", the l-value here
is the "x" property of
the "o" object. We need to capture both.
In other languages, e.g., C or Java, you have the same problem:
int x = something.prop;
x = 10;
is not the same as
something.prop = 10;
In C/C++, you know that the something.prop on the right-hand side of an
assignment
means something slightly different from the one on the left-hand side.
We have the same thing in ECMAScript:
var x = something.prop;
x = 10;
is not the same as
something.prop = 10;
So far, References as a specification mechanism is just following other
languages,
and behaving exactly as any seasoned programmer would expect. Try checking
your questions
against the expected behavior if a Reference is just an l-expression.
Where it differs is function/method calls. The traditional languages that
I compared
to does not have functions as first-class values. If a function sits on an
object,
you can't extract it and call it without such an object.
Well, there is C++ method pointers which stay bound to the object they
were extracted from
(but don't try to guess the size of one, they are probably bigger than
whatever you
might think is necessary for that). And they are different from pointers
to static
functions.
The binding of "this" when calling a Reference value mimics method calls.
It does so
fine when you treat objects as objects, but not when you try to extract a
method from its object
(try doing that in Java!). It's a shallow abstraction, but it does work
when you play along
with it.
A Reference is purely a specification-tool that desn't have to exist in
any form inside
an actual implementation. If we start exposing it, we would require
implementations to take
steps they might not need in order to visibly create and pass around such
a reference.
If you really need user-level references, you can create them yourself,
and just do
var ref = new Reference(object, "prop");
var val = ref.GetValue();
ref.SetValue(val + 10);
ref.SetValue(someFunction);
ref.call(arg1, arg2);
/L
On Mon, 11 Apr 2011 10:55:49 +0200, Claus Reinke <claus.reinke at talk21.com>
wrote:
Like most Javascript programmers, I have tended to follow a simple rule
for functions using 'this': eta-expand method selections, use .bind, or
get into trouble.Then I got curious about how method calls determine what object to pass as 'this': a method is a function selected from an
object, and functions are first-class values, so, by the time they get
called, how do we know where they came from?So I looked into the spec, and things deteriorated from there.
I would be interested in the rationale for the current specification of
PropertyReferences, as it seems to invalidate a large class of program equivalences (see below for examples).Status:
According to the Ecmascript spec (11.2.1), property accessors return not
the selected value but a Reference (8.7), which is a combination of the
object selected from and the name of the property being selected (the property value is not stored immediately, but selected later when calling GetValue on such a
PropertyReference).Function calls (11.2.3) then construct 'this' from the object in a PropertyReference, and the whole Reference concept, as far as its use for 'this' is concerned, seems finely tuned to mimic a
piece of syntax (conserve the base object info just long enough to use it for 'this'), rather than a semantic value
(trying to pass around PropertyReferences is likely to end up calling GetValue, losing the reference info).Question 1: Should a Reference hold on to the current property value?
Currently, there seems to be no way to store a property Reference without GetValue getting called, so there is no window
for changing the property value such a Reference refers to behind its back. That would no longer be true if References could be passed around, as language values.
Which value? The one the property (if it existed) had when the reference was created? Or the current one - if the Reference survives for any amount of time, the object property could change its value in the meantime. What if it's getter property? What if it's a setter property with not getter?
If you make a reference a first-class value, then you probably don't want
to make too many assumptions about how it's used. Don't read a value from
it
unless the user wants to do so.
Question 2: If a Reference allows us to recover 'obj' from 'obj.method', why does this information have to get lost when passing it
through a variable binding?obj.longish() // correct 'this' var short = obj.longish; // try to define a shorthand short(); // oops, wrong 'this' This seems to be a very popular mistake - most beginners seem to get burned once. Naively, one thinks of selection losing the information, but that does not seem to be the case. So could this error source be eliminated by passing the
Reference as a value, instead of only the value component without the base object, one step further?
Ofcourse it's possible, but personally I prefer to have the "this" object obvious in the call line. That way I know what object the method is being called on. Without it, the loss of context is in the source code, making it harder to read and maintain.
Question 3: It seems that trying to reuse References for 'this' forces early calls to GetValue (because users should not have to call 'obj.method.valueOf()' or 'obj.property.valueOf()' to
trigger the delayed selection, and because the property value is not stored in the PropertyReference).
I'm not sure I understand what the problem is here.
This loses information that we would like to hold on to - if we
really cannot solve this for References in general, why not store the 'this'-candidate in the function instead (similar to the fairly new '[[boundThis]]')?
Won't work. The same function can be used in many places at the same time. E.g. [obj1.foo, obj2.foo](Math.random() * 2) | 0;
That might allow to limit the equivalence breakage with respect to determining 'this'.
Currently, searching for 'Reference' in the spec gives an uneven
picture. For instance, the spec claims that The Reference
Specification Type is used to explain the behaviour of such
operators as delete, typeof, and the assignment operatorsHowever, References are also used to determine the value of 'this' for function calls, and most expressions/operations/ variable bindings/function calls cannot pass through References,
returning only their values instead. This information is spread over too
many pages - it could be summarized in the section on References.Question 4: (general version of question 2) Why is the origin information in References lost so easily?
It seems that most parts of the spec require GetValue() by default, with few exceptions. What would go wrong if the available information would be passed on instead (isn't it sufficient for the final consumers to call GetValue(), provided that the original property value is stored in the PropertyReference, to avoid interference)?
References is a specification tool. If it survived for an extended amount
of time,
and visibly so, implementations would have to actually implement something
to
represent it. As it is now, a reference is found and immediately consumed,
which
allows implementations to never create it at all, and work directly on the
value
in r-value contexts, and on the object and property in l-value contexts.
Broken equivalences:
It is not too surprising that eta-conversion does not hold
obj.method <-/-> function() { return obj.method(); }
although usually, the problems are with termination, side-effects, or
type errors, while in this case, the problem seems to be context-sensitive: many contexts treat References differently. That can be confusing.Currently, quite a few code transformations are not valid if the code involves References and might be used in the context of a function call (MemberExpression/CallExpression). Here are some examples:
var x = obj.m; x(); <-/-> obj.m(); obj.m.valueOf() <-/-> obj.m
Why should that work? The valueOf function isn't guaranteed to return anything related to the object it's on.
(function(){ return obj.m; }()) <-/-> obj.m (x = obj.m) <-/-> obj.m // where x is unused [obj.m][0] <-/-> obj.m // obvious in hindsight? // but really surprising the first time {tmp: obj.m}.tmp <-/-> obj.m // this explains the one above (0,obj.m) <-/-> obj.m (true && obj.m) <-/-> obj.m (false || obj.m) <-/-> obj.m (true ? obj.m : obj.m) <-/-> obj.m // Firefox 3.6.11
wrongly optimizes this one
I find the result very confusing: not only will a method lose its Reference just by passing it around, but it will pick up a new Reference by passing it through any kind of Object. This
picking-up-new-Reference is probably needed for mixins (copying methods from one object to another), but it means that storing naked method References in Arrays
is not recommended.
This is exactly correct. Extracting a method from its object will break the connection to the object. Which is kindof expected when you allow any function to be used as both a method and a non-method.
I was not aware that just about any code transformation would be invalidated by the handling of References. If this cannot be
fixed, could the specification be more explicit about this, please?
When I think of References as l-values, the current behavior actually
become
the expected one. The only tricky bit is that method-calls actually need
an l-value to
work correctly.
On 4/11/11, Claus Reinke <claus.reinke at talk21.com> wrote:
Like most Javascript programmers, I have tended to follow a simple rule for functions using 'this': eta-expand method selections, use .bind, or get into trouble.
That is unnecessary and inefficient. Instead, I use the following algorithm:
For instance methods, always call with the base object or with
call/apply. Don't use this
in methods that are to be called as
static, so you can use variable shortcuts for those static methods,
pass them around.
// DONT DO THIS
var StyleUtils = {
HAS_COMPUTED_STYLE : (function() { /.../ return true; })(),
getStyle : function(el, name) {
// FAILED STRATEGY, this
in static context.
if(this.HAS_COMPUTED_STYLE) {
return "worked";
}
return "didn't work";
}
};
That most JavaScript programmers like to bind every function says more about trends in JavaScript programming than about JavaScript.
Then I got curious about how method calls determine what object to pass as 'this': a method is a function selected from an object, and functions are first-class values, so, by the time they get called, how do we know where they came from?
The base object.
var o = { m : function(){ alert( this == o ); } }; o.m(); // true, o is base object
var f = o.m; f(); // false.
Calling f() results false because the base object is a declarative
environment record (called VariableObject in ES3). And when that
happens, the this
value is either global object or null in ES5 in
some cases.
On 4/11/11, Claus Reinke <claus.reinke at talk21.com> wrote:
Like most Javascript programmers, I have tended to follow a simple rule for functions using 'this': eta-expand method selections, use .bind, or get into trouble.
That is unnecessary, inefficient, and adds clutter.
That most JavaScript programmers do that says more about trends in JavaScript programming than it does about the language.
Then I got curious about how method calls determine what object to pass as 'this': a method is a function selected from an object, and functions are first-class values, so, by the time they get called, how do we know where they came from?
The base object.
Follow these two rules to greatly reduce this
reference confusion.
-
For instance methods (such as prototype methods), either qualify the method call with the base object or use call/apply. For example, var x = new X; x.m(); // Qualified instance method // DONT DO THIS var m = x.m; m();
-
For static methods, write them so that they never use
this
. For example, here is an example of static methodgetStyle
that violates that rule and usesthis
:
// DONT DO THIS
var StyleUtils = {
HAS_COMPUTED_STYLE : (function() { /.../})(),
getStyle : function(el, name) {
// Problem: Use of this
in static method.
if(this.HAS_COMPUTED_STYLE) {
}
}
};
By never using this
in static methods, it can be assured that they
can be aliased with a variable and passed around, e.g. var getStyle = StyleUtils.getStyle
.
The behavior of References isn't as arbitrary or different from other languages as it might seem. It's really a way to specify l-values.
Not arbitrary, but different, and quite drastically so (as far as usage is concerned). Your remarks helped me to pin down the difference (and eliminated two of my questions, thanks!-): References are l-values, but they cannot be used as such, due to forced, implicit conversion curtailing their lifetimes.
The differences between References and general l-values (values that represent locations where other values may be stored) lies in how they may be used and how long they live:
-
l-values are first-class values: they can be passed around, assigned to variables, stored in data structures; they happen to support a de-referencing operation, but merely evaluating an l-value does not de-reference it; l-values can be de-referenced explicitly; some languages implicitly coerce l-values into r-values (causing de-reference) depending on usage context (this is where the names come from: values on the left and right hand sides of assignments), but even those languages tend to provide means to control when coercions take place
-
References start out as l-values, but don't live long enough to be used as such. They cannot be passed around, stored in data structures, or assigned to variables; any attempt to evaluate them leads to immediate de-reference, no matter whether the usage context expects an l-value or not; there is no way to prevent the implicit de-reference
It is mostly the implicit coercion in evaluation, combined with the early evaluation inherent in Javascript's call-by-value semantics, that breaks those equivalences.
In other languages, e.g., C or Java, you have the same problem: int x = something.prop; x = 10; is not the same as something.prop = 10;
My C has been buried for too long, but were not l-values one of C's showcases? Something like this
int *x = &(something.prop);
*x = 10;
should work, by making x hold l-values (in case I messed up the syntax beyond recognition: x should be a pointer to int, its value being the address of something.prop, so we can use the r-value of x as an l-value in the second assignment).
C allowed us to be explicit about whether we wanted l-values or r-values, overriding the default conversions when necessary. Some later languages, such as Haskell or Standard ML, dropped the implicit coercions entirely, so all de-references are explicit.
ECMAScript relies on implicit de-reference, but triggers that by every evaluation. So we don't have explicit de-reference, we do not have C's flexibility for explicit de-reference control, and we do not even have C's context-sensitive implicit de-reference.
Which means that things like
(1 ? obj.prop : obj.prop) = 3; (0, obj.prop) = 2;
will work in C, but fail in ES (References are very short-lived l-values - every operation evaluates and de-references them, independent of whether the result is going to be used in an l-value or r-value context).
Also, in C we can write
x = &(obj.prop); *x = 4;
to express that we want x to hold l-values, and storing l-values in arrays isn't much different
int *a[1] = { &obj.prop }; *(a[0]) = 1;
In ES, we only have the default-to-r-value path. For instance,
[obj.prop][0] = 1;
will not use obj.prop as an l-value, and there does not seem to be a straightforward alternative for programmers who want to work with ES References as l-values.
So far, References as a specification mechanism is just following other languages, and behaving exactly as any seasoned programmer would expect.
Does the above explain why a seasoned programmer might reasonably expect differently, because ECMAScript behaves differently from other languages?
The binding of "this" when calling a Reference value mimics method calls. It does so fine when you treat objects as objects, but not when you try to extract a method from its object .. It's a shallow abstraction, but it does work when you play along with it.
I was trying to point out that the mechanism works for the simple case and is known to confuse programmers for other cases. In particular, I am trying to find out whether the current mechanism is a special case of a more complete mechanism, one that works equally well for simple and non-simple cases.
Since PropertyReferences hold the object the method was selected from, all that seems needed is to make References survive evalutation, ie, make References first-class values.
An alternative would be to preserve context-information during evaluation: if the result of '(?:)' is to be used as an l-value, then evaluation should perhaps not de-reference the l-values in the conditional.
I am less concerned with being able to use '(?:)' on the left hand side of assignments, and more with being able to use equivalences like '(true ? x : x) <--> x', independent
of where the expression occurs. Since we can write such conditionals on left hand sides, why not make sure that they actually work there?
Question 1: Should a Reference hold on to the current property value?
Which value? The one the property (if it existed) had when the reference was created? Or the current one - if the Reference survives for any amount of time, the object property could change its value in the meantime.
Thanks. So the answer probably is 'no' - which value we get depends on when we look behind the reference.
I guess I was confused by property accessors not actually accessing the property - once the decision is made to return a Reference instead of the property value, it would only be consequent to keep Reference construction and de-reference separated. As long as we are able to specify when to pass the reference and when to look up the value behind it.
Question 2: If a Reference allows us to recover 'obj' from 'obj.method', why does this information have to get lost when passing it through a variable binding? .. could this error source be eliminated by passing the
Reference as a value, instead of only the value component without the base object, one step further?Of course it's possible, but personally I prefer to have the "this" object obvious in the call line. That way I know what object the method is being called on. Without it, the loss of context is in the source code, making it harder to read and maintain.
I'm afraid you won't get that comfort;-) At the moment, programmers can just write the more complicated
var short = function(x) { return obj.method(x); };
short("hi"); // no 'this' object on the call line
We can try to make this more readable, and we can try to eliminate a common source of bugs, but the rest is between you and your team's coding style and style checker.
Question 3: It seems that trying to reuse References for 'this' forces early calls to GetValue (because users should not have to call 'obj.method.valueOf()' or 'obj.property.valueOf()' to
trigger the delayed selection, and because the property value is not stored in the PropertyReference).I'm not sure I understand what the problem is here.
Probably because the description is a bit confusing. I was trying to understand why References get eliminated early, through calls to GetValue, and was enumerating non-reasons before coming to my question:
This loses information that we would like to hold on to -
if we really cannot solve this for References in general, why not store the 'this'-candidate in the function instead (similar to the fairly new '[[boundThis]]')?Won't work. The same function can be used in many places at the same time.
Ah, good point. If functions were constants, we could make copies (sharing the code, but with different this values). But they aren't, so we need the 'this'-candidates outside the function.
E.g. [obj1.foo, obj2.foo](Math.random() * 2) | 0;
Note that, currently, either selection will have that anonymous first array as 'this', not obj1 or obj2. But you were aware of that, right?-)
Question 4: (general version of question 2) Why is the origin information in References lost so easily?
It seems that most parts of the spec require GetValue() by default, with few exceptions. What would go wrong if the available information would be passed on instead (isn't it sufficient for the final consumers to call GetValue(), provided that the original property value is stored in the PropertyReference, to avoid interference)?
References is a specification tool. If it survived for an extended amount of time, and visibly so, implementations would have to actually implement something to represent it. As it is now, a reference is found and immediately consumed, which allows implementations to never create it at all, and work directly on the value in r-value contexts, and on the object and property in l-value contexts.
Yes, and that is my argument. L-values as first-class values is the common way to handle references, whether it is in C, in Haskell, in ML, .., ever since Strachey documented l-values in 1967, and probably longer than that. Once references become visible to programmers, one might as well support them fully. Eliminating temporary structures is a common implementation optimization, not limited to References, and not a language spec concern.
Broken equivalences: ..
var x = obj.m; x(); <-/-> obj.m(); obj.m.valueOf() <-/-> obj.m
Why should that work? The valueOf function isn't guaranteed to return anything related to the object it's on.
The default valueOf for Function comes from Object, where it is 'ToObject this', which for Object is the input argument without conversion. I think..
This is exactly correct. Extracting a method from its object will break the connection to the object. Which is kind of expected when you allow any function to be used as both a method and a non-method.
I expected none of these:
- Property access is not extraction.
- Extraction is triggered by constructs that could just as well pass on the property accessor (and probably should, as the alternative leads to runtime errors).
- Extraction alone will not break the connection, only some forms of triggering extractions will do so.
- A new connection is established by trying to pass a property accessor through an Array.
I was not aware that just about any code transformation would be invalidated by the handling of References. If this cannot be fixed, could the specification be more explicit about this, please?
When I think of References as l-values, the current behavior actually become the expected one. The only tricky bit is that method-calls actually need an l-value to work correctly.
Even the l-value part is unusual, as I've tried to show.
Claus
Like most Javascript programmers, I have tended to follow a simple rule for functions using 'this': eta-expand method selections, use .bind, or get into trouble.
That is unnecessary, inefficient, and adds clutter.
The problem with rules-of-thumb is that most people only have two of those;-) I agree that knowing and understanding all the relevant aspects is better, and reducing the number of details a programmer must worry about is a good language design goal. But when we can't communicate all the details, our rules have to be simple.
The rule above doesn't mention details, and doesn't discuss alternatives, such as the DOM's EventListener interface, but it does cover your suggestions: if the function doesn't use 'this', there's no need to worry, and eta-expansion of method selections (only needed when not directly applied) ensures that method calls are always qualified.
var short = function(..) { return obj.method(..); };
The problem with does-it-use-this is that it implies knowledge about the function definition. Eta-expansion certainly adds clutter (but that is generic clutter which I'd like to get rid in the general case), but it works whether the function uses 'this' or not, and there is no reason for it to be inefficient.
But it doesn't really matter which rules we follow to work around this hole, the issue is that the language spec leaves this hole for programmers to fall into.
Removing such sources of errors tends to be more useful than collecting extensive documentation about workarounds(*).
Claus
(*) Once, I used a modelling tool with extensive, carefully written and illustrated documentation. Using another tool for the same application domain that didn't need that kind of documentation was an interesting experience.
Like most Javascript programmers, I have tended to follow a simple rule for functions using 'this': eta-expand method selections, use .bind, or get into trouble.
Then I got curious about how method calls determine what object to pass as 'this': a method is a function selected from an object, and functions are first-class values, so, by the time they get called, how do we know where they came from?
So I looked into the spec, and things deteriorated from there.
I would be interested in the rationale for the current specification of PropertyReferences, as it seems to invalidate a large class of program equivalences (see below for examples).
Status:
According to the Ecmascript spec (11.2.1), property accessors return not the selected value but a Reference (8.7), which is a combination of the object selected from and the name of the property being selected (the property value is not stored immediately, but selected later when calling GetValue on such a PropertyReference).
Function calls (11.2.3) then construct 'this' from the object in a PropertyReference, and the whole Reference concept, as far as its use for 'this' is concerned, seems finely tuned to mimic a piece of syntax (conserve the base object info just long enough to use it for 'this'), rather than a semantic value (trying to pass around PropertyReferences is likely to end up calling GetValue, losing the reference info).
Question 1: Should a Reference hold on to the current property value?
Question 2: If a Reference allows us to recover 'obj' from 'obj.method', why does this information have to get lost when passing it through a variable binding?
Question 3: It seems that trying to reuse References for 'this' forces early calls to GetValue (because users should not have to call 'obj.method.valueOf()' or 'obj.property.valueOf()' to trigger the delayed selection, and because the property value is not stored in the PropertyReference).
Currently, searching for 'Reference' in the spec gives an uneven picture. For instance, the spec claims that
However, References are also used to determine the value of 'this' for function calls, and most expressions/operations/ variable bindings/function calls cannot pass through References, returning only their values instead. This information is spread over too many pages - it could be summarized in the section on References.
Question 4: (general version of question 2) Why is the origin information in References lost so easily?
Broken equivalences:
It is not too surprising that eta-conversion does not hold
although usually, the problems are with termination, side-effects, or type errors, while in this case, the problem seems to be context-sensitive: many contexts treat References differently. That can be confusing.
Currently, quite a few code transformations are not valid if the code involves References and might be used in the context of a function call (MemberExpression/CallExpression). Here are some examples:
I find the result very confusing: not only will a method lose its Reference just by passing it around, but it will pick up a new Reference by passing it through any kind of Object. This picking-up-new-Reference is probably needed for mixins (copying methods from one object to another), but it means that storing naked method References in Arrays is not recommended.
I was not aware that just about any code transformation would be invalidated by the handling of References. If this cannot be fixed, could the specification be more explicit about this, please?
Claus