Strategies for standardizing mistakes
On Tue, Oct 13, 2009 at 7:01 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:
I agree with Maciej. The implementation-defined operations have clear specifications of their parameters. I think that it is highly undesirable to adopt an interpretation in which they can have arbitrary additional inputs depending on the context in which they are used.
If they didn't depend on the context in which they are used, they wouldn't need to be host objects, right? The whole point of the host object is that it knows things about the host (what mode it was loaded in, what privileges the context offers, what the user's preferences are) which aren't within the scope of the language proper.
Mike
-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Mike Shaver Sent: Tuesday, October 13, 2009 4:05 PM To: David-Sarah Hopwood Cc: es-discuss at mozilla.org Subject: Re: Strategies for standardizing mistakes
On Tue, Oct 13, 2009 at 7:01 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:
I agree with Maciej. The implementation-defined operations have clear specifications of their parameters. I think that it is highly undesirable to adopt an interpretation in which they can have arbitrary additional inputs depending on the context in which they are used.
If they didn't depend on the context in which they are used, they wouldn't need to be host objects, right? The whole point of the host object is that it knows things about the host (what mode it was loaded in, what privileges the context offers, what the user's preferences are) which aren't within the scope of the language proper.
I agree with Mike (and hence disagree with Maciej and David-Sarah). The internal methods are not implementation devices, although actual implementation might actually use a similar seeming mechanism. The internal methods are just specification devices uses for describing the standard semantics and their arguments are simply devices used in factoring the pseudo-code of the specification. When the spec. says that host object's may use other definitions of the internal methods, they are saying that the semantics is arbitrarily defined by the implementation when host objects appear in such contexts. There is nothing in the specification that limits implementations limited to using information provided by the pseudo-code arguments in the definition of their semantics. The implementation can do anything including making use of data or oracles that are not explicitly provided for in the ES specification.
Fundamentally, I think the misunderstanding here is similar to the one Cameron had when drafting the WebIDL spec. Internal methods are not a general purpose extension mechanism for plugging new functionality into ES. They simply provide a convenient mechanism in the specification for specifying polymorphic semantics. You can use them in defining the semantics of extensions, particularly ones that are tied to specific types of objects. However, they aren't intended to constrain future semantic extensions. That why ES5 was free to both add new internal methods and change both the definition and signatures of some previously existing internal methods. They aren't the language, their just a tool used to specify the language.
On 10/13/2009 04:05 PM, Mike Shaver wrote:
On Tue, Oct 13, 2009 at 7:01 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:
I agree with Maciej. The implementation-defined operations have clear specifications of their parameters. I think that it is highly undesirable to adopt an interpretation in which they can have arbitrary additional inputs depending on the context in which they are used.
If they didn't depend on the context in which they are used, they wouldn't need to be host objects, right? The whole point of the host object is that it knows things about the host (what mode it was loaded in, what privileges the context offers, what the user's preferences are) which aren't within the scope of the language proper.
I think David-Sarah may have overstated his case when he used the phrase "arbitrary additional inputs depending on the context in which they are used". Of course host objects interact with the host environment, and have access to various kinds of state like privileges and preferences.
There's one specific kind of contextual information that's being looked at askance here: knowledge of the expression surrounding the call that invoked you. Perl lets subroutines check what sort of value their caller is expecting; that hasn't aged well.
It seems to me that ES5 is not capable of forbidding such behavior. ES5 can't forbid implementations from providing JS debugging APIs, nor forbid such APIs from providing functions that inspect a call, nor forbid host objects from using such a function.
One could characterize the difference by saying that Mozilla has "reluctant properties" whereas WebKit has "reluctant values". :)
In other words, in WebKit, 'document.all' has a value --- a value that
can be assigned to other variables, stored in data structures, and so on
without changing its behavior --- but which is hard to get a grip on.
Whereas, in Mozilla, 'document' sort-of-has and sort-of-doesn't-have a
property named 'all', depending on how you look at it.
It could just be organizational bias, but reluctant properties strike me as the more bounded form of insanity.
On Wed, Oct 14, 2009 at 7:40 PM, Jim Blandy <jimb at mozilla.com> wrote:
There's one specific kind of contextual information that's being looked at askance here: knowledge of the expression surrounding the call that invoked you. Perl lets subroutines check what sort of value their caller is expecting; that hasn't aged well.
Our implementation of String.prototype.match checks the context in which it's called, to see if it need bother with the expense of constructing the result array (it needn't, if the match call is being used simply as a test, which isn't unheard of on the web). That optimization aged pretty well, and indeed benchmarks often encourage such context-sensitivity.
Mike
On Oct 14, 2009, at 5:04 PM, Jim Blandy wrote:
One could characterize the difference by saying that Mozilla has
"reluctant properties" whereas WebKit has "reluctant values". :)In other words, in WebKit, 'document.all' has a value --- a value
that can be assigned to other variables, stored in data structures,
and so on without changing its behavior --- but which is hard to get
a grip on.
I'm not sure "reluctant value" is a good way to summarize the
behavior. The term we use is "object that masquerades as undefined".
The value returned by document.all has a small set of behaviors that
are exhibited by the undefined value, but are not allowed by the spec
to values of object type.
Whereas, in Mozilla, 'document' sort-of-has and sort-of-doesn't-have
a property named 'all', depending on how you look at it.It could just be organizational bias, but reluctant properties
strike me as the more bounded form of insanity.
Before you conclude that, let's consider the impact of host object
properties that return different values based on their syntactic
context. From the informal operational semantics given form ECMAScript
syntactic constructs, one might conclude that certain source-to-source
program transforms are sound, in the sense that they cannot possibly
alter the behavior of the program. Consider these two functions. I
will use %EXPR% as a metasyntactic variable, indicating any valid
JavaScript expression that can appear as the right-hand side of an
assignment:
function testFunc() { var tmp; func nested(e) { tmp = e; } nested(%EXPR%); return tmp; }
function testFunc() { var tmp = %EXPR%; return tmp; }
You might think this is a valid transformation no matter what %EXPR%
is, even if it involves hot objects. But this transformation is
unsound in Mozilla, for example if %EXPR% is document.all. In that
case, the first function will return the all collection and the second
will return undefined. Now, you might think this is kind of obscure,
and not a practically important transform. Sure, source-to-source
transformations could in theory factor out unneeded closures, but
would they really. However, consider this pair of functions:
function testFunc() { var tmp = %EXPR%; return tmp; }
function testFunc() { return %EXPR%; }
You might think these are surely equivalent in all respects.
Converting in either direction seems like a really basic transform,
something that many ECMAScript program rewriters are likely to do. But
again, converting in either direction would be unsound for Mozlla, if
%EXPR% is document.all. The first function would return undefined, the
second all collection.
Going beyond what Mozilla does, let's say host object property or
method access can vary in arbitrary ways based on syntactic context.
Then no source-to-source transform is sound, not even changing
whitespace or stripping comments, unless you know the behavior of all
host objects the code will deal with.
It's true that an object which ToBoolean() converts as false violates
some assumptions, as does an object that compares == to undefined or
null. There are some otherwise valid identities for a value known to
be an object that this would break (though that knowledge would have
to come in some way other than 'typeof'). But I would claim this is a
more bounded form of insanity than the idea of expressions that have
different values depending on their surrounding source context. The
former requires just a limited number of additional permitted host
object exceptions. Tools that understand JavaScript object semantics
would just need a finite set of changes to consider the possibility of
objects that masquerade as undefined. The latter, if truly allowed by
the spec, makes source-to-source transformers, even something as
simple as a pretty-printer, potentially unsound. That seems like a
much less bounded form of insanity.
(It's been raised that debugging APIs may have behavior that depends
on the calling context. That may be true, but exposing debugging APIs
directly to normal code would violate important assumptions. For
example, the spec was tweaked to prevent exposing strict callers to
their non-strict callees, but it's commonplace for debugging APIs to
expose the full call stack. So I don't think the existence of
debugging APIs is a good argument that calling-context-sensitive host
objects are permissible in general. Also, looking at calling context
for the sake of performance optimizations does not create these kinds
of problems if observable behavior remains the same, so it's not a
helpful analogy.)
In any case, both the Mozilla and WebKit solutions for undetectable
document.all were pragmatic approaches to a problem with no perfect
solutions. Each has its downsides. I just wanted to explain some of
the less obvious consequences of Mozilla's approach, and of this kind
of mechanism in general.
, Maciej
On Wed, Oct 14, 2009 at 7:04 PM, Jim Blandy <jimb at mozilla.com> wrote:
One could characterize the difference by saying that Mozilla has "reluctant properties" whereas WebKit has "reluctant values". :) [...] It could just be organizational bias, but reluctant properties strike me as the more bounded form of insanity.
I agree with Maciej. To summarize his point:
<script>
alert(!function () { return document.all; }()); alert(!document.all); </script>
Two different results in Firefox. Beta reduction is broken in JavaScript.
(It's broken anyway for expressions containing arguments
or this
,
but those are statically visible.)
"Reluctant properties" are, I think, impossible to specify sanely. If they can be specified, I'm still not sure they can be implemented correctly. Our implementation looks at the bytecode; I wouldn't wager that the correspondence between that and the original syntactic context is 100% sound. I sort of doubt that everyone who touches the compiler is even aware of the constraint.
On 10/14/2009 06:36 PM, Mike Shaver wrote:
Our implementation of String.prototype.match checks the context in which it's called, to see if it need bother with the expense of constructing the result array (it needn't, if the match call is being used simply as a test, which isn't unheard of on the web). That optimization aged pretty well, and indeed benchmarks often encourage such context-sensitivity.
It aged well because it obeys the fundamental principle of optimization: "Lying is fine as long as you don't get caught." You haven't changed the meaning of the language. document.all is in a different category, because it's observable.
On 10/15/2009 07:23 AM, Maciej Stachowiak wrote:
The latter, if truly allowed by the spec, makes source-to-source transformers, even something as simple as a pretty-printer, potentially unsound. That seems like a much less bounded form of insanity.
I think this point is well-taken.
In the case of 'eval', ES5 requires an implementation to inspect the context of the call. A direct call to eval runs the code in the call's environment; indirect calls run in the global environment. This makes eval into a pseudo-syntactic form: really, expressions of the form 'eval(...)' are special to the compiler, regardless of eval's binding.
The way Mozilla treats 'document.all' seems analogous.
(It's been raised that debugging APIs may have behavior that depends on the calling context. That may be true, but exposing debugging APIs directly to normal code would violate important assumptions.
Well, my point there was more that approaching the question in terms of whether a given behavior is permitted by the spec doesn't advance the conversation much. For native objects, the spec is powerless to forbid truly horrible things; it's too low a bar.
On 10/15/2009 09:29 AM, Jason Orendorff wrote:
I sort of doubt that everyone who touches the compiler is even aware of the constraint.
/me tries to look inconspicuous
-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Jim Blandy ...
In the case of 'eval', ES5 requires an implementation to inspect the context of the call. A direct call to eval runs the code in the call's environment; indirect calls run in the global environment. This makes eval into a pseudo-syntactic form: really, expressions of the form 'eval(...)' are special to the compiler, regardless of eval's binding.
Correct, "indirect eval" is a normal function call while direct eval is essentially a syntactic form and need not actually call the global eval function. This was done to avoid any implication that the global eval function would need to have a mechanism for inspecting its calling context.
The way Mozilla treats 'document.all' seems analogous.
Is the Mozilla document.all optimization contingent upon the occurrence of the text "document.all"? What happens for: var docAll = document.all; if (docAll) alert("IE"); else alert("not IE");
It's the use of the literal name "eval" that make direct eval a pseudo-syntactic form.
On Thu, Oct 15, 2009 at 1:47 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
Is the Mozilla document.all optimization contingent upon the occurrence of the text "document.all"?
No, but it's contingent on the property lookup being the truthyness-testing-expression. (I wouldn't call it an optimization; if anything it adds some cost to certain paths.)
What happens for: var docAll = document.all; if (docAll) alert("IE"); else alert("not IE");
"IE"
var doc = document; if (doc.all) alert("IE"); else alert("not IE");
"not IE"
It was something that was intentionally targetted at the common case of the simple feature test, which is almost always "document.all". This is done by the host object implementation, which is given access to limited context ("is this access one that's being used to detect the presence of a property, versus fetching the value?").
Mike
On Oct 15, 2009, at 10:54 AM, Mike Shaver wrote:
On Thu, Oct 15, 2009 at 1:47 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
Is the Mozilla document.all optimization contingent upon the
occurrence of the text "document.all"?No, but it's contingent on the property lookup being the truthyness-testing-expression. (I wouldn't call it an optimization; if anything it adds some cost to certain paths.)
What happens for: var docAll = document.all; if (docAll) alert("IE"); else alert("not IE");
"IE"
Just as a minor point of technical correction - this will actually
alert "not IE" in Firefox because the right-hand sign of an assignment
is considered a detecting access. (Just tested to confirm.)
On Thu, Oct 15, 2009 at 2:29 PM, Maciej Stachowiak <mjs at apple.com> wrote:
Just as a minor point of technical correction - this will actually alert "not IE" in Firefox because the right-hand sign of an assignment is considered a detecting access. (Just tested to confirm.)
Thank you! I see that I wrote the test backwards when I tested here...
Mike
On Oct 15, 2009, at 11:31 AM, Mike Shaver wrote:
On Thu, Oct 15, 2009 at 2:29 PM, Maciej Stachowiak <mjs at apple.com>
wrote:
Just as a minor point of technical correction - this will
actually alert
"not IE" in Firefox because the right-hand sign of an assignment is considered a detecting access. (Just tested to confirm.)
Thank you! I see that I wrote the test backwards when I tested
here...
I cited the relevant bugzilla.mozilla.org bugs recently in public-html:
bugzilla.mozilla.org/show_bug.cgi?id=259935#c0
is the place where assignment to a variable came up; it links to:
us.js1.yimg.com/us.yimg.com/lib/g/ylib_dom.js
See also:
bugzilla.mozilla.org/show_bug.cgi?id=253150#c4, bugzilla.mozilla.org/show_bug.cgi?id=253150#c12, bugzilla.mozilla.org/attachment.cgi?id=154617
The masquerades-as-undefined approach works equally well on such
sites, and it works better in theory, except for the nastiness of
violating ECMA-262 rules, which matters not only for conformance
purity but also for any implementations modeled on the spec in ways
that make it a chore to support masquerades-as-undefined.
Anyway, I wanted to cite the bugzilla links for anyone interested in
the history.
-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Maciej Stachowiak Sent: Thursday, October 15, 2009 7:23 AM ... On Oct 14, 2009, at 5:04 PM, Jim Blandy wrote:
...
It could just be organizational bias, but reluctant properties strike me as the more bounded form of insanity.
Before you conclude that, let's consider the impact of host object properties that return different values based on their syntactic context. From the informal operational semantics given form ECMAScript syntactic constructs, one might conclude that certain source-to-source program transforms are sound, in the sense that they cannot possibly alter the behavior of the program. Consider these two functions. I will use %EXPR% as a metasyntactic variable, indicating any valid JavaScript expression that can appear as the right-hand side of an assignment:
...
Maciej's thought experiment touches upon the fundamental evil of host objects. In the presence of host objects there is no firm foundation for understanding the semantics of an ECMAScript program. Adding some additional restrictions on host objects only reduces the insanity but doesn't eliminate it. That's why some of us would like to see ECMAScript semantics expanded so that all of the essential characteristics of Web APIs can be directly expressed in the ECMAScript language without having to use the host object loophole (and would like to set any Web API features that can't be support with rational ES features on the deprecation path).
The argument about which document.all anti-detection semantics most closely complies with the standard's host object exemptions is pointless. The same scenario could have just as easily come about because IE (or any other browser) added a useful native object and some large group of users adopted it as a form of browser detection. You'd have the same problem and probably would have arrived at the same solutions but won't have the host objects exemptions to use as defense for the hack.
Death to host objects!!
On 10/15/2009 02:18 PM, Allen Wirfs-Brock wrote:
Maciej's thought experiment touches upon the fundamental evil of host objects. In the presence of host objects there is no firm foundation for understanding the semantics of an ECMAScript program. Adding some additional restrictions on host objects only reduces the insanity but doesn't eliminate it. That's why some of us would like to see ECMAScript semantics expanded so that all of the essential characteristics of Web APIs can be directly expressed in the ECMAScript language without having to use the host object loophole (and would like to set any Web API features that can't be support with rational ES features on the deprecation path).
It's not clear to me how this would win.
Suppose the language as standardized provided ways to:
-
define reluctant properties or values that masquerade as undefined, for 'document.all' emulation,
-
define catch-all getters and setters sufficient to implement the cases recently brought up in the discussions about WebIDL (I forget the specifics),
-
define "split objects" or something similar, to get a global object that behaves the way the web requires, and
-
do other wretched but necessary things I haven't learned about yet.
If these features were in the standard, then people would use them in new code. This seems like a recipe for maximizing the spread of these very odd semantics.
And since it's been demonstrated that implementers are willing to introduce contra-standard behavior into their ES engines to be compatible with the de facto web with reasonable performance, I don't understand the argument that delimiting the existing kludges in the standard would somehow constrain future insanity. It never has before.
That is, a language standard alone isn't strong enough socially to contain the malaise.
What might be strong enough is a public discussion process that is responsive enough for innovators but inclusive enough to give ideas a broad vetting. Obviously, not everything can be standardized before anyone implements it, but there should be a public rough vetting process that a reasonable participant can plausibly believe will improve his/her designs. I'm assuming that, if document.all had been discussed before a broader audience, its problems could have been avoided.
If I'm debating straw man arguments, that wasn't my intention. I may have simply misunderstood what people meant; I'm certainly ignorant of the history. But it seems to me that public discussion, even of leading-edge ideas, is the only way to win.
Brendan Eich wrote:
I agree with Maciej. The implementation-defined operations have clear specifications of their parameters. I think that it is highly undesirable to adopt an interpretation in which they can have arbitrary additional inputs depending on the context in which they are used.