ES3.1 questions and issues
On Mar 18, 2009, at 00:43 , Mark S. Miller wrote:
- For the identifiers we are already confident will become keywords as of ES-Harmony -- "const" and "let", perhaps "yield" and "lambda" -- would it make sense to make these keywords now in strict code?
I think that would be a very good idea.
Tobie
Some thoughts below
-----Original Message----- From: es-discuss-bounces at mozilla.org [mailto:es-discuss- bounces at mozilla.org] On Behalf Of Mark S. Miller Sent: Tuesday, March 17, 2009 4:43 PM To: es3.x-discuss at mozilla.org x-discuss; es-discuss Subject: ES3.1 questions and issues
Some ES3.1 questions and issues that some Googlers have asked me to relay:
- At the end of 7.8.5 is the sentence "If the call to new RegExp would generate an error, the error must be reported while scanning the program." Presumably, the intention of this sentence applies only to syntax errors. For example, if the call to new RegExp would generate an out of memory error, we should not imply that this must instead be reported at scanning time.
It presumably means syntax error and any static semantic errors that can be detected by statically examining the regular expression text. (I'm not sure if there are, any of the latter). The above sentence should probably be restarted as ""If the call to new RegExp would generate any of the errors specified in 15.10.4.1, the errors must be reported while scanning the program."
- There is general confusion about when "scanning" is, i.e., what temporal distinction is being made here. When I tried to clarify, I found I may be confused as well. Can an implementation postpone scanning a top level function until it is called? Until after earlier top-level definitions and/or statements in its Program have been evaluated?
The "scanning" terminology is a carryover from previous editions. It think a reasonable interpretation of the intended meaning of "reported while scanning" is that all such errors for any particular ECMAScript /Program/ (section 14 meaning) must be reported before any code of that /Program/ is executed. I'm not sure where such a definition should go in the specification.
- What about logical impossibilities like "/$.^/"? May an implementation reject this at scan time? Must it? If not, then what about during construction at runtime?
Only if it is explicitly defined as an error somewhere in the spec. Of course, an implementation would be free to use any available mechanism to issue a non-fatal warning.
- I do not know that the verbal agreement (from the ES3.1 phone calls) that I summarize below is adequately captured by the current draft spec:
If a host object provides its own internal [[GetOwnProperty]] method, the mutability implies by the descriptors it returns must be an upper bound on the possible mutations of the described property. For example, if an accessor (getter/setter) property is described as a data property where the getter may return different values over time, then the [[Writable]] attribute must be true. If the attributes may change over time or if the property might disappear, then the [[Configurable]] attribute must be true. If [[Writable]] and [[Configurable]] are both false and the property is described as a data property, then the host is effectively promising that the gotten value will be stable and may validly be cached.
Is this guarantee adequately captured? Where?
It isn't captured. Now that Mark as written the words I can add them to section 8.6.2
- Annex B lists some aspects of consensus JavaScript as implemented by some major browsers but purposely omitted from the normative spec. However, other aspects of consensus JavaScript are not listed in Annex B. Should they be? defineGetter, defineSetter, proto nested function definitions, const non-strict arguments.caller, <function>.caller, <function>.arguments And now that <function>.toString()'s behavior has reverted to unspecified, do we want to suggest particular behavior in Annex B?
Annex B is a carryover from previous editions. It is my impression (Brendan?) the functions in Annex B are either ones that existed in Editions 1 and/or 2 but are not in Edition 3 or functions that were implemented in IE and Mozilla prior to initial standardization. Personally, I don't think we should be adding to this section except for items that are actually removed from an Edition of the specification. I don't think we have any for ES3.1 The things listed above either have better alternatives provided by ES3.1, are implemented by various implementation with irreconcilable differences (so which version would be describe?), do not have consensus regarding their semantics or impact on the existing language, or are hazards that we don't want to encourage. I think any of these concerns is enough to not include something.
If we had agreement on that <function>.toString should do, we would put it into the spec, not the Annes.
- Array.prototype.sort was a particularly tricky thing to specify, since we do not wish to specify which sorting algorithm an implementation must use, even though the resulting differences are observable by side-effecting comparefn, getters, and setters. Were we to specify any normal sorting algorithm in our normal pseudo-code algorithmic style, the sort() function itself would [[Get]] "length", make some set of [[Get]]s, [[Put]]s, [[Delete]]s, on numerically named properties between 0 and length-1, where values [[Put]] would only be values obtained from previous [[Get]]s, and would make some set of calls to comparefn, using only values obtained from previous [[Get]]s. Each of these operations may behave badly in a number of ways, in which case the specified sort routine will fail to sort. But the only side effects sort would cause would be that brought about by calling these other operations. If any of these operations throw, I would expect sort() not to intercept it but to let it propagate, terminating the sort().
Notice that I said all that without naming a specific sort algorithm. Would it be reasonable to tighten the sort spec, including the sort failure spec, in this way?
In all other Array.prototype functions (and some other places) where similar possibilities exist, I have a situational appropriate variation of a sentence such as "The final state of O is unspecified if in the above algorithm any call to the [[ThrowingPut]] or [[Delete]] internal method of O throws an exception." I didn't add one for sort because I thought there were already enough caveats in the sort specification to cover it. Among other things, it says that sort behavior is implementation defined "If any property of this array whose property name is a nonnegative integer less than len is an accessor property."
The exception propagation issue applies to any operation that might throw that is used in the specification of any section 15 function. I think, what is supported to happen is reasonably covered in the last paragraph of 5.2
IIUC, [[Delete]] is only included so that a sort algorithm can pack an array with holes in it if it wishes. Is this behavior important? Could we instead tighten the spec to add a requirement that the supposed array be packed? What does sort() do on existing browsers on arrays with holes?
Many of the other Array.prototype function also explicitly deal with holes in a well specified manner. While we might debate the utility of this, I do see any good reason of deprecate it at this time.
- For the identifiers we are already confident will become keywords as of ES-Harmony -- "const" and "let", perhaps "yield" and "lambda" -- would it make sense to make these keywords now in strict code?
"The best laid plans of mice and men..." I'm not a big fan of this approach. Just look at how much mileage we've gotten out of the future reserved words :-)
On Tue, Mar 17, 2009 at 6:56 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
In all other Array.prototype functions (and some other places) where similar possibilities exist, I have a situational appropriate variation of a sentence such as "The final state of O is unspecified if in the above algorithm any call to the [[ThrowingPut]] or [[Delete]] internal method of O throws an exception."
Oh. I see that now. Searching, I see it in Array.prototype.{pop,push,reverse,shift,splice,unshift,map}
I could find no other examples.
I think the inclusion of "map" is a mistake. Map does not mutate O.
For the others, I think "unspecified" is way too broad.
- These methods must not modify any frozen properties.
- The only values they may write into these properties should be those that might have been written had the specified algorithm been followed up to the point that the error was thrown. Otherwise, a conforming implementation could react to a failure to pop by writing the global object or your password into the array.
- Is there any reason for this looseness at all? If you simply leave out this qualifier, then the reading of these algorithms consistent with the rest of the spec is that the side effects that happened up to the point of the exception remain, while no further side effects happen. That assumption is pervasive across all other algorithmic descriptions in the spec. I think we should just drop this qualifier.
On Tue, Mar 17, 2009 at 6:56 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
- For the identifiers we are already confident will become keywords as of ES-Harmony -- "const" and "let", perhaps "yield" and "lambda" -- would it make sense to make these keywords now in strict code?
"The best laid plans of mice and men..." I'm not a big fan of this approach. Just look at how much mileage we've gotten out of the future reserved words :-)
Actually, if we look at the list in 7.5.3, at the present time, I would guess that
export interface class const import public
will be used by Harmony. Hedging bets, I remain glad that
abstract static extends super final native package implements
are reserved as well. The remainder
boolean long byte synchronized
char float throws goto private transient
protected volatile double enum int short
I am less keen on. But neither has their reservation been terribly costly. In my judgment, the mileage of the future reserved words approach has been worth its minor cost. YMMV!
You're correct about map. It had previously used [[ThrowingPut]] rather than [[DefineOwnProperty]].
Over specification of non-essential requirements is not necessarily beneficial.
I'm reluctant to over spec. these algorithms for these error cases. As section 5.2 says an implementation may choose to use more efficient algorithms and the necessity to exactly match a complex state in an error situation makes it much more difficult to create more efficient algorithms. I would sooner have non-error cases run faster than to have precisely specified results for improbable error conditions such as applying these functions to an array where some arbitrary set of elements are non-writable or non-delectable. It actually bothers me that the algorithms imply a particular sequence of element access and hence implies a particular order of side-effects if any accessed elements are accessor properties. It might actually be better if the other array algorithms said something similar to sort's "Perform an implementation-dependent sequence of calls to the [[Get]] , [[ThrowingPut]], and [[Delete]]". Nobody should intentionally writes code that depends upon the order of side-effects of these operations.
Saying the result is unspecified is not open an invitation for implementation to expose the user's password or to do anything else that would not be normally part of one of these algorithms. No rational implementation is going to do something like that. "Unspecified" is more a warning to programmers that if an exception is thrown they should not depend upon the state of the involved objects
There's a huge benefit to specifying these keywords now: guaranteeing
future compliance of ES 3.1 strict programs.
The cost, on the other hand is fairly small: pick a new variable or
function name.
It's always harder to fix these issues in widely distributed programs.
Some of the challenges faced by the Prototype JavaScript Framework
involve certain methods specified in ES 3.1 (notably the new
Array.prototype methods) which were implemented in Prototype prior to
their specification. Doing so causes significant backward-
compatibility issues and is best avoided.
OT: For what it's worth, that is also why I would find it useful to
specify an Object.prototype.toDebugString method for ES 3.1, even if
the specification is loose. It would just prevent programs from
accidentally shadowing it.
On Tue, Mar 17, 2009 at 10:43 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
You're correct about map. It had previously used [[ThrowingPut]] rather than [[DefineOwnProperty]].
Actually, that's not the issue. Even when map did [[ThrowingPut]], it was (and is) mutating the newly constructed array, not the object it is iterating over. In the case of exceptional exit, the partially constructed result becomes unreachable, and so is unobservable.
Over specification of non-essential requirements is not necessarily beneficial.
Certainly. By definition. Bad things are bad. The questions are: how much specification is "over" and which requirements are non-essential?
Saying the result is unspecified is not open an invitation for implementation to expose the user's password or to do anything else that would not be normally part of one of these algorithms. No rational implementation is going to do something like that. "Unspecified" is more a warning to programmers that if an exception is thrown they should not depend upon the state of the involved objects
Conventional developers seek only functionality, and stay away from edge conditions. Attackers seek opportunities in edge conditions. So defenders must reason about the limits on the damage that might be caused by these edge conditions.
Put another way, conventional developers must code to the intersection semantics of the platforms in question, since a correct program must work across all these platforms. Attackers can seek opportunities in the union semantics, since an attack that works on any platform is still a successful attack. More deterministic specs narrow the gap between these two.
So, in attempting to reason about the security of Caja, ADsafe, WebSandbox, FBJS2, or Jacaranda, we must find some precise codification of your "No rational implementation is going to do something like that" and pray that we got it right. If defenders and implementers read slightly different things into your "something like that", holes will happen. Better to codify this in the spec, as that's what the spec is for: an agreed common understanding to serve as a coordination point for implementers, developers, attackers, and defenders.
-----Original Message----- From: Tobie Langel [mailto:tobie.langel at gmail.com] ... Some of the challenges faced by the Prototype JavaScript Framework involve certain methods specified in ES 3.1 (notably the new Array.prototype methods) which were implemented in Prototype prior to their specification. Doing so causes significant backward- compatibility issues and is best avoided.
I'd be interested in more details on how the new Array.prototype methods cause you problems. With only one or two exception, they are the same as the Mozilla "Array extra" methods so you must already have to deal with the possibly that they are present in some browsers. The new methods are all writable so if somebody wants to unconditionally install their own replacements they can. We are certainly trying to minimize negative impacts of these sorts of additions so if our mitigations are not sufficient it would be very useful to understand why.
Would it be useful if there was a way to determine whether nor not a function was "built-in"?
-----Original Message----- From: Mark S. Miller [mailto:erights at google.com] Sent: Wednesday, March 18, 2009 9:13 AM To: Allen Wirfs-Brock ... So, in attempting to reason about the security of Caja, ADsafe, WebSandbox, FBJS2, or Jacaranda, we must find some precise codification of your "No rational implementation is going to do something like that" and pray that we got it right. If defenders and implementers read slightly different things into your "something like that", holes will happen. Better to codify this in the spec, as that's what the spec is for: an agreed common understanding to serve as a coordination point for implementers, developers, attackers, and defenders.
First of all, implementers, defenders, and everybody else will always read slightly different things into any specification. If you want perfectly identical behavior then you don't want a standard instead you want a single universally used implementation. That has its own problems---the word "monoculture" comes to mind...
Like all engineering, building a good JavaScript implementation is a matter of making trade-off among multiple dimensions of requirements and objectives. Security is only one of these dimensions. Implementers must determine in the context of their overall objectives and practical limitations the appropriate balance of between security, performance, robustness, features, etc. If a standard over specifies requirements along any of these dimensions those requirements are likely to simply be ignored by implementations and hence are self defeating from a standards perspective.
On Mar 18, 2009, at 17:20 , Allen Wirfs-Brock wrote:
Would it be useful if there was a way to determine whether nor not a
function was "built-in"?
That would be useful outside of the security concerns expressed by
Mark. I'm thinking about performance and robustness concerns in JS
libraries, for example.
On Wed, Mar 18, 2009 at 9:38 AM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
First of all, implementers, defenders, and everybody else will always read slightly different things into any specification. If you want perfectly identical behavior then you don't want a standard instead you want a single universally used implementation. That has its own problems---the word "monoculture" comes to mind...
Like all engineering, building a good JavaScript implementation is a matter of making trade-off among multiple dimensions of requirements and objectives. Security is only one of these dimensions. Implementers must determine in the context of their overall objectives and practical limitations the appropriate balance of between security, performance, robustness, features, etc. If a standard over specifies requirements along any of these dimensions those requirements are likely to simply be ignored by implementations and hence are self defeating from a standards perspective.
Agreed. I am seeking a good tradeoff that balances these concerns. I am reacting to your
Saying the result is unspecified is not open an invitation for implementation to expose the user's password or to do anything else that would not be normally part of one of these algorithms. No rational implementation is going to do something like that. "Unspecified" is more a warning to programmers that if an exception is thrown they should not depend upon the state of the involved objects
which does not. It leaves all security concerns out in the cold. A single unqualified "unspecified" in the spec makes the spec useless for reasoning about security. When planning a move in an adversarial game, one must reason not only about what one may do, but also about what one's opponent cannot do.
See again the language I propose for sort. While leaving the choice of sorting algorithm unspecified, I place bounds on the range of side effects it may cause even under uncooperative conditions. If we really wish to leave the precise algorithm unspecified for these others, the same qualifying language could still be applied.
-----Original Message----- From: Mark S. Miller [mailto:erights at google.com] Sent: Wednesday, March 18, 2009 10:27 AM ... See again the language I propose for sort. While leaving the choice of sorting algorithm unspecified, I place bounds on the range of side effects it may cause even under uncooperative conditions. If we really wish to leave the precise algorithm unspecified for these others, the same qualifying language could still be applied.
The only thing that I see in the language in your previous message that may not be already covered is "[if any of the [[Get]], [[Put]], or [[Delete]] operations fail,] the only side effects sort would cause would be that brought about by calling these other operations."
Personally, I think this concept is already implicit in the sort algorithm and most other algorithms in the specification. However, I wouldn't mind making it explicit for the other array methods if we could also made it explicit that implementation may use alternative algorithms that apply [Get]], [[Put]], or [[Delete]] in possibly different orderings.
Here is my first cut at a generic version of language that might accomplish that:
The above algorithm is intended to describe the result produced by this function in the absence of any error conditions. It is not intended to imply the use of any specific implementation technique. The above algorithm produces its results by using a specific sequence of calls to internal methods and abstract operations. Implementation may use other algorithms that use a different sequencing of these calls as long as an identical results is obtained in the absence of errors conditions. If any internal method call invokes a get or set function of an accessor property that has side-effects or if an error occurs the observable side-effects of this function are implementation defined but are restricted to those that would be produced by some sequence of the internal method and abstract operation calls in the above algorithm.
Excellent! +1! That captures my concern precisely.
On Wed, Mar 18, 2009 at 12:41 PM, John Cowan <cowan at ccil.org> wrote:
All you need to do now is put the above disclaimer in ALL CAPS and add it to each function and method in the standard.
It is not, after all, actually specified anywhere that the result of evaluating 3 + 4 cannot have the side effect of setting the global variable "Ludolf" to 3.141592653.
If your point is that no disclaimer text is needed here at all, as I stated earlier, I would be happy with that. However, the implication would be conformance to the algorithmic spec as written, which Allen objects to as overspecifying. A blanket "anything may happen" disclaimer as in the current draft spec fatally underspecifies. Allen's text here is a nice compromise.
What I'm going to put on the table is that we add the following paragraph to the end of the introduction to section 15:
Functions in this section are generally defined using algorithms that are intended to describe the result produced by each function in the absence of any implicit error conditions. These algorithms are not intended to imply the use of any specific implementation technique. The each algorithm produces its results by using a specific sequence of calls to internal methods and abstract operations. Implementation may use other algorithms that use a different sequencing of these calls as long as an identical result is obtained in the absence of errors conditions that are not explicitly handled by the specified algorithm. If any internal method call invokes a get or set function of an accessor property that has side-effects or if an implicit error occurs the observable side-effects of a function are implementation dependent but are restricted to those that would be produced by some sequence of the internal method and abstract operation calls that would be made by the specified algorithm.
We can decide at next week's meeting whether we want to carry through with this change.
Mark Miller wrote:
On Tue, Mar 17, 2009 at 6:56 PM, Allen Wirfs-Brock <Allen.Wirfs-Brock at microsoft.com> wrote:
In all other Array.prototype functions (and some other places) where similar possibilities exist, I have a situational appropriate variation of a sentence such as "The final state of O is unspecified if in the above algorithm any call to the [[ThrowingPut]] or [[Delete]] internal method of O throws an exception."
Oh. I see that now. Searching, I see it in Array.prototype.{pop,push,reverse,shift,splice,unshift,map}
I could find no other examples.
I think the inclusion of "map" is a mistake. Map does not mutate O.
For the others, I think "unspecified" is way too broad.
- These methods must not modify any frozen properties.
- The only values they may write into these properties should be those that might have been written had the specified algorithm been followed up to the point that the error was thrown. Otherwise, a conforming implementation could react to a failure to pop by writing the global object or your password into the array.
- Is there any reason for this looseness at all? If you simply leave out this qualifier, then the reading of these algorithms consistent with the rest of the spec is that the side effects that happened up to the point of the exception remain, while no further side effects happen. That assumption is pervasive across all other algorithmic descriptions in the spec. I think we should just drop this qualifier.
I agree completely, and particularly with points 1) and 2). There should be very good reasons to make behaviour unspecified or implementation-defined; here there is not.
On Mar 18, 2009, at 11:58 AM, Tobie Langel wrote:
On Mar 18, 2009, at 17:20 , Allen Wirfs-Brock wrote:
Would it be useful if there was a way to determine whether nor not
a function was "built-in"?That would be useful outside of the security concerns expressed by
Mark. I'm thinking about performance and robustness concerns in JS
libraries, for example.
Can you give an example?
We're self-hosting native methods that are called built-in by the spec
in TraceMonkey, and I know V8 original self-hosted some Array and
String methods. With the right isolation to avoid violating the spec,
this is a good thing and it should not be prohibited.
Self-hosting built-ins also should not be the subject of second-
guessing versionitis in JS libraries. Shades of coding C idioms for
the microarchitecture of certain revisions of the x86!
BTW, by "built-in" I didn't mean to imply anything about how the function was implemented but simply whether or not it was the implementation provided version of a function specified by the standard. It occurred to me that this information might be useful to framework builder as part of a feature detection protocol used to decide whether or not to install framework provided versions of some such methods.
Some ES3.1 questions and issues that some Googlers have asked me to relay:
At the end of 7.8.5 is the sentence "If the call to new RegExp would generate an error, the error must be reported while scanning the program." Presumably, the intention of this sentence applies only to syntax errors. For example, if the call to new RegExp would generate an out of memory error, we should not imply that this must instead be reported at scanning time.
There is general confusion about when "scanning" is, i.e., what temporal distinction is being made here. When I tried to clarify, I found I may be confused as well. Can an implementation postpone scanning a top level function until it is called? Until after earlier top-level definitions and/or statements in its Program have been evaluated?
What about logical impossibilities like "/$.^/"? May an implementation reject this at scan time? Must it? If not, then what about during construction at runtime?
I do not know that the verbal agreement (from the ES3.1 phone calls) that I summarize below is adequately captured by the current draft spec:
If a host object provides its own internal [[GetOwnProperty]] method, the mutability implies by the descriptors it returns must be an upper bound on the possible mutations of the described property. For example, if an accessor (getter/setter) property is described as a data property where the getter may return different values over time, then the [[Writable]] attribute must be true. If the attributes may change over time or if the property might disappear, then the [[Configurable]] attribute must be true. If [[Writable]] and [[Configurable]] are both false and the property is described as a data property, then the host is effectively promising that the gotten value will be stable and may validly be cached.
Is this guarantee adequately captured? Where?
And now that <function>.toString()'s behavior has reverted to
unspecified, do we want to suggest particular behavior in Annex B?
Notice that I said all that without naming a specific sort algorithm. Would it be reasonable to tighten the sort spec, including the sort failure spec, in this way?
IIUC, [[Delete]] is only included so that a sort algorithm can pack an array with holes in it if it wishes. Is this behavior important? Could we instead tighten the spec to add a requirement that the supposed array be packed? What does sort() do on existing browsers on arrays with holes?