Strengthening Function.prototype.toString
On Sep 26, 2008, at 12:38 AM, Karl Krukow wrote:
My suggestion is to strengthen the contract to state something like: For any function f and any values x1, x2, ... , xn we have:
eval(f.toString())(x1,x2,...,xn) === f(x1,x2,..., xn).
I would agree with such a requirement, but only in the case where the
eval(f.toString()) is performed in the same scope where f was
originally defined, since toString does not represent the scope chain.
Note however, that this still allows for some quite significant
rewrites of the function body, so it might still not be very
interoperable to rely on parsing and modifying the toString result.
, Maciej
On 26/09/2008, at 11.13, Maciej Stachowiak wrote:
I would agree with such a requirement, but only in the case where
the eval(f.toString()) is performed in the same scope where f was
originally defined, since toString does not represent the scope chain.
Agreed! The classic variable capture problem ;-)
Note however, that this still allows for some quite significant
rewrites of the function body, so it might still not be very
interoperable to rely on parsing and modifying the toString result.
Yes, indeed the result of toString would be distinct in different
implementations but semantically, I would get the same functions,
which is quite a strong guarantee (I think this was originally the
intended meaning of 'implementation dependent' in ECMAScript 3).
For my example application of partial evaluation, if I know that the
result is parseable as a FunctionDeclaration and I know that semantics
is preserved, then I can good ahead and perform optimizations although
the syntax is different. I would then get different performance gains
on different implementations.
However I would have to move from a form:
(function(a,b) {return a+b;}).specialize({ a: 42})
to
(function(a,b) {return a+b;}).specialize([42])
because
(function(a,b) {return a+b;}).toString()
could rewrite names of local variables and yield e.g.,
"function(v1,v2) {return v1+v2;}"
Thanks for your input,
On Sep 26, 2008, at 3:27 AM, Karl Krukow wrote:
On 26/09/2008, at 11.13, Maciej Stachowiak wrote:
I would agree with such a requirement, but only in the case where the eval(f.toString()) is performed in the same scope where f was originally defined, since toString does not represent the scope
chain.Agreed! The classic variable capture problem ;-)
I prefer to think of it as the classic features of lexical scope and
first-class functions.
If I remember correctly The reason for keeping it implementation
dependent is so that devices with low memory would not have to keep a
reversible implementation in memory. For example an implementation
might have translated the whole function to machine code.
I know that Opera on mobile phones used to return a string
representation that did not reflect the original.
2008/9/26 Erik Arvidsson <erik.arvidsson at gmail.com>:
I know that Opera on mobile phones used to return a string representation that did not reflect the original.
Yeah. Opera Mobile returned "[ECMAScript code]" or "[ecmascript code]". This was contrary to the ES3 spec (must be parsable as a function definition, IIRC) and also breaks the eval roundtripping by throwing a parse error. Anybody know if those issues have been fixed in more modern versions?
On Fri, 26 Sep 2008 19:58:27 +0200, liorean <liorean at gmail.com> wrote:
2008/9/26 Erik Arvidsson <erik.arvidsson at gmail.com>:
I know that Opera on mobile phones used to return a string representation that did not reflect the original.
Yeah. Opera Mobile returned "[ECMAScript code]" or "[ecmascript code]". This was contrary to the ES3 spec (must be parsable as a function definition, IIRC) and also breaks the eval roundtripping by throwing a parse error. Anybody know if those issues have been fixed in more modern versions?
No, not consistently across "modern versions". It's not likely to be
properly "fixed" for a while yet. The reason is that on many platforms
where memory is scarce, not enabling JS decompilation helps reduce memory
requirements.
Unfortunately several major libraries (prototype in particular, jQuery to
a minor extent) have started (ab)using decompilation for various purposes.
Pages that rely on this tend to malfunction in Opera on low-memory
devices. Some Opera versions make at least some of these pages work again
by returning "function(){[ecmascript code]}" instead but that's of course
a silly hack..
(It would be great if ES-next could have a look at why these libraries
rely on toString - hacks and make it possible to do equivalent things with
nicer language constructs. One of the reasons toString is used is probably
outside the group's scope and caused by typeOf bugs in IE: prototype's
isFunction method uses toString in order to reliably detect whether
something is a function reference. Another usage is apparently - from my
memory - to look at the names of the named arguments and do something
magic if the first argument is $super. I'm not up-to-date on whether the
next iteration has an elegant solution for this apparent requirement.)
2008/9/26 Erik Arvidsson <erik.arvidsson at gmail.com>:
I know that Opera on mobile phones used to return a string representation that did not reflect the original.
On Fri, 26 Sep 2008 19:58:27 +0200, liorean <liorean at gmail.com> wrote:
Yeah. Opera Mobile returned "[ECMAScript code]" or "[ecmascript code]". This was contrary to the ES3 spec (must be parsable as a function definition, IIRC) and also breaks the eval roundtripping by throwing a parse error. Anybody know if those issues have been fixed in more modern versions?
2008/9/26 Hallvord R. M. Steen <hallvord at opera.com>:
No, not consistently across "modern versions". It's not likely to be properly "fixed" for a while yet. The reason is that on many platforms where memory is scarce, not enabling JS decompilation helps reduce memory requirements.
You can fix the ES3 spec compliancy by simply returning "function(){/decompilation disabled/}" or something like that instead of "[ecmascript code]". You could also fix the eval roundtripping using for example "function(){opera.getFunction('UniqueFunctionID').apply(this,arguments);}" as a way to without decompiling allow roundtripping.
On Sep 26, 2008, at 5:32 PM, Hallvord R. M. Steen wrote:
On Fri, 26 Sep 2008 19:58:27 +0200, liorean <liorean at gmail.com> wrote:
2008/9/26 Erik Arvidsson <erik.arvidsson at gmail.com>:
I know that Opera on mobile phones used to return a string representation that did not reflect the original.
Yeah. Opera Mobile returned "[ECMAScript code]" or "[ecmascript code]". This was contrary to the ES3 spec (must be parsable as a function definition, IIRC) and also breaks the eval roundtripping by throwing a parse error. Anybody know if those issues have been fixed in more modern versions?
No, not consistently across "modern versions". It's not likely to be properly "fixed" for a while yet. The reason is that on many platforms where memory is scarce, not enabling JS decompilation helps reduce
memory requirements.Unfortunately several major libraries (prototype in particular,
jQuery to a minor extent) have started (ab)using decompilation for various
purposes. Pages that rely on this tend to malfunction in Opera on low-memory devices. Some Opera versions make at least some of these pages work
again by returning "function(){[ecmascript code]}" instead but that's of
course a silly hack..(It would be great if ES-next could have a look at why these libraries rely on toString - hacks and make it possible to do equivalent
things with nicer language constructs. One of the reasons toString is used is
probably outside the group's scope and caused by typeOf bugs in IE: prototype's isFunction method uses toString in order to reliably detect whether something is a function reference. Another usage is apparently -
from my
That's what jQuery's isFunction
used to do. Afaik, they reverted it
to a simple instanceof
check just recently. In prototype.js, on the
other hand, we tried to work around cross-frame "issues" that arise
when using instanceof
. Doing a typeof
check (of both an object and
its call property) seems to be sufficient.
memory - to look at the names of the named arguments and do something magic if the first argument is $super. I'm not up-to-date on whether
the next iteration has an elegant solution for this apparent requirement.)
Latest build does the same thing. That "something magic" is actually
just creating a reference to a "super" method (same-named method of a
parent class). Function decompilation, while non-standard, seems to be
the best solution when implementing such inheritance mechanism. We
would gladly consider alternative (more compliant) solutions, if any
exist.
Karl Krukow wrote:
However I would have to move from a form:
(function(a,b) {return a+b;}).specialize({ a: 42})
to
(function(a,b) {return a+b;}).specialize([42])
Does your example substitute a or b? How to indicate what parameter(s) you want to specialize?
Another thing: your solution relies on keeping around potentially big function bodies in their textual form, and a full-blown parser (+ interpreter, if you want to do optimizations while specializing) written in JavaScript, while there is a parser already in the compiler/interpreter that is a part of a run-time (see eval()). Sounds complicated, resource-hungry, and redundant. Why not implement specialize() at the interpreter/compiler level rather than hacking around with toString()? If we are to ask for features let's ask big! ;-)
Thanks,
Eugene
A summary of the discussion.
Pros:
-- current large implementations already satisfy this. I've tested it
on: Firefox 2, Firefox 3, Safari 3.2.1, IE6, IE7, the latest Opera,
SquirrelFish Extreme, TraceMonkey and V8.
-- It is powerful. When combined with a parser like Crockford's one
can use some powerful meta-programming techniques commonly only found
in languages like Scheme and LISP.
-- It should not be that hard to implement.
Cons:
-- There are some smaller implementations which do not satisfy this
semantic property.
-- It would perhaps add more complexity to implementations and it is
hard to test true equality of functions so you would have to prove it
correct ;-)
-- It would require implementations to keep a decompilable version of
the function around. This may not be appropriate for implementations
running on memory-constrained devices.
I would really like to stay on track, discussing whether or not it is
a possible addition to 3.1. Are the pros outweighing the cons?
Proposal: To extend section 15.3.4.2. Function.prototype.toString ( ) with something similar to the following paragraph.
For any function f and any values x1, x2, ... , xn, whenever
eval(f.toString()) is called in the same scope f was defined, we have
eval(f.toString())(x1,x2,...,xn) === f(x1,x2,..., xn).
I use toString in a test suite that I built (jspec) to simplify the testing DSL. Effectively, I extract the string contents of the function, and build a new function with parameters representing some methods I want to make available inside, and then call the function with those methods as parameters. It's a little bit of a hack, but protects the global namespace and allows me to keep the DSL simple.
I believe Screw.Unit uses the same technique, as it's based on my original jspec proof-of-concept.
Also, if toString() was guaranteed to return a valid JS representation of the original function, it would be theoretically possible to mutate functions (a la lisp macros) with a pure-JS JS parser. For the most part, I have assumed that a correct toString() is available on implementations I'm worried about people wanting to run jspec.
-- Yehuda
2008/9/26 Juriy Zaytsev <kangax at gmail.com>
On 27/09/2008, at 08.58, Eugene Lazutkin wrote:
Karl Krukow wrote:
However I would have to move from a form:
(function(a,b) {return a+b;}).specialize({ a: 42})
to
(function(a,b) {return a+b;}).specialize([42])
Does your example substitute a or b? How to indicate what parameter(s) you want to specialize?
In this second form, it would specialize the first parameter. To
specialize the second, you would use
(function(a,b) {return a+b;}).specialize([undefined,42])
granted you would be unable to specialize using undefined as a value.
Alternatively you could use a form:
(function(a,b) {return a+b;}).specialize([{1: 42}])
Another thing: your solution relies on keeping around potentially big function bodies in their textual form, and a full-blown parser (+ interpreter, if you want to do optimizations while specializing)
written in JavaScript, while there is a parser already in the compiler/interpreter that is a part of a run-time (see eval()).
At least it depends on being able to produce a FunctionDeclaration
which, when eval'ed, is equivalent to the original function. However,
a JIT would have some representation of the function stored already,
otherwise how would it emit code? Now I am theorizing, but wouldn't it
just be adding another 'translation module' which emits code back in
JavaScript?
Also, I am not convinced it is such a big issue, there are
implementations that target also mobile devices which keep textual
representations around. At least I know that V8 keeps the source
around and emits code directly from that without an intermediate
representation. But I am no expert so I will leave it to the
implementers to judge :-)
Sounds complicated, resource-hungry, and redundant. Why not implement specialize() at the interpreter/compiler level rather than hacking around with toString()? If we are to ask for features let's ask
big! ;-)
I'm not convinced that it is 'complicated, resource-hungry or
redundant.' At least I have an early prototype implementation, Jeene (code.google.com/p/jeene
) which I feel is not too bad performance wise: it is a one pass
solution as is based on Crockford's efficient Pratt parser.
However, you are right! I would actually like to 'ask big' ;-) If I
could ask for more I would not ask for a specialize function directly
in the environment. 'specialize' is just one application of a much
more powerful concept which is the LISP code/data duality. Instead, I
would ask for e.g., a Function.prototype.representation function that
would return an object representing the abstract syntax tree of the
function, perhaps even annotated with additional information (and
perhaps a dual Function.prototype.compile). This would be even better
as I wouldn't have to do the parsing myself.
Think I can have that? ;-) Thanks for your great reply.
Inline.
Karl Krukow wrote:
(function(a,b) {return a+b;}).specialize([42]) Does your example substitute a or b? How to indicate what parameter(s) you want to specialize?
In this second form, it would specialize the first parameter. To
specialize the second, you would use(function(a,b) {return a+b;}).specialize([undefined,42])
granted you would be unable to specialize using undefined as a value.
Alternatively you could use a form:
(function(a,b) {return a+b;}).specialize([{1: 42}])
My understanding you are not proposing this as a part of the standard, but rather how someone can implement the functionality, right?
At least it depends on being able to produce a FunctionDeclaration
which, when eval'ed, is equivalent to the original function. However,
a JIT would have some representation of the function stored already,
otherwise how would it emit code? Now I am theorizing, but wouldn't it
just be adding another 'translation module' which emits code back in
JavaScript?
- JIT can use an intermediate byte code that was most probably optimized.
- JIT can use optimized syntax tree notation (I hope we'll go past that).
- No JIT, the code is already (partially) compiled.
To sum it up: I don't see the practical need to keep the code in its source form around.
I'm not convinced that it is 'complicated, resource-hungry or
redundant.' At least I have an early prototype implementation, Jeene (code.google.com/p/jeene ) which I feel is not too bad performance wise: it is a one pass
solution as is based on Crockford's efficient Pratt parser.
My intention was more humble: I wanted to voice my personal opinion on this matter rather than convincing anyone that my opinion is the ultimate truth. And I am sure that your solution works just fine barring small details of the performance testing.
Still I think requiring to keep the source code around other than for debugging is an overkill.
However, you are right! I would actually like to 'ask big' ;-) If I
could ask for more I would not ask for a specialize function directly
in the environment. 'specialize' is just one application of a much
more powerful concept which is the LISP code/data duality. Instead, I
would ask for e.g., a Function.prototype.representation function that
would return an object representing the abstract syntax tree of the
function, perhaps even annotated with additional information (and
perhaps a dual Function.prototype.compile). This would be even better
as I wouldn't have to do the parsing myself.
Here we go again: you made another implicit assumption that AST are still around (and untransformed) when the code is running. And even one more: AST objects should be the same on all interpreters.
Think I can have that? ;-) Thanks for your great reply.
/Karl
I think that there is one common flaw with both suggestions (the source code and the AST) --- they are too low-level, require too much JavaScript (parser + interpreter, even for AST), don't give much flexibility to implementers, and my prevent possible optimizations. I feel that more high-level (actually medium-level) constructs would give the flexibility for programmers and implementers without compromising the performance. Of course it is all predicated on the interest from users in this functionality, which is still to be seen.
Thanks,
Eugene
On 30/09/2008, at 19.12, Eugene Lazutkin wrote:
Karl Krukow wrote:
Alternatively you could use a form:
(function(a,b) {return a+b;}).specialize([{1: 42}])
My understanding you are not proposing this as a part of the standard, but rather how someone can implement the functionality, right?
Yes, exactly.
- JIT can use an intermediate byte code that was most probably
optimized.- JIT can use optimized syntax tree notation (I hope we'll go past
that).- No JIT, the code is already (partially) compiled.
To sum it up: I don't see the practical need to keep the code in its source form around.
OK, for option 3 I don't know what to do, and I may have misunderstood
you. But for 1 and 2, wouldn't it be possible to produce some source
form from the byte code or ast? I mean the source form doesn't have
to be identical to any original source text, but may be generated on
the fly.
My intention was more humble: I wanted to voice my personal opinion on this matter rather than convincing anyone that my opinion is the ultimate truth. And I am sure that your solution works just fine
barring small details of the performance testing.
Good. Your opinion and discussion is exactly what I was looking for.
Here we go again: you made another implicit assumption that AST are still around (and untransformed) when the code is running. And even
one more: AST objects should be the same on all interpreters.
Yes, it depends on existence of an AST. Different implementations
could map their internal representation to a common standardized
minimal representation which at least wouldn't require AST objects to
be equal.
I think that there is one common flaw with both suggestions (the
source code and the AST) --- they are too low-level, require too much JavaScript (parser + interpreter, even for AST), don't give much flexibility to implementers, and my prevent possible optimizations. I feel that more high-level (actually medium-level) constructs would
give the flexibility for programmers and implementers without compromising the performance. Of course it is all predicated on the interest from users in this functionality, which is still to be seen.
I respect you opinion.
Hallvord R. M. Steen wrote:
On Fri, 26 Sep 2008 19:58:27 +0200, liorean <liorean at gmail.com> wrote:
2008/9/26 Erik Arvidsson <erik.arvidsson at gmail.com>:
I know that Opera on mobile phones used to return a string representation that did not reflect the original.
Yeah. Opera Mobile returned "[ECMAScript code]" or "[ecmascript code]". This was contrary to the ES3 spec (must be parsable as a function definition, IIRC) and also breaks the eval roundtripping by throwing a parse error. Anybody know if those issues have been fixed in more modern versions?
No, not consistently across "modern versions". It's not likely to be
properly "fixed" for a while yet. The reason is that on many platforms
where memory is scarce, not enabling JS decompilation helps reduce memory
requirements.
Do you keep the original source code of the whole script in memory, or at least somewhere cached? If so, you could store offsets to the function's source within memory/cache in defined functions, and get the function source as a string on demand. It would be slow, but it would fix compatibility, and I hardly think the speed of a relatively little-used feature matters for mobile.
Juriy Zaytsev wrote:
On Sep 26, 2008, at 5:32 PM, Hallvord R. M. Steen wrote:
memory - to look at the names of the named arguments and do something
magic if the first argument is $super. I'm not up-to-date on whether the
next iteration has an elegant solution for this apparent requirement.)Latest build does the same thing. That "something magic" is actually just creating a reference to a "super" method (same-named method of a parent class). Function decompilation, while non-standard, seems to be the best solution when implementing such inheritance mechanism. We would gladly consider alternative (more compliant) solutions, if any exist.
So as far as I can tell, what we need to discourage usage of func.toString() is:
- An API for function currying/partial evaluation (specializing/binding certain arguments).
- A read-only property on functions that contains the list of parameter names.
For any other purpose, I would think that you might as well parse the whole source file rather than just the function, since you would need to keep track of closures. For example, consider this:
function foo() { var x = 10; return function() { return x; }; }
function bar(f) { var x = 20; print(f()); print(eval(f.toString())()); }
bar(foo());
That would print 10, then 20. So as you can see, even in the same scope, eval(f.toString())(...) is not necessarily equal to f(), even if f.toString() is a correct decompilation of f.
Inline.
Yuh-Ruey Chen wrote:
So as far as I can tell, what we need to discourage usage of func.toString() is:
- An API for function currying/partial evaluation (specializing/binding certain arguments).
- A read-only property on functions that contains the list of parameter names.
Both points are valid. But #2 is of little utility for variadic functions. Example:
function foo(){ return arguments[5]; };
As you can see it is generally impossible to know how many arguments are actually expected, and what are their names.
Personally I am not against a read-only property for argument names, just pointing out its limited utility.
For any other purpose, I would think that you might as well parse the whole source file rather than just the function, since you would need to keep track of closures. For example, consider this:
function foo() { var x = 10; return function() { return x; }; }
function bar(f) { var x = 20; print(f()); print(eval(f.toString())()); }
bar(foo());
That would print 10, then 20. So as you can see, even in the same scope, eval(f.toString())(...) is not necessarily equal to f(), even if f.toString() is a correct decompilation of f.
That's a good point about literal comparison of bodies. The other side of this argument is: it is generally impossible to read the function body, and compile it back with eval() expecting to get the same functional behavior because it this point we have no idea about closures. To accomplish that we need a way to manipulate closures/context: extract it from a function somehow, specify it as a parameter to eval()< and so on. I am not sure it is doable/practical.
Thanks,
Eugene
On Sat, 11 Oct 2008 04:52:20 +0200, Yuh-Ruey Chen <maian330 at gmail.com>
wrote:
Yeah. Opera Mobile returned "[ECMAScript code]" or "[ecmascript code]". Anybody know if those issues have been fixed in more modern versions?
No, not consistently across "modern versions". It's not likely to be properly "fixed" for a while yet. The reason is that on many platforms where memory is scarce, not enabling JS decompilation helps reduce
memory requirements.
Do you keep the original source code of the whole script in memory, or at least somewhere cached?
I am not a developer but I think this is configurable and turned off to
save memory on many platforms.
Hello everyone,
I was exploring extending Crockford's Pratt parser to build a partial
evaluator for JavaScript. One would extend Function.prototype with a
specialize function, e.g.
(function(a,b) {return a+b}).specialize({ a: 42 })
would return a function equivalent to:
function(b) {return 42+b}
This technique relies on caling toString on functions. However, the
ECMAScript 3.1 (and 3) specs says the following about
Function.prototype.toString:
15.3.4.2 Function.prototype.toString ( ) An implementation-dependent representation of the function is
returned. This representation has the syntax of a FunctionDeclaration. Note in particular that the use and
placement of white space, line terminators, and semicolons within the representation string is
implementation-dependent...
The phrase: "implementation-dependent" bothers me. It gives no
guarantees that the function returned should bear any relation to the
function on which toString is called. E.g., it would be legal to
always return "function(){}".
The sentence: "Note in particular that the use and placement of white
space, line terminators, and semicolons within the representation string is
implementation-dependent." seems to indicate the intention of
"implementation-dependent" refers only to the "optional/non-semantic"
parts of the definition. It is unclear, however.
Effectively this means that programmers cannot rely on
Function.prototype.toString in general, render it useless.
My suggestion is to strengthen the contract to state something like:
For any function f and any values x1, x2, ... , xn we have:
eval(f.toString())(x1,x2,...,xn) === f(x1,x2,..., xn).
Indicating also a semantic match.
How do you feel about this suggestion? Here is my current list of Pros
and Cons:
Pros:
-- current large implementations already satisfy this. I've tested
it on: Firefox 2, Firefox 3, Safari 3.2.1, IE6, IE7, the latest Opera,
SquirrelFish Extreme, TraceMonkey and V8.
-- It is powerful. When combined with a parser like Crockford's one
can use some powerful meta-programming techniques commonly only found
in languages like Scheme and LISP.
-- It should not be that hard to implement.
Cons:
-- There are some smaller implementations which do not satisfy this
semantic property.
-- It would perhaps add more complexity to implementations and it is
hard to test true equality of functions so you would have to prove it
correct ;-)
Hope to hear from you,
-- Karl
P.S. If you are interested in the partial evaluator project, Jeene, go
to: blog.higher-order.net.