Safe, Closure-free, Serializable functions

# François REMY (12 years ago)

It's been some time I've been working on this idea of a Closure-free, Serializable function. This kind of function would have no access to the parent closures they are defined in, and only limited (read-only) access to the enclosing environment (read-only Math, Object, Array, others...).

To the difference of other functions, those objects would not be affected by the JavaScript running elsewhere on the page, so in this closure-free function, Array.prototype.slice is guaranteed to be the native slice function, not some kind of polyfill or replaced function.

|    function sort(array) as serializable { |        Array.prototype.sort.call(array); |    }

|    function sqrt(number) as serializable { |        return Math.sqrt(number); |    }

|    function BAD_CODE() as serializable { |        return new XMLHttpRequest(); // xhr not defined in ES6 |    }

Trying to export a global variable or modify any of the host environment-defined objects would fail.

|    function BAD_CODE(number) as serializable { |        globalNumber = number; // cannot write into the global object |    }

It's also important to note that any Object or Array created inside this context will be passed to the calling context by deep-cloning (or by replacing the "safe" Math object by the host Math object in the case of environmental objects). We could also maybe use the idea of a proxy though deep-cloning seems safer.

This makes sure it's impossible to leak the "safe" objects to the calling context in any way (ie: the calling code can leak anything to the called code, but not the reverse).

|    var Sin = Math.sin; |    Math.sin=function() {}; |    |    function giveMeMath() as serializable { |        return [Math, Math.sin]; |    } |    |    var [m,s] = giveMeMath(); |    // s === Sin |    // m === Math |    // m.sin !== Sin

To be honest, those functions are not really meant to expose new objects: even if they need some internally, they should just keep them internally and avoid distributing them. The deep-cloning algorithm is just there for the cases where you want to return multiple values at the end of a function, or when you need an options object.

Still, the fact they run in a "safe" environment makes them a good candidate for further optimization and inlining, so we may end up seeing codes written as serializable to benefit from performance boost and safety from external attacks.

|    function Point(x,y) as serializable { |        x = +x; |        y = +y; |        return { |            x:x, |            y:y, |            r: Math.sqrt(xx+yy), |            t: Math.atan(x,y) |        }; |    }

The arguments, however, could be any object, and those act totally normally. If an object is given as argument to the function that is an Object, the function can access the "real" Object.prototype by using Object.getPrototypeOf(...).

|    window.newXHR = function newXHR(window) as serializable { |        return new this.XMLHttpRequest(); |    } |    |    var xhr = window.newXHR();

However, it's also possible for the calling code not to give such information by passing only primitive values like string and numbers. I believe this is the most likely usage of this kind of function, at least from the web platform point of view.

The good thing about those functions, is that they can safely be sent over the wires to another thread, or to another web page, because they do not possibly rely on any state or context.

Formalizing those functions is also an important step towards enabling JS code to run safely deeper into the browser stack, by avoiding any possible use of objects that are not supposed to be interacted with at a given time (the calling code can control exactly what the called function has access to).

A possible use case would be to defined arbitrary timing function for animations:

|    function x2(x) as serializable { return x*x; } |    |    // this is safe because SomeWebAnim knows he will call the function only with numbers, so the code cannot access the DOM while it's still being computed, or because the DOM actually lives in another thread than the animation code. |    SomeWebAnim.timingFunction = x2;

Is there some interest from anyone else in formalizing this for a future version of EcmaScript? Or any comment? Francois

PS: for the purposes of safety, we may want to disallow "eval" and "Function" inside this environment to make sure the code can be compiled ahead of time in all cases, and not force the usage of an interpreter. this could also be let to the choice of the author but be exposed as a slightly different concept (ie: compilable + serializable vs serializable only).

# François REMY (12 years ago)

TLDR ==> The web needs a way to express executable code that does not rely on its parent context, is guaranteed to be side-effect-free, and can be executed safely anywhere (even in a different thread/worker/window/device, or as callback for browser code being executed while the DOM is not ready to be accessed)

It's been some time I've been working on this idea of a Closure-free, Serializable function. This kind of function would have no access to the parent closures they are defined in, and only limited (read-only) access to the enclosing environment (read-only Math, Object, Array, others...).

To the difference of other functions, those objects would not be affected by the JavaScript running elsewhere on the page, so in this closure-free function, Array.prototype.slice is guaranteed to be the native slice function, not some kind of polyfill or replaced function.

|    function sort(array) as serializable { |        Array.prototype.sort.call(array); |    }

|    function sqrt(number) as serializable { |        return Math.sqrt(number); |    }

|    function BAD_CODE() as serializable { |        return new XMLHttpRequest(); // xhr not defined in ES6 |    }

Trying to export a global variable or modify any of the host environment-defined objects would fail.

|    function BAD_CODE(number) as serializable { |        globalNumber = number; // cannot write into the global object |    }

|    function BAD_CODE() as serializable { |        Object.prototype.doSomething=function() {}; // cannot write into the native objects |    }

It's also important to note that any Object or Array created inside this context will be passed to the calling context by deep-cloning (or by replacing the "safe" Math object by the host Math object of the calling code in the case of environmental objects). Objects that can't be cloned (non-serializable functions, for example) are just transformed into null. We could also maybe use the idea of a proxy though deep-cloning seems safer.

This makes sure it's impossible to leak the "safe" objects to the calling context in any way (ie: the calling code can leak anything to the called code, but not the reverse).     |    var RealSin = Math.sin; |    Math.sin=function() {}; |    |    function giveMeMath() as serializable { |        return [Math, Math.sin]; |    } |    |    var [m,s] = giveMeMath(); |    // s === RealSin |    // m === Math |    // m.sin !== RealSin |    |    // note that another possibility here |    // would be to have giveMeMath return [null,null] |    // (ie: consider host objects unserializable)

To be honest, those functions are not really meant to expose new objects: even if they need some internally, they should just keep them internally and avoid distributing them. The deep-cloning algorithm is just there for the cases where you want to return multiple values at the end of a function, or when you need an options object.

Still, the fact they run in a "safe" environment makes them a good candidate for further optimization and inlining, so we may end up seeing codes written as serializable to benefit from performance boost and safety from external attacks.

|    function Point(x,y) as serializable { |        x = +x; |        y = +y; |        return { |            x:x, |            y:y, |            r: Math.sqrt(xx+yy), |            t: Math.atan(x,y) |        }; |    }

The arguments, however, could be any object, and those act totally normally. If an object is given as argument to the function that is an Object, the function can access the "real" Object.prototype by using Object.getPrototypeOf(...).

|    window.newXHR = function newXHR(window) as serializable { |        return new this.XMLHttpRequest(); |    } |    |    var xhr = window.newXHR();

However, it's also possible for the calling code not to give such information by passing only primitive values like string and numbers. I believe this is the most likely usage of this kind of function, at least from the web platform point of view.

The good thing about those functions, is that they can safely be sent over the wires to another thread, or to another web page, because they do not possibly rely on any state or context.

Formalizing those functions is also an important step towards enabling JS code to run safely deeper into the browser stack, by avoiding any possible use of objects that are not supposed to be interacted with at a given time (the calling code can control exactly what the called function has access to).

A possible use case would be to defined arbitrary timing function for animations:

|    function x2(x) as serializable { return x*x; } |    |    // this is safe because SomeWebAnim knows he will call the function only with numbers, so the code cannot access the DOM while it's still being computed, or because the DOM actually lives in another thread than the animation code. |    SomeWebAnim.timingFunction = x2;

Is there some interest from anyone else in formalizing this for a future version of EcmaScript? Or any comment? Francois

PS: for the purposes of safety, we may want to disallow "eval" and "Function" inside this environment to make sure the code can be compiled ahead of time in all cases, and not force the usage of an interpreter. this could also be let to the choice of the author but be exposed as a slightly different concept (ie: compilable + serializable vs serializable only).

# Andrea Giammarchi (12 years ago)

similar thread proposing 'use native' for the very similar problem safe closures would like to solve ... I wonder if it wouldn't be good enough to retrieve in a "read-only" fashion all original exposed function prototypes through a constant or read-only global property for the current realm ...

something like window.env.Array.prototype.slice VS window.Array.prototype.env

the env Proxy will expose unconditionally native constructors with their methods as defined before any possible overwrite ... these are all there by default in any case, could be flagged as "never GC even if overwritten" and retrieved back in a special way.

a global constant like this could be easily exposed through every engine I've seen and be marked, eventually, as "safe way to retrieve safe behaviors/natives" solving many level of paranoia around JS world.

My 2 cents

# François REMY (12 years ago)

similar thread proposing 'use native' for the very similar problem safe
closures would like to solve ... I wonder if it wouldn't be good enough
to retrieve in a "read-only" fashion all original exposed function
prototypes through a constant or read-only global property for the
current realm ...

something like window.env.Array.prototype.slice VS
window.Array.prototype.slice

This is not the real issue I want to solve here. The real problem here is not that I want to get the barebone ES6 environment, but that I don't want to be even possible to access any object that has not been given as argument by the caller.

As specified in the TLDR, the objective is to create functions that can be serialized or run in contexts where the JS environment is not ready (for example, during a style recalc (animation-timing-function) or a layout) or not available (workers...), and this requires barebone ES environment, otherwise you can detect the effect of another script mutating the environment objects.

If it also solves the JS "mutated environment paranoia" at the same time, it makes me happy but this is not my objective. As I said, I would settle on making it impossible to return any environment-based function (make them unserializable) if it can make this proposal more robust.

# Mark S. Miller (12 years ago)

ois, your goals here have a tremendous overlap with SES. In what ways is SES not satisfactory for these purposes?

The best short-but-accurate summary of SES, sufficient for this question, is research.google.com/pubs/pub40673.html section 2.3.

SES does not remove eval and Function, but rather replaces them with confining equivalents which should be better for your purposes. You can get SES from either <`` code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/ses>

or code.google.com/p/es-lab/source/browse/trunk/src/ses.

The security of SES is analysed at < theory.stanford.edu/~ataly/Papers/sp11.pdf>.

# Mark S. Miller (12 years ago)

On Wed, Sep 25, 2013 at 5:32 PM, Mark S. Miller <erights at google.com> wrote:

Hi François, your goals here have a tremendous overlap with SES. In what ways is SES not satisfactory for these purposes?

The best short-but-accurate summary of SES, sufficient for this question, is research.google.com/pubs/pub40673.html section 2.3.

I just looked and this section is indeed short -- shorter than your question ;). Here it is:

2.3 SES: Securing JavaScript

In a memory-safe object language with unforgeable object references (protected pointers) and encapsulated objects, an object reference grants the right to invoke the public interface of the object it designates. A message sent on a reference both exercises this right and grants to the receiving object the right to invoke the passed arguments.

In an object-capability (ocap) language [7], an object can cause effects on the world outside itself only by using the references it holds. Some objects are transitively immutable or powerless [8], while others might cause effects. An object must not be given any powerful references by default; any references it has implicit access to, such as language-provided global variables, must be powerless. Under these rules, granted references are the sole representation of permission.

Secure EcmaScript (SES) is an ocap subset of ES5. SES is lexically scoped, its functions are encapsulated, and only the global variables on its whitelist (including all globals defined by ES5) are accessible. Those globals are unassignable, and all objects transitively reachable from them are immutable, rendering all implicit access powerless.

SES supports defensive consistency [7]. An object is defensively consistent when it can defend its own invariants and provide correct service to its well behaved clients, despite arbitrary or malicious misbehavior by its other clients. SES has a formal semantics supporting automated verification of some security properties of SES code [9]. The code in this paper uses the following functions from the SES library:

  • def(obj) def ines a def ensible object. To support defensive consistency, the def function makes the properties of its argument read-only, likewise for all objects transitively reachable from there by reading properties. As a result, this subgraph of objects is effectively tamper proof. A tamper-proof record of encapsulated functions hiding lexical variables is a defensible object. In SES, if makePoint called def on the points it returns by saying “return def({...})”, it would make defensively consistent points.
  • confine(exprSrc, endowments) enables safe mobile code. The confine function takes the source code string for a SES expression and an endowments record. It evaluates the expression in a new global environment consisting of the SES whitelisted (powerless) global variables and the properties of this endowments record. For example, confine(‘x + y’, {x: 3, y: 6}) returns 9.
  • Nat(allegedNumber) tests whether allegedNumber is indeed a primitive number, and whether it is a non-negative integer (a natural number) within the contiguous range of exactly representable integers in JavaScript. If so, it returns allegedNumber. Otherwise it throws an error.
  • var m = WeakMap() assigns to m a new empty weak map. WeakMaps are an ES6 extension (emulated by SES on ES5 browsers) supporting rights amplification [10]. Ignoring space usage, m is simply an object-identity-keyed table. m.set(obj,val) associates obj’s identity as key with val as value, so m.get(obj) returns val and m.delete(obj) removes this entry. These methods use only obj’s identity without interacting with obj.
# Andrea Giammarchi (12 years ago)

Sorry* *François, I didn't mean to say your goals were those but I see your problem as safe evaluation over a "must be safe" sort of local environment confined closure per closure which is actually similar to the 'use native' directive proposed in the other thread.

My idea is that functions can be serialized/unserialized without problems:


function serializable() {
  alert('Hello World');
}
serializable.toJSON = function () {
  return '\x01' + this.toString();
};

var serialized = JSON.stringify({method:serializable});

var unserialized = JSON.parse(serialized, function revive(k, v) {
  return k && v[0] === '\x01' ? Function('return ' + v.slice(1))() : v;
});

so your extra problem would simply be to grant access to local scope variables and clean natives.

# Mark S. Miller (12 years ago)

A concrete example of using SES to serialize and unserialize safe closure free functions is Bob, serializing the escrowExchange function on line 22 of < code.google.com/p/es-lab/source/browse/trunk/src/ses/contract/makeBob.js#22

:

var escrowSrc = ''+escrowExchange;

This works because of < harmony:function_to_string>. Bob

then sends it to the contractHost, who doesn't trust Bob, on line 59 of < code.google.com/p/es-lab/source/browse/trunk/src/ses/contract/makeBob.js#59

:

    var tokensP = Q(contractHostP).send('setup', escrowSrc);

in ES5 syntax, or

    var tokensP = contractHostP ! setup(escrowSrc);

in strawman ES7 syntax. The contractHost received this string it its setup method at line 89 and 90 of < code.google.com/p/es-lab/source/browse/trunk/src/ses/contract/makeContractHost.js#89

:

  setup: function(contractSrc) {
    contractSrc = ''+contractSrc;

which also coerces it to a string. It then evaluates it on line 95, by calling the previously described confine function from the SES library:

    var contract = confine(contractSrc, {Q: Q});

The contractHost can now treat the unserialized contract as, in your terminology, safe and closure-free.

Bob could of course have serialized any function, or made up any string he wanted. But if the contractSrc expression has any free variables other than the safe globals[1], i.e., if it is not closure-free[1] in your terminology, then these variable references will only result in ReferenceErrors when read or TypeErrors when written. As far as the contractHost is concerned, it can safely assume that this object is closure free[1].

[1] Except for Q, which the contractHost explicitly authorizes the contractSrc to access, in the second argument to confine above.

# François REMY (12 years ago)

Hi François, your goals here have a tremendous overlap with SES. In what ways is SES not satisfactory for these purposes?

Sorry for the late reply, I did take some time investigating SES and the implications. Basically, it seems SES provide the required guards in this case (untouched + read-only globals, direct eval only). I'm not quite sure it has this notion of "stateless", however, that is required to be able to use the function safely.

What I mean by that is that the function should not be able to know in any way if it has been called before, or cloned, or anything. I'm pretty sure that SES does not give you that (but you can of course use it in a certain way that will guarantee you that).

The second thing I'm not sure SES provide you is the "everything here is wiped after the function returns" thing. By deep-cloning every object before any operation involving a host object, we create a layer that make sure the safe execution context cannot pollute in any way the parent context (for example via an object prototype yielding the Object.prototype inside the sandbox, or via a function and its scope).

| function SES_CODE() { | var x = {value:0}; | return function() { return x.value++; } | } |
| // in this case, the return value in my proposal would be null | // because we can't clone the function without leaking | // some of the internal objects of the function. | // | // however "return function() as serializable { return 0; }" | // would have been ok since we can clone the function.

The final remark is that SES seems to be a JavaScript-based sandbox. To fulfill my needs, I need a browser to be able to accept a JS function and know this function can be executed into a sandbox.

| SomeWebAnim.timingFunction = alert; // should throw, we cannot accept this because we can't report the error after

This means the sandbox has to be implemented in C++ because the browser itself does not trust JavaScript to enforce its security.

In conclusion, I believe a browser-implemented SES with a particular membrane deep-cloning/nulling outgoing values to prevent any insider object to reach the calling code is exactly what my proposal was all about, it's great to see some people did already work making sure the details would be right.

In my proposal, the "function() as serializable { ... }" syntax is nothing more than a way to create a stateless function that is meant to be executed in such a SES environment, and flagged as such. We could imagine any syntax for that, but I think eval'ing the source code of a function to recreate it is not what we should advocate for this use case.

Do my comments make sense to you?

# Aymeric Vitte (12 years ago)

This is similar to the "Native" thread as Andrea mentioned.

Then when SES is coming?

It seems urgent to boost it, I have tried CSP recently, or at least what works today, see [1] and [2], unfortunately I don't see quite well what it does secure, today and tomorrow.

,

Aymeric

[1] lists.w3.org/Archives/Public/public-webappsec/2013Sep/0058.html [2] lists.w3.org/Archives/Public/public-webappsec/2013Sep/0067.html

Le 26/09/2013 02:32, Mark S. Miller a écrit :

# David Bruant (12 years ago)

Le 26/09/2013 01:29, François REMY a écrit :

Hi,

TLDR ==> The web needs a way to express executable code that does not rely on its parent context, is guaranteed to be side-effect-free, and can be executed safely anywhere (even in a different thread/worker/window/device, or as callback for browser code being executed while the DOM is not ready to be accessed)

Why "need"? You don't really expose a use case. You only start with "It's been some time I've been working on this idea of a..."

Also, one thing is not really clear to me. Are you trying to protect the function from its caller/loader or the caller from the function? both?

The good thing about those functions, is that they can safely be sent over the wires to another thread, or to another web page, because they do not possibly rely on any state or context.

This is an interesting use case. I've come across such a use case when doing MongoDB MapReduce [1]. I was doing Node.js code and unlike in other MongoDB clients, I had the interesting benefit of having syntax coloration for my map and reduce functions:

 // global scope
 var map = '' + function(){
    // map code
 }

(in other language, you have to put the JS in a string and so don't have syntax coloration). Of course, my functions couldn't use any Node.js built-in since they were meant to be serialized to be sent to MongoDB. To be honest, I did fine with this trick. The function being global, I had no temptation to use anything from the parent scope... since there was no parent scope.

At scale, maybe I would have needed a more systematic approach to make sure my functions weren't using non-standard globals. But that can be part of my test infrastructure. I'm pretty sure an Esprima-based tool already exists to find free variables.

In the previous paragraph, I loosely used the word "standard". Indeed, there is no standard JavaScript implementation. There are a bunch of implementations with different levels of conformance to existing standards. For instance, at the time I used MongoDB, it was using an old SpiderMonkey that didn't have Object.keys. I learned this the hard way. In that experience, I learned that one of the things you're trying to achieve (move "standard" code from machine to machine) is somewhat undoable. It's a very context-sensitive problem. Now that MongoDB is using v8, maybe I can use Object.keys, but won't be able to use the non-standard-then-yet-convenient destructuring available in SpiderMonkey.

Also, MongoDB provides to the serialized function a non-standard "emit" function. Your proposal prohibits non-ES6 built-ins, thus preventing such an emit function to be provided (arguably, they should pass this function an argument). I can also easily imagine that if sharing code between two same-version Node.js instances, one may want to be able to use Node.js-specific additional globals.

Is there some interest from anyone else in formalizing this for a future version of EcmaScript? Or any comment?

Given my experience with MongoDB, I'm tempted to ask: do we need this at all? A global function (no parent scope) and combined with a tool finding free variables might be enough to cover this use case.

David

[1] docs.mongodb.org/manual/core/map-reduce

# Bradley Meck (12 years ago)

The only really think of that would be of benefit is parallel execution. I can think of is a work stealing scheduler like Rust is getting, or a symmetric operation like mapping an array / comprehensions being able to exploit that work can be done on multiple threads without issue (for example on very large arrays). This is all currently doable with a "sufficiently clever compiler", but languages such as D implement a "pure" keyword for things such as this.

Just things to think about, Bradley.

# David Bruant (12 years ago)

Le 26/09/2013 15:36, Bradley Meck a écrit :

The only really think of that would be of benefit is parallel execution.

MongoDB MapReduce exploits parallelism as much as one can ever hope and just a string generated from var f = '' + function(){...} seems to work just fine. (this actually wasn't true when SpiderMonkey was under the hood for technical reasons, but the design allowed parallelism and I think that now that they're using V8, they do exploit the parallelism)

This has already been happening in the real world for several years with the language as it is. What would be the benefit of changing the language?

# Alex Russell (12 years ago)

It's unclear what your threat model is. What do you want to defend, from who or what, and for how long?

# Aymeric Vitte (12 years ago)

I would like to defend against a potential mitm/code injection and ideally against globals modifications.

Unfortunately I have the ws problem which screw up a little bit the use case. So, we have the scenario that you see at the begining of [1], forget about the unsafe-inline, I thought it could apply to the worker but it does not.

I tried (as an experiment) to apply this case using CSP and I don't understand very well what the result does secure, as well as [2]

Aymeric

Le 26/09/2013 18:14, Alex Russell a écrit :

# Bradley Meck (12 years ago)

I was merely stating my opinions on what it could be used for if such things are guaranteed. D uses a keyword for compile time checking, which is available for pure functions by their nature of not having side effects outside of the this variable, arguments, and return value (all modifications of which would need to be clearly defined, we don't want shared state side effects).

# François REMY (12 years ago)

Le 26/09/2013 15:36, Bradley Meck a écrit :

The only really think of that would be of benefit is parallel execution. MongoDB MapReduce exploits parallelism as much as one can ever hope and just a string generated from var f = '' + function(){...} seems to work just fine.

Firstly, this is kinda ugly (and not entirely secure, someone could override Function.prototype.toString). Secondly, the point here is not to enable parallelism but to enable safe execution of JS code anywhere without access to any specific context. The fact the code runs in a "clean" environment make it safe and determinist, and the fact it cannot possibly leak anything to the outside world make it possible for very powerful optimizations.

For instance, you can create any object on the stack and bypass GC.

This has already been happening in the real world for several years with the language as it is. What would be the benefit of changing the language?

Let's imagine, for a second, that we have that "animationTimingFunction" and that we want to implement our own function (ie: not a cubic bezier or any of the default model) then the problem is that the browser cannot call a function at any time because that function may access the DOM which is not ready at that time, and there's no way to prevent that.

If we have a special kind of function that is guaranteed not have access to anything more that what was given to it, we can start envisioning using those functions deeper in the browser stack.

For example, we could easily use this kind of function to generate custom layout code, why not?

The second case would be a customizable javascript algorithm that does not want to reveal any information about himself. If I receive an abritrary function, he cannot be sure the function won't be leaking information to the outside world by using a global variable, even if he actively "re-eval" the string representation of the function.

The last thing is that a "safe" function opens many doors for optimization, like I said before. Here's another example: because you know Math.sin is the default sin function, you can bypass the Math object, the sin function, and directly call the native "sin" function. If you're running in an uncontrolled environment, you don't know. Maybe the user replaced Math with something else, or sin. You need to have guards and safety checks before doing the optimization.

This also opens doors to very straightforward inlining. You can inline the function and not recheck guards after the function has been called, because you know it doesn't have side effects.

If you want your code to run inside the critical parts of a browser, any optimization is good to take! With "safe" functions, you can generate code that is exactly as fast as C++ while not restricting the user to the ASM.js limits.

# Alex Russell (12 years ago)

On Thu, Sep 26, 2013 at 9:56 AM, Aymeric Vitte <vitteaymeric at gmail.com>wrote:

I would like to defend against a potential mitm/code injection and ideally against globals modifications.

Only one of those is a threat (MITM). The other is an effect of something happening (which you may or may not want). Conflating them isn't meaningful.

Unfortunately I have the ws problem which screw up a little bit the use case. So, we have the scenario that you see at the begining of [1], forget about the unsafe-inline, I thought it could apply to the worker but it does not.

Right. Workers are something I've raised with the WebAppSec WG and want to get a solution for (as I'm leading the design of the Service Worker API: slightlyoff/ServiceWorker/blob/master/explainer.md)

# Mike Samuel (12 years ago)

2013/9/26 David Bruant <bruant.d at gmail.com>:

Le 26/09/2013 01:29, François REMY a écrit :

Hi,

TLDR ==> The web needs a way to express executable code that does not rely on its parent context, is guaranteed to be side-effect-free, and can be executed safely anywhere (even in a different thread/worker/window/device, or as callback for browser code being executed while the DOM is not ready to be accessed)

Why "need"? You don't really expose a use case. You only start with "It's been some time I've been working on this idea of a..."

Also, one thing is not really clear to me. Are you trying to protect the function from its caller/loader or the caller from the function? both?

I can't speak for Françios but here's one use case:

I used similar things in code.google.com/p/prebake/wiki/YSON

The design principle was that we can get the benefits of both declarative configuration and dynamic scripting languages by running scripts that produce serializable & relocatable data bundles that include pure functions.

That wiki page is part of a build system that tries to get the benefit of static declarative build systems and the flexibility of build shell scripts by using a hacked version of Rhino that allows serialization of some closures and exposes the inferred dependency graph via a web service for inspection by other tools.

# Aymeric Vitte (12 years ago)

Le 26/09/2013 20:14, Alex Russell a écrit :

On Thu, Sep 26, 2013 at 9:56 AM, Aymeric Vitte <vitteaymeric at gmail.com <mailto:vitteaymeric at gmail.com>> wrote:

I would like to defend against a potential mitm/code injection and
ideally against globals modifications.

Only one of those is a threat (MITM). The other is an effect of something happening (which you may or may not want). Conflating them isn't meaningful.

I am not "conflating them", the idea is to defend against everything, as far as possible, including physical attacks like your colleague hacking inside your browser while you have left your office during some time, I don't find it so unlikely to happen, and quasi impossible to detect.

,

Aym

# François REMY (12 years ago)

TLDR ==> The web needs a way to express executable code that does not rely on its parent context, is guaranteed to be side-effect-free, and can be executed safely anywhere (even in a different thread/worker/window/device, or as callback for browser code being executed while the DOM is not ready to be accessed) Why "need"? You don't really expose a use case. You only start with "It's been some time I've been working on this idea of a..."

Also, one thing is not really clear to me. Are you trying to protect the function from its caller/loader or the caller from the function? both?

The use case is at the bottom: I want to be able to use JavaScript functions inside the browser as part of algorithm customization. For example, use some JavaScript function as part of the layout computation of an object, or as the animation timing function.

An arbitrary function can't do the trick here because the browser is in a state where it cannot execute arbitrary JS on the DOM when he needs this kind of information, or could be in another thread where it simply can't access the global variables and all. So, instead of trying a way that would be very complex to make all DOM API throw in this case, I propose to add a way to make sure you can't have them.

Then you can check at runtime whether or not a function is "safe" for out-of-context usage. You can send the function as part of a JSON object, by the way, and it could be deep-cloned, unlike a normal function.

What I really try to do is to give entire control to the caller regarding what the function can access. This requires isolating the called function completely form the outside environment, to the exception of the given arguments.

# Alex Russell (12 years ago)

"Defend against everything" is a security non sequitur. If you want this group to consider specific areas to lock down, we need to understand specific threat models.

Le 26/09/2013 20:14, Alex Russell a écrit :

On Thu, Sep 26, 2013 at 9:56 AM, Aymeric Vitte <vitteaymeric at gmail.com>wrote:

I would like to defend against a potential mitm/code injection and ideally against globals modifications.

Only one of those is a threat (MITM). The other is an effect of something happening (which you may or may not want). Conflating them isn't meaningful.

I am not "conflating them", the idea is to defend against everything, as far as possible, including physical attacks like your colleague hacking inside your browser while you have left your office during some time, I don't find it so unlikely to happen, and quasi impossible to detect.

,

Aym

# David Herman (12 years ago)

On Sep 26, 2013, at 1:10 PM, François REMY <francois.remy.dev at outlook.com> wrote:

TLDR ==> The web needs a way to express executable code that does not rely on its parent context, is guaranteed to be side-effect-free, and can be executed safely anywhere (even in a different thread/worker/window/device, or as callback for browser code being executed while the DOM is not ready to be accessed) Why "need"? You don't really expose a use case. You only start with "It's been some time I've been working on this idea of a..."

Also, one thing is not really clear to me. Are you trying to protect the function from its caller/loader or the caller from the function? both?

The use case is at the bottom: I want to be able to use JavaScript functions inside the browser as part of algorithm customization. For example, use some JavaScript function as part of the layout computation of an object, or as the animation timing function.

This is an area I've put some thought into. However, you've described a mechanism but not really made the case that environment-free closures is the only or best solution. In fact I think they are a bad idea, but I agree with the desideratum that a way to share functions across JS heaps would be highly desirable.

Why shareable functions are desirable

Lots of reasons, such as:

  • more convenient workers or worker-like API's (a la Erlang spawn)
  • ability to provide JS customization for native API's that want to run JS code in parallel or in the background
  • more JS customization (a la the extensible web [1]) for core browser technologies that doesn't compete with potential parallelization of those technologies
  • ability to write parallel abstractions (along the lines of map) that can concisely but safely -- and ideally efficiently -- share data

Why environment-free closures is a bad idea

First, I don't want introduce the ability to test whether a function is environment-free. This interfere with fundamental information hiding guarantees of closures. (And any API that depends on a closure being environment-free to protect its invariants requires the ability to perform that test.)

Second, I think it's very confusing to have code that is lexically nested but interpreted in a completely different scope:

var x = 42;
spawn(function!() { // imaginary function! syntax
    ... x ... // x is a global variable in some worker
});

Third, I don't think it's sufficiently expressive. I'd be much more interested in seeing something along these lines:

  • the ability to define custom serialization logic (as opposed to, or perhaps in extension to, the ad hoc structured clone algorithm)
  • the ability to define serializable functions whose serialization logic lifts through their upvars

This is more expressive than environment-free closures because it allows shared functions to also share their closure data. But it also narrows the observation you can make on a closure from "does it have anything in its environment?" to "is its environment shareable?" This is the real question you want to be asking anyway, and it doesn't violate the abstraction closures provide.

Don't get me wrong, it's also a much harder problem. But I've been starting to get my head around it little by little. It'd require some published protocol for customizable serialization (a standardized @@serialize symbol, or maybe some kind of traits feature) and some static requirements on the shape of the code of a serializable closure. I imagine the serialized upvars would have to be const, and we'd probably need to require them to all be initialized at the time of serialization. (Or something. More research to do here.) To make serialization efficient, you'd want the ability to write directly into the other heap in a fast and space-efficient way, but the typed objects API is very promising here: user code can pick whatever representation format it wants and the code on the other side will be able to interact with the resulting data as an object.

There's also a question of whether you'd want a single "serialize" operation or two separate "clone" vs "move" operations (analogous to the transfer semantics for ArrayBuffers). Tony Arcieri blogged about some similar ideas last year. [2]

At any rate, this is an area I'm very interested in, and so are some others in my group at Mozilla Research.

Dave

[1] extensiblewebmanifesto.org [2] tonyarcieri.com/2012

# Jake Verbaten (12 years ago)

Another use case for serializable functions is building templating languages that are a subset of javascript.

I'm building a templating language that consists of javascript functions, arrays & object literals.

A serializable function syntax primitive would be useful for ensuring that all maintainers of the template files immediately recognize that the functions embedded in the template are not full functions. Full functions are not allowed as that would inhibit the ability to serialize the template data structure.

For this use case it would actually be preferable to mark a function as referentially transparent instead of serializable

# Tom Van Cutsem (12 years ago)

2013/9/27 David Herman <dherman at mozilla.com>

Why environment-free closures is a bad idea

[...] Third, I don't think it's sufficiently expressive. I'd be much more interested in seeing something along these lines:

  • the ability to define custom serialization logic (as opposed to, or perhaps in extension to, the ad hoc structured clone algorithm)
  • the ability to define serializable functions whose serialization logic lifts through their upvars

We've found the need for serializable functions in AmbientTalk, which, like JS, encourages a style of programming that makes significant use of lexical nesting. We have found that this programming style makes that requiring serializable functions to be closed is indeed too restrictive. Translated to JS, our solution looks like:

var z = 42; var f = function(x,y) (z) { // x and y are locals, z refers to (a copy of) the upvar }

Here, f is a pass-by-copy function. The second parameter list captures the upvars that should be serialized together with the function. In the function body, only x, y and z are in scope, and z refers to the original upvar (when called on the original), or to a copied upvar (when called on a deserialized copy of the function).

The benefit of this scheme is that it is immediately clear from the definition of f what variables it depends on, and will get serialized along with f (it's good to be conscious of this for reasons of performance and security). The downside is that if the function body is refactored, the list of captured upvars needs to be kept in-sync (a good IDE can obviously help here).

# François REMY (12 years ago)

We've found the need for serializable functions in AmbientTalk, which,
like JS, encourages a style of programming that makes significant use
of lexical nesting. We have found that this programming style makes
that requiring serializable functions to be closed is indeed too
restrictive.

Translated to JS, our solution looks like:

var z = 42; var f = function(x,y) (z) { // x and y are locals, z refers to (a copy of) the upvar }

Here, f is a pass-by-copy function. The second parameter list captures
the upvars that should be serialized together with the function.

That seems a syntax sugar for something along those lines:

|    var z = 42; |    var f = function(x,y):safe(z){ |        |        ... |        |    };

becoming

|    var z = 42; |    var f = function(x,y):safe{ |        const z = 42; |        |        ... |        |    }

where 42 is actually replaced by the value of 'z' at the time of construction. Did I understand correctly?

I think the idea is good, but in practice this could be emulated by using eval("function(x,y):safe { var z = " + uneval(z) + " ... }")

Any other kind of non-serialization transfer (like transferring an object to another thread) can be done using standards arguments of the safe function (and, if necessary, a library on top of it), it doesn't have to be dealt with at the safe-function level.

Any kind of special serialization can be handled on the platform separately. We could even define an object with an @unserialize serializable function, which would finally enable to serialize properly cyclic graphs of objects:

|    function serializeAsFunction(o){ |        var source = [o]; var destin = ['o']; var instructions = []; |        var jso=Object.mapPropertiesRecursively(o, function(object, property, path){ |            var v = property.value; if(!v) return; |            if (typeof v =='object') { |                var index = source.indexOf(v); |                if(index !== -1) { |                    instructions.push("o"+path+"=o"destin[index]); |                    return null; |                } else { |                    source.push(v); |                    destin.push(path); |                } |            } |            return v; |        }); |        return function():safe(o,instructions) { |            ... execute all instructions in order ... |            return o; |        }; |    };

I could obviously already achieve something like that using simple functions, but safe functions are much better because the person executing the code can be sure that, whatever the function does, it is not having a single pointer to the current environment and therefore cannot do more harm that just eating the CPU (and the calling code could setup some kind of "abort in X seconds" system if needed).

This also opens the way to more compressed data messages, since you could for instance define functions for serializing purposes:

|    function():safe { |        var row = function StockActionUpdate(name,oldPrice,newPrice){ |            return { name: name, oldPrice: oldPrice, newPrice: newPrice, variation: (newPrice-oldPrice)/oldPrice, isUp:(newPrice>oldPrice), isDown:(oldPrice>newPrice), toSpeakAloudString: function():safe{ return this.name+' ('+this.newPrice+')'+(this.isDown?' is down':' is up')+' since yesterday ('+this.oldPrice+')'} } |        }; |        return { |            'MSFT':row('Microsoft Corporation', 30, 31), |            'GOOG':row('Google Inc', 490, 491), |            'AAPL':row('Apple Inc', 480, 481), |            ... |        }; |    }

This is more compact than a traditional JSON channel, and safer than a traditional "eval" code channel. I didn't say "totally safe" because the browser can have a security bug, which even a "safe" function could use, but this is very unlikely in pure JS functions, and you cannot protect reliably your application from browser bugs anyway.

The beauty of all this, is that all code coming out of the safe function as a return value cannot possibly be unsafe, because it cannot possibly have got any reference to the outside world (at least not any you didn't give them).

# Brendan Eich (12 years ago)

Tom Van Cutsem wrote:

var f = function(x,y) (z) {

Tiny bike-shed-ish comment that we need a linking punctuator so that arrow function syntax can be extended likewise:

var f = function (x, y) : (z) {...}

Arrow form showing expression body:

var f = (x, y) : (z) => x*y/z;

An alternative without a punctuator:

var f = (x, y) [z] => x*y/z;

Getting a bit C++1x'ish but it's not bad.

# François REMY (12 years ago)

Tiny bike-shed-ish comment that we need a linking punctuator so that arrow function syntax can be extended likewise:

var f = function (x, y) : (z) {...}

Arrow form showing expression body:

var f = (x, y) : (z) => x*y/z;

FWIW, I like the semi colon. However, I prefer

|    function(x,y):safe(z) { ... } and |    (x,y):safe(z) => ...

because it enables further markdown annotation to functions, and I'm pretty sure we will want some more in the future.

I don't know, crazy things like |    function(x,y):tail(...) for tail-recursion-desired functions, or |    function(x,y):local(...) for functions that can run on the same callstack because they can only get called during their closure lifetime (they throw otherwise).


The brackets syntax looks reasonable, too, with an optional prelude:

|    function(x,y)[z] { ... } and |    (x,y)[z] => ...

Extensible to: |    function(x,y)[tail(...)] for tail-recursion-desired functions, or |    function(x,y)[local(...)] for functions that can run on the same callstack because they can only get called during their closure lifetime (they throw otherwise).

# Brendan Eich (12 years ago)

Maybe, but:

(1) annotations are user-hostile even in a language like Rust, wherefore type-and-lifetime inference, which you are not proposing.

(2) we need a sane default, if not inference.