JS control-structure abstractions, using tailnest flattening and tailcall optimization

# Claus Reinke (14 years ago)

Dear all,

we have seen examples of how to define control-structure abstractions via block-lambdas (and Smalltalk blocks), including non-local returns out of user-defined loops.

I'd like to provide an example of how to do something like this with JS, using only the proposed syntactic sugar for flat syntactic tailnests and the upcoming support for flat runtime tailcall stacks (for callback chains, the two complement each other).

Building up from small structures, the example aims to implement a variant of "yield". With conventional blocks, this requires the upcoming generators language extension. Block-lambdas do not help here, I think, because yield can suspend and resume at statement level, and even the slightly unusual sequel feature does only support non-local return, no resume (calling a sequel returns to a lexically scoped elsewhere, not to the caller).

If JS had call/cc or delimited continuations, we could implement yield on top of that, but that isn't likely. If we were to rewrite our code in continuation passing style (cps), we could implement call/cc, but even in languages with lightweight function syntax, that is not considered very readable. There are modular variants of continuation passing style that can be made to look very similar to "normal" code, but these become unreadable in JS syntax. Cps without tailcall optimization also quickly overflows the stack.

Which is where tailnests (readability) and tailcalls (efficiency/ useability) come in, addressing the issues that cps in current JS leads to overflows in syntactic and runtime stack nesting.

First, the programming pattern: in plain cps, we'd have nested callback chains feeding intermediate results into the rest of the callback chain (assuming currying for the callback parameter):

operation(..)( function(result) { ..operations..} )

With paren-free right-associative calls (f @< arg) and brace-free definitions (function(arg) => expr) for expression functions, this

becomes:

operation(..) @< function(result) => ..operations..

Tailcall optimization guarantees that the callbacks will not overflow the runtime stack, and tailnest flattening keeps the level of syntactic nesting independent of the chain length.

In modular cps, instead of every operation taking a callback parameter, every operation returns its result in an object implementing a single method 'then' (the ideas are standard, the translation to JS is not). Calling 'then' with a callback feeds a value/intermediate result to the callback.

operation(..).then @< function(result) => ..operations..

(which can be read as a clumsy way to write "let result <- operation in operations", but the ".then" allows us to give different meanings to the variable binding, eg., it could mean "foreach result from operation, do operations", or it could store the continuation and jump elsewhere, as yield)

The most basic operation is 'value', which just wraps a value into a "then-able":

function value(val) => { then : function(cont) => cont(val) }

Since statements represented as then-ables are objects, conditional statements are just conditional expressions:

cond ? t : e

or, if the condition involves then-able statements, too:

cond.then @< function( c ) => c ? t : e

which we can either use directly, or wrap in a function (the then/else-branch arguments are themselves wrapped to delay their evaluation):

function if_(cond) => function(ft) => function(fe) =>
    cond.then @< function( c ) => c ? ft() : fe()

Hence, a while-loop with then-able condition and delayed body:

function while_(cond) => function(body) =>
    if_(cond)
        (function() => body().then @< function() => 
                                while_(cond)(body) )
        (function() => value( null ) )

You can find these examples online [1], so to limit this email, I'll jump right to the interesting bit, which is implementing yield:

"yield_(val)" produces a then-able that behaves a bit unusually: instead of passing a value to continuations passed via ".then()", it swallows all such continuations, storing them for later use (in plain cps, we'd have the whole continuation at hand; in this modular cps variation, we have to collect nested continuations).

To get at the yielded value, we can call "yield_(val).value(cont)", and to restart the "generator", we can call "yield_(val).next()", which applies the internally stored continuations to "val". In code:

function yield_(val) => { then: stack(val) };

function stack(val) => function(c1) =>
  { then:  function(c2)   => stack(val)(comp_then(c2,c1))
  , value: function(cont) => cont(val)
  , next:  function()     => c1(val) }; 

function comp_then(c2,c1) => function(val) => c1(val).then(c2);

Again, the commented code and a couple of example generators can be found online [1]. If you use the desugaring page, you can see why this style isn't popular without syntactic sugar. But the functionality is plain JS, no semantic changes!

The code could be made more readable by providing "let <-"-desugaring to then-ables, but JS is different enough from other languages providing similar features that I want to make sure that this pattern is the one to aim for before proposing that. The code could be shortened further by combining arrow-syntax with "@<", instead of the less ambitious expression functions of the tailnests proposal. And extended object-literals would shorten inline then-ables.

Claus

[1] clausreinke/jstr/tree/master/es-discuss

Examples are in then.js, but if you clone the repo, you can
load them via tailnests.html, desugar into plain JS via jstr, 
and evaluate the resulting code. So you can play with it!-)

Dear all,

we have seen examples of how to define control-structure
abstractions via block-lambdas (and Smalltalk blocks), including
non-local returns out of user-defined loops.

I'd like to provide an example of how to do something like this 
with JS, using only the proposed syntactic sugar for flat syntactic 
tailnests and the upcoming support for flat runtime tailcall stacks 
(for callback chains, the two complement each other).

Building up from small structures, the example aims to implement 
a variant of "yield". With conventional blocks, this requires the 
upcoming generators language extension. Block-lambdas do 
not help here, I think, because yield can suspend and resume at 
statement level, and even the slightly unusual sequel feature 
does only support non-local return, no resume (calling a sequel 
returns to a lexically scoped elsewhere, not to the caller). 

If JS had call/cc or delimited continuations, we could implement 
yield on top of that, but that isn't likely. If we were to rewrite our 
code in continuation passing style (cps), we could implement 
call/cc, but even in languages with lightweight function syntax, 
that is not considered very readable. There are modular variants 
of continuation passing style that can be made to look very similar 
to "normal" code, but these become unreadable in JS syntax. 
Cps without tailcall optimization also quickly overflows the stack.

Which is where tailnests (readability) and tailcalls (efficiency/
useability) come in, addressing the issues that cps in current 
JS leads to overflows in syntactic and runtime stack nesting.

First, the programming pattern: in plain cps, we'd have nested 
callback chains feeding intermediate results into the rest of the
callback chain (assuming currying for the callback parameter):

    operation(..)( function(result) { ..operations..} )

With paren-free right-associative calls (f @< arg) and brace-free
definitions (function(arg) => expr) for expression functions, this
becomes:

    operation(..) @< function(result) => ..operations..

Tailcall optimization guarantees that the callbacks will not 
overflow the runtime stack, and tailnest flattening keeps the
level of syntactic nesting independent of the chain length.

In modular cps, instead of every operation taking a callback
parameter, every operation returns its result in an object 
implementing a single method 'then' (the ideas are standard,
the translation to JS is not). Calling 'then' with a callback feeds 
a value/intermediate result to the callback.

    operation(..).then @< function(result) => ..operations..

(which can be read as a clumsy way to write
"let result <- operation in operations", but the ".then" allows
us to give different meanings to the variable binding, eg., it
could mean "foreach result from operation, do operations", or 
it could store the continuation and jump elsewhere, as yield)

The most basic operation is 'value', which just wraps a value
into a "then-able":

    function value(val) => { then : function(cont) => cont(val) }

Since statements represented as then-ables are objects, 
conditional statements are just conditional expressions:

    cond ? t : e

or, if the condition involves then-able statements, too:

    cond.then @< function( c ) => c ? t : e

which we can either use directly, or wrap in a function (the 
then/else-branch arguments are themselves wrapped to 
delay their evaluation):

    function if_(cond) => function(ft) => function(fe) =>
        cond.then @< function( c ) => c ? ft() : fe()

Hence, a while-loop with then-able condition and delayed body:

    function while_(cond) => function(body) =>
        if_(cond)
            (function() => body().then @< function() => 
                                    while_(cond)(body) )
            (function() => value( null ) )

You can find these examples online [1], so to limit this email,
I'll jump right to the interesting bit, which is implementing yield:

"yield_(val)" produces a then-able that behaves a bit unusually:
instead of passing a value to continuations passed via ".then()",
it swallows all such continuations, storing them for later use
(in plain cps, we'd have the whole continuation at hand; in this
modular cps variation, we have to collect nested continuations).

To get at the yielded value, we can call "yield_(val).value(cont)",
and to restart the "generator", we can call "yield_(val).next()",
which applies the internally stored continuations to "val". In code:

    function yield_(val) => { then: stack(val) };

    function stack(val) => function(c1) =>
      { then:  function(c2)   => stack(val)(comp_then(c2,c1))
      , value: function(cont) => cont(val)
      , next:  function()     => c1(val) }; 

    function comp_then(c2,c1) => function(val) => c1(val).then(c2);

Again, the commented code and a couple of example generators 
can be found online [1]. If you use the desugaring page, you can 
see why this style isn't popular without syntactic sugar. But the
functionality is plain JS, no semantic changes! 

The code could be made more readable by providing 
"let <-"-desugaring to then-ables, but JS is different enough from
other languages providing similar features that I want to make 
sure that this pattern is the one to aim for before proposing that. 
The code could be shortened further by combining  arrow-syntax 
with "@<", instead of the less ambitious expression functions of 
the tailnests proposal. And extended object-literals would shorten
inline then-ables.

Claus

[1] https://github.com/clausreinke/jstr/tree/master/es-discuss

    Examples are in then.js, but if you clone the repo, you can
    load them via tailnests.html, desugar into plain JS via jstr, 
    and evaluate the resulting code. So you can play with it!-)

# Brendan Eich (14 years ago)

On Jul 8, 2011, at 7:34 AM, Claus Reinke wrote:

Tailcall optimization guarantees that the callbacks will not overflow the runtime stack, and tailnest flattening keeps the level of syntactic nesting independent of the chain length.

ES.next has generators and proper tail calls already. I'm hard-pressed to call the syntax extensions you propose 'more readable" than what can be done with generators and libraries such as taskjs (dherman/taskjs). I have to say @< and expression-forms seem less readable in general.

On Jul 8, 2011, at 7:34 AM, Claus Reinke wrote:

> Tailcall optimization guarantees that the callbacks will not overflow the runtime stack, and tailnest flattening keeps the
> level of syntactic nesting independent of the chain length.

ES.next has generators and proper tail calls already. I'm hard-pressed to call the syntax extensions you propose 'more readable" than what can be done with generators and libraries such as taskjs (https://github.com/dherman/taskjs). I have to say @< and expression-forms seem less readable in general.

/be

# Claus Reinke (14 years ago)

Tailcall optimization guarantees that the callbacks will not overflow the runtime stack, and tailnest flattening keeps the level of syntactic nesting independent of the chain length.

ES.next has generators and proper tail calls already. I'm hard-pressed to call the syntax extensions you propose 'more readable" than what can be done with generators and libraries such as taskjs (dherman/taskjs). I have to say @< and expression-forms seem less readable in general.

'more readable' wrt current function definition and call syntax, whose problems you know very well from your own proposals.

The tailnests.html page in the repo allows you to see sugared and desugared source side-by-side, making the readability improvement obvious - the plain ES version only adds noise.

The current ES syntax is so bad that we are looking at extending the semantics of ES (with generators, with block lambdas) just to give programmers features that they should be able to define in ES already (the only necessary semantic change are proper tail calls).

Generators offer a good cost-benefit ratio, and I'm not even against block-lambdas - they both provide useable syntax for interesting use cases. At least as long as the language is still awkwardly biased towards imperative code ("awkward" because this goes against its prototypical/functional nature).

But you cannot go on adding special language features for every use case: block-lambdas have new features, but as far as control structure abstraction go, they do no help to implement yield (I think?), so we add yield as well; and if (the form of) yield we add is not sufficient for the next use case, then we add ..?

I'd like to get out of this loop where every interesting feature requires a language extension, and move on to being able to define new language features in the language. So I think it would be a serious mistake to use special features as an argument against improving the usability of the core language.

By core language improvements, I mean function definition and call improvements to match the tail call optimization semantic improvement. Function definition improvements are what I thought arrow syntax was about, but that seems stuck, so I suggested a less ambitious alternative. Function call improvements are addressed by a single infix application operation (you know what I think about having to discuss every new infix operation in the language committee).

Also, since you mention task.js: task.js uses generators, my example showed how to implement generators - one coin, different sides. However, while the idea of using generators to emulate non-preemptive threads is good, using the same idea to handle callback chains is overkill. Writing

spawn(function() { var result = yield operation(..); ..operations.. })

uses built-in generators as delimited continuations, just to get a handle on the variable binding context, when the same can be achieved more directly, in current ES syntax:

operation(..).then( function(result) { .. operations.. } )

In the former variant, 'spawn' can stitch the continuations together asynchronously, if 'operation' returns something like a promise, and in the latter variant, '.then' can do the same, without using the built-in generator machinery (also, generators are objects with internal state, not easily copied; this could be a disadvantage when trying to extend the control structure pattern to similar use cases).

Yes, agreed, programmers have been trained to read their variable definitions against the left-to-right text flow, so we'll want some additional 'let-then' sugar for the above, and then

let result <- operation(..); ..operations..

would desugar (preferably without 'this'/'arguments') to

operation(..).then( function(result) { .. operations.. } )

(this is also called for by Tennent's correspondence principle: equal rights for declarative and parametric variable bindings).

But that is still just sugar, plus an interface against which the code is desugared. By providing different implementations of that interface (then-ables), we can reuse and augment the code, for instance in asynchronous operations callback chains, or with user-defined generators. The beauty of coding against an interface means that we could hook in different implementations.

In my example, I just wanted to make sure that this interface is sufficient for the kind of control structure abstractions under discussion. This highlighted three issues: abstraction over References is difficult in ES, abstraction just to delay evaluation wants very minimal syntax, and without types, implementations will have to be selected explicitly. But the then-able interface itself seems to be sufficient.

Btw, I wonder what current JS engines can do with the generator- based approach to async code: do you think it is going to be as efficient as direct-style async code? That is, are tracing JITs or other current optimizations good enough to eliminate the built-in-generator overhead?

Claus

>> Tailcall optimization guarantees that the callbacks will not 
>> overflow the runtime stack, and tailnest flattening keeps the
>> level of syntactic nesting independent of the chain length.
> 
> ES.next has generators and proper tail calls already. I'm 
> hard-pressed to call the syntax extensions you propose 
> 'more readable" than what can be done with generators 
> and libraries such as taskjs (https://github.com/dherman/taskjs). 
> I have to say @< and expression-forms seem less readable in 
> general.

'more readable' wrt current function definition and call syntax,
whose problems you know very well from your own proposals.

The tailnests.html page in the repo allows you to see sugared
and desugared source side-by-side, making the readability
improvement obvious - the plain ES version only adds noise.

The current ES *syntax* is so bad that we are looking at extending
the *semantics* of ES (with generators, with block lambdas) just
to give programmers features that they should be able to define
*in ES* already (the only *necessary* semantic change are proper 
tail calls).

Generators offer a good cost-benefit ratio, and I'm not even against 
block-lambdas - they both provide useable syntax for interesting 
use cases. At least as long as the language is still awkwardly biased 
towards imperative code ("awkward" because this goes against its
prototypical/functional nature). 

But you cannot go on adding special language features for every
use case: block-lambdas have new features, but as far as control
structure abstraction go, they do no help to implement yield (I 
think?), so we add yield as well; and if (the form of) yield we add 
is not sufficient for the next use case, then we add ..? 

I'd like to get out of this loop where every interesting feature 
requires a language extension, and move on to being able to
define new language features in the language. So I think it 
would be a serious mistake to use special features as an 
argument against improving the usability of the core language.

By core language improvements, I mean function definition
and call improvements to match the tail call optimization
semantic improvement. Function definition improvements
are what I thought arrow syntax was about, but that seems
stuck, so I suggested a less ambitious alternative. Function
call improvements are addressed by a single infix application
operation (you know what I think about having to discuss
every new infix operation in the language committee).


Also, since you mention task.js: task.js uses generators, my
example showed how to implement generators - one coin,
different sides. However, while the idea of using generators 
to emulate non-preemptive threads is good, using the same
idea to handle callback chains is overkill. Writing

    spawn(function() { var result = yield operation(..); ..operations.. })

uses built-in generators as delimited continuations, just to get 
a handle on the variable binding context, when the same can be 
achieved more directly, in current ES syntax:

    operation(..).then( function(result) { .. operations.. } )

In the former variant, 'spawn' can stitch the continuations together
asynchronously, if 'operation' returns something like a promise, and
in the latter variant, '.then' can do the same, without using the built-in
generator machinery (also, generators are objects with internal state, 
not easily copied; this could be a disadvantage when trying to extend 
the control structure pattern to similar use cases).

Yes, agreed, programmers have been trained to read their
variable definitions against the left-to-right text flow, so we'll
want some additional 'let-then' sugar for the above, and then

    let result <- operation(..); ..operations..

would desugar (preferably without 'this'/'arguments') to

    operation(..).then( function(result) { .. operations.. } )

(this is also called for by Tennent's correspondence principle:
equal rights for declarative and parametric variable bindings).

But that is still just sugar, plus an interface against which the
code is desugared. By providing different implementations of
that interface (then-ables), we can reuse and augment the code,
for instance in asynchronous operations callback chains, or
with user-defined generators. The beauty of coding against an 
interface means that we could hook in different implementations.

In my example, I just wanted to make sure that this interface is
sufficient for the kind of control structure abstractions under
discussion. This highlighted three issues: abstraction over 
References is difficult in ES, abstraction just to delay evaluation 
wants very minimal syntax, and without types, implementations
will have to be selected explicitly. But the then-able interface 
itself seems to be sufficient.

Btw, I wonder what current JS engines can do with the generator-
based approach to async code: do you think it is going to
be as efficient as direct-style async code? That is, are tracing
JITs or other current optimizations good enough to eliminate
the built-in-generator overhead?

Claus