ES Discuss - Message History

Mark S. Miller (2013-11-11T18:12:50.000Z)

Go to Source

On Mon, Nov 11, 2013 at 9:25 AM, Jason Orendorff
<jason.orendorff at gmail.com>wrote:

> On Fri, Nov 8, 2013 at 1:35 PM, Mark S. Miller <erights at google.com> wrote:
> (re: weakrefs and post-mortem finalization)
> > They are needed for many other things, such as
> > distributed acyclic garbage collection (as in adapting the CapTP ideas to
> > distributed JS).
>
> I'm not convinced acyclic distributed GC is a good thing to support.
>
> JS users do not want RPC systems where one process's memory usage
> depends on getting per-object callbacks from an untrusted peer's GC
> implementation.
>

Some will. I do.  See <http://research.google.com/pubs/pub40673.html>.

Why do you believe manual deallocation decisions will be easier in
distributed systems than they are locally? If anything, local manual
deallocation should be easier, and these have already proven hard enough
that people (except C++ programmers) have turned to local GC.

You are correct that a distributed mutually suspicious system must support
manual deallocation as well. Your Erlang example is quite telling: Erlang
does have strong cross process references, the process id. However, because
they are forgeable, processes cannot be garbage collected. The decision to
terminate a process is the decision to preemptively terminate service to
clients that may still exist. Sometimes this needs to be done, even with
GC, because the client causes the service to retain more memory than the
service wishes to continue to devote to this client. As the Erlang example
also indicates, the natural unit for such preemptive manual deallocation
decisions is the vat/worker/process.

However, many clients will engage in honest GC to keep their requirements
on service memory low. Many services will not need to cut such clients off
because of excessive resource demands.

E/CapTP and Cap'n Proto have an additional form of manual deallocation
decision besides vat termination. Between pair of vats there are both
"offline references" which survive partition and "live references" with
last only up to partition. Because offline references can be reconnected
after a partition, they are not subject to GC. Instead, E provides three
hooks for manually deallocating them. From Chapter 17.4 "Persistence" of <
http://erights.org/talks/thesis/markm-thesis.pdf>:

The operations for making an offline capability provide three options for
ending this obligation: It can expire at a chosen future date, giving the
association a time-to-live. It can expire when explicitly cancelled, making
the association revocable. And it can expire when the hosting vat
incarnation crashes, making the association transient. An association which
is not transient is durable.

Since vats must be prepared for inter-vat partition, a vat can preemptively
induce a partition with a counterparty vat, in order to preemptively sever
the live references between them, forcing reconnection to rely on the
offline references subject to the above manual policies. In E/CapTP and
Cap'n Proto, the distributed GC governs only these transient live refs,
which substantially reduces the pressure to preemptively sever these
connections. (The NodeKen system on which Dr. SES will be built does not
make this split, forcing it to rely on manually deallocation of vats rather
than connections, in order to manually reclaim memory. There is a place in
the world for each failure model. Here I am arguing only for the
CapTP/Cap'n Proto failure model.)

Ultimately, the only principled solution for distributed storage management
among mutually suspicious machines is some form of quid pro quo, such as
the *market-sweep algorithms* <
http://e-drexler.com/d/09/00/AgoricsPapers/agoricpapers/ie/ie3.html>. But
even after 25 years, these still seem premature. Distributed GC +
preemptive deallocation for extreme conditions, either of vats or
connections, is a great compromise in the meantime.

> There are already many ways to drop stuff from one process when
> another process probably doesn't need it anymore. It doesn't require
> nondeterministic language features. Consider, in the simplest case,
> "session data" (the capability in question is represented on the wire
> as an HTTP cookie) that expires on a timer. Or IPDL's managed
> hierarchy of actors
> <
> https://developer.mozilla.org/en-US/docs/IPDL/Tutorial#Subprotocols_and_Protocol_Management_
> >,
> where all references across a given link form a hierarchy, and a whole
> subtree can be dropped with a single message. This approach reduces
> traffic as well as opportunities for leak-inducing errors; and it's
> totally deterministic. Or consider Erlang—one of the best designs for
> distributed computing has no strong cross-process references at all.
>
> -j
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

-- 
    Cheers,
    --MarkM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131111/1346443b/attachment-0001.html>

domenic at domenicdenicola.com (2013-11-17T17:55:30.344Z)

On Mon, Nov 11, 2013 at 9:25 AM, Jason Orendorff <jason.orendorff at gmail.com>wrote:
> JS users do not want RPC systems where one process's memory usage
> depends on getting per-object callbacks from an untrusted peer's GC
> implementation.

Some will. I do. See <http://research.google.com/pubs/pub40673.html>.

Why do you believe manual deallocation decisions will be easier in
distributed systems than they are locally? If anything, local manual
deallocation should be easier, and these have already proven hard enough
that people (except C++ programmers) have turned to local GC.

You are correct that a distributed mutually suspicious system must support
manual deallocation as well. Your Erlang example is quite telling: Erlang
does have strong cross process references, the process id. However, because
they are forgeable, processes cannot be garbage collected. The decision to
terminate a process is the decision to preemptively terminate service to
clients that may still exist. Sometimes this needs to be done, even with
GC, because the client causes the service to retain more memory than the
service wishes to continue to devote to this client. As the Erlang example
also indicates, the natural unit for such preemptive manual deallocation
decisions is the vat/worker/process.

However, many clients will engage in honest GC to keep their requirements
on service memory low. Many services will not need to cut such clients off
because of excessive resource demands.

E/CapTP and Cap'n Proto have an additional form of manual deallocation
decision besides vat termination. Between pair of vats there are both
"offline references" which survive partition and "live references" with
last only up to partition. Because offline references can be reconnected
after a partition, they are not subject to GC. Instead, E provides three
hooks for manually deallocating them. From Chapter 17.4 "Persistence" of
http://erights.org/talks/thesis/markm-thesis.pdf:

The operations for making an offline capability provide three options for
ending this obligation: It can expire at a chosen future date, giving the
association a time-to-live. It can expire when explicitly cancelled, making
the association revocable. And it can expire when the hosting vat
incarnation crashes, making the association transient. An association which
is not transient is durable.

Since vats must be prepared for inter-vat partition, a vat can preemptively
induce a partition with a counterparty vat, in order to preemptively sever
the live references between them, forcing reconnection to rely on the
offline references subject to the above manual policies. In E/CapTP and
Cap'n Proto, the distributed GC governs only these transient live refs,
which substantially reduces the pressure to preemptively sever these
connections. (The NodeKen system on which Dr. SES will be built does not
make this split, forcing it to rely on manually deallocation of vats rather
than connections, in order to manually reclaim memory. There is a place in
the world for each failure model. Here I am arguing only for the
CapTP/Cap'n Proto failure model.)

Ultimately, the only principled solution for distributed storage management
among mutually suspicious machines is some form of quid pro quo, such as
the [market-sweep algorithms](http://e-drexler.com/d/09/00/AgoricsPapers/agoricpapers/ie/ie3.html). But even after 25 years, these still seem premature. Distributed GC + preemptive deallocation for extreme conditions, either of vats or
connections, is a great compromise in the meantime.

Edit