Weak Graph

# Jussi Kalliokoski (9 years ago)

Usually my first assumption when I think I need weak references is that I should use WeakMaps instead. Today I came across an interesting use case and was wrong for the first time. However it wasn't weak references I needed either.

I'm trying to come up with a solution to the problem of rendering lists that's often used as a counter argument for using a framework / view library instead of direct DOM manipulation, where - even with React - updating (even appending to) a list is O(n) at best.

My idea for a solution is that the lists are immutable, contain a reference to their parent and a changeset / diff compared to their parent. This would allow rendering the whole list initally, then just applying the diffs on subsequent renders by walking the graph up to the last known ancestor and combining the changesets. This would make subsequent renders O(1) which is a great improvement even with small lists.

The biggest problem is that this will leak memory like crazy; every revision of the list will be preserved. Let's say we have the following implementation:

function WeakGraph () {
    const parentByNode = new WeakMap();

    this.getLineage = function (node, ancestor) {
        const lineage = [];

        let currentNode = node;
        do {
            lineage.push(currentNode);
            if ( !parentByNode.has(currentNode) ) { throw new Error("node
is not a descendant of ancestor"); }
            currentNode = parentByNode.get(currentNode);
        } while ( currentNode !== ancestor );

        return lineage;
    };

    this.addNode = function (node, parent) {
        parentByNode.set(node, parent);
    };
}

It provides the needed interface and the unused child revisions get cleaned up properly. However:

  • This is a complete nightmare for GC performance because of cyclical weak references.
  • Any reference to a child will maintain references to all its parents.

However this doesn't necessarily need to be the case because the stored ancestry is not observable to anything that creates a WeakGraph, except to the oldest ancestor that has a reference elsewhere.

I'm not sure if this use case alone warrants adding a new feature to the language, or if I'm just missing something and it can be implemented with existing constructs or if there should be some other lower level primitive that would allow building a WeakGraph on the user level.

# Isiah Meadows (9 years ago)

Would this be possible with a mixture of weak references and weak collections?

# Jussi Kalliokoski (9 years ago)

On Wed, Nov 4, 2015 at 6:19 PM, Isiah Meadows <isiahmeadows at gmail.com>

wrote:

Would this be possible with a mixture of weak references and weak collections?

I don't think so - the only potential implementation I can think of would make a weak reference to the parent which would allow the parent to be GCed, but then all the nodes between the latest revision and the ancestor would require strong references somewhere in order to be maintained, making the whole thing kinda pointless because you couldn't rely on it working. Also this use case doesn't require making GC observable, which I think is pretty much the only feature of weak references that WeakMaps don't provide.

# Isiah Meadows (9 years ago)

I don't think that's possible with traditional GC. If you can find an existing implementation in a GC language (like Java) that has the behavior you want, I'll be interested. Otherwise, the only way is explicit garbage collection.

# Jason Orendorff (9 years ago)

On Wed, Nov 4, 2015 at 10:09 AM, Jussi Kalliokoski <jussi.kalliokoski at gmail.com> wrote:

I'm trying to come up with a solution to the problem of rendering lists [...] My idea for a solution is that the lists are immutable, contain a reference to their parent and a changeset / diff compared to their parent. [...]

Good problem, interesting idea.

The biggest problem is that this will leak memory like crazy; every revision of the list will be preserved.

OK. Perhaps obviously, the only way around this is to mutate the list, breaking the chain at a point where nobody cares about the rest of it anymore.

The approach you've outlined is to have the GC tell you when to do the mutation, but why is that a good idea? You can do it deterministically in getLineage().

Maybe the concepts here would be clearer if we limited the graph to a single linked list. Then it looks a lot like a stream, in the functional reactive programming sense. Let the user (in this case, the renderer) buffer the diffs as needed; it knows when to reset the list. And no need for fancy data structures: it could just be an Array.

# Steve Fink (9 years ago)

On 11/04/2015 08:09 AM, Jussi Kalliokoski wrote:

It provides the needed interface and the unused child revisions get cleaned up properly. However:

  • This is a complete nightmare for GC performance because of cyclical weak references.

Not necessarily. Current Spidermonkey should handle it ok, using a linear-time algorithm for weakmap marking. (I think an inverted representation would also be good for this? I haven't thought it through.)

  • Any reference to a child will maintain references to all its parents.

However this doesn't necessarily need to be the case because the stored ancestry is not observable to anything that creates a WeakGraph, except to the oldest ancestor that has a reference elsewhere.

I'm not sure if this use case alone warrants adding a new feature to the language, or if I'm just missing something and it can be implemented with existing constructs or if there should be some other lower level primitive that would allow building a WeakGraph on the user level.

The brute-force approach would be to give each node its own weakmap, and add an entry for every ancestor:

this.getLineage() = function (node, ancestor) {
   const lineage = [];
   while (node) {
     lineage.push(node);
     if (node == ancestor)
       return lineage;
     node = node.parentForAncestor.get(ancestor);
   }
}

this.addNode = function (node, parent, ancestor) {
   node.parentForAncestor = new WeakMap();
   for (let key of this.getLineage(parent, ancestor))
     node.parentForAncestor.set(key, parent);
};

...but notice that addNode now requires an ancestor to be specified, and it will only work as far back as that. And of course, this defeats the whole point of the exercise, in that it requires space quadratic in the number of live nodes. And insertion is linear in the distance from the node to the chosen ancestor. Which could still be a win in rare cases, I guess, but my point is only to show that the leak could be "fixed".

I should note that in your example, you have a WeakMap with basically infinite lifetime. That means that keys always hold values live, in which case you might as well store them directly as properties on the key (node.parent = parent).

I also pondered what language primitives would fix this case. The straightforward approach is to expose something tailored specifically to this use case -- call it a WeakList. You cannot look up elements by index. You can call WeakList.prototype.get(begin, end) and it will return an Array of elements from begin..end (where 'begin' and 'end' are actual elements of the list), or undefined if either begin or end is not present in the list. Internally, the implementation would allowed to discard all leading and trailing dead elements. It would be stored as a dag to share space with overlapping lists.

A totally different option is something I'm calling a VagueMap. I should mention here that I don't know of any prior art, nor have I looked for any, so my apologies if this is known by another name. A VagueMap is sort of the dual of a WeakMap, where the value keeps the entry (and key) alive rather than the key keeping the value alive. But to avoid exposing GC behavior, you cannot just look up values by key (since then if you wanted to know when something gets GC'ed, you'd just stick it in a VagueMap under a well-known key and look it up periodically.) Instead, get(key) returns a "vague value", which can only be used in two ways: for equality testing, and as a VagueMap key. VagueMap lookups would treat vague values and their equivalent non-vague values identically. VagueMap would also support has().

Note that VagueMap does not automatically keep its keys live (as in, only the ones with live values will be kept live.) So you still can't iterate over the keys.

This still doesn't fix the original example. The simple replacement of WeakMap with VagueMap will "work", but getLineage() will return an array of vague values, which aren't much use. So I'll need to add another funky feature to VagueMap: if you give it a (non-vague) value, it will hand you back the non-vague, guaranteed live, key. I'll call it VagueMap.prototype.getKey(vkey, value) where vkey is a possibly-vague key that maps to value.

We get back very close to the original code, with an added loop to reify the vague nodes (oh, and I include the ancestor in the lineage -- which means the do/while loop could now easily be a for loop, but I'll stick close to the original):

function WeakGraph () {
     const parentByNode = new VagueMap();

     this.getLineage = function (node, ancestor) {
         const lineage = [];

         let currentNode = node;
         do {
             lineage.push(currentNode);
             if ( !parentByNode.has(currentNode) ) { throw new 
Error("node is not a descendant of ancestor"); }
             currentNode = parentByNode.get(currentNode);
         } while ( currentNode !== ancestor );
         lineage.push(ancestor);

         for (let i = lineage.length; i > 1; i--) {
             lineage[i-2] = parentByNode.getKey(lineage[i-2], lineage[i-1]);
         }

         return lineage;
     };

     this.addNode = function (node, parent) {
         parentByNode.set(node, parent);
     };
}

Given the number of failed attempts I've taken at this today, there's probably some very obvious problem with all of this. But I think this all checks out now?

# Jussi Kalliokoski (9 years ago)

On Fri, 6 Nov 2015 at 17:31 Jason Orendorff <jason.orendorff at gmail.com>

wrote:

On Wed, Nov 4, 2015 at 10:09 AM, Jussi Kalliokoski <jussi.kalliokoski at gmail.com> wrote:

I'm trying to come up with a solution to the problem of rendering lists [...] My idea for a solution is that the lists are immutable, contain a reference to their parent and a changeset / diff compared to their parent. [...]

Good problem, interesting idea.

The biggest problem is that this will leak memory like crazy; every revision of the list will be preserved.

OK. Perhaps obviously, the only way around this is to mutate the list, breaking the chain at a point where nobody cares about the rest of it anymore.

The approach you've outlined is to have the GC tell you when to do the mutation, but why is that a good idea? You can do it deterministically in getLineage().

I'm not sure I follow.

Maybe the concepts here would be clearer if we limited the graph to a single linked list.

Perhaps graph as a concept is too wide for expressing this, but it surely is not a linked list either. It may look like one in some cases though, when there's only one lineage and no branching. However that is not the use case here as when application state gets transformed to a view representation it may have various transformations applied to it, such as sorting, mapping or filtering.

Then it looks a lot like a stream, in the functional reactive programming sense. Let the user (in this case, the renderer) buffer the diffs as needed; it knows when to reset the list. And no need for fancy data structures: it could just be an Array.

That misses the point, to say the least. The idea of React is that you can represent the UI as a declarative function of a snapshot of the state, so the view doesn't have to care about async. With FRP you subscribe/unsubscribe to asynchronous streams which, not unlike the idea here, can be transformed just like normal data structures and forked (to fork an immutable data structure is to just pass the reference). The difference is that streams are an inherently async structure, while what I'm trying to do is not. The idea being not only that the easy case of insert without transforms is O(1), but almost every use case can be further better optimized by knowing the previous state of the data structure.

Consider this: You have a large list of items as an array, unsorted, as your state. The view is a paged listing of the items sorted by different criteria. So basically:

list
  .sort(ascendingBy("something"))
  .slice(FIRST_INDEX_ON_PAGE, LAST_INDEX_ON_PAGE)
  .map(item => <Item item={item} />)

Now something gets added to the middle of the list. Let's look at what we can do with this information at each stage, if we know the previous list and the diff to that:

  • We can perform the sort in linear time because we just have to find where the added item belongs in the list.
  • The slice now has a different diff (only the insert position is different), and
  • if the added item is in view, we can make the insertion index relative to our view and add a remove for the last item in the list.
  • if the added item is before the view, we return an insert for the item coming to view and a remove for the item leaving the view.
  • if the added item is after the view, the diff is empty, so we can stop here.
  • We can map just the diffs at the last stage.

You can implement this with streams, but that will just be an unnecessary abstraction level offering no simplification whatsoever while making the concept needlessly async. Another significant difference between this and FRP is that streams require imperative subscribe / unsubscribe, which is basically just sophisticated reference counting, while having the same issues (user after free -> update after unmount, leaks). What I have in

mind can also be implemented using reference counting, and in fact will be to in my initial version, but having a WeakGraph data structure would make this nasty artifact and source of easy bugs (use after free -> use after

unsubscribe, leaks) go away, just like WeakMap and WeakSet are designed to do for certain other cases.

# Nelo Mitranim (9 years ago)

I was just recently working on the same problem as Jussy: subscribe/unsubscribe pattern for view components, and immutable data tree for efficient updates.

(Working demo here; the subs/unsubs are completely implicit.)

Concluded that weakrefs offer no help with this whatsoever. In fact, they're a red herring. You always want to unsubscribe your listeners deterministically, exactly when the corresponding view is destroyed. Letting them hang around and attempt to update nonexistent views until the next GC phase is no good. In fact, React will give you warnings when you try to update unmounted components. It also provides the "will mount" / "will unmount" lifecycle methods to clean things up.

Pretty sure weakrefs are harmful rather than helpful in this particular use case. But I may have missed the point of the sentiment.

# Jussi Kalliokoski (9 years ago)

On Sun, 8 Nov 2015 at 13:29 Nelo Mitranim <me at mitranim.com> wrote:

I was just recently working on the same problem as Jussy: subscribe/unsubscribe pattern for view components, and immutable data tree for efficient updates.

(Working demo here; the subs/unsubs are completely implicit.)

Concluded that weakrefs offer no help with this whatsoever. In fact, they're a red herring. You always want to unsubscribe your listeners deterministically, exactly when the corresponding view is destroyed. Letting them hang around and attempt to update nonexistent views until the next GC phase is no good. In fact, React will give you warnings when you try to update unmounted components. It also provides the "will mount" / "will unmount" lifecycle methods to clean things up.

Pretty sure weakrefs are harmful rather than helpful in this particular use case. But I may have missed the point of the sentiment.

I think I may have miscommunicated my meaning here. In the idea I've been working on, the view has no subscription (except at top level) that can push data to it, so there's nothing to unsubscribe from. The idea being that the data structures are modeled purely synchronously (current version

  • diff to previous version). The reason for needing reference counting is different from FRP streams: making sure resources can be collected instead of leaving an infinite trail. So basically FRP in this comparison is a push model that needs reference counting to free the leaf nodes whereas the model I have in mind is pull where the reference counting is needed to free the root nodes.
  • Jussi

P.S. Thanks for the demo, will check it out!

# Jason Orendorff (9 years ago)

On Sun, Nov 8, 2015 at 4:45 AM, Jussi Kalliokoski <jussi.kalliokoski at gmail.com> wrote:

Perhaps graph as a concept is too wide for expressing this, but it surely is not a linked list either. It may look like one in some cases though, when there's only one lineage and no branching. However that is not the use case here as when application state gets transformed to a view representation it may have various transformations applied to it, such as sorting, mapping or filtering.

The last sentence here doesn't seem connected to the rest. A linear list of versions can go through a data transformation pipeline of sorts, maps, and filters just fine.

I don't see where lineage or branching comes into any of your examples. I can see how it could, but I think I must be missing something here.

On Fri, 6 Nov 2015 at 17:31 Jason Orendorff <jason.orendorff at gmail.com> wrote:

Then it looks a lot like a stream, in the functional reactive programming sense. Let the user (in this case, the renderer) buffer the diffs as needed; it knows when to reset the list. And no need for fancy data structures: it could just be an Array.

That misses the point, to say the least. The idea of React is that you can represent the UI as a declarative function of a snapshot of the state, so the view doesn't have to care about async. With FRP you subscribe/unsubscribe to asynchronous streams which, not unlike the idea here, can be transformed just like normal data structures and forked (to fork an immutable data structure is to just pass the reference). The difference is that streams are an inherently async structure, [...]

FRP is not inherently async. As a programming model, I'd say it's inherently about user code not having to care about time.

Anyway, my point here is not "you should drop what you're doing and do FRP instead" but rather "if FRP can handle this without new GC primitives, maybe you can too".

while what I'm trying to do is not. The idea being not only that the easy case of insert without transforms is O(1), but almost every use case can be further better optimized by knowing the previous state of the data structure.

Right, the state is a left fold over a series of events. Like in FRP.

You can implement this with streams, but that will just be an unnecessary abstraction level offering no simplification whatsoever while making the concept needlessly async.

I dunno, you sort of lost me I guess. Streams come to mind because of what you're doing: taking a series of events, transforming them, then applying them one by one, as they arrive, to a mutable data sink.

Another significant difference between this and FRP is that streams require imperative subscribe / unsubscribe, which is basically just sophisticated reference counting, while having the same issues (user after free -> update after unmount, leaks).

This is true, and I think it's your strongest point. Furthermore in support of your side here, even without WeakGraph, I think leaks in FRP are more wasteful than they would be in your model, because in FRP the "deltas" are pushed through the system until somebody turns them off, rather than pulled on demand.

Three alternative models:

  1. In Elm, the data flow graph is static. The system manages subscriptions. Drawback: the user can't mutate the graph procedurally.

  2. A system could allow the data flow graph to change, not procedurally exactly, but when a diff is applied that changes the UI. Since the system knows about diffs, it could then automatically manage subscriptions based on what's actually in the UI. Subscribe on insert, unsubscribe on delete. No manual subscription management for the end user; within the framework, you can use reference counting etc.

  3. Implement your system using a "WeakGraph"-like object that keeps strong references to all revisions in a fixed-sized window of recent history. Periodically clear old patches and revisions from this cache. Instead of getLineage(a, b), we have a method diff(a, b) that typically does exactly the same thing, returning an array of patches. But diff may find that the user is trying to diff two versions that are not both in the history. Then it simply does a full diff of the two states, effectively generating a fake lineage. Note that the performance is not amortized constant time: diff() is always fast except when diffing something quite old, which shouldn't happen. And the memory usage of the differ is more predictable than "WeakGraph". Drawback: fake lineages. But so what?