Module-based parallel JS strawman

# Isiah Meadows (7 years ago)

TL;DR: I'm proposing a completely different threading model, supporting lightweight and cooperative multitasking while allowing synchronous, thread-safe communication and inspection. Apologies for the length.


Admittedly, this is a significant departure from the original JavaScript concurrency model, web workers, that's implemented in browsers. But I feel the lower level, more tightly integrated nature of it will make it far faster and lighter in practice, while still avoiding some significant footguns and still working with the traditional single-threaded nature of JavaScript.

So here's my idea: add an opt-in, low-level threading mechanism that also includes several safety measures for both security and safety, to leverage the evented nature of JavaScript as well as its module system, and to remain as light as possible to make the multithreaded experience still feel native and intuitive, not bolt-on.

For more details, see my gist, which is much more complete than this: gist.github.com/isiahmeadows/a01799b4dc9019c55dfcd809450afd24


First, add an import.fork(module) method-like expression, down the vein of Domenic's import(module) proposal, and to a lesser extent, the super() pseudo-method. This expression returns a promise to a thread instance. The thread instance will have the following properties:

  • thread.module - This is the resulting imported module.
  • thread.global - The global instance for the thread.
  • thread.close() - Close the thread. This returns a promise resolved when it's done.
  • thread.onClose(handler) - Register a handler for synchronous cleanup, triggered when the thread is closed (but after any remaining atomic calls).
  • thread.active - Whether this thread is currently active (i.e. not closed).
  • thread.closing - Whether this thread is currently closing, but not fully closed.
  • thread.id - A unique integer ID for this thread.

The instantiated module is literally the exported module itself, with minimal overhead.

The instantiated module is literally the exported module itself, with minimal overhead. It's loaded inside its own separate realm. In the child, there is a global currentThread object with two methods:

  • currentThread.close() - Close the currently running thread after the current tick ends. Or, in other words, make this tick the last.

  • currentThread.onClose(handler) - Register a handler for synchronous cleanup, triggered when the thread is closed (but after any remaining atomic calls).

  • currentThread.active - Whether the current thread is active (i.e. the thread's event loop is still alive).

  • currentThread.closing - Whether the current thread is currently closing, but ticks are still being processed.


Now there are several safety mechanisms in place to avoid numerous race conditions, and to maintain overall safety of the language.

  • There's now a concept of atomic objects and thread contexts. If an atomic object was created in your thread's context, you may modify it as you wish. If it was created in another thread's context, you can only read it, and attempting to modify it in any way causes a TypeError to be thrown. Consider it a special proxy of sorts.

    • Modules and many global objects are atomic objects, including Object and Object.prototype themselves, as well as SharedArrayBuffer and its prototype.
    • Atomic objects use a new keyword called atomic. Atomic object literals are created with atomic {foo: 1}, atomic functions are created with atomic function foo() {}, etc.
  • There are two key restrictions/invariants with atomic objects, and attempts to violate them will result in a TypeError:

    1. Atomic objects may only reference other atomic objects, including from their prototype.
    2. Methods defined on atomic classes and shorthand methods on atomic object literals are themselves implicitly atomic.
  • There will be two new methods for dealing with atomic objects: Object.isAtomic(object) to detect atomic objects and Object.isFromThread(object) to detect whether an object is from the current thread.

  • Only atomic objects may escape their thread context, and they themselves are read-only outside it (modifications result in type errors). This is a very key safety layer to allow most current optimizations to persist, and it's also to mostly prevent a whole class of bugs.

  • When a thread dies, its event loop will stop scheduling tasks, finishing those which were already scheduled. After it eventually dries up, the event loop is closed, the OS thread exits, and the associated data is eligible for GC. The thread context survives thread death, because it's on the heap shared by all threads.

  • Atomic function and class declarations are immutable, to enable some optimizations stated below. Additionally, their instances and prototypes are both atomic, as sugar to avoid a ton of repetition.

  • Calling atomic functions from the same thread context requires no lock, and is a normal function call.

  • Calling atomic functions, atomic getters, or atomic setters from a different thread context waits for the callee's context to finish its current microtask (not tick) before running the call. This avoids numerous race conditions with less performance hit by requiring function-level locks, and it allows atomic functions to wrap non-atomic things like context variables.

  • Constant property accesses on atomic objects (e.g. known non-configurable non-writable, const module exports) are done with no lock.

  • Potentially mutable property accesses on atomic objects (i.e. everything else) are synchronized in the same way functions are.

  • The cross-thread scheduling works much like an event loop interleaved within another event loop. There's a thread task queue that is executed with higher priority over the main event loop. The execution is specified in a way that requires no actual locks (the threads wait for everything to run instead).

  • Each thread runs in its own realm, but the ability to expose direct instances makes sharing far easier to do, with a more intuitive interface. Modules are cached per-thread, though.


In the gist, I do address several performance and security concerns as well. Some of them are easily mitigated (simultaneous variable read/write is impossible by design) and others some are more difficult to mitigate (concurrent heap modification), but some are inherent to parallelism itself (SharedArrayBuffer-based cache timing attacks are already possible with Web Workers).

# Florian Bösch (7 years ago)

On Sun, Nov 6, 2016 at 6:57 AM, Isiah Meadows <isiahmeadows at gmail.com>

wrote:

TL;DR: I'm proposing a completely different threading model, supporting lightweight and cooperative multitasking while allowing synchronous, thread-safe communication and inspection. Apologies for the length.

Cooperative multitasking is a mode of operation where a thread once started continues to run until it yields control. What you are describing isn't cooperative multitasking since it lacks the ability to yield. Therefore your suggestion is preemptive multitasking where tasks timeshare CPU time (presumably switching at random/scheduled intervals at the interpreter level).

It's widely acknowledged that preemptive multitasking (however you implement it be that via interpreter level granularity, or as OS native threads etc.) is difficult to control and often results in incorrect code, race conditions, data corruption, mutex proliferation and deadlocks. It also routinely fails to efficiently address I/O scheduling concerns, because it is effectively just trying to timeshare a single strand of processing across disjoint threads of execution. However, preemptive multitasking is an effective strategy to timeshare CPU time if that is the bottleneck. It could be argued though that WebWorkers already fill that role more effectively.

Structured coroutines is a concept that allows switching between strands of execution by "yielding" control to another strand. This isn't necessarily related to multitasking, but it can be. If the coroutines are managed by a scheduler, which is trivial to implement and customize in the presence of coroutine functionalty, then coroutines can be used to implement cooperative multitasking. It's widely acknowledged that cooperative multitasking is not an effective strategy to address CPU timesharing, however it is an extremely efficient method to address all other forms of scheduling problems (such as I/O scheduling).

Please note that efficiency in scheduling is always a function of how well the scheduler matches the use-case in question. Thus it is important to have a low level concept like coroutines upon which user-code can exist that implements the scheduling. Examples of scheduling problems may include:

  • wait for user-input arrives (i.e. wait for input)
  • wait for network actcivity (download or upload) has finished (i.e. wait for xhr)
  • wait for a WebWorker has finished processing (i.e. wait for worker.postMessage)
  • wait for GPU activity (i.e. query results, texImage2D, bufferData, shader compile etc.)
  • evaluation of a dependency graph
  • evaluation finite state machines (without resorting to state transition methods)
  • etc.

I believe it would be much better to introduce true coroutines to JS, than to try to introduce preemptive threading.

# Isiah Meadows (7 years ago)

Ok. I'll admit my terminology is mostly incorrect (this isn't technically cooperative). I also have a few clarifications inline.

On Sun, Nov 6, 2016, 06:03 Florian Bösch <pyalot at gmail.com> wrote:

On Sun, Nov 6, 2016 at 6:57 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:

TL;DR: I'm proposing a completely different threading model, supporting lightweight and cooperative multitasking while allowing synchronous, thread-safe communication and inspection. Apologies for the length.

Cooperative multitasking is a mode of operation where a thread once started continues to run until it yields control. What you are describing isn't cooperative multitasking since it lacks the ability to yield. Therefore your suggestion is preemptive multitasking where tasks timeshare CPU time (presumably switching at random/scheduled intervals at the interpreter level).

Yeah... This is where I screwed up in my terminology. I'll fix the wording in my gist. I'll note that technically, it's a mixture of preemptive and cooperative multitasking, since the task's thread is in control of when it returns.

It's widely acknowledged that preemptive multitasking (however you implement it be that via interpreter level granularity, or as OS native threads etc.) is difficult to control and often results in incorrect code, race conditions, data corruption, mutex proliferation and deadlocks.

I took special steps to avoid race conditions outside asynchronous code that's already marked "unsafe" (with the atomic keyword). If you look in the section explaining the timing queue, I intentionally specified it in terms of waits, not locks (deadlock is not a problem). Instead of locking for access, you're waiting on a return value after requesting an access.

It also routinely fails to efficiently address I/O scheduling concerns,

because it is effectively just trying to timeshare a single strand of processing across disjoint threads of execution. However, preemptive multitasking is an effective strategy to timeshare CPU time if that is the bottleneck. It could be argued though that WebWorkers already fill that role more effectively.

Structured coroutines is a concept that allows switching between strands of execution by "yielding" control to another strand. This isn't necessarily related to multitasking, but it can be. If the coroutines are managed by a scheduler, which is trivial to implement and customize in the presence of coroutine functionalty, then coroutines can be used to implement cooperative multitasking. It's widely acknowledged that cooperative multitasking is not an effective strategy to address CPU timesharing, however it is an extremely efficient method to address all other forms of scheduling problems (such as I/O scheduling).

My main focus on this was CPU scheduling and time sharing. Most runtimes already shove most outside I/O to another thread, so I wasn't that concerned about that. (You could use some indirection and deferral to implement cooperative multitasking.)

Please note that efficiency in scheduling is always a function of how well the scheduler matches the use-case in question. Thus it is important to have a low level concept like coroutines upon which user-code can exist that implements the scheduling. Examples of scheduling problems may include:

  • wait for user-input arrives (i.e. wait for input)

  • wait for network actcivity (download or upload) has finished (i.e. wait for xhr)

  • wait for a WebWorker has finished processing (i.e. wait for worker.postMessage)

  • wait for GPU activity (i.e. query results, texImage2D, bufferData, shader compile etc.)

These could be handled by simply returning a promise. Each thread has its

own event loop which runs within that thread. So just not doing it synchronously would be the solution.

  • evaluation of a dependency graph
  • evaluation finite state machines (without resorting to state transition methods)
  • etc.

I believe it would be much better to introduce true coroutines to JS, than to try to introduce preemptive threading.

Cooperative multithreading and coroutines can be done with async iterators. When those get standardized, it'll become easy to use cooperative multitasking. (I need to make well-known symbols identical cross-realm to make it work, though.)

export atomic async function *sync(file) {
  while (true) {
    if (await yield) await updateFile(file)
  }
}
# Florian Bösch (7 years ago)

On Sun, Nov 6, 2016 at 9:29 PM, Isiah Meadows <isiahmeadows at gmail.com>

wrote:

My main focus on this was CPU scheduling and time sharing. Most runtimes already shove most outside I/O to another thread, so I wasn't that concerned about that. (You could use some indirection and deferral to implement cooperative multitasking.)

The problem isn't that the platform can't execute this in another thread. The problem is that you can't write synchronous code anymore to do synchronous code, this comes at a considerable cost of complexity to programs.

Please note that efficiency in scheduling is always a function of how well the scheduler matches the use-case in question. Thus it is important to have a low level concept like coroutines upon which user-code can exist that implements the scheduling. Examples of scheduling problems may include:

  • wait for user-input arrives (i.e. wait for input)

  • wait for network actcivity (download or upload) has finished (i.e. wait for xhr)

  • wait for a WebWorker has finished processing (i.e. wait for worker.postMessage)

  • wait for GPU activity (i.e. query results, texImage2D, bufferData, shader compile etc.)

These could be handled by simply returning a promise. Each thread has its own event loop which runs within that thread. So just not doing it synchronously would be the solution.

Promises are a very bad model to handle cooperative multitasking and quasi-synchronous code (I believe I don't have to illustrate that).

  • evaluation of a dependency graph
  • evaluation finite state machines (without resorting to state transition methods)
  • etc.

I believe it would be much better to introduce true coroutines to JS, than to try to introduce preemptive threading.

Cooperative multithreading and coroutines can be done with async iterators. When those get standardized, it'll become easy to use cooperative multitasking. (I need to make well-known symbols identical cross-realm to make it work, though.)

export atomic async function *sync(file) {
  while (true) {
    if (await yield) await updateFile(file)
  }
}

More random line noise atop a broken model of co-routines, isn't going to fix the lack of proper co-routines (in fact, I think this just exasperates it).