# Efficient 64 bit arithmetic

I like this a lot.

It looks like it will have a fairly mellow performance cliff in cases where VM optimizations fail for whatever reason, since even in a lower-tier compiler incapable of sophisticated optimizations all you really have to do is make sure that these new math function calls bottom out in something efficient.

This would be even better if there was a story for div and mod. Is there one?

Latest news on "div" AFAIK:

That’s unrelated, since that discussion deals with integer division for the domain of integers that is representable using ES numbers (i.e. IEEE doubles). This discussion, and my question about division, is in to 64-bit arithmetic.

64-bit integers do not fit into ES numbers, and so you need the lo/hi pairs like what Fabrice proposes (or some kind of value objects, which would be a different beast). For the lo/hi pair approach, we logically want a division like:

(lo, hi) = Math.idiv64(lo0, hi0, lo1, hi1)

I am wondering if Fabrice has ideas about how to represent this.

The main reason why I was proposing (hi, lo)-pair based API instead of lower version is to

a) avoid necessity writing low-level looking code in the high-level language which is JS;

b) avoid pattern matching in the JS code to recognize construction built out of low-level 32-bit counterparts.

This is too much like first writing 32-bit assembly and then trying to pattern match it to emit 64-bit one. If we are starting to add low-level intrisics like that maybe it's time to actually agree that we want a bytecode with a full range of numeric types and operations?

Math.H part of the proposal was an awful hack though in attempt to avoid degrading polyfill performance while keeping API high-level and "atomic".

To be honest I think the actual API for high level language like JS should be either value-object or pair based.

Given that ES6 defines typed-arrays it might be possible to make API based on that:

var a = new Int32Array(2), b = new Int32Array(2), c = new Int32Array(2); Math.div64(c, a, b);

But again it has bad performance unless you pool both a, b, c in the right way (pooling is good for unoptimized code, but pooling globally is bad for optimized code).

Lowered version

cl = Math.div64l(al, ah, bl, bh); ch = Math.div64h(al, ah, bl, bh);

should perform well (no allocations, potentially only some boxing if engine requires that) and can be assembled into a single instruction by the optimizer... but this just looks slightly bizarre.

// Vyacheslav Egorov

Note: my proposal is not limited to 64 bit operations. It can be used to efficiently implement arbitrary precision arithmetic.

64 bit integer division and remainder are less critical than the other operations because they are seldom used and slow anyway. Moreover, it is more difficult to define them because they are undefined cases (division by zero and overflow). So not proposing a specific API for them can be acceptable.

If you really want such an API, it is not worth proposing a lowered version because division is slow anyway. You can define for example:

{quo_lo:q0, quo_hi:q1, rem_lo:r0, rem_hi:r1 } = Math.idiv64(lo0, hi0, lo1, hi1)

for signed 64 bit division and

{quo_lo:q0, quo_hi:q1, rem_lo:r0, rem_hi:r1 } = Math.idivu64(lo0, hi0, lo1, hi1)

for unsigned 64 bit division. In case of division by zero or overflow, null is returned. Instead of returning an object, you can use a preallocated Int32Array to return the 4 32 bit numbers but you also need to return a boolean to indicate the error cases.

Fabrice.

Note: my proposal is not limited to 64 bit operations. It can be used to efficiently implement arbitrary precision arithmetic.

True, it is easier to arrive to the efficient and clear code with lowered representation. With (lo, hi)-result API compiler will have to figure more things out on it's own. Though I don't see any issues as both APIs are strictly of the same power:

```
Math.imuluh(a, b) === Math.u64mul(a, 0, b, 0).hi
```

But then again the question would be: does it make sense to let people implement say bigints in pure JavaScript or better give them bigints as part of the language.

If you really want such an API, it is not worth proposing a lowered version because division is slow anyway

This is also a fair point. But having inconsistent APIs is very weird if you ask me.

Thanks for the proposal, I ran it by TC39 yesterday and it advanced to stage 0. This means work getting to stage 1, per:

docs.google.com/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU/edit

The committee liked it not only for emulating int64 and uint64 more efficiently (e.g., for Emscripten), even though we intend to support int64 and uint64 value objects, but also for bignum emulation (using 64-bit "bigits" or "big digits", h/t Mark Miller), decimal emulation, etc.

This gist

gist.github.com/BrendanEich/4294d5c212a6d2254703

is based on your original post, but with a name change suggested by several on TC39: imuluh => umulh.

Feedback welcome. Thanks again for suggesting this -- thanks to @mraleph for his earlier proposal too.

On Wed, Sep 24, 2014 at 5:37 PM, Brendan Eich <brendan at mozilla.org> wrote:

Hi Fabrice,

Thanks for the proposal, I ran it by TC39 yesterday and it advanced to stage 0. This means work getting to stage 1, per:

docs.google.com/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaV UF7Dk0hpPpPDzU/edit

The committee liked it not only for emulating int64 and uint64 more efficiently (e.g., for Emscripten), even though we intend to support int64 and uint64 value objects, but also for bignum emulation (using 64-bit "bigits" or "big digits", h/t Mark Miller), decimal emulation, etc.

AFAIK, coined by dl.acm.org/citation.cfm?id=319861 , though Smalltalk used the technique much earlier.

Regarding 64 bit arithmetic, there is a simpler solution compared to what was proposed in [1]. It is already possible to get the low 32 bit part of the 64 bit operations for addition, subtraction and multiplication. So it is enough to add a few more functions to get the high 32 bit part of 64 bit additions, subtractions and multiplications. Hence I propose to add the following functions:

Math.iaddh(lo0, hi0, lo1, hi1) : return the high 32 bit part of the 64 bit addition of (hi0, lo0) and (hi1, lo1).

Math.isubh(lo0, hi0, lo1, hi1) : return the high 32 bit part of the 64 bit subtraction of (hi0, lo0) and (hi1, lo1).

Math.imulh(a, b) : return the high 32 bit part of the signed 64 bit product of the 32 bit numbers a and b.

Math.imuluh(a, b) : return the high 32 bit part of the unsigned 64 bit product of the 32 bit numbers a and b.

All these functions convert their argument to 32 bit integers. They return a signed 32 bit integer.

With these functions, the 64 bit operations are easy to implement :

(hi_res, lo_res) = (hi0, lo0) + (hi1, lo1) (64 bit addition):

lo_res = (lo0 + lo1) | 0; hi_res = Math.iaddh(lo0, hi0, lo1, hi1);

(hi_res, lo_res) = (hi0, lo0) - (hi1, lo1) (64 bit subtraction):

lo_res = (lo0 - lo1) | 0; hi_res = Math.isubh(lo0, hi0, lo1, hi1);

(hi_res, lo_res) = a * b (signed 64 bit product of 32 bit integers):

lo_res = Math.imul(a, b); hi_res = Math.imulh(a, b);

(hi_res, lo_res) = a * b (unsigned 64 bit product of 32 bit integers):

lo_res = Math.imul(a, b); hi_res = Math.imuluh(a, b);

(hi_res, lo_res) = (hi0, lo0) * (hi1, lo1) (signed 64 bit product of 64 bit integers):

lo_res = Math.imul(lo0, lo1); hi_res = (Math.imulh(lo0, lo1) + Math.imul(lo0, hi1)) | 0; hi_res = (hi_res + Math.imul(lo1, hi0)) | 0;

(hi_res, lo_res) = (hi0, lo0) * (hi1, lo1) (unsigned 64 bit product of 64 bit integers):

lo_res = Math.imul(lo0, lo1); hi_res = (Math.imuluh(lo0, lo1) + Math.imul(lo0, hi1)) | 0; hi_res = (hi_res + Math.imul(lo1, hi0)) | 0;

It is easy for the compiler to optimize the code because only 32 bit integers are used and because the functions have no side effect. Even if the compiler does not remove the duplicate operation for the low 32 bit parts, the overhead is very small on a 32 bit CPU (1 more instruction than the optimal code).

Fabrice.

[1] www.mail-archive.com/es-discuss%40mozilla.org/msg27237.html