proposal for efficient 64-bit arithmetic without value objects

# Vyacheslav Egorov (12 years ago)

There are algorithms that require 64-bit arithmetic, however currently JavaScript does not provide any efficient way to perform it.

Similar to Math.imul which was added for the purposes of efficient 32-bit multiplication I propose extending Math object with a family of operations Math.<s>64<op> where <s> is one of {i, u} and <op> is one of { sub, add, mul, div, neg }

These functions are specified as follows for <op> in { sub, add, mul, div }:

`Math.<s>64<op>`

accepts 4 arguments al, ah, bl, bh
x' = ToUint32(x), for x in {al, ah, bl, bh }
pairs (al', ah') and (bl', bh') are interpreted as 64-bit signed (if <s> == i) or unsigned (if <s> == u) integer values with first member of the pair containing low bits and second - high bits.
Result is computed as a standard overflowing operation on 64-bit values and is decomposed into two int32 values (cl', ch') = (al', ah') <op> (bl', bh')
A one shot property Math.H is created that returns ch' on the first access and deletes itself.
Return cl'

Unary operation is specified in a similar straightforward fashion.

Here is addition of three unsigned 64bit integers using described functions:

var al, ah, bl, bh, cl, ch, dl, dh;

dl = Math.u64add(Math.u64add(al, ah, bl, bh), Math.H, cl, ch);
dh = Math.H;

This API is designed purposefully to allow good optimizations of the resulting code, e.g. optimizing compiler knowing one shot semantics of H property can eradicate any memory traffic and keep things in registers even collocating (dl, dh) to a single 64-bit register on 64-bit platform.

I think such API has many advantages:

can greatly improve performance of numeric algorithms relying on 64-bit math;
easily polyfillable in the current JavaScript;
does not depend on any complicated language changes (e.g. value objects);
simple to implement and optimize on any platform (including 32bit ones);

Hi,

There are algorithms that require 64-bit arithmetic, however currently
JavaScript does not provide any efficient way to perform it.

Similar to Math.imul which was added for the purposes of efficient
32-bit multiplication I propose extending Math object with a family of
operations Math.<s>64<op> where <s> is one of {i, u} and <op> is one
of { sub, add, mul, div, neg }

These functions are specified as follows for <op> in { sub, add, mul, div }:

Math.<s>64<op>

1. accepts 4 arguments al, ah, bl, bh
2. x' = ToUint32(x), for x in {al, ah, bl, bh }
3. pairs (al', ah') and (bl', bh') are interpreted as 64-bit signed
(if <s> == i) or unsigned (if <s> == u) integer values with first
member of the pair containing low bits and second - high bits.
4. Result is computed as a standard overflowing operation on 64-bit
values and is decomposed into two int32 values (cl', ch') = (al', ah')
<op> (bl', bh')
5. A one shot property Math.H is created that returns ch' on the first
access and deletes itself.
6. Return cl'

Unary operation is specified in a similar straightforward fashion.

Here is addition of three unsigned 64bit integers using described functions:

var al, ah, bl, bh, cl, ch, dl, dh;

dl = Math.u64add(Math.u64add(al, ah, bl, bh), Math.H, cl, ch);
dh = Math.H;

This API is designed purposefully to allow good optimizations of the
resulting code, e.g. optimizing compiler knowing one shot semantics of
H property can eradicate any memory traffic and keep things in
registers even collocating (dl, dh) to a single 64-bit register on
64-bit platform.

I think such API has many advantages:

- can greatly improve performance of numeric algorithms relying on 64-bit math;
- easily polyfillable in the current JavaScript;
- does not depend on any complicated language changes (e.g. value objects);
- simple to implement and optimize on any platform (including 32bit ones);

--
Vyacheslav Egorov

# Olov Lassus (12 years ago)

2013/10/30 Vyacheslav Egorov <me at mrale.ph>

5. A one shot property Math.H is created that returns ch' on the first access and deletes itself.

Alternative step 5: Math.H is assigned ch'.

Rationale being faster polyfilled execution, in combination with a lack of imagination from my side to come up with a use case where any code would be interested in knowing (at run-time) whether Math.H exists or not (i.e. whether it has already been read). Does such a use case exist?

If all of JSC, Chakra, V8 et.al reliably optimizes away most overhead of a polyfilled Math.H getter then perhaps this does not matter.

2013/10/30 Vyacheslav Egorov <me at mrale.ph>

> 5. A one shot property Math.H is created that returns ch' on the first
> access and deletes itself.
>

Alternative step 5: Math.H is assigned ch'.

Rationale being faster polyfilled execution, in combination with a lack of
imagination from my side to come up with a use case where any code would be
interested in knowing (at run-time) whether Math.H exists or not (i.e.
whether it has already been read). Does such a use case exist?

If all of JSC, Chakra, V8 et.al reliably optimizes away most overhead of a
polyfilled Math.H getter then perhaps this does not matter.

/Olov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131030/549b1481/attachment.html>

# Vyacheslav Egorov (12 years ago)

Rationale being faster polyfilled execution

The main reason for H being one shot is to allow optimizing compiler elide updating it in most cases to eliminate memory traffic.

After thinking about it a bit I propose the following alternative step 5:

Math.H is from the very beggining a non-configurable non-writable accessor property with a getter that returns hidden inner value and always zeros inner value.

> Rationale being faster polyfilled execution

The main reason for H being one shot is to allow optimizing compiler
*elide* updating it in most cases to eliminate memory traffic.

After thinking about it a bit I propose the following alternative step 5:

Math.H is from the very beggining a non-configurable non-writable
accessor property with a getter that returns hidden inner value and
always zeros inner value.

--
Vyacheslav Egorov

On Wed, Oct 30, 2013 at 5:28 PM, Olov Lassus <olov.lassus at gmail.com> wrote:
> 2013/10/30 Vyacheslav Egorov <me at mrale.ph>
>>
>> 5. A one shot property Math.H is created that returns ch' on the first
>> access and deletes itself.
>
>
> Alternative step 5: Math.H is assigned ch'.
>
> Rationale being faster polyfilled execution, in combination with a lack of
> imagination from my side to come up with a use case where any code would be
> interested in knowing (at run-time) whether Math.H exists or not (i.e.
> whether it has already been read). Does such a use case exist?
>
> If all of JSC, Chakra, V8 et.al reliably optimizes away most overhead of a
> polyfilled Math.H getter then perhaps this does not matter.
>
> /Olov
>

# Olov Lassus (12 years ago)

2013/10/30 Vyacheslav Egorov <me at mrale.ph>

The main reason for H being one shot is to allow optimizing compiler elide updating it in most cases to eliminate memory traffic.

Aaah. Thanks for pointing this out - I thought only of the polyfill performance so I neglected this key aspect of your proposal.

After thinking about it a bit I propose the following alternative step 5:

Math.H is from the very beggining a non-configurable non-writable accessor property with a getter that returns hidden inner value and always zeros inner value.

+1 (for now) :)

2013/10/30 Vyacheslav Egorov <me at mrale.ph>

> > Rationale being faster polyfilled execution
>
> The main reason for H being one shot is to allow optimizing compiler
> *elide* updating it in most cases to eliminate memory traffic.
>

Aaah. Thanks for pointing this out - I thought only of the polyfill
performance so I neglected this key aspect of your proposal.


> After thinking about it a bit I propose the following alternative step 5:
>
> Math.H is from the very beggining a non-configurable non-writable
> accessor property with a getter that returns hidden inner value and
> always zeros inner value.
>

+1 (for now) :)

/Olov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131030/e62abc4d/attachment.html>

# Vyacheslav Egorov (12 years ago)

Some people find "global" state that this proposal introduces bad. I see two ways addressing this:

Returning {lo, hi} object.
- Pros: no global state, in combination with destructuring allows to write concise code, overhead can still be optimized away.
- Cons: performance of polyfill is abysmal on bad and moderately good VMs, requires allocation sinking pass to optimize away object allocation.
Make H property of the respective operation (e.g. u64mul updates its own property H)
- Pros: easy to implement, good perf on bad VMs
- Cons: still kinda global state
Math.<s>64<op> can become Math.createOperator(<s>64, <op>) that returns function with H property:
```
var add = Math.createOperator("u64", "add");
var dl = add(add(al, ah, bl, bh), add.H, cl, ch);
var dh = add.H;
```
- Pros: no global state, relatively good performance on the non advanced VMs, can be actually extended(!) e.g. SIMD operations can be exposed as Math.createOperator("simd128", "add")

Some people find "global" state that this proposal introduces bad. I
see two ways addressing this:

- Returning {lo, hi} object.

Pros: no global state, in combination with destructuring allows to
write concise code, overhead can still be optimized away.
Cons: performance of polyfill is abysmal on bad and moderately good
VMs, requires allocation sinking pass to optimize away object
allocation.

- Make H property of the respective operation (e.g. u64mul updates its
own property H)

Pros: easy to implement, good perf on bad VMs
Cons: still kinda global state

- Math.<s>64<op> can become Math.createOperator(<s>64, <op>) that
returns function with H property:

var add = Math.createOperator("u64", "add");
var dl = add(add(al, ah, bl, bh), add.H, cl, ch);
var dh = add.H;

Pros: no global state, relatively good performance on the non advanced
VMs, can be actually extended(!) e.g. SIMD operations can be exposed
as Math.createOperator("simd128", "add")

--
Vyacheslav Egorov

On Wed, Oct 30, 2013 at 5:46 PM, Olov Lassus <olov.lassus at gmail.com> wrote:
> 2013/10/30 Vyacheslav Egorov <me at mrale.ph>
>>
>> > Rationale being faster polyfilled execution
>>
>> The main reason for H being one shot is to allow optimizing compiler
>> *elide* updating it in most cases to eliminate memory traffic.
>
>
> Aaah. Thanks for pointing this out - I thought only of the polyfill
> performance so I neglected this key aspect of your proposal.
>
>>
>> After thinking about it a bit I propose the following alternative step 5:
>>
>> Math.H is from the very beggining a non-configurable non-writable
>> accessor property with a getter that returns hidden inner value and
>> always zeros inner value.
>
>
> +1 (for now) :)
>
> /Olov
>

# Luke Wagner (12 years ago)

Just to be sure, do you agree that both the {lo, hi}-returning API and the magic-property API should both be able to achieve equivalent performance on a JS engine that has specifically added and optimized these int64 builtins? I think this is true.

Assuming so, the reason to prefer the rather more awkward magic-property API would be purely because its polyfill is more efficient. This is a tough choice, but it seems like bending the spec for the polyfill is overly conservative in this case. A lot of the use cases I hear for int64 come from crypto and other very specific algorithms which already have implementations in JS. In such cases, it seems like the authors have to write a new version of the algorithm using the new builtins anyway so, if performance was important, they could just keep around the old version and pick which version to call based on whether the new builtins are present. Or they can just wait until the optimization is broadly available before switching.

The other main use case I can think of is large compiled C++ codebases. However, in our experience, C++ codebases tend not to heavily use int64 so the overhead of the polyfill would be less significant.

Are there any other use cases you have in mind that really demand high polyfill performance?

API considerations aside, though, I like the idea of bringing fast 64-bit arithmetic to JS without waiting for value objects.

Just to be sure, do you agree that both the {lo, hi}-returning API and the magic-property API should both be able to achieve equivalent performance on a JS engine that has specifically added and optimized these int64 builtins?  I think this is true.

Assuming so, the reason to prefer the rather more awkward magic-property API would be purely because its polyfill is more efficient.  This is a tough choice, but it seems like bending the spec for the polyfill is overly conservative in this case.  A lot of the use cases I hear for int64 come from crypto and other very specific algorithms which already have implementations in JS.  In such cases, it seems like the authors have to write a new version of the algorithm using the new builtins anyway so, if performance was important, they could just keep around the old version and pick which version to call based on whether the new builtins are present.  Or they can just wait until the optimization is broadly available before switching.

The other main use case I can think of is large compiled C++ codebases.  However, in our experience, C++ codebases tend not to heavily use int64 so the overhead of the polyfill would be less significant.

Are there any other use cases you have in mind that really demand high polyfill performance?

API considerations aside, though, I like the idea of bringing fast 64-bit arithmetic to JS without waiting for value objects.

Cheers,
Luke

# Vyacheslav Egorov (12 years ago)

Yes, all API variants I have proposed should result in the equivalent performance, to the best of my knowledge.

I would even say that {lo, hi} one is easier on VMs for two reasons:

VMs tend to have some sort of escape analysis / allocation sinking and they can incorporate { lo, hi } support into this pass;
If VM desires to allocate { lo, hi } value to a single register it might be easier to do that when values are explicitly grouped, VM does not have to rediscover pairing --- it is already there.

You also correctly reasoned that I proposed magic property API for the purposes of faster polyfilling.

So given the choice between { lo, hi } and magical property API if I would prefer { lo, hi } iff I ignore polyfill performance.

The other main use case I can think of is large compiled C++ codebases

I saw crypto libraries (e.g. NaCl) heavily relying on 64bit arithmetic.

Are there any other use cases you have in mind that really demand high polyfill performance?

I am interested in the whole number hierarchy actually (int32 - int64 - bigint). But I have no clear idea what to do here.

One possibility would be to allow passing type arrays alongside with primitive numbers into something Math.big<op>. But this is pretty ugly and probably also results in abysmal polyfill performance.

Yes, all API variants I have proposed should result in the equivalent
performance, to the best of my knowledge.

I would even say that {lo, hi} one is easier on VMs for two reasons:

- VMs tend to have some sort of escape analysis / allocation sinking
and they can incorporate { lo, hi } support into this pass;

- If VM desires to allocate { lo, hi } value to a single register it
might be easier to do that when values are explicitly grouped, VM does
not have to rediscover pairing --- it is already there.

You also correctly reasoned that I proposed magic property API for the
purposes of faster polyfilling.

So given the choice between { lo, hi } and magical property API if I
would prefer { lo, hi } iff I ignore polyfill performance.

> The other main use case I can think of is large compiled C++ codebases

I saw crypto libraries (e.g. NaCl) heavily relying on 64bit arithmetic.

> Are there any other use cases you have in mind that really demand high polyfill performance?

I am interested in the whole number hierarchy actually (int32 - int64
- bigint). But I have no clear idea what to do here.

One possibility would be to allow passing type arrays alongside with
primitive numbers into something Math.big<op>. But this is pretty ugly
and probably also results in abysmal polyfill performance.


--
Vyacheslav Egorov


On Wed, Oct 30, 2013 at 9:56 PM, Luke Wagner <luke at mozilla.com> wrote:
> Just to be sure, do you agree that both the {lo, hi}-returning API and the magic-property API should both be able to achieve equivalent performance on a JS engine that has specifically added and optimized these int64 builtins?  I think this is true.
>
> Assuming so, the reason to prefer the rather more awkward magic-property API would be purely because its polyfill is more efficient.  This is a tough choice, but it seems like bending the spec for the polyfill is overly conservative in this case.  A lot of the use cases I hear for int64 come from crypto and other very specific algorithms which already have implementations in JS.  In such cases, it seems like the authors have to write a new version of the algorithm using the new builtins anyway so, if performance was important, they could just keep around the old version and pick which version to call based on whether the new builtins are present.  Or they can just wait until the optimization is broadly available before switching.
>
> The other main use case I can think of is large compiled C++ codebases.  However, in our experience, C++ codebases tend not to heavily use int64 so the overhead of the polyfill would be less significant.
>
> Are there any other use cases you have in mind that really demand high polyfill performance?
>
> API considerations aside, though, I like the idea of bringing fast 64-bit arithmetic to JS without waiting for value objects.
>
> Cheers,
> Luke
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

# David Herman (12 years ago)

On Oct 30, 2013, at 1:56 PM, Luke Wagner <luke at mozilla.com> wrote:

Just to be sure, do you agree that both the {lo, hi}-returning API and the magic-property API should both be able to achieve equivalent performance on a JS engine that has specifically added and optimized these int64 builtins? I think this is true.

Assuming so, the reason to prefer the rather more awkward magic-property API would be purely because its polyfill is more efficient. This is a tough choice, but it seems like bending the spec for the polyfill is overly conservative in this case. A lot of the use cases I hear for int64 come from crypto and other very specific algorithms which already have implementations in JS. In such cases, it seems like the authors have to write a new version of the algorithm using the new builtins anyway so, if performance was important, they could just keep around the old version and pick which version to call based on whether the new builtins are present. Or they can just wait until the optimization is broadly available before switching.

Agreed -- the magic-property API is pretty astoundingly wacky, and definitely not worth it.

The other main use case I can think of is large compiled C++ codebases. However, in our experience, C++ codebases tend not to heavily use int64 so the overhead of the polyfill would be less significant.

I'm open to the {lo, hi} version of the API as a stopgap, but since we don't see a big need for it in compiled codebases and it's warm beer for human programmers, then looking at Slava's rationale...

I think such API has many advantages:

can greatly improve performance of numeric algorithms relying on 64-bit math;

easily polyfillable in the current JavaScript;

does not depend on any complicated language changes (e.g. value objects);

simple to implement and optimize on any platform (including 32bit ones);

...I'd say it's not particularly necessary for the short-term, and it's definitely not sufficient for the long-term. We can and must do better for programmers than declare {lo, hi} as a realistic API for 64-bit integers. Value types are in JS's future.

API considerations aside, though, I like the idea of bringing fast 64-bit arithmetic to JS without waiting for value objects.

As I say, I could see doing the less wacky API for short-term, but I don't think it's vital.

On Oct 30, 2013, at 1:56 PM, Luke Wagner <luke at mozilla.com> wrote:

> Just to be sure, do you agree that both the {lo, hi}-returning API and the magic-property API should both be able to achieve equivalent performance on a JS engine that has specifically added and optimized these int64 builtins?  I think this is true.
> 
> Assuming so, the reason to prefer the rather more awkward magic-property API would be purely because its polyfill is more efficient.  This is a tough choice, but it seems like bending the spec for the polyfill is overly conservative in this case.  A lot of the use cases I hear for int64 come from crypto and other very specific algorithms which already have implementations in JS.  In such cases, it seems like the authors have to write a new version of the algorithm using the new builtins anyway so, if performance was important, they could just keep around the old version and pick which version to call based on whether the new builtins are present.  Or they can just wait until the optimization is broadly available before switching.

Agreed -- the magic-property API is pretty astoundingly wacky, and definitely not worth it.

> The other main use case I can think of is large compiled C++ codebases.  However, in our experience, C++ codebases tend not to heavily use int64 so the overhead of the polyfill would be less significant.

I'm open to the {lo, hi} version of the API as a stopgap, but since we don't see a big need for it in compiled codebases and it's warm beer for human programmers, then looking at Slava's rationale...

> I think such API has many advantages:
> 
> - can greatly improve performance of numeric algorithms relying on 64-bit math;
> - easily polyfillable in the current JavaScript;
> - does not depend on any complicated language changes (e.g. value objects);
> - simple to implement and optimize on any platform (including 32bit ones);

...I'd say it's not particularly necessary for the short-term, and it's definitely not sufficient for the long-term. We can and must do better for programmers than declare {lo, hi} as a realistic API for 64-bit integers. Value types are in JS's future.

> API considerations aside, though, I like the idea of bringing fast 64-bit arithmetic to JS without waiting for value objects.

As I say, I could see doing the less wacky API for short-term, but I don't think it's vital.

Dave

# Andreas Rossberg (12 years ago)

Instead of returning a pair, you could also do it C-style:

var ret = {}
add(al, ah, bl, bh, ret)
add(ret.lo, ret.hi, cl, ch, ret)
var dl = ret.lo, dh = ret.hi

This way, it's up to the caller to allocate a suitable return buffer and reuse it. (For asm.js, that would probably require extending the spec to allow a module to pre-allocate one such buffer.)

Cleaner than the other hacks, IMO, but still too ugly for an official API.

On 30 October 2013 18:47, Vyacheslav Egorov <me at mrale.ph> wrote:
> Some people find "global" state that this proposal introduces bad. I
> see two ways addressing this:
>
> - Returning {lo, hi} object.
>
> Pros: no global state, in combination with destructuring allows to
> write concise code, overhead can still be optimized away.
> Cons: performance of polyfill is abysmal on bad and moderately good
> VMs, requires allocation sinking pass to optimize away object
> allocation.
>
> - Make H property of the respective operation (e.g. u64mul updates its
> own property H)
>
> Pros: easy to implement, good perf on bad VMs
> Cons: still kinda global state
>
> - Math.<s>64<op> can become Math.createOperator(<s>64, <op>) that
> returns function with H property:
>
> var add = Math.createOperator("u64", "add");
> var dl = add(add(al, ah, bl, bh), add.H, cl, ch);
> var dh = add.H;
>
> Pros: no global state, relatively good performance on the non advanced
> VMs, can be actually extended(!) e.g. SIMD operations can be exposed
> as Math.createOperator("simd128", "add")

Instead of returning a pair, you could also do it C-style:

var ret = {}
add(al, ah, bl, bh, ret)
add(ret.lo, ret.hi, cl, ch, ret)
var dl = ret.lo, dh = ret.hi

This way, it's up to the caller to allocate a suitable return buffer
and reuse it. (For asm.js, that would probably require extending the
spec to allow a module to pre-allocate one such buffer.)

Cleaner than the other hacks, IMO, but still too ugly for an official API.

/Andreas