typed array filling convenience AND performance

# Florian Bösch (11 years ago)

The usecases:

1) Filling with custom data

When writing WebGL or physics or many other things todo with large collections of data, it's not unusual to have to fill arrays with custom data of some kind.

someArray.set([
 x0, y0, 1, 0, 0,
 x1, y0, 0, 1, 0,
 x1, y1, 0, 0, 1,
   x0, y0, 1, 0, 0,
   x1, y1, 0, 1, 0,
   x0, y1, 0, 0, 1
 ], i);

2) Copying in data from another array

Some data resides in another array and needs to be copied in. A feature frequently in use by emscripten.

someArray.set(otherArray.subarray(srcOffset, srcSize), dstOffset)

3) Initializing an existing array with a repeated numerical value

For audio processing, physics and a range of other tasks it's important to initialize an array with the same data.

for(var i=0; i<size; i++){ someArray[i] = 0; }

The problem: Doing all of these things is slow and/or unsuitable for realtime code.

someArray.set from a new list is slow due to set being slow, and constructing the list is slow. It's not realtime friendly because it'll construct a new list, which will have to be GCed.
someArray.set is slow due to the new array view construction and it's not realtime friendly due to GCing.
Filling an array one element at a time is slow.

*The test: *jsperf.com/typed-array-fast-filling/4 (screenshot here codeflow.org/pictures/typed-array-test.png and attached)

The status quo:

The fastest way to fill an array with custom data across browsers is:

r[i] = x0;
r[i + 1] = y0;
r[i + 2] = 1;
r[i + 3] = 0;
r[i + 4] = 0;
r[i + 5] = x1;
r[i + 6] = y0;

*Things that are not faster: *

pushing to a list: ~93% slower
a helper function filling from a list: 57-70% slower
array.set: ~73% slower
a helper function filling from arguments: 65% - 93% slower
asm.js: 69-81% slower (even in firefox)

Suggestions:

Browser engines should get a lot better at arguments handling so that non sized arguments can be quickly iterated by native code. Firefox is already pretty good at unboxing a specified argument list (chrome not so much), but I think that test shows that there's ample room for improvement.
someArray.memcpy: Add a method to typed arrays that can shuffle bytes from array A to array B like so: dst.memcpy(dstOffset, src, srcOffset, size). This is to avoid having to allocate an object to do the job.
someArray.memset: Add a method to typed arrays that can initialize them with a value like so: dst.memset(dstOffset, value, size)
someArray.argcpy: Add a (fast) method to typed arrays that can copy the arguments like so: dst.argcpy(dstOffset, 1, 2, 3, 4)
Drastically improve the set method.

(naming and semantic don't matter to me, long as the methods do it efficiently, conveniently and fast what's suggested).

Related discussion:

Consequence of failure to rectify:

Fast code will be unreadable and unmaintainable. Sophisticated and speed requiring code will not be written in ecmascript. Emscripten and asm.js with its hermetic nature will crowd out ecmascript driven developments. Other alternatives such as on GPU transform&feedback and compute shaders will be preferred to solve the problem.

The usecases:

*1) Filling with custom data*

When writing WebGL or physics or many other things todo with large
collections of data, it's not unusual to have to fill arrays with custom
data of some kind.

someArray.set([
> x0, y0, 1, 0, 0,
> x1, y0, 0, 1, 0,
> x1, y1, 0, 0, 1,
>   x0, y0, 1, 0, 0,
>   x1, y1, 0, 1, 0,
>   x0, y1, 0, 0, 1
> ], i);


*2) Copying in data from another array*

Some data resides in another array and needs to be copied in. A feature
frequently in use by emscripten.

someArray.set(otherArray.subarray(srcOffset, srcSize), dstOffset)


*3) Initializing an existing array with a repeated numerical value*

For audio processing, physics and a range of other tasks it's important to
initialize an array with the same data.

for(var i=0; i<size; i++){ someArray[i] = 0; }


*The problem:* Doing all of these things is slow and/or unsuitable for
realtime code.

   1. someArray.set from a new list is slow due to set being slow, and
   constructing the list is slow. It's not realtime friendly because it'll
   construct a new list, which will have to be GCed.
   2. someArray.set is slow due to the new array view construction and it's
   not realtime friendly due to GCing.
   3. Filling an array one element at a time is slow.

*The test: *http://jsperf.com/typed-array-fast-filling/4 (screenshot here
http://codeflow.org/pictures/typed-array-test.png and attached)

*The status quo:*

The fastest way to fill an array with custom data across browsers is:

r[i] = x0;
> r[i + 1] = y0;
> r[i + 2] = 1;
> r[i + 3] = 0;
> r[i + 4] = 0;
> r[i + 5] = x1;
> r[i + 6] = y0;


*Things that are not faster: *

   - pushing to a list: ~93% slower
   - a helper function filling from a list: 57-70% slower
   - array.set: ~73% slower
   - a helper function filling from arguments: 65% - 93% slower
   - asm.js: 69-81% slower (even in firefox)

*Suggestions:*

   1. Browser engines should get a lot better at arguments handling so that
   non sized arguments can be quickly iterated by native code. Firefox is
   already pretty good at unboxing a specified argument list (chrome not so
   much), but I think that test shows that there's ample room for improvement.
   2. *someArray.memcpy*: Add a method to typed arrays that can shuffle
   bytes from array A to array B like so: dst.memcpy(dstOffset, src,
   srcOffset, size). This is to avoid having to allocate an object to do the
   job.
   3. *someArray.memset*: Add a method to typed arrays that can initialize
   them with a value like so: dst.memset(dstOffset, value, size)
   4. *someArray.argcpy*: Add a (fast) method to typed arrays that can copy
   the arguments like so: dst.argcpy(dstOffset, 1, 2, 3, 4)
   5. Drastically improve the set method.

(naming and semantic don't matter to me, long as the methods do it
efficiently, conveniently and fast what's suggested).

*Related discussion:*

   - https://bugzilla.mozilla.org/show_bug.cgi?id=936168
   -
   https://www.khronos.org/webgl/public-mailing-list/archives/1410/msg00105.html

*Consequence of failure to rectify:*

Fast code will be unreadable and unmaintainable. Sophisticated and speed
requiring code will not be written in ecmascript. Emscripten and asm.js
with its hermetic nature will crowd out ecmascript driven developments.
Other alternatives such as on GPU transform&feedback and compute shaders
will be preferred to solve the problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/5911034e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: typed-array-test.png
Type: image/png
Size: 15931 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/5911034e/attachment-0001.png>

# Jussi Kalliokoski (11 years ago)

+1 - especially the lack of something like the proposed memset() has been a massive headache for me.

Semantics (I'm quite nitpicky on them)... I'd prefer the argument order for memcpy() to be (src, dstOffset, srcOffset, size) to be consistent with set(). As for memset(), I'd prefer (value, dstOffset, size) because I've so far actually never needed offset and size. I'd also prefer memcpy to be named closer to set(), because it's basically the same thing with two extra arguments. I'm not sure what the performance implications would be to actually just add the extra two parameters to set as optional, so maybe someone smarter than me can chime in on whether it's better to add a separate method or overload the existing set()?

It's worth noting that set() currently gets relatively faster the bigger the data is, and the last I tested (also confirmed by Jukka Jylänki), at around 4k (surprisingly many) elements it becomes faster than manually assigning values.

+1 - especially the lack of something like the proposed memset() has been a
massive headache for me.

Semantics (I'm quite nitpicky on them)... I'd prefer the argument order for
memcpy() to be (src, dstOffset, srcOffset, size) to be consistent with
set(). As for memset(), I'd prefer (value, dstOffset, size) because I've so
far actually never needed offset and size. I'd also prefer memcpy to be
named closer to set(), because it's basically the same thing with two extra
arguments. I'm not sure what the performance implications would be to
actually just add the extra two parameters to set as optional, so maybe
someone smarter than me can chime in on whether it's better to add a
separate method or overload the existing set()?

It's worth noting that set() currently gets relatively faster the bigger
the data is, and the last I tested (also confirmed by Jukka Jylänki), at
around 4k (surprisingly many) elements it becomes faster than manually
assigning values.

On Thu, Oct 30, 2014 at 10:29 AM, Florian Bösch <pyalot at gmail.com> wrote:

> The usecases:
>
> *1) Filling with custom data*
>
> When writing WebGL or physics or many other things todo with large
> collections of data, it's not unusual to have to fill arrays with custom
> data of some kind.
>
> someArray.set([
>> x0, y0, 1, 0, 0,
>> x1, y0, 0, 1, 0,
>> x1, y1, 0, 0, 1,
>>   x0, y0, 1, 0, 0,
>>   x1, y1, 0, 1, 0,
>>   x0, y1, 0, 0, 1
>> ], i);
>
>
> *2) Copying in data from another array*
>
> Some data resides in another array and needs to be copied in. A feature
> frequently in use by emscripten.
>
> someArray.set(otherArray.subarray(srcOffset, srcSize), dstOffset)
>
>
> *3) Initializing an existing array with a repeated numerical value*
>
> For audio processing, physics and a range of other tasks it's important to
> initialize an array with the same data.
>
> for(var i=0; i<size; i++){ someArray[i] = 0; }
>
>
> *The problem:* Doing all of these things is slow and/or unsuitable for
> realtime code.
>
>    1. someArray.set from a new list is slow due to set being slow, and
>    constructing the list is slow. It's not realtime friendly because it'll
>    construct a new list, which will have to be GCed.
>    2. someArray.set is slow due to the new array view construction and
>    it's not realtime friendly due to GCing.
>    3. Filling an array one element at a time is slow.
>
> *The test: *http://jsperf.com/typed-array-fast-filling/4 (screenshot here
> http://codeflow.org/pictures/typed-array-test.png and attached)
>
> *The status quo:*
>
> The fastest way to fill an array with custom data across browsers is:
>
> r[i] = x0;
>> r[i + 1] = y0;
>> r[i + 2] = 1;
>> r[i + 3] = 0;
>> r[i + 4] = 0;
>> r[i + 5] = x1;
>> r[i + 6] = y0;
>
>
> *Things that are not faster: *
>
>    - pushing to a list: ~93% slower
>    - a helper function filling from a list: 57-70% slower
>    - array.set: ~73% slower
>    - a helper function filling from arguments: 65% - 93% slower
>    - asm.js: 69-81% slower (even in firefox)
>
> *Suggestions:*
>
>    1. Browser engines should get a lot better at arguments handling so
>    that non sized arguments can be quickly iterated by native code. Firefox is
>    already pretty good at unboxing a specified argument list (chrome not so
>    much), but I think that test shows that there's ample room for improvement.
>    2. *someArray.memcpy*: Add a method to typed arrays that can shuffle
>    bytes from array A to array B like so: dst.memcpy(dstOffset, src,
>    srcOffset, size). This is to avoid having to allocate an object to do the
>    job.
>    3. *someArray.memset*: Add a method to typed arrays that can
>    initialize them with a value like so: dst.memset(dstOffset, value, size)
>    4. *someArray.argcpy*: Add a (fast) method to typed arrays that can
>    copy the arguments like so: dst.argcpy(dstOffset, 1, 2, 3, 4)
>    5. Drastically improve the set method.
>
> (naming and semantic don't matter to me, long as the methods do it
> efficiently, conveniently and fast what's suggested).
>
> *Related discussion:*
>
>    - https://bugzilla.mozilla.org/show_bug.cgi?id=936168
>    -
>    https://www.khronos.org/webgl/public-mailing-list/archives/1410/msg00105.html
>
> *Consequence of failure to rectify:*
>
> Fast code will be unreadable and unmaintainable. Sophisticated and speed
> requiring code will not be written in ecmascript. Emscripten and asm.js
> with its hermetic nature will crowd out ecmascript driven developments.
> Other alternatives such as on GPU transform&feedback and compute shaders
> will be preferred to solve the problem.
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/613e1188/attachment.html>

# Allen Wirfs-Brock (11 years ago)

Have you looked at the ES6 typed array constructor people.mozilla.org/~jorendorff/es6-draft.html#sec-properties-of-the-%typedarray%-intrinsic-object and instance methods people.mozilla.org/~jorendorff/es6-draft.html#sec-properties-of-the-%typedarrayprototype%-object In particular check out the from, copyWith, and fill methods.

In general, TC39’s approach in ES6 is for typed arrays to simply be a variant of Array and all of the Array methods (except those that are not applicable to fixed length arrays) are supported for typed arrays. Similarly, if there are common array operations that are still missing we would want them to be added for both Array and typed arrays.

Performance optimizatiuon in an implementation issues. That’s where you should apply performance pressure.

Have you looked at the ES6 typed array constructor http://people.mozilla.org/~jorendorff/es6-draft.html#sec-properties-of-the-%typedarray%-intrinsic-object 
and instance methods http://people.mozilla.org/~jorendorff/es6-draft.html#sec-properties-of-the-%typedarrayprototype%-object  In particular check out the from, copyWith, and fill methods.

In general, TC39’s approach in ES6 is for typed arrays to simply be a variant of Array and all of the Array methods (except those that are not applicable to fixed length arrays) are supported for typed arrays.  Similarly, if there are common array operations that are still missing we would want them to be added for both Array and typed arrays.

Performance optimizatiuon in an implementation issues.  That’s where you should apply performance pressure.

Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/8c8c9735/attachment.html>

# Adrian Perez de Castro (11 years ago)

On Thu, 30 Oct 2014 09:29:36 +0100, Florian Bösch <pyalot at gmail.com> wrote:

The usecases:

[...]

3) Initializing an existing array with a repeated numerical value

For audio processing, physics and a range of other tasks it's important to initialize an array with the same data.
for(var i=0; i<size; i++){ someArray[i] = 0; }

For this use case there is %TypedArray%.prototype.fill(), see:

people.mozilla.org/~jorendorff/es6-draft.html#sec-%typedarray%.prototype.fill

JavaScript engines are expected to implement it at some point. For example I am implementing this in V8, along with other new typed array methods. The engines should be able to generate quite good code for uses of this function and/or provide optimized versions relying on knowledge of the underlying element type of the typed array they are applied to.

On Thu, 30 Oct 2014 09:29:36 +0100, Florian Bösch <pyalot at gmail.com> wrote:

> The usecases:
>
> [...]
> 
> *3) Initializing an existing array with a repeated numerical value*
> 
> For audio processing, physics and a range of other tasks it's important to
> initialize an array with the same data.
> 
> for(var i=0; i<size; i++){ someArray[i] = 0; }

For this use case there is %TypedArray%.prototype.fill(), see:

  http://people.mozilla.org/~jorendorff/es6-draft.html#sec-%typedarray%.prototype.fill

JavaScript engines are expected to implement it at some point. For example
I am implementing this in V8, along with other new typed array methods. The
engines should be able to generate quite good code for uses of this function
and/or provide optimized versions relying on knowledge of the underlying
element type of the typed array they are applied to.

Cheers,

--
 ☺ Adrián

# Jussi Kalliokoski (11 years ago)

Oh, nice! Had completely missed fill() before. Hope to see this land soon. :)

On Thu, Oct 30, 2014 at 3:14 PM, Adrian Perez de Castro <aperez at igalia.com>
wrote:

> On Thu, 30 Oct 2014 09:29:36 +0100, Florian Bösch <pyalot at gmail.com>
> wrote:
>
> > The usecases:
> >
> > [...]
> >
> > *3) Initializing an existing array with a repeated numerical value*
> >
> > For audio processing, physics and a range of other tasks it's important
> to
> > initialize an array with the same data.
> >
> > for(var i=0; i<size; i++){ someArray[i] = 0; }
>
> For this use case there is %TypedArray%.prototype.fill(), see:
>

Oh, nice! Had completely missed fill() before. Hope to see this land soon.
:)


>
> http://people.mozilla.org/~jorendorff/es6-draft.html#sec-%typedarray%.prototype.fill
>
> JavaScript engines are expected to implement it at some point. For example
> I am implementing this in V8, along with other new typed array methods. The
> engines should be able to generate quite good code for uses of this
> function
> and/or provide optimized versions relying on knowledge of the underlying
> element type of the typed array they are applied to.
>
> Cheers,
>
> --
>  ☺ Adrián
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/1e71bb52/attachment.html>

# Florian Bösch (11 years ago)

On Thu, Oct 30, 2014 at 1:41 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:

Performance optimizatiuon in an implementation issues. That’s where you should apply performance pressure.

That's true, but if there's only bad APIs to do certain tasks, it doesn't help.

On Thu, Oct 30, 2014 at 1:41 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>
wrote:

> Performance optimizatiuon in an implementation issues.  That’s where you
> should apply performance pressure.
>
That's true, but if there's only bad APIs to do certain tasks, it doesn't
help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/051f3502/attachment.html>

# Steve Fink (11 years ago)

On 10/30/2014 06:14 AM, Adrian Perez de Castro wrote:

For this use case there is %TypedArray%.prototype.fill(), see:

people.mozilla.org/~jorendorff/es6-draft.html#sec-%typedarray%.prototype.fill

JavaScript engines are expected to implement it at some point. For example I am implementing this in V8, along with other new typed array methods. The engines should be able to generate quite good code for uses of this function and/or provide optimized versions relying on knowledge of the underlying element type of the typed array they are applied to.

I implemented this for Firefox 2 years ago, but never landed it - bugzilla.mozilla.org/show_bug.cgi?id=730880

Now there is %TypedArray%.prototype.fill. But I've become generally skeptical about it as an answer to performance concerns. I would rather see engines hyperoptimize

for(var i=0; i<size; i++){ someArray[i] = 0; }

based on observed type information. Which is not to say that we wouldn't want to make TA#fill fast too, but the above seems more generally useful.

On a related note, I would like to have some way of getting the OS to decommit memory. See bugzilla.mozilla.org/show_bug.cgi?id=855669 (start reading at about comment 22) for our discussion and attempt at this, which looks like it mysteriously trailed off this last March. Theoretically, the above loop could also trigger a decommit, but I think it's too much to expect the engine to guess when that's going to be a good idea. On the other hand, from a spec POV it's unobservable behavior, which makes it weird.

On 10/30/2014 06:14 AM, Adrian Perez de Castro wrote:
> On Thu, 30 Oct 2014 09:29:36 +0100, Florian Bösch <pyalot at gmail.com> wrote:
>
>> The usecases:
>>
>> [...]
>>
>> *3) Initializing an existing array with a repeated numerical value*
>>
>> For audio processing, physics and a range of other tasks it's important to
>> initialize an array with the same data.
>>
>> for(var i=0; i<size; i++){ someArray[i] = 0; }
> For this use case there is %TypedArray%.prototype.fill(), see:
>
>   http://people.mozilla.org/~jorendorff/es6-draft.html#sec-%typedarray%.prototype.fill
>
> JavaScript engines are expected to implement it at some point. For example
> I am implementing this in V8, along with other new typed array methods. The
> engines should be able to generate quite good code for uses of this function
> and/or provide optimized versions relying on knowledge of the underlying
> element type of the typed array they are applied to.

I implemented this for Firefox 2 years ago, but never landed it -
https://bugzilla.mozilla.org/show_bug.cgi?id=730880

Now there is %TypedArray%.prototype.fill. But I've become generally
skeptical about it as an answer to performance concerns. I would rather
see engines hyperoptimize

  for(var i=0; i<size; i++){ someArray[i] = 0; }

based on observed type information. Which is not to say that we wouldn't
want to make TA#fill fast too, but the above seems more generally useful.

On a related note, I *would* like to have some way of getting the OS to
decommit memory. See https://bugzilla.mozilla.org/show_bug.cgi?id=855669
(start reading at about comment 22) for our discussion and attempt at
this, which looks like it mysteriously trailed off this last March.
Theoretically, the above loop could also trigger a decommit, but I think
it's too much to expect the engine to guess when that's going to be a
good idea. On the other hand, from a spec POV it's unobservable
behavior, which makes it weird.

# Filip Pizlo (11 years ago)

On Oct 30, 2014, at 10:44 AM, Steve Fink <sphink at gmail.com> wrote:

I would rather see engines hyperoptimize
for(var i=0; i<size; i++){ someArray[i] = 0; }
based on observed type information. Which is not to say that we wouldn't want to make TA#fill fast too, but the above seems more generally useful.

I agree with this philosophy. I would just point out that both in C and Java, having some way for the programmer to say “fill” (i.e. memset/bzero in C, and I forget what it is in Java) has survived despite the compilers being super mature, probably because idiom recognition on loops like this is too fallible in the general case. So, I’d like to see .fill() be part of the language.

Anyway, I just filed a bug on our end for this: bugs.webkit.org/show_bug.cgi?id=138218

> On Oct 30, 2014, at 10:44 AM, Steve Fink <sphink at gmail.com> wrote:
> 
> On 10/30/2014 06:14 AM, Adrian Perez de Castro wrote:
>> On Thu, 30 Oct 2014 09:29:36 +0100, Florian Bösch <pyalot at gmail.com> wrote:
>> 
>>> The usecases:
>>> 
>>> [...]
>>> 
>>> *3) Initializing an existing array with a repeated numerical value*
>>> 
>>> For audio processing, physics and a range of other tasks it's important to
>>> initialize an array with the same data.
>>> 
>>> for(var i=0; i<size; i++){ someArray[i] = 0; }
>> For this use case there is %TypedArray%.prototype.fill(), see:
>> 
>>  http://people.mozilla.org/~jorendorff/es6-draft.html#sec-%typedarray%.prototype.fill
>> 
>> JavaScript engines are expected to implement it at some point. For example
>> I am implementing this in V8, along with other new typed array methods. The
>> engines should be able to generate quite good code for uses of this function
>> and/or provide optimized versions relying on knowledge of the underlying
>> element type of the typed array they are applied to.
> 
> I implemented this for Firefox 2 years ago, but never landed it -
> https://bugzilla.mozilla.org/show_bug.cgi?id=730880 <https://bugzilla.mozilla.org/show_bug.cgi?id=730880>
> 
> Now there is %TypedArray%.prototype.fill. But I've become generally
> skeptical about it as an answer to performance concerns. I would rather
> see engines hyperoptimize
> 
>  for(var i=0; i<size; i++){ someArray[i] = 0; }
> 
> based on observed type information. Which is not to say that we wouldn't
> want to make TA#fill fast too, but the above seems more generally useful.

I agree with this philosophy.  I would just point out that both in C and Java, having some way for the programmer to say “fill” (i.e. memset/bzero in C, and I forget what it is in Java) has survived despite the compilers being super mature, probably because idiom recognition on loops like this is too fallible in the general case.  So, I’d like to see .fill() be part of the language.

Anyway, I just filed a bug on our end for this: https://bugs.webkit.org/show_bug.cgi?id=138218

-Filip


> 
> On a related note, I *would* like to have some way of getting the OS to
> decommit memory. See https://bugzilla.mozilla.org/show_bug.cgi?id=855669 <https://bugzilla.mozilla.org/show_bug.cgi?id=855669>
> (start reading at about comment 22) for our discussion and attempt at
> this, which looks like it mysteriously trailed off this last March.
> Theoretically, the above loop could also trigger a decommit, but I think
> it's too much to expect the engine to guess when that's going to be a
> good idea. On the other hand, from a spec POV it's unobservable
> behavior, which makes it weird.
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org <mailto:es-discuss at mozilla.org>
> https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/3d8b71d2/attachment-0001.html>

# Florian Bösch (11 years ago)

On Thu, Oct 30, 2014 at 6:44 PM, Steve Fink <sphink at gmail.com> wrote:

Now there is %TypedArray%.prototype.fill. But I've become generally skeptical about it as an answer to performance concerns. I would rather see engines hyperoptimize

for(var i=0; i<size; i++){ someArray[i] = 0; }

based on observed type information. Which is not to say that we wouldn't want to make TA#fill fast too, but the above seems more generally useful.

While useful, it's not a substitute for a convenient and fast method. Also, I presented severe usuability issues with that approach if you need more than one value (such as computed values). Usability issues which can be resolved by using set + a new list for every assignment, which, unfortunately, is also quite slow. It's slow because say, for a loop that's running a million times, it makes a million lists.

On Thu, Oct 30, 2014 at 6:44 PM, Steve Fink <sphink at gmail.com> wrote:

> Now there is %TypedArray%.prototype.fill. But I've become generally
> skeptical about it as an answer to performance concerns. I would rather
> see engines hyperoptimize
>
>   for(var i=0; i<size; i++){ someArray[i] = 0; }
>
> based on observed type information. Which is not to say that we wouldn't
> want to make TA#fill fast too, but the above seems more generally useful.
>
While useful, it's not a substitute for a convenient and fast method. Also,
I presented severe usuability issues with that approach if you need more
than one value (such as computed values). Usability issues which can be
resolved by using set + a new list for every assignment, which,
unfortunately, is also quite slow. It's slow because say, for a loop that's
running a million times, it makes a million lists.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141030/71718909/attachment.html>

# Katelyn Gadd (11 years ago)

I'd also like to chime in since this is a real problem for performance-sensitive JSIL code: The idea that JS runtimes will just optimize all our performance-sensitive loops is lovely. It also appears to be a fantasy, because we've been writing those loops for years and they're still slow. I did some extensive testing and experimentation with spidermonkey a year or so back, examining how it optimized various forms of a memcpy loop, and the results were pretty dire. V8 didn't seem to fare much better, though it is harder to tell because they don't trivially expose generated native code for functions.

In my testing, even best-case optimized memcpy/memset loops did not produce the kind of efficient code you want. There were always bounds checks and in some cases I saw other overhead. The set/copy definitely weren't vectorized. In the best case I saw bounds checks getting hoisted out of the loop body (great!)

I feel like at this point the claim that we don't need better copy/set APIs is the 'sufficiently smart compiler' trope all over again, though in this case perhaps it is wholly possible and it's just not happening because it isn't important enough to VM implementers.

TypedArray.prototype.fill is perfect for that scenario, so all we need is a copy. It's essential that the copy support the source array being distinct from the destination one (a previous proposal - did it go through? - required the source & destination arrays to be the same, presumably micro-optimizing for emscripten.)

In the short term these new methods could be self-hosted and would at least be no worse than existing hand-authored loops. If the runtime uses the equivalent of Spidermonkey's callsite cloning, the copy/fill loops would be specialized based on element type, which would increase performance. If the VM were to actually generate fully optimal copy/fill code, it would be a sizable improvement. At a bare minimum, having the 'best' copy/fill loop JS for a given runtime in the stdlib is much better than developers having to figure out the best loop type - especially if that type varies across browsers.

The performance of memcpy and memset is extremely important. Off the top of my head, I'm aware of a Google paper where they experimented with using instrumentation and specialized code to improve memcpy performance across some of their key applications and the results were (IMO) quite significant: static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40679.pdf

In some of their applications a significant chunk of time is spent just inside memcpy - 20% or more - with memset and memcmp adding up to a couple more percentage points. Any optimizations to these functions can pay dividends, obviously. For real-world applications being ported to JS, if memcpy and memset are already at a performance disadvantage vs the 'standard' ones used in native C, this will be more dramatic.

I'd also like to chime in since this is a real problem for
performance-sensitive JSIL code:
The idea that JS runtimes will just optimize all our
performance-sensitive loops is lovely. It also appears to be a
fantasy, because we've been writing those loops for years and
they're still slow. I did some extensive testing and experimentation
with spidermonkey a year or so back, examining how it optimized
various forms of a memcpy loop, and the results were pretty dire. V8
didn't seem to fare much better, though it is harder to tell because
they don't trivially expose generated native code for functions.

In my testing, even best-case optimized memcpy/memset loops did not
produce the kind of efficient code you want. There were always bounds
checks and in some cases I saw other overhead. The set/copy definitely
weren't vectorized. In the best case I saw bounds checks getting
hoisted out of the loop body (great!)

I feel like at this point the claim that we don't need better copy/set
APIs is the 'sufficiently smart compiler' trope all over again, though
in this case perhaps it is wholly possible and it's just not happening
because it isn't important enough to VM implementers.

TypedArray.prototype.fill is perfect for that scenario, so all we need
is a copy. It's essential that the copy support the source array being
distinct from the destination one (a previous proposal - did it go
through? - required the source & destination arrays to be the same,
presumably micro-optimizing for emscripten.)

In the short term these new methods could be self-hosted and would at
least be no worse than existing hand-authored loops. If the runtime
uses the equivalent of Spidermonkey's callsite cloning, the copy/fill
loops would be specialized based on element type, which would
*increase* performance. If the VM were to actually generate fully
optimal copy/fill code, it would be a sizable improvement. At a bare
minimum, having the 'best' copy/fill loop JS for a given runtime in
the stdlib is much better than developers having to figure out the
best loop type - especially if that type varies across browsers.

The performance of memcpy and memset is extremely important. Off the
top of my head, I'm aware of a Google paper where they experimented
with using instrumentation and specialized code to improve memcpy
performance across some of their key applications and the results were
(IMO) quite significant:
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40679.pdf

In some of their applications a *significant* chunk of time is spent
just inside memcpy - 20% or more - with memset and memcmp adding up to
a couple more percentage points. Any optimizations to these functions
can pay dividends, obviously. For real-world applications being ported
to JS, if memcpy and memset are already at a performance disadvantage
vs the 'standard' ones used in native C, this will be more dramatic.

Related: https://bugzilla.mozilla.org/show_bug.cgi?id=862249

On 30 October 2014 11:46, Florian Bösch <pyalot at gmail.com> wrote:
> On Thu, Oct 30, 2014 at 6:44 PM, Steve Fink <sphink at gmail.com> wrote:
>>
>> Now there is %TypedArray%.prototype.fill. But I've become generally
>> skeptical about it as an answer to performance concerns. I would rather
>> see engines hyperoptimize
>>
>>   for(var i=0; i<size; i++){ someArray[i] = 0; }
>>
>> based on observed type information. Which is not to say that we wouldn't
>> want to make TA#fill fast too, but the above seems more generally useful.
>
> While useful, it's not a substitute for a convenient and fast method. Also,
> I presented severe usuability issues with that approach if you need more
> than one value (such as computed values). Usability issues which can be
> resolved by using set + a new list for every assignment, which,
> unfortunately, is also quite slow. It's slow because say, for a loop that's
> running a million times, it makes a million lists.
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>

# Andreas Rossberg (11 years ago)

On 31 October 2014 16:40, Katelyn Gadd <kg at luminance.org> wrote:

I'd also like to chime in since this is a real problem for performance-sensitive JSIL code: The idea that JS runtimes will just optimize all our performance-sensitive loops is lovely. It also appears to be a fantasy, because we've been writing those loops for years and they're still slow. I did some extensive testing and experimentation with spidermonkey a year or so back, examining how it optimized various forms of a memcpy loop, and the results were pretty dire. V8 didn't seem to fare much better, though it is harder to tell because they don't trivially expose generated native code for functions.

Hm, just an aside, but I wonder what you mean by "trivially" here. It is only one flag away.

On 31 October 2014 16:40, Katelyn Gadd <kg at luminance.org> wrote:
> I'd also like to chime in since this is a real problem for
> performance-sensitive JSIL code:
> The idea that JS runtimes will just optimize all our
> performance-sensitive loops is lovely. It also appears to be a
> fantasy, because we've been writing those loops for years and
> they're still slow. I did some extensive testing and experimentation
> with spidermonkey a year or so back, examining how it optimized
> various forms of a memcpy loop, and the results were pretty dire. V8
> didn't seem to fare much better, though it is harder to tell because
> they don't trivially expose generated native code for functions.

Hm, just an aside, but I wonder what you mean by "trivially" here. It
is only one flag away.

/Andreas

# Isiah Meadows (11 years ago)

From: Andreas Rossberg <rossberg at google.com>

Hm, just an aside, but I wonder what you mean by "trivially" here. It is only one flag away.

Try node --allow-natives-syntax, and take a look at the runtime-*.cc files at v8/v8-git-mirror/tree/master/src/runtime (grep RUNTIME_FUNCTION)

> From: Andreas Rossberg <rossberg at google.com>
> To: Katelyn Gadd <kg at luminance.org>
> Cc: es-discuss <es-discuss at mozilla.org>
> Date: Mon, 3 Nov 2014 08:54:16 +0100
> Subject: Re: typed array filling convenience AND performance
> On 31 October 2014 16:40, Katelyn Gadd <kg at luminance.org> wrote:
> > I'd also like to chime in since this is a real problem for
> > performance-sensitive JSIL code:
> > The idea that JS runtimes will just optimize all our
> > performance-sensitive loops is lovely. It also appears to be a
> > fantasy, because we've been writing those loops for years and
> > they're still slow. I did some extensive testing and experimentation
> > with spidermonkey a year or so back, examining how it optimized
> > various forms of a memcpy loop, and the results were pretty dire. V8
> > didn't seem to fare much better, though it is harder to tell because
> > they don't trivially expose generated native code for functions.
>
> Hm, just an aside, but I wonder what you mean by "trivially" here. It
> is only one flag away.

Try `node --allow-natives-syntax`, and take a look at the runtime-*.cc
files at https://github.com/v8/v8-git-mirror/tree/master/src/runtime (`grep
RUNTIME_FUNCTION`)

>
> /Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20141103/eab1228a/attachment.html>

# Brendan Eich (11 years ago)

Steve Fink wrote:

On a related note, Iwould like to have some way of getting the OS to decommit memory. Seehttps://bugzilla.mozilla.org/show_bug.cgi?id=855669 (start reading at about comment 22) for our discussion and attempt at this, which looks like it mysteriously trailed off this lastMarch. Theoretically, the above loop could also trigger a decommit, but I think it's too much to expect the engine to guess when that's going to be a good idea. On the other hand, from a spec POV it's unobservable behavior, which makes it weird.

ArrayBuffer.transfer (gist.github.com/andhow/95fb9e49996615764eff) is an ES7 stage 0 proposal, needs to move to stage 1 soon. It enables decommitting memory.

Steve Fink wrote:
> On a related note, I*would*  like to have some way of getting the OS to
> decommit memory. Seehttps://bugzilla.mozilla.org/show_bug.cgi?id=855669
> (start reading at about comment 22) for our discussion and attempt at
> this, which looks like it mysteriously trailed off this lastMarch.
> Theoretically, the above loop could also trigger a decommit, but I think
> it's too much to expect the engine to guess when that's going to be a
> good idea. On the other hand, from a spec POV it's unobservable
> behavior, which makes it weird.

ArrayBuffer.transfer 
(https://gist.github.com/andhow/95fb9e49996615764eff) is an ES7 stage 0 
proposal, needs to move to stage 1 soon. It enables decommitting memory.

/be

# Steve Fink (11 years ago)

On 11/04/2014 11:08 AM, Brendan Eich wrote:

ArrayBuffer.transfer (gist.github.com/andhow/95fb9e49996615764eff) is an ES7 stage 0 proposal, needs to move to stage 1 soon. It enables decommitting memory.

I'm not sure we're talking about the same thing. I'm talking about what would be madvise(MADV_DONTNEED) on POSIX or VirtualAlloc(MEM_RESET) on Windows. Er... actually, I think it would be MEM_RESET followed by MEM_COMMIT to get the zero-filling while still releasing the physical pages.

Unless there's some tricky way of using ArrayBuffer.transfer to signal that memory can be decommitted, but I don't see it.

On 11/04/2014 11:08 AM, Brendan Eich wrote:
> Steve Fink wrote:
>> On a related note, I*would*  like to have some way of getting the OS to
>> decommit memory. Seehttps://bugzilla.mozilla.org/show_bug.cgi?id=855669
>> (start reading at about comment 22) for our discussion and attempt at
>> this, which looks like it mysteriously trailed off this lastMarch.
>> Theoretically, the above loop could also trigger a decommit, but I think
>> it's too much to expect the engine to guess when that's going to be a
>> good idea. On the other hand, from a spec POV it's unobservable
>> behavior, which makes it weird.
>
> ArrayBuffer.transfer
> (https://gist.github.com/andhow/95fb9e49996615764eff) is an ES7 stage
> 0 proposal, needs to move to stage 1 soon. It enables decommitting
> memory.

I'm not sure we're talking about the same thing. I'm talking about what
would be madvise(MADV_DONTNEED) on POSIX or VirtualAlloc(MEM_RESET) on
Windows. Er... actually, I think it would be MEM_RESET followed by
MEM_COMMIT to get the zero-filling while still releasing the physical pages.

Unless there's some tricky way of using ArrayBuffer.transfer to signal
that memory can be decommitted, but I don't see it.

# Brendan Eich (11 years ago)

Steve Fink wrote:

I'm not sure we're talking about the same thing. I'm talking about what would be madvise(MADV_DONTNEED) on POSIX or VirtualAlloc(MEM_RESET) on Windows. Er... actually, I think it would be MEM_RESET followed by MEM_COMMIT to get the zero-filling while still releasing the physical pages.

Luke should weigh in. I had in mind the words "either truncated or zero-extended to be newByteLength" from

gist.github.com/andhow/95fb9e49996615764eff

Steve Fink wrote:
> I'm not sure we're talking about the same thing. I'm talking about what
> would be madvise(MADV_DONTNEED) on POSIX or VirtualAlloc(MEM_RESET) on
> Windows. Er... actually, I think it would be MEM_RESET followed by
> MEM_COMMIT to get the zero-filling while still releasing the physical pages.

Luke should weigh in. I had in mind the words "either truncated or 
zero-extended to be newByteLength" from

https://gist.github.com/andhow/95fb9e49996615764eff

/be

# Luke Wagner (11 years ago)

ArrayBuffer.transfer provides a coarser granularity than the proposed ArrayBuffer.prototype.discard in bug 855669. Page-level madvise(MADV_DONTNEED) is the type of thing you'd want if you were implementing a malloc heap (jemalloc being an example). OTOH, ArrayBuffer.transfer also allows you to immediately release virtual address space, something discard doesn't, so I think these features are complementary.

ArrayBuffer.transfer provides a coarser granularity than the proposed ArrayBuffer.prototype.discard in bug 855669.  Page-level madvise(MADV_DONTNEED) is the type of thing you'd want if you were implementing a malloc heap (jemalloc being an example).  OTOH, ArrayBuffer.transfer also allows you to immediately release virtual address space, something discard doesn't, so I think these features are complementary.

----- Original Message -----
> Steve Fink wrote:
> > I'm not sure we're talking about the same thing. I'm talking about what
> > would be madvise(MADV_DONTNEED) on POSIX or VirtualAlloc(MEM_RESET) on
> > Windows. Er... actually, I think it would be MEM_RESET followed by
> > MEM_COMMIT to get the zero-filling while still releasing the physical
> > pages.
> 
> Luke should weigh in. I had in mind the words "either truncated or
> zero-extended to be newByteLength" from
> 
> https://gist.github.com/andhow/95fb9e49996615764eff
> 
> /be
> 
> 
>

# Brendan Eich (11 years ago)

Luke Wagner wrote:

the proposed ArrayBuffer.prototype.discard

I'll get this on the November Agenda.

Luke Wagner wrote:
> the proposed ArrayBuffer.prototype.discard

I'll get this on the November Agenda.

/be

# Isiah Meadows (11 years ago)

These two methods happen to fix one of the two reasons I haven't really dabbled in handwritten asm.js (the other being the lack of string handling).

> Luke Wagner wrote:
> > the proposed ArrayBuffer.prototype.discard
>
> I'll get this on the November Agenda.
>
> /be

These two methods happen to fix one of the two reasons I haven't
*really* dabbled in handwritten asm.js (the other being the lack of
string handling).

-- 
Isiah Meadows

# Jeff Walden (11 years ago)

On 10/31/2014 08:40 AM, Katelyn Gadd wrote:

In my testing, even best-case optimized memcpy/memset loops did not produce the kind of efficient code you want. There were always bounds checks and in some cases I saw other overhead. The set/copy definitely weren't vectorized. In the best case I saw bounds checks getting hoisted out of the loop body (great!)

It's my impression, across many compilers and languages, that completely correct range analysis (a prerequisite to eliminating many bounds checks) is very, very hard.

When people have used Csmith and similar on gcc and clang, they've found range analysis bugs (or issues in systems producing inputs to range analysis) with ease. These are seasoned compilers where one might think such issues had long since been eliminated.

People find similar issues fuzzing SpiderMonkey. Range analysis has been in SpiderMonkey awhile now, yet the incoming stream of range analysis issues continues unabated. We "trust" range analysis information for DCE purposes and constant folding because a wrong value/behavior is not immediately exploitable. But we've been very hesitant to trust it to eliminate bounds checks. The consequences there are memory corruption and arbitrary code execution. When the range analysis information is known buggy, it'd be folly to trust it to remove bounds checks.

I'd also note C/C++ and similar range analysis code usually has the advantage of working with stable, integer-xor-floating types. JS with its double type requires considerably more care to suss out an implicit integer subtype, when double operations can produce integers, integer operations can produce doubles, and negative zero constantly confounds. Range analysis for those languages is easier than it is for JS, and yet they're still buggy.

I feel like at this point the claim that we don't need better copy/set APIs is the 'sufficiently smart compiler' trope all over again, though in this case perhaps it is wholly possible and it's just not happening because it isn't important enough to VM implementers.

"not important enough" is true, but it requires a big-picture view. Your narrow focus is on typed arrays and native-equivalent code. There's practically a whole web of code dissimilar to that, that also needs optimizing. There are new standard features to implement. There are security issues to fix. There's low-level architecture to get right so that it's possible to build upon basic optimizations and eventually implement bigger ones (such as the vectorization ideas you mention here). We'll get to this eventually, but it has to compete with the other important concerns on the table.

Jeff

P.S. -- For what it's worth, I think we'll care a bit more about this soon. We may need to do bits of this to permit self-hosting of SpiderMonkey's typed array methods without losing performance, and we really want to make that change soon. Don't give up hope!

On 10/31/2014 08:40 AM, Katelyn Gadd wrote:
> In my testing, even best-case optimized memcpy/memset loops did not
> produce the kind of efficient code you want. There were always bounds
> checks and in some cases I saw other overhead. The set/copy definitely
> weren't vectorized. In the best case I saw bounds checks getting
> hoisted out of the loop body (great!)

It's my impression, across many compilers and languages, that completely correct range analysis (a prerequisite to eliminating many bounds checks) is very, very hard.

When people have used Csmith and similar on gcc and clang, they've found range analysis bugs (or issues in systems producing inputs to range analysis) with ease.  These are seasoned compilers where one might think such issues had long since been eliminated.

People find similar issues fuzzing SpiderMonkey.  Range analysis has been in SpiderMonkey awhile now, yet the incoming stream of range analysis issues continues unabated.  We "trust" range analysis information for DCE purposes and constant folding because a wrong value/behavior is not immediately exploitable.  But we've been *very* hesitant to trust it to eliminate bounds checks.  The consequences there are memory corruption and arbitrary code execution.  When the range analysis information is known buggy, it'd be folly to trust it to remove bounds checks.

I'd also note C/C++ and similar range analysis code usually has the advantage of working with stable, integer-xor-floating types.  JS with its double type requires considerably more care to suss out an implicit integer subtype, when double operations can produce integers, integer operations can produce doubles, and negative zero constantly confounds.  Range analysis for those languages is easier than it is for JS, and yet they're still buggy.

> I feel like at this point the claim that we don't need better copy/set
> APIs is the 'sufficiently smart compiler' trope all over again, though
> in this case perhaps it is wholly possible and it's just not happening
> because it isn't important enough to VM implementers.

"not important enough" is true, but it requires a big-picture view.  Your narrow focus is on typed arrays and native-equivalent code.  There's practically a whole web of code dissimilar to that, that also needs optimizing.  There are new standard features to implement.  There are security issues to fix.  There's low-level architecture to get right so that it's possible to build upon basic optimizations and eventually implement bigger ones (such as the vectorization ideas you mention here).  We'll get to this eventually, but it has to compete with the other important concerns on the table.

Jeff

P.S. -- For what it's worth, I think we'll care a bit more about this soon.  We may need to do bits of this to permit self-hosting of SpiderMonkey's typed array methods without losing performance, and we really want to make that change soon.  Don't give up hope!