Proposal: Array.prototype.shuffle (Fisher-Yates)

# John Terenzio (5 years ago)

Many scripting languages, such as Ruby, PHP, and Python (via the random module), have a built-in for properly shuffling an array. I propose adding Array.prototype.shuffle to the ES spec. I think it should act like Array.prototype.reverse, shuffling the array in place and returning a reference to it. It should use the Fisher-Yates algorithm. See en.wikipedia.org/wiki/Fisher–Yates_shuffle

Since this functionality is missing, some less-seasoned developers fall into the trap of using the "sort by random" method, which produces bias. See sroucheray.org/blog/2009/11/array-sort-should-not-be-used-to-shuffle-an-array

In a perfect world, this would use a better random number generator than the existing Math.random, but for now an implementation like the one below would still be a vast improvement over having to implement manually and possibly incorrectly. See bugzilla.mozilla.org/show_bug.cgi?id=322529

Example implementation: gist.github.com/jterenzio/28bb972f3abb7618b14b

I don't think adding this to the spec would cause many compatibility issues and the implementation should be very straightforward. Let me know what you think!

Best, John

# Axel Rauschmayer (5 years ago)

If it was up to me, I’d add most if not all Underscore.js functions to the standard library (possibly via a module). The one function I most often miss is _.range() [1], which can be used for all kinds of things, including for filling an array with values:

Array.range(5).map(() => 'x') [ 'x', 'x', 'x', 'x', 'x' ]

A more principled approach to determine candidates for ES6 inclusion: search large code bases that depend on Underscore and determine how much each of its functions is actually used.

[1] underscorejs.org/#range

# Alexander Lichter (6 months ago)

Hey all!

Though this proposal was created more than five years ago, it's still something that should be standardized.

As the algorithm is known and an implementation wouldn't take ages, bringing in Array.prototype.shuffle is a low-hanging fruit.

Furthermore, it helps to provide a "real" random shuffle mechanism, keeping people away from trying to do implement their own (biased or wrong) algorithm (see stackoverflow.com/a/18650169/3975480 for example).

# Naveen Chawla (6 months ago)

I don't think there's such a thing as "real random" in digital algos, just "pseudo random".

Apart from card games, what's the use case?

(I'm not in TC39 so I can't vote on this)

I think that this comes after toObject(keyCallback), takeWhile, skipWhile, and forEachReturningOriginalArray in terms of usefulness! (and maybe even some others)

# Isiah Meadows (6 months ago)

I think this would be better suited for a library function rather than a language feature. I could see this also being useful also for randomized displays, but that's about it. And I'm not sure what an engine could provide here that a library couldn't - you can't really get much faster than what's in the language (minus bounds checking, but the likely frequent cache misses will eclipse that greatly), and it's not unlocking any real new possibilities.

(Not in TC39, either, BTW.)

Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Tab Atkins Jr. (6 months ago)

On Sun, Apr 29, 2018 at 10:01 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:

I think this would be better suited for a library function rather than a language feature. I could see this also being useful also for randomized displays, but that's about it. And I'm not sure what an engine could provide here that a library couldn't - you can't really get much faster than what's in the language (minus bounds checking, but the likely frequent cache misses will eclipse that greatly), and it's not unlocking any real new possibilities.

Yes, this is very easily doable in userland, and a built-in will have little if any speed benefit.

The point, tho, is that this is (a) a relatively common operation, and (b) very, very easy to do wrong, as demonstrated by the SO link. Thus it's a reasonable candidate for inclusion as a built-in anyway, to avoid the footgun of trying to do it yourself and screwing it up.

# Alexander Lichter (6 months ago)

On 29.04.2018 18:34, Naveen Chawla wrote:

I don't think there's such a thing as "real random" in digital algos, just "pseudo random".

You are right. I meant 'unbiased' pseudo randomness here.

Apart from card games, what's the use case?

There are a lot of potential use cases. The best that comes into my mind is sampling test data.

On 29.04.2018 19:01, Isiah Meadows wrote:

I think this would be better suited for a library function rather than a language feature. I could see this also being useful also for randomized displays, but that's about it. And I'm not sure what an engine could provide here that a library couldn't - you can't really get much faster than what's in the language (minus bounds checking, but the likely frequent cache misses will eclipse that greatly), and it's not unlocking any real new possibilities.

As Tab Atkins Jr. already pointed out it's not about performance benefits. It's about how error-prone custom shuffle implementations are/can be.

# Isiah Meadows (6 months ago)

BTW, I added this to my list of various proposed array additions (as a weak one) 1.

I did do a little reading up and found that in general, there's a major glitch that makes shuffling very hard to get right (issues even Underscore's _.shuffle doesn't address), specifically that of the size of the random number generator vs how many permutations it can hit. Specifically, if you want any properly unbiased shuffles covering all permutations of arrays larger than 34 entries (and that's not a lot), you can't use any major engines' Math.random (which typically have a max seed+period of 128 bits). You have to roll your own implementation, and you need at least something like xorshift1024* 2 (1024-bit max seed/period size) for up to 170 entries, MT19937 3/WELL19937 4 (seed/period up to 2^19937-1) for up to 2080 entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the minimum seed/period size in bits grows roughly ceil(log2(fact(N))) where N is the length of the array 5, and the only way you can guarantee you can even generate all possible permutations of all arrays (ignoring potential bias) is through a true hardware generator (with its potentially infinite number of possibilities). Also, another concern is that the loop's bottleneck is specifically the random number generation, so you can't get too slow without resulting in a very slow shuffle.

If you want anything minimally biased and much larger than that, you'll need a cryptographically secure pseudorandom number generator just to cover the possible states. (This is about where the intersection meets between basic statistics and cryptography, since cipher blocks are frequently that long.) But I'm leaving that as out of scope of that proposal.


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# J Decker (6 months ago)

I can see that's certainly something that can be gotten wrong; the in-place sort is kind-of nice; but you end up hitting most numbers twice, and with a longer array (say a set of 75 bingo balls) you can move the same number multiple times.

which gives numbers at the end a higher chance of being at the start when all is done; and almost guaranteed to not be at the end. I use sha2 as a stream of bits; and when it runs out of bits, invokes a callback to get more salt ( that is optional, although adding Date.now() is sufficient to increase the overall entropy slightly ). This can also be used as a generator procedural content generation, since you can specify the initial salt and subsequent salts, or even copy the state, and fork two streams using slightly different progressive entropy. d3x0r/-/blob/master/org.d3x0r.common salty_random_generator.js

And then to shuffle, for each number in the array, assign a random number; hang in a binary tree from least to most; then gather the tree back out. requires additional memory approximately the 4xsize of the original array tough temporarily. (don't seem to have a JS version, but the C version ports pretty easily) d3x0r/SACK/blob/master/src/utils/cardodds/shuffle.c

This allows any number to result in any position equally. The SHA2 hash generates very good random number characteristics; evaluating with diehard tests; chi-square etc. Allowing more salt to be introduced periodically destroys any overall period (even if it is in the realm of 256 bits).

if you were shuffling a deck of cards (52) you only need random values that are 6-7 bits at a time...

It is slower than say www.pcg-random.org/using-pcg.html or mersenne, both of which I found to be very poor. pcg generates zeros like 70% of the time.

While a good shuffle should be able to start from the initial state and generate a good shuffled result, it is slightly better to progressively shuffle the previous result array into new positions than to start from an initial state and compute a 1-off.

# Tab Atkins Jr. (6 months ago)

On Sun, Apr 29, 2018 at 4:31 PM, J Decker <d3ck0r at gmail.com> wrote:

I can see that's certainly something that can be gotten wrong; the in-place sort is kind-of nice; but you end up hitting most numbers twice, and with a longer array (say a set of 75 bingo balls) you can move the same number multiple times. which gives numbers at the end a higher chance of being at the start when all is done; and almost guaranteed to not be at the end.

This is true of a "naive" in-place shuffle, which draws the "item to be placed at the end" from the full array, rather than from just the unshuffled portion. But the proper Fisher-Yates algorithm does not have this bias, and gives a proper unbiased sort, as shown by a multitude of links across the web, many of which can be found in the SO answers linked from earlier in this thread.

And then to shuffle, for each number in the array, assign a random number; hang in a binary tree from least to most; then gather the tree back out.

While this shuffling method (assign a random number to each value, then sort by that) also works and is unbiased, it takes O(nlogn) time, and may, as you note for your implementation, have significant space costs as well. Fisher-Yates is a faster (linear-time, bounded by the cost of generating N random numbers) and more space-efficient unbiased algorithm.

# Isiah Meadows (6 months ago)

Replying inline, so I can target specific points better.


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

On Sun, Apr 29, 2018 at 7:31 PM, J Decker <d3ck0r at gmail.com> wrote:

I can see that's certainly something that can be gotten wrong; the in-place sort is kind-of nice; but you end up hitting most numbers twice, and with a longer array (say a set of 75 bingo balls) you can move the same number multiple times.

I specifically mentioned this (the easy to screw up) in the link I created and posted here.

which gives numbers at the end a higher chance of being at the start when all is done; and almost guaranteed to not be at the end. I use sha2 as a stream of bits; and when it runs out of bits, invokes a callback to get more salt ( that is optional, although adding Date.now() is sufficient to increase the overall entropy slightly ). This can also be used as a generator procedural content generation, since you can specify the initial salt and subsequent salts, or even copy the state, and fork two streams using slightly different progressive entropy. d3x0r/-/blob/master/org.d3x0r.common/salty_random_generator.js

And then to shuffle, for each number in the array, assign a random number; hang in a binary tree from least to most; then gather the tree back out. requires additional memory approximately the 4xsize of the original array tough temporarily. (don't seem to have a JS version, but the C version ports pretty easily) d3x0r/SACK/blob/master/src/utils/cardodds/shuffle.c

For an in-place shuffle, that's still not especially efficient. It's still taking O(n log n) stack space + global space + time, even though it doesn't require real heap allocation.

This allows any number to result in any position equally. The SHA2 hash generates very good random number characteristics; evaluating with diehard tests; chi-square etc. Allowing more salt to be introduced periodically destroys any overall period (even if it is in the realm of 256 bits).

Basing it on a cryptographic primitive is what I was referring to for larger arrays. You need one that powerful for large arrays if you hope to maintain any sort of statistical soundness. However, for small arrays (like a "deck" of cards), it is likely overkill.

if you were shuffling a deck of cards (52) you only need random values that are 6-7 bits at a time...

It is slower than say www.pcg-random.org/using-pcg.html or mersenne, both of which I found to be very poor. pcg generates zeros like 70% of the time.

If it's generating zeros that frequently, don't use it. The website itself looks a bit on the sketchy side to me, but that's just me.* A Mersenne Twister with a large state shouldn't give you too much trouble for simple stuff, and if you take the SFMT variant 1, you shouldn't have speed issues too much. Also, note that I did specify a higher-quality iteration on that, WELL 2, which is similar, but improves on both the statistical properties and speed of the traditional Mersenne Twister.

* If you're not cryptographically secure, don't compare yourself to ones that are, especially trying to make yourself look better than them. Also, it missed a few other high quality generators (like the xorshift128 family and xorshiro128+, two of the most efficient, effective fast PRNGs out there) in its online comparison.

While a good shuffle should be able to start from the initial state and generate a good shuffled result, it is slightly better to progressively shuffle the previous result array into new positions than to start from an initial state and compute a 1-off.

Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as the RNG that powers it.)

# J Decker (6 months ago)

While a good shuffle should be able to start from the initial state and generate a good shuffled result, it is slightly better to progressively shuffle the previous result array into new positions than to start from an initial state and compute a 1-off.

Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as the RNG that powers it.)

Specifically re-1-off vs progressive shuffling? Not on-hand; it was an evaluation I did 10 years ago; and while the difference wasn't that much, it was notable. the rate for any single ball to be in a specific position took longer (by iteration count) when resetting the array to shuffle to initial conditions (1-N)

Fisher-Yates. is biased by the usage of the number generator; not just the generator. I've been considering how to graphically/visually demonstrate it; but I'v failed so far.

for a list of 10 numbers; and a 'perfect' RNG. For the last number, it's 90% guaranteed to evaluate it's shuffle twice. (10% it won't move) It has a 10% chance to be in the last position, but only a 1.1% chance to be in the second to last position. (1/10 * 1/9 = 1/90 = 0.011...) so it will NOT be the last position 90% but it will NOT be the second to last 98.9% of the time. For an ideal shuffle, every original position will end up in every other position 10% of the time.

# Isiah Meadows (6 months ago)

If you can come up with something better than what I have, feel free to file an issue against my repo. (I have minimal need, but I'd love to have something that works.)

isiahmeadows/array-additions-proposal/issues/new


Just an item of note about algorithm choice: I strongly doubt TC39 reps would be okay with a cryptographic primitive being used in the spec, since it's pretty useless elsewhere, and I'm not sure about the legal aspect (export restrictions, etc.).