Yet another attempt at typed JS data

# Andrea Giammarchi (5 years ago)

As more and more often devs are shifting into TS, or would like to have dynamic typed arrays without a fixed length, and as v8 and likely other engines too optimize a lot for same-kind arrays, I wonder what people here think about this TypedArray rough proposal:

Signature:

new TypedArray(
  type,   // either a "type" to infer, or a shape, or a Class
  length  // optional: if defined it's fixed size
);

About the type

  • if the inferred type is object:
    • if the object is null, throw new InvalidType
    • if the object is a literal, or its __proto__ is either null or `Object.prototype, enforce own properties and respective types
    • if the object is not literal, fallback to object Class (/./ is RegExp, () => {} is Function, etc)
  • if the inferred type is function, fallback to Class
  • if the inferred type is Class, ensure each entry is an instance of such class, or throw InvalidType
  • in every other case, make a typed collection of number, string, boolean, or symbol

About the length

  • if provided, the collection has a fixed, immutable, length, and each initial entry a default value

Use Cases:

const numbers = new TypedArray(0, 5);
// creates [0, 0, 0, 0, 0]
// numbers length is immutable
// numbers accepts only typeof number
// out of bound entries throw

let numbers = new TypedArray(0);
// numbers length is dynamic
// numbers accepts only typeof number
// out of bound entries throw

const shapes = new TypedArray({name: '', age: 0}, 3);
// creates [{name: '', age: 0}, {name: '', age: 0}, {name: '', age: 0}]
// shapes length is immutable
// shapes accepts only objects with `name` and `age` and respective types
// out of bound entries throw

let shapes = new TypedArray({lat: 0, long: 0});
// shapes length is dynamic
// shapes accepts only objects with `lat` and `long` and respective types
// out of bound entries throw

const classes = new TypedArray(HTMLElement, 2);
// creates [Object.create(HTMLElement.prototype),
Object.create(HTMLElement.prototype)]
// classes length is immutable
// classes accepts only instances of HTMLElement
// out of bound entries throw

let classes = new TypedArray(HTMLElement);
// classes length is dynamic
// classes accepts only instances of HTMLElement
// out of bound entries throw

// more options
let strings = new TypedArray('');
let booleans = new TypedArray(true);
let functions = new TypedArray(Function);
let coords = new TypedArray({lat: 0, long: 0});

Thanks in advance for eventual thoughts on this 👋

# Jordan Harband (5 years ago)

the "type" in Typed Arrays refers to the bit size in memory; being able to pass an arbitrary value seems like it would require implementations to calculate the precise (not just the maximum) memory size a single item requires.

# Andrea Giammarchi (5 years ago)

well, this proposal has nothing to do with Typed Arrays, as these are all fixed size indeed ... scratch the TypedArray name, and use ArrayKind instead, any better outcome?

# Scott Rudiger (5 years ago)

StructuredArray or StructArray for short? Just throwing it out there.

# Andrea Giammarchi (5 years ago)

Any name would do, I'm rather interested in the proposal itself, and its ergonomics/logic ;-)

# Bruno Macabeus (5 years ago)

I really don't think that it could be so much useful to optimize, because it could be used only on a specific part of the code, for example:

const foo = (anyType) => {
  // we can't optimize with typed array here
}

const bar = () => {
  const numbers = new TypedArray(0, 5)

  // we could optimize numbers array in this scope

  foo(numbers)
}

Since this limitations, I don't think that it'll be so much useful, and I think that moderns virtual machines already did these optimisations despite there isn't a TypedArray like that.

# Andrea Giammarchi (5 years ago)

having to retroactively add checks like...

we already have typed arrays in JS so I don't think this would be any different

I think that moderns virtual machines already did these optimisations

despite there isn't a TypedArray like that.

It's a bit of a mess to create an Array that is not holed and gets best optimizations [1], and this proposal would like to address that exact case.

[1] v8.dev/blog/elements

# kai zhu (5 years ago)

It's a bit of a mess to create an Array that is not holed and gets best

optimizations [1], and this proposal would like to address that exact case.

could the performance issue be resolved more easily with a simple static-function Array.repeat(<length>, <repeater>)?

let structuredList;
structuredList = Array.repeat(4, function (ii) {
    return {
        index: 2 * ii + 1,
        tags: []
});
/*
structuredList = [
    { index: 1, tags: [] },
    { index: 3, tags: [] },
    { index: 5, tags: [] },
    { index: 7, tags: [] }
];
 */

the only time i can practically enforce the shape of a "StructuredArray" is during element-insertion, and a userland insertion/creation function would be just as effective as a StructuredArray constructor.

enforcing shapes during element deletions and updates are going to be hard and likely just as confusing with StructuredArray as they are with regular Array.

also note that most javascript arrays need to be easily JSON-serialized for message-passing over-the-wire (commonly http) to external systems.

# Jordan Harband (5 years ago)

That already exists - Array.from({ length: 4 }, () => whatever) - I

assume that the hope is to have an array where it is impossible for it to have the wrong "kind" of data, and a userland factory function wouldn't provide that.

# Andrea Giammarchi (5 years ago)

Unfortunately, Array.from({ length: 4 }, () => whatever) produces a holey

array, so that the .repeat(...) idea, if capable of packing elements in a better way, wouldn't be so terrible, as simplification.

Although, the intent of this proposal was to also grant "shapes" or kindness of each entry, same way typed Arrays do, but maybe that would require some better primitive, as in const Shape = Object.defineShape(...) and Object.createShape(Shape) or similar.

# kai zhu (5 years ago)

if you really care about performance AND structured-data, perhaps javascript is not the best tool for the job.

if you goto the wasm-sqlite3 demo @ kaizhu256.github.io/demo-sqljs-csv you can paste the following code into browser's dev-console to ingest a million-row csv and perform queries on it:

(async function () {
    "use strict";
    let csv;
    let ii;
    let randomSelect;
    let result;
    randomSelect = function (list) {
    /*
     * this function will select a random element from list
     */
        return list[Math.floor(list.length * Math.random())];
    };
    csv = "rowid,name,address,random\n";
    ii = 0;
    while (ii < 1000000) {
        csv += (
            ii + 1 + ","
            + randomSelect([
                "Bob", "Jane", "John"
            ]) + " " + randomSelect([
                "Doe", "Smith", "Williams"
            ]) + ","
            + "\"1234 Main St., Los Angeles, CA 90001\","
            + Math.random() + "\n"
        );
        ii += 1;
    }
    console.error(csv.slice(0, 1000));
    // rowid,name,address,random
    // 1,Jane Doe,"1234 Main St., Los Angeles, CA 90001",0.8783498663648375
    // 2,Bob Williams,"1234 Main St., Los Angeles, CA
90001",0.22973214766766303
    // 3,John Doe,"1234 Main St., Los Angeles, CA 90001",0.8658095647533652
    // 4,Jane Smith,"1234 Main St., Los Angeles, CA
90001",0.27730496836028085
    // ...
    // 1000000,Jane Williams,"1234 Main St., Los Angeles, CA
90001",0.43105992922801883
    await window.sqljsTableImport({
        csv,
        tableName: "table1"
    });
    // sqljsTableImport - 945 ms - inserted 12,572 rows - 1 MB
    // sqljsTableImport - 1163 ms - inserted 25,017 rows - 2 MB
    // ...
    // sqljsTableImport - 6242 ms - inserted 997,423 rows - 81 MB
    // sqljsTableImport - 6252 ms - inserted 1,000,000 rows - 81 MB
    result = await window.sqljsExec(
        "SELECT * FROM table1 WHERE\n"
        + "name LIKE 'John'\n"
        + "AND random > 0.5\n"
        + "ORDER BY random DESC\n"
        + "LIMIT 1000;"
    );
    console.error(result.results[0].values);
    // ["961621", "John Doe", "1234 Main St., Los Angeles, CA 90001",
"0.999 ...
    // ["51800", "John  Williams  ", "1234 Main St., Los Angeles, CA
90001", "0.999 ...
    // ["241184", "John  Smith  ", "1234 Main St., Los Angeles, CA 90001",
"0.999 ...
    // ["591592", "John  Williams  ", "1234 Main St., Los Angeles, CA
90001", "0.999 ...
    // ["32403", "John Doe", "1234 Main St., Los Angeles, CA 90001", "0.999
...
    // ["847237", "John  Smith  ", "1234 Main St., Los Angeles, CA 90001",
"0.999 ...
    // ["23195", "John Doe", "1234 Main St., Los Angeles, CA 90001", "0.999
...
    // ["136423", "John  Smith  ", "1234 Main St., Los Angeles, CA 90001",
"0.999 ...
}());
# Bergi (5 years ago)

Hello!

Unfortunately, Array.from({ length: 4 }, () => whatever) produces a holey array

Does it? But really, if the performance difference betweeen HOLEY and PACKED arrays were large enough to be relevant[1], the engine programmers would certainly already have optimised all those trivial cases where an array is filled gradually to produce the more efficient representation.

kind , Bergi

[1]: it probably isn't: stackoverflow.com/questions/54481918/#comment95848513_54485509

# Andrea Giammarchi (5 years ago)

They couldn't even optimize the most obvious case of them all: bugs.chromium.org/p/v8/issues/detail?id=6892

And "javascript is not the best tool for the job" makes no sense when it's the most targeted transpiled language, but fair enough.

And yes, I've used SQLite wasm version too ... as a matter of fact, it's going to be a great lazy-loaded thing for my next project, 'cause it's 1MB overhead, so not something to really promote in the wild, imho 😅

.

# Michael Haufe (5 years ago)

Array(3) // [empty × 3]

Array(3).fill() // [undefined, undefined, undefined]

Array(3).fill('whatever') // ["whatever", "whatever", "whatever"]

# Andrea Giammarchi (5 years ago)

Great, now maybe you also read how it works behind the scene, and debug properly to understand that every array is holey, including the latter one, to date.

v8.dev/blog/elements-kinds

Please, let's assume for a second I knew what I was talking about, when I've said it's a mess to not have holey arrays, thanks.

# Michael Haufe (5 years ago)

Given your history I know better than to assume what you know…

The definition of sparse in the spec (while not explicitly in its own section) is straightforward.

V8’s inability or unwillingness to perform a safe “upcast” internally to an appropriate tag doesn’t seem to provide enough weight to introduce a new construct.

From: Andrea Giammarchi <andrea.giammarchi at gmail.com>

Sent: Monday, February 10, 2020 2:26 PM To: Michael Haufe <tno at thenewobjective.com>

Cc: Bergi <a.d.bergi at web.de>; es-discuss at mozilla.org

Subject: Re: Yet another attempt at typed JS data

Great, now maybe you also read how it works behind the scene, and debug properly to understand that every array is holey, including the latter one, to date.

v8.dev/blog/elements-kinds

Please, let's assume for a second I knew what I was talking about, when I've said it's a mess to not have holey arrays, thanks.

On Mon, Feb 10, 2020 at 9:21 PM Michael Haufe <tno at thenewobjective.com<mailto:tno at thenewobjective.com>> wrote:

Array(3) // [empty × 3]

Array(3).fill() // [undefined, undefined, undefined]

Array(3).fill('whatever') // ["whatever", "whatever", "whatever"]

# Andrea Giammarchi (5 years ago)

Given your history I know better than to assume what you know…

I've no idea what you are talking about, but this should be no venue for these kind of answers.

My history in this thread explained the proposal, the intent, and linked all the facts around it, and before your pointless answer, so please keep your biases for yourself.

Thank you.

# kai zhu (5 years ago)

And yes, I've used SQLite wasm version too ... as a matter of fact, it's

going to be a great lazy-loaded thing for my next project, 'cause it's 1MB overhead, so not something to really promote in the wild, imho 😅

sqlite's homepage claims: "our best guess is that SQLite is the second mostly widely deployed software library, after libz" [1]. whether true or not, i think we can agree its an ubiquitous (and hopefully well-understood) piece of software library.

i know its not a tc39 thing, but perhaps some of its members who are also implementers would consider making wasm-sqlite3 a 3rd-party browser "builtin" (like libz/ffmpeg/libpng, but as sandboxed-wasm exposed to userland) to improve its loading-performance.

-kai

[1] Most Widely Deployed and Used Database Engine www.sqlite.org/mostdeployed.html

# Isiah Meadows (5 years ago)

You have proof of this? That it doesn't produce a dense array in engines?

# Isiah Meadows (5 years ago)

Edit: never mind. In either case, it's an engine problem, not a spec problem.

I wouldn't mind a couple things that could desugar to that making it in (like Array.range), but those all have something special the engine can do to fill it in less than half the time provided appropriate ISA support.

# Andrea Giammarchi (5 years ago)

So ...

On Tue, Feb 11, 2020 at 5:44 PM Isiah Meadows <contact at isiahmeadows.com>

wrote:

You have proof of this? That it doesn't produce a dense array in engines?

Yes, I have d8 traces, after long discussions in twitter too, that shows there's no way to have PACKED_SMI_ELEMENTS, but also a bug in Chromium from 2017, but if you need any other proof, I can figure out which other engine has the same issue, although I am not sure all engines follow same optimizations with the "packed" vs "holey" v8 concept.

# Jordan Harband (5 years ago)

No, Array.from never produces a holey array whatsoever; only ever a dense array.

# Andrea Giammarchi (5 years ago)

Which part of HOLEY_SMI_ELEMENTS makes you think so?

V8 version 8.2.34 d8> %DebugPrint(Array.from({length: 2}, (_, i) => i))

DebugPrint: 0x5ab080c5ded: [JSArray]

  • map: 0x05ab08281869 <Map(HOLEY_SMI_ELEMENTS)> [FastProperties]
  • prototype: 0x05ab08248f7d <JSArray[0]>
  • elements: 0x05ab080c5dfd <FixedArray[2]> [HOLEY_SMI_ELEMENTS]
  • length: 2
  • properties: 0x05ab080406e9 <FixedArray[0]> { #length: 0x05ab081c0165 <AccessorInfo> (const accessor descriptor) }
  • elements: 0x05ab080c5dfd <FixedArray[2]> { 0: 0 1: 1 } 0x5ab08281869: [Map]
  • type: JS_ARRAY_TYPE
  • instance size: 16
  • inobject properties: 0
  • elements kind: HOLEY_SMI_ELEMENTS
  • unused property fields: 0
  • enum length: invalid
  • back pointer: 0x05ab082817f1 <Map(PACKED_SMI_ELEMENTS)>
  • prototype_validity cell: 0x05ab081c0451 <Cell value= 1>
  • instance descriptors #1: 0x05ab08249605 <DescriptorArray[1]>
  • transitions #1: 0x05ab08249639 <TransitionArray[4]>Transition array #1: 0x05ab08042e91 <Symbol: (elements_transition_symbol)>: (transition to

PACKED_DOUBLE_ELEMENTS) -> 0x05ab08281891 <Map(PACKED_DOUBLE_ELEMENTS)>

  • prototype: 0x05ab08248f7d <JSArray[0]>
  • constructor: 0x05ab08248e51 <JSFunction Array (sfi = 0x5ab081cbf85)>
  • dependent code: 0x05ab080401ed <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
  • construction counter: 0

[0, 1]

# Claude Pache (5 years ago)

At some point in the algorithm of Array.from, the newly-created array will indeed be created as sparse (at steps 9/10), but this is usually not observable (that is, unless you are creating an instance of a subclass of Array with very unusual behaviour) as far as the spec is concerned (optimisation is not part of the spec).

If it is important for you to circumvent the v8 limitation in its optimisation process, you should be able to write Array.from(Array(2), _ => undefined): the array will be created as non-sparse at steps 5.a/b.

In any case, this optimisation detail is by very far not a satisfactory argument for introducing the heavy feature you proposed. It could maybe be a conceivable argument for introducing Array.createAsNonSparse(2).

# Andrea Giammarchi (5 years ago)

That indeed worked:

V8 version 8.2.34
d8> %DebugPrint(Array.from(Array(2), (_, i) => i));

DebugPrint: 0x1c67080c5ddd: [JSArray]
 - map: 0x1c67082817f1 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]
 - prototype: 0x1c6708248f7d <JSArray[0]>
 - elements: 0x1c67080c5ded <FixedArray[4]> [PACKED_SMI_ELEMENTS]
 - length: 2
 - properties: 0x1c67080406e9 <FixedArray[0]> {
    #length: 0x1c67081c0165 <AccessorInfo> (const accessor descriptor)
 }
 - elements: 0x1c67080c5ded <FixedArray[4]> {
           0: 0
           1: 1
         2-3: 0x1c6708040385 <the_hole>
 }
0x1c67082817f1: [Map]
 - type: JS_ARRAY_TYPE
 - instance size: 16
 - inobject properties: 0
 - elements kind: PACKED_SMI_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - back pointer: 0x1c670804030d <undefined>
 - prototype_validity cell: 0x1c67081c0451 <Cell value= 1>
 - instance descriptors #1: 0x1c6708249605 <DescriptorArray[1]>
 - transitions #1: 0x1c6708249621 <TransitionArray[4]>Transition array #1:
     0x1c6708042e91 <Symbol: (elements_transition_symbol)>: (transition to

HOLEY_SMI_ELEMENTS) -> 0x1c6708281869 <Map(HOLEY_SMI_ELEMENTS)>

 - prototype: 0x1c6708248f7d <JSArray[0]>
 - constructor: 0x1c6708248e51 <JSFunction Array (sfi = 0x1c67081cbf85)>
 - dependent code: 0x1c67080401ed <Other heap object
(WEAK_FIXED_ARRAY_TYPE)>
 - construction counter: 0

[0, 1]

It could maybe be a conceivable argument for introducing Array.createAsNonSparse(2).

My proposal basically boils down, if we don't like the TypedArray concept, to this:

Array.createAsNonSparse(
  length, // pre-defined length of the Array
  optionalValueOrCallback // values per each entry
);

Array.createAsNonSparse(3, 0); // [0, 0, 0] non sparsed
Array.createAsNonSparse(3, i => i * i); // [0, 1, 4] non sparsed

Would this makes more sense? It'd be super handy in many cases, as many developers don't even know what is a holey/sparsed Array, neither how to not create one.

If the second argument is too ambiguous, I wouldn't mind having it as function all the time:

Array.createAsNonSparse(3); // [undefined, undefined, undefined] non sparsed
Array.createAsNonSparse(3, () => 0); // [0, 0, 0] non sparsed

Array.createAsNonSparse(3, i => i * i); // [0, 1, 4] non sparsed

Thanks.