Idiomatic representation of { buffer, bytesRead }

# Domenic Denicola (10 years ago)

While working on lower-level byte streams we're encountering a number of situations that need to return something along the lines of { buffer, bytesRead }. (In this setting "buffer" = ArrayBuffer.) In the most general form the signature ends up being something like

{ sourceBuffer, offset, bytesDesired } -> { newBuffer, bytesRead }

where sourceBuffer gets detached, then (in some other thread, most likely) up to bytesDesired bytes get written in at position offset to the backing memory, then the backing memory gets transferred to newBuffer, and bytesRead tells you how many bytes were actually read into the buffer.

I was hoping to get opinions on the most idiomatic way to represent this type in JavaScript. So far I can think of a few options:

  1. Just an object literal, probably with names { transferred, bytesRead }
  2. A Uint8Array view onto the new buffer, starting at offset and extending bytesRead. I.e., return new Uint8Array(result.transferred, input.offset, result.bytesRead);.
  3. A DataView view onto the buffer, similar to 2.
  4. An ArrayBuffer with an additional property added!? I.e. result.transferred.bytesRead = result.bytesRead; return result.transferred;

1 is unambiguous, but a bit awkward, and in general does not compose well if we try to make byte streams a special case of more general streams (which is a goal).

Both 2 and 3 are essentially attempting to smuggle the two pieces of information into one object. 2 takes the "byte" idea literally, whereas 3 uses DataView since it feels more "agnostic." In both cases you can access the underlying buffer using view.buffer so no generality is lost. I would be especially interested in peoples' opinions on 2 vs 3.

4 I just thought up while composing this email and is probably not such a great idea. But, I think it does work.

I wrote up a specialized version of 2 in some detail under a number of different scenarios at: gist.github.com/domenic/65921459ef7a31ec2839. Of particular interest might be gist.github.com/domenic/65921459ef7a31ec2839#a-two-buffer-pool-for-a-file-stream

# Jason Orendorff (10 years ago)

On Mon, Mar 2, 2015 at 3:45 PM, Domenic Denicola <d at domenic.me> wrote:

While working on lower-level byte streams we're encountering a number of situations that need to return something along the lines of { buffer, bytesRead }. (In this setting "buffer" = ArrayBuffer.) In the most general form the signature ends up being something like

{ sourceBuffer, offset, bytesDesired } -> { newBuffer, bytesRead }

I very much like 2 and 3 because they provide the result type that the user wants anyway. Slightly prefer DataView.

But you can support both, like this:

pull(DataView) -> Promise<DataView>
pull(TypedArrayView) -> Promise<TypedArrayView of the same type>

A view argument conveniently provides just the three pieces of information you need, plus a type.

The lower-level primitive could take an optional fourth argument:

pull(sourceBuffer, offset, bytesDesired,

resultConstructor=DataView) -> Promise<resultConstructor>

This could even be generic in resultConstructor, though it's a little awkward because you have to divide by resultConstructor.BYTES_PER_ELEMENT before invoking the constructor.

# Domenic Denicola (10 years ago)

Thanks very much for the feedback Jason!

But you can support both, like this:

pull(DataView) -> Promise<DataView>
pull(TypedArrayView) -> Promise<TypedArrayView of the same type>

A view argument conveniently provides just the three pieces of information you need, plus a type.

I thought of that. However, I found it a bit strange that passing this function a view onto bytes [256, 512] of a 1024-byte buffer would detach the entire 1024-byte buffer. What do you think?

# Jason Orendorff (10 years ago)

On Wed, Mar 4, 2015 at 3:06 PM, Domenic Denicola <d at domenic.me> wrote:

I thought of that. However, I found it a bit strange that passing this function a view onto bytes [256, 512] of a 1024-byte buffer would detach the entire 1024-byte buffer. What do you think?

It's a good point. I figured most callers will have allocated the buffer themselves, most will only have one view into it at a time, and most will ask for the whole thing to be filled; and by "most" I really mean somewhere over 99.9%. Skimming your gist seemed to sort of confirm my hunch, but don't take my word for it -- that's all the research I did. All I know is, I've known about Python's file.readinto() method for at least 15 years and never yet had a need for it.

Would it make it seem less strange if you specified the argument as a "dictionary" with these properties:

{buffer:, byteOffset:, byteLength:, constructor:}

...and then casually mention that DataViews and TypedArrays both happen to quack in just this way?

# Kevin Smith (10 years ago)

But you can support both, like this:

pull(DataView) -> Promise<DataView>
pull(TypedArrayView) -> Promise<TypedArrayView of the same type>

A view argument conveniently provides just the three pieces of information you need, plus a type.

I thought of that. However, I found it a bit strange that passing this function a view onto bytes [256, 512] of a 1024-byte buffer would detach the entire 1024-byte buffer. What do you think?

Not sure if it makes any difference or not, but it's interesting to me that a signature like the above maps very well to async iterators:

source.next(myDataView) -> Promise<DataView>

I implemented a zip/tar utility when validating the async iterator design and basically used that signature (except with Node Buffers instead of DataViews). The basic strategy I used was to construct a pipeline with async iterators:

source -> data pump -> transform -> data pump -> consumer

where each "data pump" would have a pool of buffers, and would pull from the upstream iterator, providing the next unused buffer in its pool as argument to "next". The downstream transform or consumer would then just pull the next filled buffer (view) from the upstream pool.

# Kevin Smith (10 years ago)
source.next(myDataView) -> Promise<DataView>

Derp. Should have been:

source.next(dataView) -> Promise<IteratorResult<DataView>>

which you kind of need anyway ; )

# Domenic Denicola (10 years ago)

From: Jason Orendorff [mailto:jason.orendorff at gmail.com]

Would it make it seem less strange if you specified the argument as a "dictionary" with these properties:

{buffer:, byteOffset:, byteLength:, constructor:}

...and then casually mention that DataViews and TypedArrays both happen to quack in just this way?

Yes, this in fact makes me very happy :)

# Bergi (10 years ago)

Kevin Smith schrieb:

Should have been:

 source.next(dataView) -> Promise<IteratorResult<DataView>>

which you kind of need anyway ; )

Wouldn't .next() rather need to return an IteratorResult<Promise<DataView>>?

Bergi

# Kevin Smith (10 years ago)

Wouldn't .next() rather need to return an IteratorResult<Promise< DataView>>?

No, because the "done-ness" of the iteration is itself asynchronous. You don't know whether you're done iterating until the promise resolves.

zenparsing/async-iteration

# Bergi (10 years ago)

Kevin Smith schrieb:

Wouldn't .next() rather need to return an IteratorResult<Promise< DataView>>?

No, because the "done-ness" of the iteration is itself asynchronous. You don't know whether you're done iterating until the promise resolves.

zenparsing/async-iteration

Oh, right, I should've read your first post properly - you were talking about future asynchronous generators.

Bergi