Idiomatic representation of { buffer, bytesRead }
On Mon, Mar 2, 2015 at 3:45 PM, Domenic Denicola <d at domenic.me> wrote:
While working on lower-level byte streams we're encountering a number of situations that need to return something along the lines of
{ buffer, bytesRead }
. (In this setting "buffer" = ArrayBuffer.) In the most general form the signature ends up being something like{ sourceBuffer, offset, bytesDesired } -> { newBuffer, bytesRead }
I very much like 2 and 3 because they provide the result type that the user wants anyway. Slightly prefer DataView.
But you can support both, like this:
pull(DataView) -> Promise<DataView>
pull(TypedArrayView) -> Promise<TypedArrayView of the same type>
A view argument conveniently provides just the three pieces of information you need, plus a type.
The lower-level primitive could take an optional fourth argument:
pull(sourceBuffer, offset, bytesDesired,
resultConstructor=DataView) -> Promise<resultConstructor>
This could even be generic in resultConstructor, though it's a little awkward because you have to divide by resultConstructor.BYTES_PER_ELEMENT before invoking the constructor.
Thanks very much for the feedback Jason!
But you can support both, like this:
pull(DataView) -> Promise<DataView> pull(TypedArrayView) -> Promise<TypedArrayView of the same type>
A view argument conveniently provides just the three pieces of information you need, plus a type.
I thought of that. However, I found it a bit strange that passing this function a view onto bytes [256, 512] of a 1024-byte buffer would detach the entire 1024-byte buffer. What do you think?
On Wed, Mar 4, 2015 at 3:06 PM, Domenic Denicola <d at domenic.me> wrote:
I thought of that. However, I found it a bit strange that passing this function a view onto bytes [256, 512] of a 1024-byte buffer would detach the entire 1024-byte buffer. What do you think?
It's a good point. I figured most callers will have allocated the buffer themselves, most will only have one view into it at a time, and most will ask for the whole thing to be filled; and by "most" I really mean somewhere over 99.9%. Skimming your gist seemed to sort of confirm my hunch, but don't take my word for it -- that's all the research I did. All I know is, I've known about Python's file.readinto() method for at least 15 years and never yet had a need for it.
Would it make it seem less strange if you specified the argument as a "dictionary" with these properties:
{buffer:, byteOffset:, byteLength:, constructor:}
...and then casually mention that DataViews and TypedArrays both happen to quack in just this way?
But you can support both, like this:
pull(DataView) -> Promise<DataView> pull(TypedArrayView) -> Promise<TypedArrayView of the same type>
A view argument conveniently provides just the three pieces of information you need, plus a type.
I thought of that. However, I found it a bit strange that passing this function a view onto bytes [256, 512] of a 1024-byte buffer would detach the entire 1024-byte buffer. What do you think?
Not sure if it makes any difference or not, but it's interesting to me that a signature like the above maps very well to async iterators:
source.next(myDataView) -> Promise<DataView>
I implemented a zip/tar utility when validating the async iterator design and basically used that signature (except with Node Buffers instead of DataViews). The basic strategy I used was to construct a pipeline with async iterators:
source -> data pump -> transform -> data pump -> consumer
where each "data pump" would have a pool of buffers, and would pull from the upstream iterator, providing the next unused buffer in its pool as argument to "next". The downstream transform or consumer would then just pull the next filled buffer (view) from the upstream pool.
source.next(myDataView) -> Promise<DataView>
Derp. Should have been:
source.next(dataView) -> Promise<IteratorResult<DataView>>
which you kind of need anyway ; )
From: Jason Orendorff [mailto:jason.orendorff at gmail.com]
Would it make it seem less strange if you specified the argument as a "dictionary" with these properties:
{buffer:, byteOffset:, byteLength:, constructor:}
...and then casually mention that DataViews and TypedArrays both happen to quack in just this way?
Yes, this in fact makes me very happy :)
Kevin Smith schrieb:
Should have been:
source.next(dataView) -> Promise<IteratorResult<DataView>>
which you kind of need anyway ; )
Wouldn't .next()
rather need to return an
IteratorResult<Promise<DataView>>?
Bergi
Wouldn't
.next()
rather need to return an IteratorResult<Promise< DataView>>?
No, because the "done-ness" of the iteration is itself asynchronous. You don't know whether you're done iterating until the promise resolves.
Kevin Smith schrieb:
Wouldn't
.next()
rather need to return an IteratorResult<Promise< DataView>>?No, because the "done-ness" of the iteration is itself asynchronous. You don't know whether you're done iterating until the promise resolves.
Oh, right, I should've read your first post properly - you were talking about future asynchronous generators.
Bergi
While working on lower-level byte streams we're encountering a number of situations that need to return something along the lines of
{ buffer, bytesRead }
. (In this setting "buffer" = ArrayBuffer.) In the most general form the signature ends up being something likewhere
sourceBuffer
gets detached, then (in some other thread, most likely) up tobytesDesired
bytes get written in at positionoffset
to the backing memory, then the backing memory gets transferred tonewBuffer
, andbytesRead
tells you how many bytes were actually read into the buffer.I was hoping to get opinions on the most idiomatic way to represent this type in JavaScript. So far I can think of a few options:
{ transferred, bytesRead }
offset
and extendingbytesRead
. I.e.,return new Uint8Array(result.transferred, input.offset, result.bytesRead);
.result.transferred.bytesRead = result.bytesRead; return result.transferred;
1 is unambiguous, but a bit awkward, and in general does not compose well if we try to make byte streams a special case of more general streams (which is a goal).
Both 2 and 3 are essentially attempting to smuggle the two pieces of information into one object. 2 takes the "byte" idea literally, whereas 3 uses DataView since it feels more "agnostic." In both cases you can access the underlying buffer using
view.buffer
so no generality is lost. I would be especially interested in peoples' opinions on 2 vs 3.4 I just thought up while composing this email and is probably not such a great idea. But, I think it does work.
I wrote up a specialized version of 2 in some detail under a number of different scenarios at: gist.github.com/domenic/65921459ef7a31ec2839. Of particular interest might be gist.github.com/domenic/65921459ef7a31ec2839#a-two-buffer-pool-for-a-file-stream