endianness

# Brendan Eich (12 years ago)

Kenneth Russell wrote:

On Tue, Mar 26, 2013 at 4:35 PM, David Herman<dherman at mozilla.com> wrote:

[breaking out a new thread since this is orthogonal to the NaN issue]

While the Khronos spec never specified an endianness, TC39 agreed in May 2012 to make the byte order explicitly little-endian in ES6:

 https://mail.mozilla.org/pipermail/es-discuss/2012-May/022834.html

The de facto reality is that there are essentially no big-endian browsers for developers to test on. Web content is being invited to introduce byte-order dependencies. DataView is usually held up as the counter-argument, as if the existence of a safe alternative API means no one will ever misuse the unsafe one. Even if we don't take into account human nature, Murphy's Law, and the fact that the web is the world's largest fuzz tester, a wholly rational developer may often prefer not to use DataView because it's still easier to read out bytes using [ ] notation instead of DataView methods.

I myself -- possibly the one person in the world who cares most about this issue! -- accidentally created a buggy app that wouldn't work on a big-endian system, because I had no way of testing it:

 https://github.com/dherman/float.js/commit/deb5bf2f5696ce29d9a6c1a6bf7c479a3784fd7b

In summary: we already agreed on TC39 to close this loophole, it's the right thing to do, and concern about potential performance issues on non-existent browsers of non-existent systems should not trump portability and correctly executing existing web content.

I am disappointed that this decision was made without input from both of the editors of the typed array spec and disagree with the statement that it is the right thing to do.

First, my apologies to Dave for forgetting the May decision. I was looking in the wrong place for a record of that decision!

Second, I understand Ken's point of view (old SGI hacker here) but have to question it on the grounds Dave raises: no one is testing, so there are very likely to be latent little-endian dependencies in deployed code today.

So why not codify this? Because it might penalize big-endian systems? The IP protocols use big-endian byte order (owing to the pre-Intel big-endian architectures of IMPs and early Internet machines) and so penalize Intel machines, and we take the hit (and it's small on modern super-scalar CPUs).

Ken, a couple of questions:

  1. Do you know of any big-endian systems with market share that host JS implementations? It would be great to have a list.

  2. Independent of the answer to (1), today's web browsers run on little-endian systems and that has very likely created a de-facto standard. If we codify it, we make the small to empty set of big-endian systems pay a price. Why is this the wrong thing for typed arrays but the right thing (in reverse! majority-architectures penalized) for the IP protocol suite?

# Aymeric Vitte (12 years ago)

I think one day I would have raised this subject too. While doing [1] to [3], I was wondering during some time if I was using typed arrays correctly, because I found strange to have to create each time a DataView of a possible enormous buffer to read 2 bytes, and what the impact on performances.

Not talking about node.js that "does not care" about the fact that their DataView instantiation is ridiculously slow, it appears that the defect is present in FF too up to a certain version but it's corrected in the latest Nightly, so probably the answer is that there is an impact on performances.

Maybe it's not uninteresting to look at [3] where I am trying to compare 3 methods (former "string" buffers, typed arrays without DataViews, and typed arrays with DataViews)

Or maybe I am still not using this correctly, all protocols used here are in big endian order, I did not read everything about the subject and why this decision was made but the basic reflex would be to add an endianness option to typed arrays.

,

[1] Ayms/node-Tor [2] Ayms/iAnonym [3] Ayms/abstract-tls and Ayms/abstract-tls#performances-

Le 27/03/2013 01:18, Brendan Eich a écrit :

# Vladimir Vukicevic (12 years ago)

On 4/1/2013 10:24 PM, Kenneth Russell wrote:

On Sun, Mar 31, 2013 at 1:42 PM, Kevin Gadd <kevin.gadd at gmail.com> wrote:

One could also argue that people using typed arrays to alias and munge individual values should be using DataView instead. If it performs poorly, that can hopefully be addressed in the JS runtimes (the way it's specified doesn't seem to prevent it from being efficient). Agreed. DataView's methods are all simple and should be easy to optimize. Because they include a conditional byte swap, they can't run quite as fast as the typed arrays' accessors -- but they shouldn't need to.

Side note: in theory they could run extremely fast, as long as the endianness parameter was a constant. Since you generally know the format of the data coming over the network, it should be except in rare cases. The implementation knows the native endianness and can just choose the right load/store variant when generating JIT code for the call. However, they can also read from unaligned offsets, and so that will slow things down a bit. Either way -- they could be much faster than they are now.

# Aymeric Vitte (12 years ago)

Le 02/04/2013 04:24, Kenneth Russell a écrit :

Agreed. DataView's methods are all simple and should be easy to optimize. Because they include a conditional byte swap, they can't run quite as fast as the typed arrays' accessors -- but they shouldn't need to. DataView was designed to support file and network I/O, where throughput is limited by the disk or network connection. The typed array views were designed for in-memory assembly of data to be submitted to the graphics card, sound card, etc., and must run as fast as possible.

When you are streaming things, what's the correct use of DataViews?

ie : you are supposed to create each time you want to read some bytes a DataView (which can be optimized or whatever, but still with some costs)?

Maybe it's outside of the scope of this discussion, I have already provided examples, I still suspect that I am using it wrongly or that ArrayBuffers are more adapted to webgl (ie static buffer manipulation) than network streaming (ie dynamic buffer manipulation).

Probably I am wrong but really would like to know then what's the correct use.

,

# Kenneth Russell (12 years ago)

On Tue, Apr 2, 2013 at 3:03 PM, Aymeric Vitte <vitteaymeric at gmail.com> wrote:

Le 02/04/2013 04:24, Kenneth Russell a écrit :

Agreed. DataView's methods are all simple and should be easy to optimize. Because they include a conditional byte swap, they can't run quite as fast as the typed arrays' accessors -- but they shouldn't need to. DataView was designed to support file and network I/O, where throughput is limited by the disk or network connection. The typed array views were designed for in-memory assembly of data to be submitted to the graphics card, sound card, etc., and must run as fast as possible.

When you are streaming things, what's the correct use of DataViews?

ie : you are supposed to create each time you want to read some bytes a DataView (which can be optimized or whatever, but still with some costs)?

Maybe it's outside of the scope of this discussion, I have already provided examples, I still suspect that I am using it wrongly or that ArrayBuffers are more adapted to webgl (ie static buffer manipulation) than network streaming (ie dynamic buffer manipulation).

Probably I am wrong but really would like to know then what's the correct use.

If I understand your question, then the correct use of DataView is: upon receiving an ArrayBuffer, create a DataView referring to it. When iterating down the contents of the ArrayBuffer, continue to use the same DataView instance, just incrementing the offset. In abstract-tls/lib/abstract-tls.js there are some operations which create a new DataView just to read or write a single element; this isn't the correct usage. www.html5rocks.com/en/tutorials/webgl/typed_arrays may be a useful reference.

If you're handling streaming data then presumably you're receiving multiple ArrayBuffers, one after the other. You should create one DataView per buffer. The only challenge is properly handling the boundary from one buffer to the next, if the boundary is within an element like a uint16 or uint32.

# Kevin Gadd (12 years ago)

Is there a reason why DataView wasn't specified as static methods that take an ArrayBuffer in the first place? That would solve the problem of figuring out when/how often to create DataView instances, and eliminate the garbage created by using DataViews.

# Kenneth Russell (12 years ago)

Yes: API consistency. An ArrayBuffer is opaque; to work with the data it contains, instantiate a view.

# Aymeric Vitte (12 years ago)

So, my suspicion was correct, I am using DataView methods like this (Ayms/abstract-tls/blob/master/lib/abstract-tls.js#l280-332 here and on other projects) for historical reasons, because I started from node.js's buffers methods and then switched to ArrayBuffers with node.js's like methods, because in some cases I needed to use both at the same time or to be able to switch easily from one to another.

Then the use is not correct in theory, now performances seem to be OK (except node.js), so maybe most of the implementations are instantiating once a DataView for a given ArrayBuffer and keeping reference to it when you call it again?

,

Le 03/04/2013 00:52, Kenneth Russell a écrit :