Float denormal issue in JavaScript processor node in Web Audio API

# Stéphane Letz (11 years ago)

Using JavaScript (and more specifically asm.js) code in the context of ScriptProcessor nodes in the Web Audio API has shown a very problematic issue related to denormalized float numbers.

We have ported our audio C++ code (audio nodes generated using the Faust audio DSP language : faust.grame.fr) as JS nodes in the Web Audio API using emscripten. We get very high CPU load on Intel processors, as soon at the computation produced denormal floats numbers.

When the same code runs in a C++ context, we typically solve the problem by adding "flush denormal to zero" kind of macro before the audio computation, since audio code can perfectly live without following strict IEEE754 compatibility. This is a very common solution that is used everywhere when developing native audio code.

The problem has been seen on all recent browsers we tested : Firefox, Chrome and WebKit on OSX. It can be tested on the following page with contains a physical model of a piano string, compiled in asm.js using emscripten, and run as a Web Audio API ScriptProcessorNode. If you hit the "gate" button a sound is played. After some seconds when the notes become idle again, the CPU load raises to 100% (seen in activity monitor on OSX).

We started a bug report on Firefox which causes a lot of feedback: bugzilla.mozilla.org/show_bug.cgi?id=1027624

Our understanding is that having the possibly to control "flush denormal to zero" operation is not a pure Web Audio API specification/implementation issue, but has to be treated at the JavaScript language level. We would be happy to get feedback on the subject.

Best

Stéphane Letz

# Benjamin Bouvier (11 years ago)

I would like to emphasize this topic, as it actually matters a lot in signal processing in general. When one is applying operations like filters, feedback loops and so on, one isn't interested in very low values, as they can't be perceived by the eyes or the ears anyways.

Ideally, we would define zones of code in which we run with the FTZ (flush to zero) processor flag applied, making it clear we're not following IEEE754 arithmetic anymore. A few ideas have been already thrown in the bug: [1] A function annotation "use flush_denormals_to_zero"

  • Math.flushDenormalToZero(v) (returns 0 if v is denormal, otherwise v)
  • flush-to-zero versions of JS SIMD intrinsics (or just make the intrinsics flush-to-zero by default) [2] Set FTZ in loops surrounding SIMD code. [3] Have a wrapper function withFTZ(f), where f is another function, such that we set FTZ, we call f() and then we unset FTZ at the end. This method sounds easy to optimize in the JITs and avoids the issue of interleaving code using FTZ with code that doesn't.

Out of the wild, I am wondering whether having a FTZ flag on a Realm could have all operations executed within this realm use flushing denormals arithmetic. That is just a (maybe crazy) idea and I am not sure this fits in the Realms scope.

Do you have any other better ideas or opinions about these? Could this be brought up to the next TC39 meeting?

Cheers, Benjamin

[1] bugzilla.mozilla.org/show_bug.cgi?id=1027624#c13 [2] bugzilla.mozilla.org/show_bug.cgi?id=1027624#c17 [3] bugzilla.mozilla.org/show_bug.cgi?id=1027624#c30

# Allen Wirfs-Brock (11 years ago)

On Jul 4, 2014, at 8:19 AM, Benjamin Bouvier wrote:

Hi,

I would like to emphasize this topic, as it actually matters a lot in signal processing in general. When one is applying operations like filters, feedback loops and so on, one isn't interested in very low values, as they can't be perceived by the eyes or the ears anyways.

Ideally, we would define zones of code in which we run with the FTZ (flush to zero) processor flag applied, making it clear we're not following IEEE754 arithmetic anymore. A few ideas have been already thrown in the bug: [1] A function annotation "use flush_denormals_to_zero"

  • Math.flushDenormalToZero(v) (returns 0 if v is denormal, otherwise v)
  • flush-to-zero versions of JS SIMD intrinsics (or just make the intrinsics flush-to-zero by default)

Based upon various other conversations, it appears to me that a the "Math.flushDenormalToZero" function is the most feasible of the various alternatives. Perhaps called: Math.denormz (thanks to dherman for the name)

Code generators like Emscripten could wrap math operators with it, much as it current does with Math.fround. Manually you would code things like: Math.denormz(small / big) if you want to ensure a zero result.

It's also been suggested we may want a function that is equivalent to Math.denormz(Math.fround(x)). Perhaps: Math.fdzround (name also due to dherman)

The other alternatives that expose the FTZ flag in various ways seem very undesirable from an overall language perspective. If a language like JS is going to have control of this sort of mode then it really needs to be lexically scoped in some manner. And it probably needs to be part of the state that is captured by closures. Overall that seems like a very expensive feature for a limited use case.

[2] Set FTZ in loops surrounding SIMD code. [3] Have a wrapper function withFTZ(f), where f is another function, such that we set FTZ, we call f() and then we unset FTZ at the end. This method sounds easy to optimize in the JITs and avoids the issue of interleaving code using FTZ with code that doesn't.

Out of the wild, I am wondering whether having a FTZ flag on a Realm could have all operations executed within this realm use flushing denormals arithmetic. That is just a (maybe crazy) idea and I am not sure this fits in the Realms scope.

Realms aren't the equivalent of processes or even threads, they aren't a container of an execution context. A flag like this would probably have to be associated with each function and would have to be saved/restored across each call/return/unwind. This would be a significant change to the runtime behavior. Hard to justify for a limited use case.

Do you have any other better ideas or opinions about these? Could this be brought up to the next TC39 meeting?

I've added to the July meeting agenda.

# JF Bastien (11 years ago)

Here are a few thoughts about denorms (a.k.a. subnormals as of 2008) from a discussion a few months ago with John Mccutchan, Dave Herman, Luke Wagner and Dan Gohman.

A few facts to start off with:

  • The current SIMD proposal doesn't specify how denormals behave.
  • ECMA-262 specifies denormal behavior in 8.5: "The remaining 9007199254740990 (that is, 2^53-2) values are denormalised, having the form [...]".
  • Most CPUs support dernomals, but they're often slow (think 10x–100x slower than a single FP instruction).
  • Other hardware like GPUs don't support denormals.
  • ARM NEON doesn't support denormals, and instead flushes to zero, which means that all SIMD operations aren't denormalized whereas all scalar operations are.
  • A64 does support denormals, but they're not necessarily fast.
  • Most CPUs allow setting denormals-are-zero and/or flush-to-zero as a floating-point state, affecting SIMD too, but that instruction is often slow or serializes the FP pipeline.

My opinion is that denormals aren't something people are asking for. They're actually quite a surprise to most people, who learn about them the hard way when their application slows to a crawl, and then they learn about DAZ/FTZ and everything is fine again.

I think the current state of hardware makes it prohibitive to mandate that denormals be supported outright for scalars, and makes it impossible to mandate denormals for SIMD since ARMv7 is a major CPU ISA that doesn't support denormals for SIMD (pure scalar would therefore have to be used). There's a further issue with adding SIMD in JS where the temporary polyfills will use scalar instructions, so if denormals exist in scalar but not SIMD then the polyfills are slightly wrong. I think it's also quite prohibitive to change the CPU's FP state to have DAZ/FTZ on/off between scalar and SIMD, and allowing the user to annotate their source with FTZ-on/FTZ-off will lead to surprising performance pitfalls on some hardware, and will lead to weird coding where scalar FP and SIMD can't be mixed.

My conclusion is therefore that denormals should be left as unspecified for both scalar and SIMD. Change the spec for scalars, and leave as-is for SIMD.

Realistically implementation will set DAZ/FTZ for the foreseeable future, and if people ever clamor for them then denormals can be brought back, assuming the hardware landscape is better in the future.

FWIW there's a similar problem with NaNs, which AFAIK are left as unspecified.

JF

# Mark S. Miller (11 years ago)

On Thu, Jul 10, 2014 at 9:54 AM, JF Bastien <jfb at chromium.org> wrote:

Here are a few thoughts about denorms (a.k.a. subnormals as of 2008) from a discussion a few months ago with John Mccutchan, Dave Herman, Luke Wagner and Dan Gohman.

A few facts to start off with:

  • The current SIMD proposal doesn't specify how denormals behave.
  • ECMA-262 specifies denormal behavior in 8.5: "The remaining 9007199254740990 (that is, 2^53-2) values are denormalised, having the form [...]".
  • Most CPUs support dernomals, but they're often slow (think 10x–100x slower than a single FP instruction).
  • Other hardware like GPUs don't support denormals.

This triggered an odd thought. Rather than proceeding under the assumption

that

a) 64bit with denormals b) 64bit without denormals c) 32bit with denormals d) 32bit without denormals

are all needed, what if we assume that only #a and #d are needed enough to bother with. These are the two extremes:

a) gimme all the precision that IEEE double precision was specced to provide, and that normal non-GPU hardware was built to provide

d) Damn the precision; full speed ahead. Gimme the full speed that all stock FPUs can provide, whether on CPUs or GPUs, at whatever cost in precision that lowest common denominator demands.

This doesn't address the question of how we should meet these needs. But it might simplify the question. So: Are there compelling enough use cases for #b and #c that we should care about them?

# Jens Nockert (11 years ago)

On 10 Jul 2014, at 18:54, JF Bastien <jfb at chromium.org> wrote:

Other hardware like GPUs don't support denormals.

Not true, AMD and nVidia GPUs support denormals. Anything that supports double-precision in OpenCL will support denormals.

My opinion is that denormals aren't something people are asking for. They're actually quite a surprise to most people, who learn about them the hard way when their application slows to a crawl, and then they learn about DAZ/FTZ and everything is fine again.

I think programmers are surprised when denormals are slow (but only DSP-type code tends to produce these issues.)

On the other hand, I think most programmers would be surprised if x - y = 0 didn’t mean x == y too.

I think the current state of hardware makes it prohibitive to mandate that denormals be supported outright for scalars, and makes it impossible to mandate denormals for SIMD since ARMv7 is a major CPU ISA that doesn't support denormals for SIMD (pure scalar would therefore have to be used).

Sure, allow flush-to-zero for SIMD. It doesn’t break old code.

There's a further issue with adding SIMD in JS where the temporary polyfills will use scalar instructions, so if denormals exist in scalar but not SIMD then the polyfills are slightly wrong. I think it's also quite prohibitive to change the CPU's FP state to have DAZ/FTZ on/off between scalar and SIMD, and allowing the user to annotate their source with FTZ-on/FTZ-off will lead to surprising performance pitfalls on some hardware, and will lead to weird coding where scalar FP and SIMD can't be mixed.

Sure, if you use SSE2 for scalar arithmetic, it shares state with any SIMD extension to JS. So if we really care about the cost of changing the state, we need to use the same state for SIMD and scalar code. (But on modern x86, denormals are relatively fast)

My conclusion is therefore that denormals should be left as unspecified for both scalar and SIMD. Change the spec for scalars, and leave as-is for SIMD.

Realistically implementation will set DAZ/FTZ for the foreseeable future, and if people ever clamor for them then denormals can be brought back, assuming the hardware landscape is better in the future.

Please, please don’t try to break the web, denormals/gradual underflow change behaviour in a lot of simulations and root-finding problems, there’s already code out there that depends on denormals.

# JF Bastien (11 years ago)

So: Are there compelling enough use cases for #b and #c that we should care about them?

I don't think so, but Jens seems to disagree. Simulations and root-finding problems in my experience use smaller precision numbers (e.g. i16, f16 or f32) to hillclimb close to the solution faster, and then user bigger precision numbers (f64) to nail down the actual solution. Using denormals here kind of defeats the purpose.

Not true, AMD and nVidia GPUs support denormals. Anything that supports double-precision in OpenCL will support denormals.

You're correct, I wasn't precise in this: denormal support is new to GPUs, and is also fraught with peril. It's a pretty similar situation to that on ARM.

I think programmers are surprised when denormals are slow (but only DSP-type code tends to produce these issues.)

My experience, and what I've heard from other people, has been that denormals are seen in an unexpected way much more often that with DSP applications.

On the other hand, I think most programmers would be surprised if x - y = 0 didn’t mean x == y too.

This is floating point. People will also be surprised when (x / y) * y != x. I think this is a strawman argument: transcendentals' lack of precision are much more surprising than this IMHO.

Sure, allow flush-to-zero for SIMD. It doesn’t break old code.

The case I'm making is that mandating FTZ for SIMD but keeping denormal support for scalars is a bad idea because it implies changing FP state back and forth which has significant performance implications and dissuades folks from writing intuitive code that has SIMD and FP operations. It's also not much of an option when SIMD is polyfilled.

# JF Bastien (10 years ago)

Were denormals discussed at the TC39 meeting? I can't seem to find them in the meeting notes.

# Allen Wirfs-Brock (10 years ago)

We didn't get to them at the July meeting. I'll put them on the agenda for Sept.

The likely proposal will be to provide a Math.demormz(x) function and perhaps also Math.fdzround(x)

# JF Bastien (10 years ago)

On Tue, Aug 12, 2014 at 9:29 AM, Allen Wirfs-Brock <allen at wirfs-brock.com>

wrote:

We didn't get to them at the July meeting. I'll put them on the agenda for Sept.

Thanks.

The likely proposal will be to provide a Math.demormz(x) function and perhaps also Math.fdzround(x)

I'd be interested in the details: as I said on July 10th I'm not sure that's the best solution, but I may have missed something.