Re: [linux-audio-dev] Denormal numbers

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Denormal numbers
From: Simon Jenkins (sjenkins_AT_blueyonder.co.uk)
Date: Sun Aug 03 2003 - 21:47:31 EEST


  Simon Jenkins wrote:

> [...]If it still doesn't
> work after they are applied then you could try -ffloat-store. The
> compiler manual
> says that programs which rely on the exact storage format of IEEE
> floats should
> use this option, but I am unable to break the macro in a way that this
> fixes.

Unfortunately -ffloat-store slows code down a *lot*.

More unfortunately, I can't prove that the macro will always function
correctly
without it. I still can't actually break it, but things I have read
whilst searching
the web have given me definite cause for concern that it could fail.

I might mail a compiler list about this one.

> I'm hoping to post some code in the next day or two which prevents
> denormal
> values from being generated in the first place.
>
Here's where I've got to so far. Comments are welcome.

(Note: This might, or might not, suffer from the same problems that
FLUSH_TO_ZERO
might or might not suffer from :))

/* === benormal.h - Copyright (C) 2003 Simon Jenkins === */

#define FLOAT_EXP_MASK (0x7F800000)
#define FLOAT_AS_BITS(x) (*(volatile unsigned int *)&(x))

/*============================================================================
  Function: FlushMultiplyQuick
==============================================================================
  Purpose: Returns the product of its parameters, but flushes the result to
            zero if there is suspicion (not proof!) that the result
might have
            been a denormal float.

  Method: The result is flushed to zero if the magnitude of either
parameter
            is below 2**-63.

  Comments: This never produces a denormal, but sometimes flushes a result
            that would not have been denormal.
============================================================================*/
static inline float FlushMultiplyQuick( float a, float b )
{
    return ( ( FLOAT_AS_BITS(a) & 0x60000000 )
             && ( FLOAT_AS_BITS(b) & 0x60000000 ) )
           ? (a * b) : 0.0f;
}

/*============================================================================
  Function: FlushMultiplyQuickAsym
==============================================================================
  Purpose: Returns the product of its parameters, but flushes the result to
            zero if there is suspicion (not proof!) that the result
might have
            been a denormal float.

  Method: The result is flushed to zero if the magnitude of the first
            parameter is below 2**-63.

  Comments: Faster than FlushMultiplyQuick, but might produce a denormal if
            the second parameter is non-zero with magnitude < 2**-63. Never
            produces a denormal if second parameter has magnitude >= 2**-63,
============================================================================*/
static inline float FlushMultiplyQuickAsym( float a, float b )
{
    return ( FLOAT_AS_BITS(a) & 0x60000000 ) ? (a * b) : 0.0f;
}

/*============================================================================
  Function: FlushMultiply
==============================================================================
  Purpose: Returns the product of its parameters, but flushes the result to
            zero if there is suspicion (not proof!) that the result
might have
            been a denormal float.

  Method: Computes a lower bound for the exponent of the result by adding
            the exponents of the parameters. Flushes the result if the lower
            bound suggests a possible denormal result.

  Comments: A bit slower than the other methods, but much less likely to
flush
            a non-denormal result. (A non-denormal will only be flushed
if the
            mantissas of the parameters would have "saved" an otherwise
            denormal result by having a product >= 2). Never produces a
            denormal result.
============================================================================*/
static inline float FlushMultiply( float a, float b )
{
    return ( ( FLOAT_AS_BITS(a) & FLOAT_EXP_MASK )
             + ( FLOAT_AS_BITS(b) & FLOAT_EXP_MASK ) > 0x3F800000 )
           ? (a * b) : 0.0f;
}

/*=== end of file ===*/

Simon Jenkins
(Bristol, UK)


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Sun Aug 03 2003 - 20:50:01 EEST