Re: [linux-audio-dev] Traps in floating point code

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Traps in floating point code
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Thu Jul 01 2004 - 23:38:03 EEST


Ruben van Royen wrote:

>First of all, I was not yet talking about vectorizing your code which is often
>hard, especially for a compiler. but SSE can be used on scalars as well (as
>you probably know).
>The fact is that the intel pentium 4 optimization guide says that SSE code is
>generally as fast as or faster than regular FP code. And especially the
>truncation to integer is faster. Also denormals (which started all of this)
>can be handled faster by sse math by turning on a mode flag that makes input
>denormals behave as zero's This is of course not IEEE compliant, but exactly
>what you were doing in your code.
>
>

I agree, in theory slowdowns should not occur but what I found strange
is that even Intel's own compiler, icc produced bad performance
when compiling the resampling code with SSE/SSE2 math and vectorization on.
If the compiler was smart then it would not have used SSE/SSE2 in that
section of code but apparently icc is still not good in spotting
those problems.
The problem for a C programmer is that since he is assuming that the
compiler does a good job in optimizing, most will not easily be able
to figure out why the SSE optimizations slowed down certain routines.
Then there the dilemma might occur where 50% of CPU is spent in
function1() and 50% in function2().
but if you activate SSE then function1() speeds up 40% while function2()
slows down 30%.
If it was possible to tell the compiler to not use SSE in function2()
then the app would benefit from SSE but
in the above case it would not.
Usually optimal C code can only be generated if the programmer knows the
CPU well and the compiler too, but often
this requires long painful trial and error sessions, analysis of asm
code generated by the compiler etc.
Ok there are profilers available but they don't automagically solve all
the optimization problems.

cheers,
Benno
http://www.linuxsampler.org

>The reasons for SSE code being slower than FP code could be:
> The addition is pipelined in the FP, but not in the SSE unit.
> Incorrect allignment might incur a higher penalty for SSE.
>
>Ruben
>
>
>


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Thu Jul 01 2004 - 23:30:16 EEST