[linux-audio-dev] Intel C Compiler & RedHat 8.0 , Pentium 4 FPU performance

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Intel C Compiler & RedHat 8.0 , Pentium 4 FPU performance
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Tue Nov 12 2002 - 14:41:35 EET


Hi,
does anyone know if it is possible to make the Free Intel C Compiler
work on Red Hat 8.0 ?
It used to work on RH7.3 but Jussi L. reported failure on RH8.0 too.

I would just be courious about Intel compiler's efficiency since I am
currently performing some resampling / mixing benchmarks using linear
and cubic interpolation and for example with cubic interpolation which
involves a few MULs/ADDs per sample, I get something like:
Celeron (PII-class) 71 cycles/sample , Athlon 61 cycles/sample,
Pentium 4 80 cycles / sample. (using gcc 3.2)
It seems the P4's FPU sucks quite since for some ops it needs more
cycles per instruction than the old Celeron.
I heard Intel's compiler is able to use SIMD to speed up FPU ops so I
would just be curious to see what is achievable given good quality code
that is targeted for the P4.
The innermost resampling/mixing loop is very short anyway so in that
case one could probably use a P4-specific asm version (or importing the
asm generated by icc) in order to achieve maximum performance.

But first tests seems to indicate (at least to me) that the Athlon is
20-30% faster for doing resampling/mixing stuff thus I guess an Athlon
machine will deliver more voices than a P4 running at the same
frequency ( or Pentium-frequency-equivalence-index).

Currently I am working only in the floating point domain, but Juan L. is
telling me about the wonders of integer math and pointed me to the
routines found in http://modplug-xmms.sourceforge.net/ but I am unable
to compile the package on RedHat 8.0 (am I a masochist insisting on that
distro ? :-) )
On the other hand Steve Harris says that in modern CPUs floating point
ops are more accelerated than integer ones and since integer involves a
lot of shifting, you might end up with longer execution times than the
integer version.
(FISTL takes 6 cycles but it is not that much compared to 70-80 cycles
in the case of cubic interpolation plus doing all in the FP domain saves
you from lots of hassles)

I will publish the benchmark and the results when I will have
implemented more test cases.
Anyway it is interesting to learn how sucky the x86 architecture can be
(the hard way of course :-) ).

PS: regarding the optimization options Steve H. suggested
-O6 -fomit-frame-pointer -fstrength-reduce -funroll-loops
-fmove-all-movables -ffast-math -mcpu=i686 -march=i686

Can gcc 3.2 target archtitectures higher than the PII ?
(I mean generating P3/P4 specific code ?)

thoughts ?
Benno

-- 
http://linuxsampler.sourceforge.net
Building a professional grade software sampler for Linux.
Please help us designing and developing it.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Nov 12 2002 - 13:47:03 EET