Subject: Re: [linux-audio-dev] intel signal processing lib
From: Jussi Laako (jussi.laako_AT_kolumbus.fi)
Date: Thu Nov 08 2001 - 01:39:08 EET
Steve Harris wrote:
>
> I looked at it, but I thought it was closed source and I didn't think
> people would be up for that. I'l give it another look though.
>
> FFTW is known not to be that fast, but it is cross platform. Its easy to
> beat with SSE instructions on a PIII.
Athlon/700 in these tests is the old Slot-A model with 512 kB cache.
PIII is piece of crap because it has lots of cache misses in strange places
thus creating strange results. This is because PII/PIII cache doesn't handle
floats properly (see some hardware test sites for details).
OK, here are some numbers from my tests:
--- 8< --- Using Intel's SPL (MSVC6/NT4): --- 8<---
Athlon 700:
71 us / 1024 point complex FFT (single precision)
83 us / 1024 point complex FFT (double precision)
39 us / 1024 point real FFT (single precision)
419 us / 8192 point real FFT (single precision)
K6 233:
1587 us / 1024 point complex FFT (single precision)
1659 us / 1024 point complex FFT (double precision)
840 us / 1024 point real FFT (single precision)
9595 us / 8192 point real FFT (single precision)
Pentium III 700:
732 us / 1024 point complex FFT (single precision)
6164 us / 1024 point complex FFT (double precision)
863 us / 1024 point real FFT (single precision)
13587 us / 8192 point real FFT (single precision)
Celeron 533:
8163 us / 1024 point complex FFT (single precision)
8437 us / 1024 point complex FFT (double precision)
4322 us / 1024 point real FFT (single precision)
47265 us / 8192 point real FFT (single precision)
Pentium II 350:
12521 us / 1024 point complex FFT (single precision)
12933 us / 1024 point complex FFT (double precision)
6624 us / 1024 point real FFT (single precision)
72505 us / 8192 point real FFT (single precision)
--- 8< --- FFTW: --- 8< ---
Athlon 700:
Complex:
SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
time for one fft: 424.216695 us (414.274116 ns/point)
"mflops" = 5 (n log2 n) / (t in microseconds) = 120.693034
Real:
SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
time for one fft: 129.722125 us (126.681763 ns/point)
"mflops" = 5/2 (n log2 n) / (t in microseconds) = 197.344902
PIII/700:
Complex:
SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
time for one fft: 610.025409 us (595.727939 ns/point)
"mflops" = 5 (n log2 n) / (t in microseconds) = 83.930930
Real:
SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
time for one fft: 187.183407 us (182.796296 ns/point)
"mflops" = 5/2 (n log2 n) / (t in microseconds) = 136.764259
--- 8< --- libDSP (radix-4): --- 8< ---
Athlon/1000:
224 us / 1024 point complex FFT (single precision)
139 us / 1024 point complex FFT (double precision)
41 us / 1024 point real FFT (single precision)
410 us / 8192 point real FFT (single precision)
PIII/866:
1623 us / 1024 point complex FFT (single precision)
167 us / 1024 point complex FFT (double precision)
57 us / 1024 point real FFT (single precision)
599 us / 8192 point real FFT (single precision)
- Jussi Laako
-- PGP key fingerprint: 161D 6FED 6A92 39E2 EB5B 39DD A4DE 63EB C216 1E4B Available at PGP keyservers
This archive was generated by hypermail 2b28 : Thu Nov 08 2001 - 01:35:47 EET