Re: [linux-audio-dev] intel signal processing lib

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] intel signal processing lib
From: Jussi Laako (jussi.laako_AT_kolumbus.fi)
Date: Thu Nov 08 2001 - 01:39:08 EET


Steve Harris wrote:
>
> I looked at it, but I thought it was closed source and I didn't think
> people would be up for that. I'l give it another look though.
>
> FFTW is known not to be that fast, but it is cross platform. Its easy to
> beat with SSE instructions on a PIII.

Athlon/700 in these tests is the old Slot-A model with 512 kB cache.

PIII is piece of crap because it has lots of cache misses in strange places
thus creating strange results. This is because PII/PIII cache doesn't handle
floats properly (see some hardware test sites for details).

OK, here are some numbers from my tests:

--- 8< --- Using Intel's SPL (MSVC6/NT4): --- 8<---

Athlon 700:
71 us / 1024 point complex FFT (single precision)
83 us / 1024 point complex FFT (double precision)
39 us / 1024 point real FFT (single precision)
419 us / 8192 point real FFT (single precision)

K6 233:
1587 us / 1024 point complex FFT (single precision)
1659 us / 1024 point complex FFT (double precision)
840 us / 1024 point real FFT (single precision)
9595 us / 8192 point real FFT (single precision)

Pentium III 700:
732 us / 1024 point complex FFT (single precision)
6164 us / 1024 point complex FFT (double precision)
863 us / 1024 point real FFT (single precision)
13587 us / 8192 point real FFT (single precision)

Celeron 533:
8163 us / 1024 point complex FFT (single precision)
8437 us / 1024 point complex FFT (double precision)
4322 us / 1024 point real FFT (single precision)
47265 us / 8192 point real FFT (single precision)

Pentium II 350:
12521 us / 1024 point complex FFT (single precision)
12933 us / 1024 point complex FFT (double precision)
6624 us / 1024 point real FFT (single precision)
72505 us / 8192 point real FFT (single precision)

--- 8< --- FFTW: --- 8< ---

Athlon 700:
Complex:
  SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
  time for one fft: 424.216695 us (414.274116 ns/point)
  "mflops" = 5 (n log2 n) / (t in microseconds) = 120.693034
Real:
  SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
  time for one fft: 129.722125 us (126.681763 ns/point)
  "mflops" = 5/2 (n log2 n) / (t in microseconds) = 197.344902

PIII/700:
Complex:
  SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
  time for one fft: 610.025409 us (595.727939 ns/point)
  "mflops" = 5 (n log2 n) / (t in microseconds) = 83.930930
Real:
  SPEED TEST: n = 1024, FFTW_FORWARD, out of place, specific
  time for one fft: 187.183407 us (182.796296 ns/point)
  "mflops" = 5/2 (n log2 n) / (t in microseconds) = 136.764259

--- 8< --- libDSP (radix-4): --- 8< ---

Athlon/1000:
224 us / 1024 point complex FFT (single precision)
139 us / 1024 point complex FFT (double precision)
41 us / 1024 point real FFT (single precision)
410 us / 8192 point real FFT (single precision)

PIII/866:
1623 us / 1024 point complex FFT (single precision)
167 us / 1024 point complex FFT (double precision)
57 us / 1024 point real FFT (single precision)
599 us / 8192 point real FFT (single precision)

 - Jussi Laako

-- 
PGP key fingerprint: 161D 6FED 6A92 39E2 EB5B  39DD A4DE 63EB C216 1E4B
Available at PGP keyservers


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Thu Nov 08 2001 - 01:35:47 EET