Re: [LAD] vectorization

From: Jens M Andreasen <jens.andreasen@email-addr-hidden>
Date: Wed May 07 2008 - 07:00:38 EEST

On Wed, 2008-05-07 at 01:45 +0300, Jussi Laako wrote:
> Fons Adriaensen wrote:
> > Which will determine performance for every algorithm that
> >
> > - is working on a data set that is larger than the cache,
> > - does not produce multiple results from the same inputs.
-<snip>-
> There are several use cases where the data set is rather small and is
> used in several subsequent loops, thus cache can help.
>
> After profiling, I've identified number of algorithms which
> significantly benefit from handwritten vectorized asm.
>

One thing that I wonder is what the pattern of addition in Fons's
application really looks like. I assume the fftA * fftB is some windowed
precalculated impulsresponse and a signal? The addition/accumulate
suggests that the output fftD has been touched before, implying that
there could be more variables to work on at once or that the vectors
would still be in the cache if the order of addition was changed.

/j

PS: Your fastest calculation is when the data floods the cache:
N=(1024*1024), n=1000, gcc, clock: 8410 ms (_Complex). Is that a typo?

_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Wed May 7 08:15:02 2008

This archive was generated by hypermail 2.1.8 : Wed May 07 2008 - 08:15:02 EEST