Re: [LAD] vectorization

From: Jussi Laako <jussi@email-addr-hidden>
Date: Wed Apr 23 2008 - 22:05:09 EEST

Fons Adriaensen wrote:
> I tried out vectorizing the complex multipl-and-accumulate loop in
> zita-convolver. For long convolutions and certainly if you have
>
> The results are very marginal, about 5% relative speed increase
> even in cases where the MAC operations largely outnumber any

For me, the complex MAC operation written for SSE3 practically doubled
the speed for double precision and more than doubled for single
precision, compared to "-march=i686 -O3 -ffast-math" case (the code has
to run practically on all x86 platforms).

Prior to SSE3, there was no nice way to do complex multiplication on
SSE. Now it can be done in three instructions for two single precision
complex numbers.

Still, one of the most elegant is E3DNow on AMD, it can do single
precision complex multiply in four instructions.

These instruction numbers are for the calculation itself, in addition it
of course needs the load and store operations, where SSE3 requires a few
extra instructions compared to E3DNow.

BR,

        - Jussi
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Thu Apr 24 00:15:01 2008

This archive was generated by hypermail 2.1.8 : Thu Apr 24 2008 - 00:15:02 EEST