Re: [LAD] vectorization

From: Jens M Andreasen <jens.andreasen@email-addr-hidden>
Date: Mon May 05 2008 - 08:19:23 EEST

I believe your declaration looks something like this:

float // array of complex
   ffta[N][2] __attribute__ ((aligned(16))),
   fftb[N][2] __attribute__ ((aligned(16))),
   data[N][2] __attribute__ ((aligned(16)));

.. right?

If so, then I can get the auto-vectorizer in icc to kick in with this
construct of complex multiply add:

// CC=icc -O3 -msse3

void cmadd(void)
{
   float *A = (float*) ffta;
   float *B = (float*) fftb;
   float *D = (float*) data;
   int i;
   for (i = 0;i < N*2; i += 2)
   {
      D[i] += A[i] * B[i] - A[i+1] * B[i+1];
      D[i+1] += A[i] * B[i+1] + A[i+1] * B[i];
   }
}

No luck with gcc though :-/

/j

On Wed, 2008-04-23 at 09:59 +0200, Fons Adriaensen wrote:

> I tried out vectorizing the complex multipl-and-accumulate loop in
> zita-convolver. For long convolutions and certainly if you have
> convolution matrix the MAC operation dominates the FFT and IFFT
> ones.
>
> This requires a permutation of the complex arrays as used by
> FFTW after each FFT and before each IFFT. In each block of 4
> complex values
>
> x1 y1 x2 y2 x3 y3 x4 y4
>
> swap y1 with x3 and y2 with x4 to get
>
> x1 x3 x2 x4 y1 y3 y2 y4
>
> which can be handled by the vector operations.
>
> The results are very marginal, about 5% relative speed increase
> even in cases where the MAC operations largely outnumber any
> others. Bypassing the permutations to have an idea of their cost
> didn't change anything.
>
> I'm somewhat surprised by this...
>

-- 
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Mon May 5 12:15:01 2008

This archive was generated by hypermail 2.1.8 : Mon May 05 2008 - 12:15:02 EEST