Re: [LAD] vectorization

From: Jens M Andreasen <jens.andreasen@email-addr-hidden>
Date: Tue May 06 2008 - 10:21:09 EEST

On Tue, 2008-05-06 at 00:24 +0200, Fons Adriaensen wrote:
> After each iteration, call an empty function, separately compiled,
> that takes all three vectors as arguments (and _not_ as const *
> of course). No more tricks. The overhead is peanuts compared
> to the calculation.

You mean like this:

  // defined in empty.c as return 0;
  extern int empty(void*a,void*b,void*d);

And then call it at the end of iteration:

      for (j = 0; j < n; ++j)
      {
         for (i = 0;i < N; ++i)
            cxD[i]+= cxA[i]*cxB[i];

         empty(&cxA,&cxB,&cxD);
      }
      fprintf (stderr,"> clock: %d ms %s\n",(clock()-clk)/1000,s);
      
Well, that certainly did level out everything. For n = 1000:

> clock: 64680 ms (_Complex)
> clock: 61990 ms (cvec_t)
> clock: 71060 ms (original float array[N][2])

This measures the terrible latency I have between main memory and cache.

Changing back N from (1024 * 1024) to
#define N 1024

.. and then increasing n to a million again (this should be safe now?) -
so we can pretend not to be limited by PC100 - yields with icc:

> clock: 16510 ms (_Complex)
> clock: 6090 ms (cvec_t)
> clock: 12800 ms (original float array[N][2])

.. and with gcc:

> clock: 13820 ms (_Complex)
> clock: 6330 ms (cvec_t)
> clock: 13420 ms (original float array[N][2])

Very even I would say.

_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Tue May 6 16:15:01 2008

This archive was generated by hypermail 2.1.8 : Tue May 06 2008 - 16:15:01 EEST