Re: [LAD] vectorization

From: Jussi Laako <jussi@email-addr-hidden>
Date: Wed May 07 2008 - 09:17:29 EEST

Jens M Andreasen wrote:
> PS: Your fastest calculation is when the data floods the cache:
> N=(1024*1024), n=1000, gcc, clock: 8410 ms (_Complex). Is that a typo?

Nope, that's the actual result, I just verified the settings, recompiled
and re-run, and it's still:
> clock: 8390 ms (_Complex)
> clock: 9310 ms (cvec_t)
> clock: 8480 ms (original float array[N][2])
> clock: 10550 ms (asm on float array)

Fast memory bus + prefetch is a really good thing...

I also have vectorized float array copy and it's significantly faster
than memcpy(). While memcpy() stays under 1 GB/s, vectorized version can
reach around 90% of the theoretical memory speed for large copies.

        - Jussi
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Wed May 7 12:15:02 2008

This archive was generated by hypermail 2.1.8 : Wed May 07 2008 - 12:15:02 EEST