linux-audio-dev: Re: [LAD] GCC Vector extensions

From: Robin Gareus <robin@email-addr-hidden>
Date: Mon Jul 25 2011 - 15:45:39 EEST

On 07/25/2011 12:04 PM, Maurizio De Cecco wrote:
> Short resume of my initial post: i found that using the gcc vector
> extensions induced a 2x slow down using gcc, and a 4x speed up in clang.
>
> I made more tests, isolating a small code example, on Mac OS and Ubuntu,
> and i found out the origin of the problem, even if i do not know what
> exactly happening.
>
> My original test used vectors of float of size 8; the gcc vector
> extension documentation says that if the vector size do not match the
> hardware vector size, the code is synthesized in some way.
>
> With a vector size of 8 i found the above results under Mac OS X, using
> clang and gcc4.2, and under Ubuntu 11.04, using clang and gcc4.5.2.
>
> When i move to a vector size of 4, things go better; clang slow down wrt
> the size of 8 of around 2x, and gcc obtains the same result; the
> interesting point is that gcc obtains essentially the same speed with
> and without vector extensions, meaning probably that the compiler is
> good enough in vectorizing the code, at least in the the test cases i used.
>
> I include the code, results and scripts to run the tests in a small zip
> if anybody want to make other tests; the test code compute an arbitrary
> vector computation (essentially 100 million multiply add), starting from
> a seed given as argument.
>
> The code is modeled around the way jmax compute, i.e. one vector
> operation at a time on vectors passed by pointers, and it is not
> designed to be the fastest possible code to implements this computation.
>
> Thanks for the help,
> Maurizio

Thanks for coming up with and sharing the tests!

I've just run them on a native i686 GNU/Linux system (1.66 GHz CoreDuo
32 bit vs your 2.7 GHz Core2Duo). The results are pretty much
consistent. They're all about 3x slower, except for gcc/size8 which is
only 1.8x slower and clang/size8 which is 6 times slower! - It might
have to do with 32 vs 64 bit but I don't have an explanation. would need
to look into asm output.

BTW. I'm pretty much impressed that some tests run faster on vmware
virtualization compared to native OSX. But yeah it's also no big
surprise either (and may further be related to ubuntu using a never
version of gcc).

So are you now considering use some #ifdef to select float/4 instead of
double/8 vectors in jMax or just change all of them?

ciao,
robin

PS.
#clang --version
clang version 1.1 (Debian 2.7-3)
Target: i386-pc-linux-gnu
Thread model: posix
#gcc --version
gcc (Debian 4.6.1-4) 4.6.1

_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/listinfo/linux-audio-dev

text/plain attachment: results-debian-IntelCoreDuo.1.66Ghz.txt

Received on Mon Jul 25 16:15:04 2011

This archive was generated by hypermail 2.1.8 : Mon Jul 25 2011 - 16:15:04 EEST