Re: [LAU] optimizing jackd build

From: Mike Taht <mike.taht@email-addr-hidden>
Date: Mon Apr 09 2007 - 18:26:10 EEST

On 4/9/07, Tim Blechmann <tim@email-addr-hidden> wrote:
>
> > Hand written assembler is still many orders faster than what gcc is
> > capable of doing. In Ardour peak computation (for both metering and
> > waveform displaying) is written in SSE (the first part in pure assembly,
> > the second in a C-level abstraction which is almost 1:1 assembly). Both
> > functions are more than 20x faster in raw performance than what gcc 4.1
> > can do.
>
> btw, is there a reason, why ardour is using assembler code instead of
> compiler intrinsics?

Two issues - one of the core concepts of jack et al is the idea of a run
time defined samples/period. The compiler has no idea that a typical routine
is always called with some multiple of 64 samples and can't unroll well.

Secondly - the compiler intrinsics for SSE1,2,3,4 basically suck. You can,
fairly effectively, use the _mm_whatever abstractions, but as soon as you
get into type casting you get into a world of hurt and the compiler
generates very inefficient code.

beside that, if ardour is using a fixed block size, using compile-time

Would be nice, but not enough hardware can run at low samples/period and
there are always situations where you want to run at more.

loop unrolling would be another point, where one could gain speed (iirc,
> the micro-benchmarks i did for pnpd/nova indicated an additional
> performance boost around 40%) ...

Consistently memory aligning things is an issue on x86.

Since the compiler can't figure it out (and it would be nice if there was
some compiler intrinsic that said "this routine is nearly always called with
some multiple of 32 bytes) the hand unrolled routines (more every day)
basically have to:
normally loop until you have alignment (hopefully just a test and branch)
on some arches, doing loops in 64 byte quantities is a bigger win than 16,
so loop with 16 byte quantities until you can do 64
then do 64 byte quantities for a while
then back to 16
then back to 4

It's a pretty easy pattern once you get used to it, but it pays to oprofile
first, have the best algorithm second, then... SSE like crazy. :)

tim
>
> --
> tim@email-addr-hidden ICQ: 96771783
> http://tim.klingt.org
>
> After one look at this planet any visitor from outer space would say
> "I want to see the manager."
> William S. Burroughs
>
> _______________________________________________
> Linux-audio-user mailing list
> Linux-audio-user@email-addr-hidden
> http://lists.linuxaudio.org/mailman/listinfo.cgi/linux-audio-user
>
>
>

-- 
Mike Taht
PostCards From the Bleeding Edge
http://the-edge.blogspot.com

_______________________________________________
Linux-audio-user mailing list
Linux-audio-user@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo.cgi/linux-audio-user
Received on Mon Apr 9 20:15:12 2007

This archive was generated by hypermail 2.1.8 : Mon Apr 09 2007 - 20:15:12 EEST