Re: [LAD] vectorization

From: Jens M Andreasen <jens.andreasen@email-addr-hidden>
Date: Wed Apr 16 2008 - 11:14:33 EEST

On Wed, 2008-04-16 at 09:19 +0200, Christian Schoenebeck wrote:

> But if you're totally sceptical, ...

Yes, it bothers me if the compiler can beat "hand-optimized" ASM :-D
It turns out that I get different results on consecutive runs of the
same binary, look:

Benchmarking mixdown (WITH coeff):
pure C++ : 380 ms
ASM SSE : 150 ms
GCC vector extensions : 140 ms <-- best

Benchmarking mixdown (WITH coeff):
pure C++ : 390 ms
ASM SSE : 140 ms <-- best
GCC vector extensions : 160 ms

This with slightly unrolled ASM to match the --funroll-loops option:

--8<-----------------------------------------
x86_sse_mix_buffers_with_gain --
...

.MBWG_SSE:

        cmp $8, %ecx #; we know it's not zero, but if it's not >=8, then
        jnge .MBWG_NONALIGN #; we jump straight to the "normal" code

        #; copy gain to fill %xmm1
        movss 20(%ebp), %xmm1
        shufps $0x00, %xmm1, %xmm1

.MBWG_SSELOOP:

        movaps (%esi), %xmm0 #; source => xmm0
        addl $16, %esi #; src+=4
        movaps (%esi), %xmm2 #; ++source => xmm2
        addl $16, %esi #; src+=4

        mulps %xmm1, %xmm0 #; apply gain to source
        mulps %xmm1, %xmm2 #; apply gain to source

        addps (%edi), %xmm0 #; mix with destination
        movaps %xmm0, (%edi) #; copy result to destination
        addl $16, %edi #; dst+=4

        addps (%edi), %xmm2 #; mix with destination
        movaps %xmm2, (%edi) #; copy result to destination
        addl $16, %edi #; dst+=4

        subl $8, %ecx #; nframes-=8

        cmp $8, %ecx
        jge .MBWG_SSELOOP

        cmp $0, %ecx
        je .MBWG_END

        #; if there are remaining frames, the nonalign code will do
nicely
        #; for the rest 1-7 frames.
...

_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Wed Apr 16 12:15:03 2008

This archive was generated by hypermail 2.1.8 : Wed Apr 16 2008 - 12:15:04 EEST