Re: [LAD] vectorization

From: Jens M Andreasen <jens.andreasen@email-addr-hidden>
Date: Wed Apr 16 2008 - 03:10:20 EEST

On Tue, 2008-04-15 at 19:45 +0200, Christian Schoenebeck wrote:
> Yeah, I'm respawning this topic ...
>

There is something funny with this benchmark. If we compare your
numbers:

> Benchmarking mixdown (WITH coeff):
> pure C++ : 890 ms
> ASM SSE : 300 ms
> GCC vector extensions : 230 ms
>

.. to mine (on a 1.1G Celeron):

  Benchmarking mixdown (WITH coeff):
  pure C++ : 390 ms
  ASM SSE : 170 ms
  GCC vector extensions : 140 ms

.. there is definately a similar pattern showing up, BUT the loops
appear to interfere with each other as you can see when I comment out
everything but ASM:

  Benchmarking mixdown (WITH coeff):
  ASM SSE : 160 ms <-- faster?

.. or leave in C++ as well:

  Benchmarking mixdown (WITH coeff):
  pure C++ : 400 ms <-- slower?
  ASM SSE : 170 ms

.. or take out only the ASM:

Benchmarking mixdown (WITH coeff):
pure C++ : 380 ms <-- faster?
GCC vector extensions : 160 ms <-- slower?

Me thinks it is very difficult to predict what -O3 will or will not do.

mvh // Jens M Andreasen
g++ (GCC) 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)

BTW: I slightly modified the order in x86_sse_mix_buffers_with_gain for
speed:

.MBWG_SSELOOP:

        movaps (%esi), %xmm0 #; source => xmm0
        addl $16, %esi #; src+=4 //////////////
        mulps %xmm1, %xmm0 #; apply gain to source
        addps (%edi), %xmm0 #; mix with destination
        movaps %xmm0, (%edi) #; copy result to destination

        subl $4, %ecx #; nframes-=4
        addl $16, %edi #; dst+=4
        
        cmp $4, %ecx
        jge .MBWG_SSELOOP
[...]

_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Wed Apr 16 04:15:02 2008

This archive was generated by hypermail 2.1.8 : Wed Apr 16 2008 - 04:15:02 EEST