[linux-audio-dev] real-life 3dnow! results

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] real-life 3dnow! results
From: est_AT_hyperreal.org
Date: la elo    28 1999 - 00:40:55 EDT


How well does 3dnow measure up to its theoretical speedups?

Here's a real-life measure of application impact.

I took one of the loops I use in oolaboola to convert float data to
little-endian 16-bit data and wrote a 3dnow function to do the same.
Neither of the loops do clipping.

The little-endian-specific C++ loop (compiled with g++ -O2
-funroll-loops -ffast-math) was taking 12-13.5% of some profiling runs
I was doing on my k6-2/350. The 3dnow version takes 1.12-1.9%.

This may be a commentary on gcc's floating point abilities. However,
gcc is a tool most of us use. I think it may also be a commentary on
how horrible the regular x86 fp architecture is.

The fact the 3dnow is fun enough to code that this lisper wrote x86
assembly for the first time in his life this month is also an
important datum. :)

The C++ loop is as follows:

  for (size_t i = 0; i < nfloats; i++)
    *out16++ = static_cast<int16_t>(*in++ * 32766);

I've appended the 3dnow routine. It took under half-an-hour to get
into its present shape (I'd put down the 3dnow for a couple of weeks
:)

Note that the routine hasn't had any loop-interleaving or instruction
scheduling done on it. That would probably speed it up. Also note
that the same approach will work well for conversion to 24 and 32 bit
quantities.

Clipping should be easy to add since 3dnow has parallel max and min
operations. :)

I'd like experienced x86 asm hackers to help me with two things: 1)
Improve the existing code and 2) Provide me with a pure x86 fp
equivalent to test it against.

Once I find/write an autoconf test for 3dnow, things like this could
perhaps go into Erik's library. :)

Digital filters now take over half oola's cycles and seem like a
worthy target for the future.

Eric

.bss
.data
        .align 4
fint16: .single 32766,32766

.text
        .align 32
# this converts 4xn floats (in range -1.0..1.0) to le16 values
# extern "C" convert2_3dnow(const float *in, size_t n, int16_t *out);
.globl convert2_3dnow
convert2_3dnow:
        pushl %ebp
        pushl %eax
        pushl %ecx
        pushl %ebx

        movl 20(%esp),%eax
        movl 24(%esp),%ecx
        movl 28(%esp),%ebp

        # grab some scratch space
        subl $8,%esp

        femms

        movq fint16,%mm2
myloop:
        movq (%eax),%mm0
        pfmul %mm2,%mm0
        pf2id %mm0,%mm1
        movq %mm1,(%esp)
        movw (%esp),%ebx
        movw %ebx,(%ebp)
        movw 4(%esp),%ebx
        movw %ebx,2(%ebp)
        addl $4,%ebp
        addl $8,%eax

        # for now, just do this once more
        # later, interleave and schedule
        movq (%eax),%mm0
        pfmul %mm2,%mm0
        pf2id %mm0,%mm1
        movq %mm1,(%esp)
        movw (%esp),%ebx
        movw %ebx,(%ebp)
        movw 4(%esp),%ebx
        movw %ebx,2(%ebp)
        addl $4,%ebp
        addl $8,%eax

        loop myloop

        femms

        addl $8,%esp

        popl %ebx
        popl %ecx
        popl %eax
        popl %ebp

        ret


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:25:53 EST