Re: [linux-audio-dev] getting paranoid: speed diff of using & instead of % in ALSA

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] getting paranoid: speed diff of using & instead of % in ALSA
From: David Olofson (david_AT_gardena.net)
Date: la maalis 11 2000 - 21:15:11 EST


On Fri, 10 Mar 2000, Benno Senoner wrote:
> Hi folks,
>
> I measured how many u += a % b operations my Celeron 366
> is able to perform in one sec.
>
> (an example of the % operation would be to use it in boundary clipping
> code)
>
> about 9million ops/sec , disappointing,
> ( it uses the idivl operation).

Ouch... One of the slowest instructions in the set on most
architectures. :-( Whenever possible, use multiplications instead,
since that's a lot faster. (It's easier to implement in hardware,
and is done by a partial or full matrix in most current processors.)

> when I use & (and) instead of % , the speed is boosted
> to 150million ops/sec ! , almost factor 16 !
>
> But to use & instead of % you need that in the
> a % b operation b is a power of 2.

You *can* use this good old trick together with a multiplication...
:-)

To get back to the old range, you need to do another multiplication,
but as the div is so incredibly slow, this may *still* be faster on
some CPUs. I'm not sure about the late Intel cores...

As for the optimization; when done with constant values, most
compilers optimize this nicely. If variables are involved, you still
have to convert the fragment size into a bit mask. I've seen some
kernel drivers that do this, but as you say it's not really worth
the effort...

> PS: getting too paranoid sometimes , eh ? :-)

You can never be too paranoid with drivers and system code in
general purpose OSes... When looking for this kind of things, you may
find performance killers that no one ever thought of, and that *will*
eventually kill the performance of a real life application. This is
little code with many users, so careful optimization pays.

Just look at the Linux kernel - the lowlatency patches wouldn't have
been remotely as effective if the kernel wasn't very well optimized
for speed. Kernels with "sloppy" code *have* to be preemptive to get
the same kind of performance - but then they just get even slower
in the average case...

//David

.- M u C o S --------------------------------. .- David Olofson ------.
| A Free/Open Multimedia | | Audio Hacker |
| Plugin and Integration Standard | | Linux Advocate |
`------------> http://www.linuxdj.com/mucos -' | Open Source Advocate |
.- A u d i a l i t y ------------------------. | Singer |
| Rock Solid Low Latency Signal Processing | | Songwriter |
`---> http://www.angelfire.com/or/audiality -' `-> david_AT_linuxdj.com -'


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : su maalis 12 2000 - 09:14:06 EST