Re: [linux-audio-dev] optimization / restrictions

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] optimization / restrictions
From: D. Stimits (stimits_AT_idcomm.com)
Date: Fri May 18 2001 - 10:21:29 EEST


n++k wrote:
>
> [Steve Harris <S.W.Harris_AT_ecs.soton.ac.uk>]
> |
> | Known nominal 0db value (e.g. 1.0f)
> | Minimum sample rate (e.g. 44.1k's)
> | Guaranteed 2^n block size
>
> Somewhere (in a diskmag) I read Tammo Hinrichs (writing about
> his softsynth system)
>
> "A bad idea, however, is to make your buffer sizes a power of two.
> The times when ASM coders used AND operations to mask out the buffer
> offsets
> are over. Those one or two cycles for a compare operation don't
> hurt. So, there's no reason for using power-of-two bffer sizes
> except you may be used to it. And in fact, it's even better if
> you don't. I won't go into too much detail here, but if you know
> how a cache tag RAM works, you might realize that the CPU can
> manage the cache better if the buffers start at "weird" addresses,
> especially if you use multiple buffers at a time (eg. in the
> same loop). Just make the buffer adderesses a multiple of 32,
> don't make their sizes a power of two (or leave some space
> between the buffers, even one dword is enough) and you're set."
>
> I personally wondered what he meant here about the cache operation
> of the CPU, anybody knows more about that and/or care to comment?
>
> --
> n
> ++k

Cache lines are 4 times the data width in the x86 cpu's, ending up as 32
bytes (something I recently discovered can make an enormous advantage in
speed when used right). Invalidation of any place in a line is
invalidation of the whole line, so if you mix data such that two values
are stored in the same 32 bytes, then cache for both are invalidated if
either of the two are. Making it exact multiples of 32 where it starts
at a modulo of 32 means your data and no other data will occupy whole
lines (only a change or your data can then invalidate the cache line). I
suspect he didn't mean that it is bad to use a power of two, just that a
power of two other than multiples of 32 leave you open to other data
invalidating your cache lines. For SMP it gets worse, since all the
cpu's try to snoop and keep synchronized when they share data in their
cache. Actual cache line size or cache block size will be different on
other architectures than x86.

D. Stimits, stimits_AT_idcomm.com


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri May 18 2001 - 10:48:53 EEST