On Mon, May 11, 2009 at 9:58 AM, Jens M Andreasen
<jens.andreasen@email-addr-hidden> wrote:
>
> On Mon, 2009-05-11 at 08:53 -0400, Paul Davis wrote:
>
>> However, notice that far more important from a performance perspective
>> is that power-of-2 buffer sizes permit buffers to be cache line
>> aligned, which as far as we know (its never been carefully measured)
>> greatly outweighs the kinds of concerns you are mentioning.
>
> a cache aligned buffer of 96 would easily fit in the same space as one
> of 128, only with lower latency.
I think you're missing the forest for the trees.
1) the question is now how to fit a single set of N samples into cache
memory. Its how to fit *all* the samples to be processed in a given
"cycle" into cache memory. Wasting 25% of cache memory for each buffer
isn't conducive to this.
2) CUDA and the rest are very, very powerful processing engines. But
they continue (for now) to be targetted towards tasks where
parallelism and not low latency is considered most important. You
simply can't get the data to and from the GPU fast enough for use in,
for example, live FX. This may change, and when it does it will be
quite an interesting development. But it has not changed yet, and I've
yet to see Nvidia or Intel talk about this goal in any way. They have
a *huge* number of use cases that can benefit from using a GPU already
(and the potential is really quite substantial, i think) - but there
is no incentive on their part to focus on a very niche case that wants
the processing power *and* wants very low latency.
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Mon May 11 20:15:02 2009
This archive was generated by hypermail 2.1.8 : Mon May 11 2009 - 20:15:02 EEST