Re: [LAD] FLTK vs GTKmm

From: Jens M Andreasen <jens.andreasen@email-addr-hidden>
Date: Tue Aug 11 2009 - 21:56:56 EEST

On Tue, 2009-08-11 at 19:54 +0200, Fons Adriaensen wrote:
> On Tue, Aug 11, 2009 at 06:50:50PM +0200, Jens M Andreasen wrote:
>
> > That would be four warps
> > independently working their way through the variously sized sample
> > blocks, each thread execting serial code that looks very much the same
> > as jconv itself, including the threading.
>
> Note that the algorithm implemented by libzita-convolver (used by
> jconv) when used in real-time mode relies on regular scheduling
> (i.e. being called from a Jack process callback) and carefully
> set thread priorities.
>

The priorities are always even .. and then again not nescessarily.

Say warp A (or "process" A) must do four smaller workloads while warp B
is doing one bigger workload? The way to go would then be for warp B to
call __syncthreads() when 25% of its work is done, thus assuring that
warp A will be given all of GPU untill it has catched up at the end of
its first workload and also calls __synthreads(), which gives warp B the
green light to continue. This under the assumption that warp A hasn't
already done it's part and is waiting for B to catch up.

Repeat the procedure at 50% and 75%.

> How to structure a convolution engine to run on a graphics
> processor would very much depend on where you want the I/O.

Locally on the card for use by other parts of the complex, unless by
routing directive read or written to those arrays that are transferred
back and forth between the GPU and host at each kernel launch.

> > How much jconv would something like a 300Mhz Pentium Pro buy me? (Just
> > to get a hunch if this would be a possibility at all)
>
> Almost impossible to tell without trying. It also depends
> in a very complex way of the configuration - the ratios
> will not be the same on all machines.
>

I found a measure of ~1 sec for a 128K FFT on a PPro @200
Would that be helpful for a guesstimate?

The thing is also that, although the first thing one might come to think
of is a nice convolution reverb with a decay of two seconds, having
instead 32 shorter impulses - all different - opens up another universe.
You could have an increasing delay in front of each of them, giving an
illusion that they are all parts of the same (huge) impulse redponse, or
you could use keyboard triggers and routing to play them like an
instrument.

Still, 500ms would be really very useful and 32 convolutions is mmm ..
perhaps a little overkill. There might be ways for two or four threads
to share one load. IIRC library routines for SSE enabled FFT exists
which could be more or less copied verbatim across four adjacent
threads.

_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@email-addr-hidden
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
Received on Wed Aug 12 00:15:05 2009

This archive was generated by hypermail 2.1.8 : Wed Aug 12 2009 - 00:15:05 EEST