Re: [linux-audio-dev] Realtime convolution - a threading problem?

From: Florian Schmidt <mista.tapas@email-addr-hidden>
Date: Thu Apr 07 2005 - 02:33:01 EEST

On Wed, 06 Apr 2005 18:34:28 +0200
"Sune Mai" <sunemai@email-addr-hidden> wrote:

[snip

> 2: Another approach is to split the FIR in blocks of different sizes, as can
> be seen in the bottom figure of this page:
> http://www.music.miami.edu/programs/mue/Research/jvandekieft/jvchapter2.htm
> This approach has an obvious advantage in that it allows "zero" latency,
> while still retaining some of the computational efficiency of the
> high-latency FFT based method. As far as I can see, the complexity of this
> algorithm scales as (ln_2(N))^2 for zero-latency. This is not as good as the
> high latency efficiency of ln_2(N) - but still far better than using method
> 1. with low latency.
>
> As far as I know, the programs I've come across so far (most notably
> BruteFIR and Jack_convolve) uses method 1. to get low latency. This leads to
> my first question:
> Has anybody implemented method 2. in a jack application, or, more generally,
> under GPL?. If not, are there any reasons that I'm not aware of, why one
> should not choose algorithm 2 instead of 1?

I _heard_ that in the US the algorithm 2 is patented. I haven't really
looked into it, since my time is limited ATM. If it is not, i might have
a shot at it at some unspecified time in the future.

> The only difficulty I can see with method 2 is the scheduling of the
> different sub tasks. One way to get smooth processor load, is to do the FFT
> of the different block-sizes in different threads. This involves that the
> computation of the smallest blocks (triggered by the jack callback
> mechanism), should somehow preemt the thread doing the double sized blocks -
> and this thread should in turn preemt the thread doing the quadruple size
> FFT-blocks etc. According to the FFTW documentation the FFT algorithm is
> thread-safe, and can be used in this way.

Since the convolution is simply multiplication in the frequency domain,
it should be possible to break up the computation for the bigger chunks
into smaller pieces (except for the FFT (and IFFT) of the input and
the result). This would eliminate the need for using threads. But since
the FFT's aren't the bottlenecks in the partitioned convolution (the
memory access is), this might still work. I'm not sure though that
there's much to gain though (maybe i haven't quite grokked Algorithm 2
yet. will need to do some more thinking)

Other possibly worthy ways of reducing the cpu load would be different
ways of "tail extensions". The two techniques i can see there are

- Using convolution only for the "early reflections" and using classic
dsp based reverb tails (i don't know much about how these work though)..

- Making use of the fact the tail is typically not having much variation
except for exponentially dropping volume. Thus one could use a
relatively short (and relatively early) chunk of the tail over and over
just fitted to the right volume with a gain factor.. This would not
reduce the number of multiplications, but it would help with memory
access..

I'm not sure, if the latter method can be automated and hidden behind a
simple UI (tough to get any simpler than jack_convolve's current
Interface :)) or if it would require heavy parameter tweaking by the end
users..

Flo

-- 
Palimm Palimm!
http://affenbande.org/~tapas/
Received on Thu Apr 7 04:15:09 2005

This archive was generated by hypermail 2.1.8 : Thu Apr 07 2005 - 04:15:10 EEST