[linux-audio-dev] Realtime convolution - a threading problem?

From: Sune Mai <sunemai@email-addr-hidden>
Date: Wed Apr 06 2005 - 19:34:28 EEST

Hello! I just joined the LAD-list to get some feedback on the following
topic:

Convolution is really nice - I use it all the time to get perfect reverb,
nice guitar amp-sound etc. The only serious drawback is the high
input-output latency of simple FFT based algorithms. I'd like to see if it
is possible to improve on this situation without sacrificing to much
efficiency.

If the impulse response (the FIR) is N samples long, the complexity of a
naive (non FFT) implementation scales as N, i.e. it uses N operations per
sample of input. A simple (single block overlap-add) scales as ln_2(N),
since FFT has complexity N ln_2(N).

If one wants both computational efficiency and low latency, one can do (at
least...?) two things:
1: Split the FIR in smaller blocks. This is fairly simple, but has a serious
drawback: As the latency, L, goes down, the efficiency of the algorithm
approaches the naive non-FFT implementation. So the computational cost
scales as N for "zero" (L<<N) latency.
2: Another approach is to split the FIR in blocks of different sizes, as can
be seen in the bottom figure of this page:
http://www.music.miami.edu/programs/mue/Research/jvandekieft/jvchapter2.htm
This approach has an obvious advantage in that it allows "zero" latency,
while still retaining some of the computational efficiency of the
high-latency FFT based method. As far as I can see, the complexity of this
algorithm scales as (ln_2(N))^2 for zero-latency. This is not as good as the
high latency efficiency of ln_2(N) - but still far better than using method
1. with low latency.

As far as I know, the programs I've come across so far (most notably
BruteFIR and Jack_convolve) uses method 1. to get low latency. This leads to
my first question:
Has anybody implemented method 2. in a jack application, or, more generally,
under GPL?. If not, are there any reasons that I'm not aware of, why one
should not choose algorithm 2 instead of 1?

The only difficulty I can see with method 2 is the scheduling of the
different sub tasks. One way to get smooth processor load, is to do the FFT
of the different block-sizes in different threads. This involves that the
computation of the smallest blocks (triggered by the jack callback
mechanism), should somehow preemt the thread doing the double sized blocks -
and this thread should in turn preemt the thread doing the quadruple size
FFT-blocks etc. According to the FFTW documentation the FFT algorithm is
thread-safe, and can be used in this way.

I don't have much experience with threads - and none of the resources I've
found seems to deal with this kind of realtime problem. Is the above a
stupid way to deal with the scheduling problem? And are there any resources
you know of, that describes this kind of problem in detail?

\Sune Mai

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee®
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
Received on Wed Apr 6 20:15:09 2005

This archive was generated by hypermail 2.1.8 : Wed Apr 06 2005 - 20:15:09 EEST