[linux-audio-dev] Arbitrary bufsizes in plugins requiring power of 2 bufsizes, Was: jack_convolve-0.0.10, libconvolve-0.0.3 released

From: Benno Senoner <sbenno@email-addr-hidden>
Date: Wed Jun 29 2005 - 14:20:31 EEST

My suggestion is to handle buffering in the convolution plugin and
accept any buffer size from the host.

I'd do it without threading to ensure the lowest possible latency.

For example:

assume we run convolution at 512 samples.

use a ringbuffer structure (eg like RingBuffer.h in LinuxSampler).
http://cvs.linuxsampler.org/cgi-bin/viewcvs.cgi/*checkout*/linuxsampler/src/common/RingBuffer.h?rev=1.6&content-type=text/plain

(the process() in the example below is just a pseudo plugin API , input
and output is mono)

process(float *input,float *output, int numframes) {

  if(numframes == 512) {
    convolve(input, output, 512);
    return;
  }

  ringbuffer->write(input, numframes);

  while(numframes >0) {
    if(ringbuffer->read_space() >= 512) {
      ringbuffer->read(temp_buf, 512);
      convolve(temp_buf, output, 512); // does convolution and writes
to the output array
      numframes -= 512;
    }
    else {
      write_silence(output,512);
      numframes -= 512;
    }
  }

}

This approach has the advantage that if the host supplies the
convolution plugin with 512 frames then
the added latency due to buffering is zero since
if(numframes == 512) then it calls convolve() and returns without
messing with ringbuffers.
 
Otherwise, with the above approach both the number of frames used in the
convolver and number of frames supplied by the host can
be completely arbitrary.

Drawbacks of the approach:
especially on high CPU usage plugins (and convolution IS cpu hungry,
especially at low buffer sizes), since the
host will run the process() callbacks in RT mode, CPU spikes could
introduce xruns and other bad stuff.
Assume a 512 frames convolution will take 80% of the CPU on a certain
machine.
At 44.1kHz 512 frames = 11msec. 80% of 11msec =9.2msec
If we run the above code in a host enviroment that uses eg 256 frames
(5.5msec buffers),
the first time process() is called the >=512 condition is not satisfied
and thus a 0 filled buffer is returned (silence).
At the second process() call, the >=512 condition is satisfied (there
are exactly 512 frames in the buffer).
And the convolve() function is called, eating 9.2msec of CPU.
Since 9.2msec > 5.5msec ... sh*t happens ... XRUN.

If numframes supplied by the host is bigger than 512 then there are no
CPU spike problems.
For example if the host supplies 1024 frames, the above code would call
convolve() 2 times outputting 1024 frames. (eating 2x9.2msec out of the
22msec available)
It would be a bit inefficient because if the plugin knows that the host
supplies at least 1024 frames
then you could run the convolution at 1024 achieving greater efficiency.

If the host guarantees that it always supplies the same number of frames
then the convolver could adjust
it's internal framesize to to achieve optimal CPU usage.

If not then a scheme like the above one is unavoidable.

Just for curiousity, does anyone know that's the current status of the
variable/fixed buffer sizes scenarios
supplied to plugins by hosts on various plugin platforms like VST, AU etc ?

The above code does some memory bouncing (only when numframes supplied
by the host does not match
the number of frames used in the convolver):
 it first copies the input to the ringbuffer's own buffer and then back
to temp_buf. So some memory bandwidth is wasted but I think as long as
you don't run hundreds of convolution plugins
(impossible on today's machines) the added overhead is negligible since
convolution is so CPU heavy.

I think with an approach like the above you achieve the best of both
worlds, no added latency if the host calls
the plugin with numframes = power of 2 (matching the internal
convolver's buffer size), and some added latency
if the host does not use powers of 2.

Regarding the CPU spikes, if the convolver uses less than 50% of CPU
then you can run the host with the half
convolver's numframes without getting XRUNs.
eg if the convolver uses 40% CPU at 512 frames then running it in a host
with 256 frames then the convolver will
still use an average of 40% CPU but it will experience 80% CPU spikes.
(eg 80% 0% 80% 0% etc ...).
This is not so good because if we want to add an other plugin that has a
constant CPU usage of 40%
which would lead to an average 80% CPU usage we can't because during
the 80% CPU spike we have
only 20% of CPU headroom left.

Florian, since we would like to add convolution to LinuxSampler over
time it would be cool if you could add the above
ideas to libconvolve so that one can use the lib without worrying about
supplying the right buffer sizes etc, and
in plugin hosts enviroments it would be handy too since we don't always
know what the host will do.

cheers,
Benno
http://www.linuxsampler.org

Florian Schmidt wrote

>
>Or should the plugin do this internally and simply report to the host
>that it needs a fixed buffer size (which then corresponds to the audio
>system's buffer size).. Are dssi/ladspa's allowed to do threading?
>Without i wouldn't know how to do it. And even if it were allowed to do
>threading, how would the dssi know which priorities to use, etc (on a RP
>kernel it should have prio higher than i.e. hd and net irq's, but lower
>than the jack audio thread).
>
>Plus i wonder whether the (then fixed) buffer size should be user
>configurable in any way or would the plugin simply report "16k frames is
>what i want" :) Sometimes it does make sense to use it in realtime mode
>(with the same buffer size as the audio system), if you have the cpu
>power or the responses are short enough.
>
>Regards,
>Flo
>
>
>
Received on Thu Jul 7 16:16:09 2005

This archive was generated by hypermail 2.1.8 : Thu Jul 07 2005 - 16:16:10 EEST