Re: [linux-audio-dev] lock-free data structures

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] lock-free data structures
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Fri Jun 18 2004 - 07:24:04 EEST


Joshua Haberman wrote:

>
> At the moment I am favoring an approach that uses multiple discrete
> buffers and passing pointers to them, rather than pushing data on a
> ring buffer. One major reason is that it allows you to send the data
> to multiple places. Say you want to send a captured buffer to both
> the disk thread (for recording) and the GUI thread (for displaying a
> VU meter). With ring buffers you would have to copy this data to two
> separate ring buffers, one for each thread that will read it. With
> multiple buffers, you can just send the same pointer to both threads.

Yes you would need to copy buffers but messing around with slow GUIs is
always a pain so in that case I'd choose to copy.
Why ?
Let me make an example.
Assume I want to show a FFT or an oscilloscope that displays incoming
audio data (or from a disk track).
As long as we display only one FFT/oscilloscope the overhead caused by
the memory copy is low.
Of course if we have to copy 100 stereo tracks you get about 17MB/sec
which can add a measurable overhead.
But the question is: did you try to open 100 FFT meters at the same time ?
I guess that would make the memory copy overhead look minuscule compared
to the graphics output.

> Also, I have a strategy in mind for gracefully degrading when the GUI
> thread can't pull data from the FIFO fast enough (again, for VU meters
> or some other kind of visualization). Instead of queuing up more and
> more unconsumed buffers, I will give the FIFO an upper limit, a number
> of buffers past which it will discard the oldest buffers. It's not
> immediately clear to me how you could implement this same strategy
> with a ring buffer.

In the case of FFT/oscilloscope GUIs (or any kind of GUI that displays
audio data) I'd do the following:

RingBuffer audio_ringbuffer;
RingBuffer gui_ringbuffer;

disk thread:
while(1) {
 read data into audio_ringbuffer
 check if there is enough space in the GUI ringbuffer which must be >= than
 the data just read.
 if yes then memcpy(data from audio ringbuffer to GUI ringbuffer)
 else don'rt write this block of data to the GUI ringbuffer
}

In that case you have one single memory copy for mirroring the data
you've put into the
audio ringbuffer to the GUI ringbuffer.
And in case the GUI ringbuffer gets completely filled (because the GUI
thread is unable to
process all the data it gets) you simply discard the current data block.
Visually it means the following: the GUI thread, as longs as the machine
is able to cope with the load
will show the FFT/oscilloscope smoothly and in case it cannot keep up,
it will simply leave out
some samples and will not show them, this will probably cause some jump
in the oscilloscope window
(less in FFT because you have to lowpass FFT movements anyway).

With lists of buffer pointers what will happen if the display thread
stalls for a long time ?
You have to keep track of many buffers which cannot be actively used by
the disk streaming thread / audio thread
so you waste lot of memory.
Put 10-20 of those FFT/oscilloscopes in the game and I think it will
become a bit of a pain to deal with all those lists
of buffers.

For example in my above ringbuffer case, I'd use rather small
ringbuffers for the GUI threads since GUIs are not
considered realtime objects so graceful degradation is a thing you have
to live with.

>
> Also, though I'm not sure yet if this matters for what I'm doing, your
> ring buffer can only handle one reader and one writer. The algorithms
> we are implementing can handle multiple readers and writers (though we
> are writing single-reader/single-writer versions also, since they are
> simpler and more efficient).

The question is: in what cases do we need multiple readers, multiple
writers lock-free FIFOs when dealing with audio apps ?
In both the case of HDR apps like ardour and disk based samplers like
LinuxSampler you have multiple tracks that are streamed
from/to disk and a single synchronous audio clock which causes to
read/write an audio buffer after each audio IRQ.
So even if you used a multi threaded approach where every disk track is
associated to a thread you would still need a single synchronization point
alias the audio thread which reads from (or writes to) the disk buffers.
Perhaps a multiple write / single reader FIFO would be ok in that case
but you would still need to deal with varispeed issues, and even if you
don't support
varispeed but use multiple disk threads you would need to "packetize"
data blocks and labeling them so that the audio thread knows what do do
with them
(eg what disk track the data packet belongs to, lenght etc).
This would certainly make the whole affair much more complex than the
approach LinuxSampler uses which associates a single writer/single
reader ringbuffer
to each disk track.
Currently we use only one disk thread but for example in the case of
multiple disks it make sense to fire up a thread for each physical disk
so that
you maximize concurrency of reads/disk seeks.
Addind that to our codebase would be quite easy since the streaming code
would basically remain identical, you'd only need to add more threads
and add a table that tells the audio thread to which disk thread issue
commands when starting certain audio files.
(eg thead1 handles /disk1/... , thread2 /disk2/ , so then the
sample(or audio file) /disk2/testfile needs to be triggered you would
need to send that command to
thread2).

I'm still interested about (even a hypothetical) a case where multiple
writers/single reader or multiple writers/multiple readers FIFOs would
work much better
than the single writer/single reader approach.

>
> One final thing: I'm not sure if this is a problem in practice, but
> most (if not all) processors do not guarantee that memory reads and
> writes are performed in order, as observed by other processors and by
> I/O devices. This could theoretically lead to a problem where the
> stores that write data to the ring buffer are not performed until
> *after* the store that updates your write pointer. Your reader thread
> could get old data rather than the data your just wrote. To guarantee
> that the loads and stores are observed in the required order, you have
> to use memory barriers (again, in theory). See:
>
> http://en.wikipedia.org/wiki/Memory_barrier
>
> (Ross wrote that and we're both still learning about this stuff, so
> don't take that article as the word of God).

Our RingBuffer simply updates a read_ptr and a write_ptr so the worst
case that can happen is that the read_ptr still points to the old
location even if
the writer already wrote the data. (assume a SMP system).
Is is not harmful because the reader will get that data during the next
cycle.
The only requiement is the atomicity of operations then reading the
read_ptr or writing the write_ptr but this is guaranteed by
the atomic*() macros plus on x86, PPC it translates to normal
load/stores anyway.
And as long as you encapsulate all the lock-free buffer accesses into
functions/classes even if future CPUs would require memory barriers
you could add them easily without breaking your codebase.

cheers,
Benno
http://www.linuxsampler.org


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Jun 18 2004 - 13:14:09 EEST