Re: [linux-audio-dev] lock-free data structures

New Message	Reply	About this list	Date view	Thread view	Subject view	Author view	Other groups

Subject: Re: [linux-audio-dev] lock-free data structures
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Thu Jun 17 2004 - 20:50:42 EEST

Next message: Paul Davis: "Re: [linux-audio-dev] Multithreaded programming for a poll model?"
Previous message: Fons Adriaensen: "Re: [linux-audio-dev] Multithreaded programming for a poll model?"
In reply to: Paul Davis: "Re: [linux-audio-dev] lock-free data structures"
Next in thread: Tim Goetze: "Re: [linux-audio-dev] lock-free data structures"
Reply: Benno Senoner: "Re: [linux-audio-dev] lock-free data structures"

Paul Davis wrote:

>>One thing I am still looking to learn more about is how to adjust
>>thread priorities and such to make sure that your threads are run often
>>enough (especially the disk thread), and how to decide how big your
>>disk buffers need to be.
>>
>>
>
>4 years ago, Benno and I measured this and concluded that under some
>circumstances it was possible to have a small-multi-second delay in
>disk access. Ardour uses 5 second disk buffers. With buffers of this
>size, the scheduling priority of the disk thread is not really relevant.
>
>We also determined that 256kB seemed to be the optimal i/o block size
>for ext2. Whether this is true for ext3/reiserfs/xfs and others i do
>not know.
>
>
I recently made more tests (I'll release benchmarking code, when it's
cleaned up a bit) and it seems that on decently fast disks,
like the 7200rpm IDE disks (my Maxtor 80GB IDE disk does up to 40MB/sec
sustained), if there is a lot of seeking
(which HDR apps like ardour and disk based sampling apps like
LinuxSampler do), you need to read more,
like 512KB - 1MB at time, because the disk seek time is relatively high
(12-14msec) vs the transfer rate (40MB/sec).
This means if the buffers are too small then you have lots of disk seeks
between reads and you loose performance

my benchmark does the following: creates 4 files of 750MB each (3GB total),
then seeks around randomly in all 4 files simultaneously and reads 2
bytes after each seek (just to ensure that the
disk scheduling algorithm will not fool us). Of course there is still
the file cache that could inflate numbers but if the
file sizes are big enough (so that they don't fit in RAM) and provided
you do thousands of seeks you get quite realistic
values that reflect those achieved by a HDR , disk sampling app:
for example these are the numbers of my Maxtor IDE 40GB 7200rpm

seeks/sec=75.5 average seek time=13.2 msec

in the streaming test I read a chunk of X KB from each of the 4 files
while(1) { file1.read() ; file2.read() ; file3.read() ; file4.read(); }

this causes the disk head seeking beetween the files after each read,
just like a HDR app does
when reading tracks. these are the numbers I get (read speed).

streaming with 128 KB buffers ....
performance: 8.50 MB/sec stereo voices at 44.1kHz = 50.53
required memory for buffering: 6.32 MB

streaming with 256 KB buffers ....
performance: 14.43 MB/sec stereo voices at 44.1kHz = 85.79
required memory for buffering: 21.45 MB

streaming with 512 KB buffers ....
performance: 21.10 MB/sec stereo voices at 44.1kHz = 125.41
required memory for buffering: 62.71 MB

streaming with 1024 KB buffers ....
performance: 28.49 MB/sec stereo voices at 44.1kHz = 169.38
required memory for buffering: 169.38 MB

streaming with 2048 KB buffers ....
performance: 32.88 MB/sec stereo voices at 44.1kHz = 195.44
required memory for buffering: 390.87 MB

As you can see the performance increase between 512KB and 1MB is still very
big , around 35% , so reading 256KB at time is definitively too little
these days.

Paul, does ardour allow to specify the size of the per-track-buffers you
use for
disk streaming, if yes perhaps you should add this as an option since
it's handy for the user
having the possibility to increase the default values to achieve optimal
track count.
For example using large RAID arrays, the difference between seek time
and raw disk tranfer speed
gets even bigger so even bigger buffers are needed to achieve the max
track count.
(keep in mind I'm not familiar with the ardour codebase nor with
advanced settings so
my question might be redundant in case ardour already supports it)

Joshua:
if you want an easy to use Lock-Free FIFO template in C++ look at
RingBuffer.h in the LinuxSampler CVS
We use this template heavily.
For example we set up a large ringbuffer for streaming the audio from disk
(one ringbuffer for each voice).
The disk thread reads directly into the ringbuffer and the audio thread
fetches the data
in a lock-free way. This ensures zero-copy operation. Plus we added a
wrap space so that
a section of the beginning of the buffer is replicated after the
official upper bound so that the
audio thread can read a bit past of it and still gets the correct audio
data (as it was linear),
this speeds up the audio interpolation since for the audio thread it's
like reading from a linear segment,
no nasty if() checks etc, it's all done (from time to time, so no
pratical CPU overhead) when the disk thread
writes the data to the ringbuffer.

But we don't use the lockfree ringbuffer only for audio: we use it
(since it's a template you can create ringbuffers
of any kind of struct) to send commands between the midi thread (note
on/off etc) and the audio thread and
to send commands to the disk thread (start/stop streams).
Works really well and the resulting code is clean too.

The RingBuffer class uses atomic_*() macros so it is safe on any
architecture (but on most 32bit word accesses
are atomic anyway so the macros simply translate to load and store ops,
afaik the SPARC SMP is one of the only
archs that needs special care, (and can access only 24bit atomically,
thus your ringbuffers are limited to
16million elements).

PS: about disk streaming benchmarks ... I ported my benchmark to win32
too and make some tests there too,
the irony is that using buffered I/O when lots of disk seeks occur you
get really sucky perforomance, as low
as 30% of the normal sustained disk transfer speed.
It seems that the read ahead algorithm reads too much and get the disk
head scheduling wrong
(I used Win XP so I assume it has the most performant file I/O among the
windows family).
Using direct I/O (without buffering) you get decent performance, but you
lose the benefits of the file cache.
For example in the case of a disk based sampler where you often hit the
same notes (thus audio files) over short
periods of time you can save lots of accesses. The Linux file cache does
an excellent work here.
I guess those windows based disk sampler all implemented their own file
cache, while on linux the OS does all
the work for you :)

cheers,
Benno
http://www.linuxsampler.org

Next message: Paul Davis: "Re: [linux-audio-dev] Multithreaded programming for a poll model?"
Previous message: Fons Adriaensen: "Re: [linux-audio-dev] Multithreaded programming for a poll model?"
In reply to: Paul Davis: "Re: [linux-audio-dev] lock-free data structures"
Next in thread: Tim Goetze: "Re: [linux-audio-dev] lock-free data structures"
Reply: Benno Senoner: "Re: [linux-audio-dev] lock-free data structures"

New Message	Reply	About this list	Date view	Thread view	Subject view	Author view	Other groups

This archive was generated by hypermail 2b28 : Fri Jun 18 2004 - 02:13:43 EEST