Subject: Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Sat Jul 22 2000 - 00:23:31 EEST
On Fri, 21 Jul 2000, David Olofson wrote:
> >
> > I think that applications running as plugins, will need to use these data
> > structures heavily so we have to solve the SMP/UP issues in advance.
>
> That is, one plugin <-> one direct application connection? Well, yes,
> but what says it makes sense to pass one event at a time all the way
> through the IPC layers, when the application thread isn't event
> running SCHED_FIFO?
>
(...)
>
> *Exactly* what kinds of connections are you concerned about?
>
> > As for providing ringbuffer code in a library, I am a bit against it,
> > because the compiler can't inline the code, making it much slower.
> > (I am thinking about the case where you access these datastructures
> > heavily, thousand times per second (as in disksampler) )
>
> Is it really required that all data is fed *directly* from the disk
> buttler into the FIFOs, one transaction at a time...?
Now I explain:
the disk butler writes as much data as it can (actually up to 64-128KByte at
time in order to keep the track latency low
(eg: you use big ringbuffers and there are many stream active, and if there
was no limit on the max read size, it would take looong time to refill all
stream buffers with at least a bit of data, and you would risk a dropout in the
last streams, since data is pulled from all buffers simultaneosuly by the
audio thread.
So as you see the diskthread makes very infrequent use of the ringbuffer
code because it reads large data sets at time.
But the audio thread, when it is reading the streamed data from disk,
and you run it with buffer sizes of 32 samples/fragment
( 128byte fragments (stereo)) , then it accesses the ringbuffer structure
every 32 samples, in practice once every 700usec per voice.
Play 50 voices and it will access the ringbuffer every 14usec ( 700/50).
Of course the DSP stuff with outweigh the ringbuffer access code
by a large amount (I guess even doing simple playback without interpolation
by at least factor 30) (eg you process 32 samples at time).
But you see keeping the code small and lean (and inlined :-) ), can
save you some CPU which can be used do to useful DSP stuff rather
than useless data shuffling / function calling.
got the point ?
>
> Putting it a different way; what is the point in these extreme access
> frequencies? With an event system, you could fill in the data for all
> FIFOs, then build a list of events that describe what you have done,
> and thes pass that list to the audio thread using one single API
> call. How's that for minimal sync overhead?
I am talking about the softsampler case: you want to hear the note
as soon as possible after a MIDI note-on, thus the only way is to
use very short fragmentsizes.
We all know that by lowering the fragmentsizes, the setup overhead
increases quite a bit, but 32 is still a good tradeoff, since, it delivers
very good latencies plus keeps the overhead down to max of a few %
compared the case when running with bigger fragmentsizes.
So you can not use the event trick and schedule in the future.
We want the data out of the DAC as soon we press the key.
>
> As to plugins, they simply ignore all this, and assume that their
> targets run in the same thread. If this is not the case, the host
> gets to figure out how and how frequently to pass events between
> threads. (Something like once every cycle of the thread with the
> highest cycle rate should be safe, I think... Keep in mind that you
> cannot work reliably with events timestamped less than two or three
> cycles in the future anyway!)
The problem is that the threads run totally in parallel:
assume EVO comes as a plugin for the rtsoundserver:
the soundserver runs 3 threads and the plugin provides
the 3 callbacks:
EVO_audio()
EVO_midi()
EVO_disk()
currently EVO (am I allowed to call disksampler EVO ? :-) ),
runs 3 threads which all communicate via lock-free fifo:
The MIDI thread could send events at any time and on a SMP box
both threads will run simultaneously, so events can come in at any time.
So we need fast and safe lock-free fifos in order to squeeze out the
maximum from the app.
But those poor SMP SPARC folks will hopefully take the time to compile
the app (instead to search for a flawed UP binary), since we are shipping
a GPLed app. :-)
As said: giving away the speed advantages and binary uniformity of plugins and
audio apps, only to avoid the SMP/UP distinctions (eg the only "disturbing"
issue for them is that have to recompile) for a handful of SMP SPARC
owners is IMHO not the way to go.
And I hardly believe that there will be someday commercial linux audio plugins
for SMP SPARCS avalilable.
(if there will be and the manufacturer is lazy then it will only ship the SMP
version (which will work on both), or alternatively will ship both UP/SMP)
> > Fortunately atomic_read /set macros on PPC remain the same on both SMP/UP,
> > so at least on the 95% (or more) boxes in circulation, we will not face this
> > issue.
>
> Good, but I still think it would be cool if application/plugin level
> code would never see it *at all*, while we still get a simpler,
> faster and more flexible API.
at source level API , the plugin sees only the ringbuffer.h - like structures,
The problems are on the binary level (eg the UP version run on problematic
SMP boxes (eg SPARC).
It's like complaining that an SMP compiled linux kernel runs slower on an
UP box than the UP version (because some locks become NOPs).
As you said we don't live in a perfect world, and I prefer installing
SMP-optimized versions of the app on SMP hw rather than hitting the
performance of millions of UP users because we use non inlined functions.
Benno.
This archive was generated by hypermail 2b28 : Sat Jul 22 2000 - 00:47:28 EEST