Subject: Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux
From: David Olofson (david_AT_gardena.net)
Date: Sat Jul 22 2000 - 00:35:32 EEST
On Fri, 21 Jul 2000, Benno Senoner wrote:
> On Fri, 21 Jul 2000, David Olofson wrote:
[...]
> > Is it really required that all data is fed *directly* from the disk
> > buttler into the FIFOs, one transaction at a time...?
>
[..]
>
> So as you see the diskthread makes very infrequent use of the ringbuffer
> code because it reads large data sets at time.
Ok.
> But the audio thread, when it is reading the streamed data from disk,
> and you run it with buffer sizes of 32 samples/fragment
> ( 128byte fragments (stereo)) , then it accesses the ringbuffer structure
> every 32 samples, in practice once every 700usec per voice.
> Play 50 voices and it will access the ringbuffer every 14usec ( 700/50).
This is one reason to keep the thread safe stuff at the "low
frequency" side. However, it's hard to do it in a significantly
different way in this case, since you still *have* to deliver
"events" from the streaming channels individually.
> Of course the DSP stuff with outweigh the ringbuffer access code
> by a large amount (I guess even doing simple playback without interpolation
> by at least factor 30) (eg you process 32 samples at time).
Probably, yes. :-)
> But you see keeping the code small and lean (and inlined :-) ), can
> save you some CPU which can be used do to useful DSP stuff rather
> than useless data shuffling / function calling.
> got the point ?
Yep, I can see why you do it this way on this level. Other
alternatives are possible, but I can't think of one that gives any
significant performance advantage right now.
However, this is fine for the *engine*, but this architecture and
system dependent sync stuff should definitely be kept out of plugin
APIs - or well'll have lots of "fun" with closed source plugins...
No problem if the plugins ar called from the hard-coded, built-in
mixer of EVO, since then they won't have to deal with the FIFOs
directly. Is this where the plugin API will inserted, or will the
entire voice mixers be pluggable as well?
> > Putting it a different way; what is the point in these extreme access
> > frequencies? With an event system, you could fill in the data for all
> > FIFOs, then build a list of events that describe what you have done,
> > and thes pass that list to the audio thread using one single API
> > call. How's that for minimal sync overhead?
>
> I am talking about the softsampler case: you want to hear the note
> as soon as possible after a MIDI note-on,
Not quite. Jitter is almost worse than latency in some cases.
> So you can not use the event trick and schedule in the future.
> We want the data out of the DAC as soon we press the key.
True, if the event scheduling is to be considered hard real
time. However, viewing the MIDI events as *soft* real time allows
timestamping them according to the average latency of a single
fragment rather than the full buffering latency. This will replace
evenly distributed 0..fragment_latency jitter
+ occasional peaks jitter
with
fixed fragment_latency delay
+ occasional peaks jitter
That is, average latency increases with about half a fragment period,
but you also practically get rid of all jitter.
I'd definitely prefer one fragment of fixed latency (for all
practical reasons) to evenly distributed jitter with a full P-P
amplitude of the same amount.
One will have to live with the occasional kernel latency peaks
breaking through when playing true real time via MIDI. Seems like
they're so few with 2.2.x/lowlatency that it's practically unlikely
that you'll ever notice hitting one with a critical MIDI event...
For final recordings of very demanding material, you should of course
use a sequencer that can send timestamped events directly to the
sampler. That is, more or less off line rendering, that happens to be
monitored with very low latency.
> > As to plugins, they simply ignore all this, and assume that their
> > targets run in the same thread. If this is not the case, the host
> > gets to figure out how and how frequently to pass events between
> > threads. (Something like once every cycle of the thread with the
> > highest cycle rate should be safe, I think... Keep in mind that you
> > cannot work reliably with events timestamped less than two or three
> > cycles in the future anyway!)
>
>
> The problem is that the threads run totally in parallel:
> assume EVO comes as a plugin for the rtsoundserver:
>
> the soundserver runs 3 threads and the plugin provides
> the 3 callbacks:
> EVO_audio()
> EVO_midi()
> EVO_disk()
>
> currently EVO (am I allowed to call disksampler EVO ? :-) ),
> runs 3 threads which all communicate via lock-free fifo:
> The MIDI thread could send events at any time and on a SMP box
> both threads will run simultaneously, so events can come in at any time.
Ok, three plugins in one, actually, running in different threads, so
yes, the lock-free stuff *is* needed for communication between the
modules. However, there's still no point in feeding MIDI events
directly to the audio thread, one event at a time. (See above - do
you really want J ms P-P amplitude jitter to lower the *average*
latency by J/2? I don't.)
> So we need fast and safe lock-free fifos in order to squeeze out the
> maximum from the app.
Yeah, maximum *theoretical* performance, not best actual, useful
performance.
Oh, BTW, this is a totally pointless discussion, unless either
1) the MIDI hardware timestamps the MIDI data, or
2) the MIDI thread runs at *higher* priority than the audio thread.
My softsynth hack does the same thing as a two-thread solution with
MIDI at lower prio than audio would; first thing in each audio cycle
it reads and processes all MIDI events that have arrived since the
last check. (I'm using non-blocking MIDI I/O instead of an extra
thread.) This is *not* the correct or perfect way obviously, it's
just simple and fast, and still works pretty well, with the lowest
possible complexity.
At insignificant DSP CPU load, it behaves identically to your
"deliver ASAP" model - all your voices will check their events and
do their processing in about the same time as as my synth checks all
events, and then does all processing; the outside world will not see
the difference since the total time is, in this example insignificant.
Under heavy DSP load, there is one difference: my synth still behaves
*exactly* the same way as in the previous example, while it starts to
show that your voices actually *do not execute at the same time*!
This shows as a rather pointless dependency between voice # and
latency.
> But those poor SMP SPARC folks will hopefully take the time to compile
> the app (instead to search for a flawed UP binary), since we are shipping
> a GPLed app. :-)
Ok, no problem with software that's more or less considered to be
system software - I'm more worried about the "3'rd party plugins" for
it.
> As said: giving away the speed advantages and binary uniformity of plugins and
> audio apps, only to avoid the SMP/UP distinctions (eg the only "disturbing"
> issue for them is that have to recompile) for a handful of SMP SPARC
> owners is IMHO not the way to go.
It's not as simple as that, unless the API stops at simple FIFO style
point-to-point connections. Anything that lets multiple senders
connect to one receiver will require operations that are way beyond
what can realistically be done with a low level lock-free FIFO. What
kind of plugin and audio app API are we talking about here? Anything
beyond a very basic encapsulation of lock-free RT IPC?
> And I hardly believe that there will be someday commercial linux audio plugins
> for SMP SPARCS avalilable.
> (if there will be and the manufacturer is lazy then it will only ship the SMP
> version (which will work on both), or alternatively will ship both UP/SMP)
Ok, if this was the only problem... (Maybe it is, but then I think
we're talking about two quite different things.)
> > > Fortunately atomic_read /set macros on PPC remain the same on both SMP/UP,
> > > so at least on the 95% (or more) boxes in circulation, we will not face this
> > > issue.
> >
> > Good, but I still think it would be cool if application/plugin level
> > code would never see it *at all*, while we still get a simpler,
> > faster and more flexible API.
>
> at source level API , the plugin sees only the ringbuffer.h - like structures,
> The problems are on the binary level (eg the UP version run on problematic
> SMP boxes (eg SPARC).
>
> It's like complaining that an SMP compiled linux kernel runs slower on an
> UP box than the UP version (because some locks become NOPs).
That's a kernel - not lots of independently managed applications.
> As you said we don't live in a perfect world, and I prefer installing
> SMP-optimized versions of the app on SMP hw rather than hitting the
> performance of millions of UP users because we use non inlined functions.
Of course - It's just that I'm still not convinced that "messing"
with these lock-free constructs for every single operation is the
right way to go for these rtsoundserver plugins. Do they really gain
anything from having more than one critical transfer point per
connection?
As to software that can be considered system software or drivers, and
that *need* to do it this way (it always has to be done at some point
in the communication between two threads), this binary level thing
is a non-issue - especially if most of this software will belong to
the same project. For that kind of code, sure, it doesn't get much
faster or nicer than your lock-free FIFO template, as far as I can
see. :-)
//David
.- M u C o S --------------------------------. .- David Olofson ------.
| A Free/Open Multimedia | | Audio Hacker |
| Plugin and Integration Standard | | Linux Advocate |
`------------> http://www.linuxdj.com/mucos -' | Open Source Advocate |
.- A u d i a l i t y ------------------------. | Singer |
| Rock Solid Low Latency Signal Processing | | Songwriter |
`---> http://www.angelfire.com/or/audiality -' `-> david_AT_linuxdj.com -'
This archive was generated by hypermail 2b28 : Sat Jul 22 2000 - 03:18:51 EEST