Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Sat Jul 22 2000 - 12:10:25 EEST


On Fri, 21 Jul 2000, David Olofson wrote:
(....)
> However, this is fine for the *engine*, but this architecture and
> system dependent sync stuff should definitely be kept out of plugin
> APIs - or well'll have lots of "fun" with closed source plugins...
> No problem if the plugins ar called from the hard-coded, built-in
> mixer of EVO, since then they won't have to deal with the FIFOs
> directly. Is this where the plugin API will inserted, or will the
> entire voice mixers be pluggable as well?

I think for the beginning, EVO should support LADSPA FX plugins
which can be used for the send FXes, like reverb and chorus.
Since they ony very few instances will be active
(in most case one per FX since all voices will feed data to them
via the FX send busses), the plugin model is very suitable and
does not cause any noticeable increase in CPU usage compared to
the "inlined FXes model".
But for filters, since they will get heavily accessed by potentially
all voices, I prefer to keep them inlined for now, but we will see
if a pluggable model will still deliver acceptable performance.
The problem might be the FX parameters.

(...)

>
> True, if the event scheduling is to be considered hard real
> time. However, viewing the MIDI events as *soft* real time allows
> timestamping them according to the average latency of a single
> fragment rather than the full buffering latency. This will replace
>
> evenly distributed 0..fragment_latency jitter
> + occasional peaks jitter
>
> with
> fixed fragment_latency delay
> + occasional peaks jitter
>
> That is, average latency increases with about half a fragment period,
> but you also practically get rid of all jitter.
>
> I'd definitely prefer one fragment of fixed latency (for all
> practical reasons) to evenly distributed jitter with a full P-P
> amplitude of the same amount.

Yes I thought about this issue: I measured the mean time it takes
between a note-on cmd (3bytes) got received by the MIDI thread
and the voice being rendered into fragment buffer.
about 400usecs in mean (tested over serveral thousand MIDI events).
This is close to the theoretical 725 ( fragment latency) / 2 = 362usecs.

As you pointed out as long as you do not run the MIDI thread with higher
priority than the audio thread, you can't pre-empt the audio thread,
plus the audio thread would need to check for new commands at every iteration
(or the MIDI thread could set samples_to_event < 0 so that the event processing
gets triggered).
I am not convinced if running the MIDI thread at higher priority is safe from a
RT POV, because the MIDI thread could do lots or weird stuff (eating CPU)
leating to potential dropouts.
But I think we should keep in mind this feature because it may help to increase
note timing even further, even if the MIDI wirespeed sucks , but we never
know, if a new protocol will come out ( MLAN ??), then we will be prepared !

And as for the jitter: when we play chords , each note-on adds 1.1msec latency
so when playing a 3 finger chord (C-E-G) two times, it could be that the first
time the notes will arrive with the sequence C-E-G , the 2nd time with
G-C-E , this is IMHO way above the mean 350usec jitter.

eg: the first time G will arrive after 3msecs and the 2nd time after 1msec, so
this is a delta of 2msecs which is a limitation of the MIDI protocol.

> My softsynth hack does the same thing as a two-thread solution with
> MIDI at lower prio than audio would; first thing in each audio cycle
> it reads and processes all MIDI events that have arrived since the
> last check. (I'm using non-blocking MIDI I/O instead of an extra
> thread.) This is *not* the correct or perfect way obviously, it's
> just simple and fast, and still works pretty well, with the lowest
> possible complexity.
>
> At insignificant DSP CPU load, it behaves identically to your
> "deliver ASAP" model - all your voices will check their events and
> do their processing in about the same time as as my synth checks all
> events, and then does all processing; the outside world will not see
> the difference since the total time is, in this example insignificant.
>
> Under heavy DSP load, there is one difference: my synth still behaves
> *exactly* the same way as in the previous example, while it starts to
> show that your voices actually *do not execute at the same time*!
> This shows as a rather pointless dependency between voice # and
> latency.

Nope: my model will behave the same way in both low and high
DSP load situations:
 ( leaving out weird things like loads over 95% , which would hurt your
softsynth hack too)

The audio thread releases the CPU every 700usecs, thus the MIDI thread
will have the chance to run for 700* ( 1 - cpu_load) usecs
( cpu_load goes from 0 to 1 , 1 = 100%)

Within this time, it will easily be able to fetch all MIDI bytes from the midi
fifo and if a complete command arrived, send a note-on command to the audio
thread via lock-free fifo.

It is prepared to your sensor-thread model: if you can ensure that the MIDI
thread will not chew away more than 700usec per run, then you can run it
at higher priority than the audio thread and trigger new event processing by
setting a flag so that the audio thread will process it immediately.
This may become VERY useful on SMP hardware because we would effectively
run audio and midi simultaneously , delivering almost instant response to
events.

So my question: how many windows soft synths provides this almost jitter-free
accurate event delivery ?
windows is THE timing jitter in person, so I think implementing the scheme
you mentioned will not improve the situation since the gained accuracy will
go under in the high OS jitter-noisefloor.
But linux is a different beast , thanks to lowlatency.

>
> One will have to live with the occasional kernel latency peaks
> breaking through when playing true real time via MIDI. Seems like
> they're so few with 2.2.x/lowlatency that it's practically unlikely
> that you'll ever notice hitting one with a critical MIDI event...

Agree, see how few spikes the latency diagrams contain,
plus they are all below the MIDI event latency (1msec in most cases)
so it is not the limiting factor.

>
> For final recordings of very demanding material, you should of course
> use a sequencer that can send timestamped events directly to the
> sampler. That is, more or less off line rendering, that happens to be
> monitored with very low latency.

Yes and since the sequencer will run within the rtsoundserver model,
you will get sample accurate output AND 2.1msec latency
(yes actually lower than stuff played lively because sending a
"MIDI event" form the sequencer to the synth takes virtually no time
(compared to the 1.1msec MIDI note-on time), since we are only moving
a few bytes in RAM.

> > The problem is that the threads run totally in parallel:
> > assume EVO comes as a plugin for the rtsoundserver:
> >
> > the soundserver runs 3 threads and the plugin provides
> > the 3 callbacks:
> > EVO_audio()
> > EVO_midi()
> > EVO_disk()
> >
> > currently EVO (am I allowed to call disksampler EVO ? :-) ),
> > runs 3 threads which all communicate via lock-free fifo:
> > The MIDI thread could send events at any time and on a SMP box
> > both threads will run simultaneously, so events can come in at any time.
>
> Ok, three plugins in one, actually, running in different threads, so
> yes, the lock-free stuff *is* needed for communication between the
> modules. However, there's still no point in feeding MIDI events
> directly to the audio thread, one event at a time. (See above - do
> you really want J ms P-P amplitude jitter to lower the *average*
> latency by J/2? I don't.)

The MIDI thread can feed as many events it want within one rush
( I use the same ringbuffers for commands as for audio samples,
this one of the advantages of templates !)

>
> > As said: giving away the speed advantages and binary uniformity of plugins and
> > audio apps, only to avoid the SMP/UP distinctions (eg the only "disturbing"
> > issue for them is that have to recompile) for a handful of SMP SPARC
> > owners is IMHO not the way to go.
>
> It's not as simple as that, unless the API stops at simple FIFO style
> point-to-point connections. Anything that lets multiple senders
> connect to one receiver will require operations that are way beyond
> what can realistically be done with a low level lock-free FIFO. What
> kind of plugin and audio app API are we talking about here? Anything
> beyond a very basic encapsulation of lock-free RT IPC?

The goal of this API should allow multithreaded realtime applications like
EVO to be run as plugins within the soundserver.

The soundserver would do the following:
- load the .so file using a low priority non RT thread to not distrupt audio
output (can VST / directX claim this feature ? I guess no .. at least form
reading the messages on the vst-plugins mailinglist)

- call EVO's init routines (still within the low priority thread)
- when all is ready to run add the audio , midi and disk IO callbacks to the
rtsoundserver's callback scheduler

The API itself does not assist inter-thread communication, since this is done
by using the ringbuffer.h structures.

The API should:

- allow plugin loading / unloading just like LADSPA

- manage all audio/midi input output devices
   for audio, the clients will get pointers to the audio data with informations
   about format , channels and fragmentsizes.

- midi data could probably be handled by an userspace Midishare server
   implementation which would call all involved callbacks.
   (I haven't looked at the midishare model yet, so I can't say if it would
    fit in this context, but my common sense says yes,
    Midishare folks tell us !)

- execute the plugins using simple loops which do nothing more than
   calling several functions in sequence
   like EVO_process_audio() ; Cubase_process_audio();

We could assist developer by writing documents about how to use lock-free
structures, to what rules they will have to obey , provide source code etc.
but solving the SMP/UP SPARC issue (eg sparcs would require UP/SMP
optimized plugins/apps) in an "elegant" way by providing the lock-free
structures in a dynamic lib would cause sometime an unacceptable performance
hit.

So just chose:
a) excellent speed but on certain (VERY FEW) architectures you would
   need to use SMP/UP optimized versions of the plugin.
b) sucky speed (in some cases) but one binary runs on both SMP/UP

I think most would chose a) because in the audio world,
latency , number of voices/FXes and processing speed, outweighs by far
the hassle to install different versions of your software depending if you
run SMP or UP boxes.
(this could be done completely by a GUI installer so the enduser would
never see this feature, plus for plugin developers , a recompile with
__SMP__ enabled does not take THAT much time and requires no changes
in the code.

So both from a developer and enduser POV this is almost a non-issue,
but it can buy us quite some speed in some cases.
Why waste this possibility ?

>
> > As you said we don't live in a perfect world, and I prefer installing
> > SMP-optimized versions of the app on SMP hw rather than hitting the
> > performance of millions of UP users because we use non inlined functions.
>
> Of course - It's just that I'm still not convinced that "messing"
> with these lock-free constructs for every single operation is the
> right way to go for these rtsoundserver plugins. Do they really gain
> anything from having more than one critical transfer point per
> connection?

Perhaps there is a misunderstanding between us,
can you explain what you exactly mean ?

developers of apps will have to use their own lock-free communication between
the threads , the API will not assist them because of the performance hit
that could arise from frequent access to these structures as EVOs
audiothread does when reading data from the diskstream buffers.
I am sure that similar cases to EVO exists, thus keep speed to the
highest possible level.
data from/to the rtsoundserver will not require lock-free fifos , as simple
shared mem could be used, since we are sure that we are the only
plugin running at time. (this is one of the advantages of manual scheduling
over true threads)

The fun will start when EVO will run on a 8way IA64 box and
the rtsoundserver will let run the audio processing within 8 parallel
threads which all render voices like crazy which then get mixed
together by one master and sent to the audio device.

In the next years we will see these monsters, and I want to be prepared
to fully exploit their capabilites.

Benno.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Sat Jul 22 2000 - 13:17:02 EEST