Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux

New Message	Reply	About this list	Date view	Thread view	Subject view	Author view	Other groups

Subject: Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux
From: David Olofson (david_AT_gardena.net)
Date: Wed Jul 26 2000 - 06:04:04 EEST

Next message: Paul Barton-Davis: "Re: [linux-audio-dev] MidiShare Linux"
Previous message: Andrew Morton: "Re: [linux-audio-dev] 2.4.0test5-pre4-lowlat latencies benchmarked, 2.2.16+Ingo's LL patch = strange behaviour"
In reply to: Benno Senoner: "Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux"
Next in thread: David Olofson: "Re: [linux-audio-dev] MidiShare Linux"
Reply: David Olofson: "Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux"

On Sat, 22 Jul 2000, Benno Senoner wrote:
[...]
> I am not convinced if running the MIDI thread at higher priority is safe from a
> RT POV, because the MIDI thread could do lots or weird stuff (eating CPU)
> leating to potential dropouts.

If it runs at lower prio, the MIDI plugins might as well run in the
audio thread, but...

> But I think we should keep in mind this feature because it may help to increase
> note timing even further, even if the MIDI wirespeed sucks , but we never
> know, if a new protocol will come out ( MLAN ??), then we will be prepared !

...it's probably a goo idea to be ready for doing it the other way
around.

But, while worrying about this...

> And as for the jitter: when we play chords , each note-on adds 1.1msec latency
> so when playing a 3 finger chord (C-E-G) two times, it could be that the first
> time the notes will arrive with the sequence C-E-G , the 2nd time with
> G-C-E , this is IMHO way above the mean 350usec jitter.

...you don't seem all that worried about making the possible
improvement mentioned above less useful at the cost of architecture
dependent sync code and overhead in all plugin code. ;-)

> > My softsynth hack does the same thing as a two-thread solution with
> > MIDI at lower prio than audio would; first thing in each audio cycle
> > it reads and processes all MIDI events that have arrived since the
> > last check. (I'm using non-blocking MIDI I/O instead of an extra
> > thread.) This is *not* the correct or perfect way obviously, it's
> > just simple and fast, and still works pretty well, with the lowest
> > possible complexity.
> >
> > At insignificant DSP CPU load, it behaves identically to your
> > "deliver ASAP" model - all your voices will check their events and
> > do their processing in about the same time as as my synth checks all
> > events, and then does all processing; the outside world will not see
> > the difference since the total time is, in this example insignificant.
> >
> > Under heavy DSP load, there is one difference: my synth still behaves
> > *exactly* the same way as in the previous example, while it starts to
> > show that your voices actually *do not execute at the same time*!
> > This shows as a rather pointless dependency between voice # and
> > latency.
>
> Nope: my model will behave the same way in both low and high
> DSP load situations:

Yep; this is because the priority relation makes the two threads
emulate the behaviour of my single threaded model - albeit with some
more overhead due to context switching.

(I expected the separate MIDI thread to actually warrant it's own
existence, despite the IPC issues it generates... :-)

[...]
> It is prepared to your sensor-thread model: if you can ensure that the MIDI
> thread will not chew away more than 700usec per run, then you can run it
> at higher priority than the audio thread and trigger new event processing by
> setting a flag so that the audio thread will process it immediately.
> This may become VERY useful on SMP hardware because we would effectively
> run audio and midi simultaneously , delivering almost instant response to
> events.

Well, just one problem: How do you cut below the latency caused by
the buffering without risking audio reliability? That is, even if we
get the audio thread to act in no time at all, it's still working
with buffers that have one or two buffers ahead in the driver before
they start to play... The MIDI->audio thread communication cannot
play with the *total* latency; just where in the buffer currently
being processed the events will take effect.

> So my question: how many windows soft synths provides this almost jitter-free
> accurate event delivery ?
> windows is THE timing jitter in person, so I think implementing the scheme
> you mentioned will not improve the situation since the gained accuracy will
> go under in the high OS jitter-noisefloor.
> But linux is a different beast , thanks to lowlatency.

As long as we deal with MIDI, there's not much point in going beyond
the single threaded model anyway. There is simply not enough useful
information left to preserve on this level. Besides, proper MIDI
interfaces should be used when if you really have a MIDI controller
that delivers as accurate timing as the MIDI spec allows.

So my question is; why use a separate MIDI thread at all after ruling
it out as pointless to let it do what it's meant for?

I'm even beginning to hesitate when it comes to "Is it a good idea
to prepare for running the MIDI thread at higher priority than the
audio thread for any other reasons than maintaining low jitter with
large fragment sizes?" It doesn't make any sense unless you really do
the game SFX trick on a mmap()ed audio buffer. (The same trick used
by some Windows softsynths to achieve low latency and extremely high
jitter on MIDI without audio dropouts.)

The point: Don't get into multithreading trouble - especially not
embeded in API macros - unless it really buys you something. As far as
I can tell, it's only making things worse in this case. (With the
possible exception of average jitter, which is just a useless figure
anyway in this case, especially with a distribution similar to that of
white noise.)

[...]
> > For final recordings of very demanding material, you should of course
> > use a sequencer that can send timestamped events directly to the
> > sampler. That is, more or less off line rendering, that happens to be
> > monitored with very low latency.
>
> Yes and since the sequencer will run within the rtsoundserver model,
> you will get sample accurate output AND 2.1msec latency
> (yes actually lower than stuff played lively because sending a
> "MIDI event" form the sequencer to the synth takes virtually no time
> (compared to the 1.1msec MIDI note-on time), since we are only moving
> a few bytes in RAM.

Yep, interactive applications (phrase sequencers, appreggiators etc)
can cut the application->audio latency quite a bit here. :-)

> > > The problem is that the threads run totally in parallel:
> > > assume EVO comes as a plugin for the rtsoundserver:
> > >
> > > the soundserver runs 3 threads and the plugin provides
> > > the 3 callbacks:
> > > EVO_audio()
> > > EVO_midi()
> > > EVO_disk()
> > >
> > > currently EVO (am I allowed to call disksampler EVO ? :-) ),
> > > runs 3 threads which all communicate via lock-free fifo:
> > > The MIDI thread could send events at any time and on a SMP box
> > > both threads will run simultaneously, so events can come in at any time.
> >
> > Ok, three plugins in one, actually, running in different threads, so
> > yes, the lock-free stuff *is* needed for communication between the
> > modules. However, there's still no point in feeding MIDI events
> > directly to the audio thread, one event at a time. (See above - do
> > you really want J ms P-P amplitude jitter to lower the *average*
> > latency by J/2? I don't.)
>
> The MIDI thread can feed as many events it want within one rush
> ( I use the same ringbuffers for commands as for audio samples,
> this one of the advantages of templates !)

Yep, but what is the MIDI thread going to block on in that case? At
the very least, it'd need a timer for that. (Rate: something like the
cycle rate of the audio thread.) This is not trivial code to add for
a slight performance increase, and I doubt most plugin developers
would even know what all this is about - and thus simply ignore it,
doing it the obvious way.

And *if* anyone bothers with that kind of "optimization", that
renders the arch dependant lock-free FIFOs totally pointless on the
API level, as the MIDI thread is actually operating in cycles.

Besides, it's more to it than just replacing lock-free FIFOs with
ordinary (which would basically just be wrecking a nice design
suitable for non-cycle based IPC), non-thread safe FIFOs - completely
different methods can be used easily and safely if the multithreading
issues are out of the way.

[...]
> We could assist developer by writing documents about how to use lock-free
> structures, to what rules they will have to obey , provide source code etc.
> but solving the SMP/UP SPARC issue (eg sparcs would require UP/SMP
> optimized plugins/apps) in an "elegant" way by providing the lock-free
> structures in a dynamic lib would cause sometime an unacceptable performance
> hit.

> So just chose:
> a) excellent speed but on certain (VERY FEW) architectures you would
> need to use SMP/UP optimized versions of the plugin.
> b) sucky speed (in some cases) but one binary runs on both SMP/UP

My point is not that of reducing the overhead of IPC, but to

1) keep it away from the API, and most importantly

2) to avoid using it more frequently than required.

2) is what my resistance against lock-free FIFOs directly in the API
is all about; why try to do supercomputer style IPC when we have very
deterministic behaviour in all critical task, and less critical
timing requirements on communication with non RT tasks?

The only valid reason I can see is that providing a very basic IPC
mechanism frees us from developing a complete API. That may be a good
reason, and it might be what settles all this.

I just wanted to point out that from a technical and perhaps in
praticular, performance POV, lock-free FIFOs are a very good solution
for a problems slightly different from the one at hand. Or; perhaps
we should focus on cycle-wise *transactions* rather than individual
operations?

> I think most would chose a) because in the audio world,
> latency , number of voices/FXes and processing speed, outweighs by far
> the hassle to install different versions of your software depending if you
> run SMP or UP boxes.

Exactly. This is why I'd prefer using the FIFOs for the *IPC*, while
building the API around a smarter protocol, that minimizes the number
of operations that need to be thread safe. It has the positive side
effect of the API being able to keep the sensitive low level IPC
stuff a bit father away from the application and plugin developers
without performance penalty.

> So both from a developer and enduser POV this is almost a non-issue,
> but it can buy us quite some speed in some cases.
> Why waste this possibility ?

Why waste the possibility of hiding these issues completely from
developers *and* users, get better performance, less end-user install
problems, improve the chances of applications being able to
cooperate, and of getting an API that generates plugins that are
binary compatible across single/multithread, UP, SMP etc..?

Oh well, we need a usable API to implement that way first! :-)

> > > As you said we don't live in a perfect world, and I prefer installing
> > > SMP-optimized versions of the app on SMP hw rather than hitting the
> > > performance of millions of UP users because we use non inlined functions.
> >
> > Of course - It's just that I'm still not convinced that "messing"
> > with these lock-free constructs for every single operation is the
> > right way to go for these rtsoundserver plugins. Do they really gain
> > anything from having more than one critical transfer point per
> > connection?
>
> Perhaps there is a misunderstanding between us,
> can you explain what you exactly mean ?

I think I've mentioned most of it above.

Anyway, as an example, we can go back to this MIDI vs. audio thread
thing; unless the timing is significantly improved by running the
MIDI thread at higher prio, why do so, and force it to send every
event via IPC? The alternative would be to have the API "buffer" the
MIDI thread's output, and not pass it via lock-free mechanisms more
frequently than once every audio thread cycle. If the host was smart,
it would just run both MIDI and audio plugins in the same thread (*),
and do away with the IPC stuff altogether.

(*) This, of course, is assuming that we're dealing with a real
*plugin* API, as opposed to a RT multithreading API.

> developers of apps will have to use their own lock-free communication between
> the threads , the API will not assist them because of the performance hit
> that could arise from frequent access to these structures as EVOs
> audiothread does when reading data from the diskstream buffers.
> I am sure that similar cases to EVO exists, thus keep speed to the
> highest possible level.
> data from/to the rtsoundserver will not require lock-free fifos , as simple
> shared mem could be used, since we are sure that we are the only
> plugin running at time. (this is one of the advantages of manual scheduling
> over true threads)
>
> The fun will start when EVO will run on a 8way IA64 box and
> the rtsoundserver will let run the audio processing within 8 parallel
> threads which all render voices like crazy which then get mixed
> together by one master and sent to the audio device.
>
> In the next years we will see these monsters, and I want to be prepared
> to fully exploit their capabilites.

You probably have to rely on processing net dependencies and
spinlocks rather than lock-free FIFOs here, as all CPUs will have to
work like one, solid unit. The alternative is to use lock-free FIFOs
between the threads, and assume that the buffering can be set up for
that to work reliably. OTOH, the latter may actually be the only way,
as long as we're not using RTL, so we're basically back at square
one: the best way to communicate data between RT threads running
cycle based processing at fixed rates.

Finally; we're probably talking about two different levels of APIs
here; your FIFOs are about as efficient as you can get a generic, RT
safe IPC layer, while I'm thinking more along the lines of running
plugins within one or more threads that schedule the plugins
manually. My concerns are mostly about integrating the IPC with the
plugin API used internally inside thereads/hosts - it would be nice
if it could be done, but I'm afraid it would have a porformance
impact applications that run many plugins in the same thread.

//David

Next message: Paul Barton-Davis: "Re: [linux-audio-dev] MidiShare Linux"
Previous message: Andrew Morton: "Re: [linux-audio-dev] 2.4.0test5-pre4-lowlat latencies benchmarked, 2.2.16+Ingo's LL patch = strange behaviour"
In reply to: Benno Senoner: "Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux"
Next in thread: David Olofson: "Re: [linux-audio-dev] MidiShare Linux"
Reply: David Olofson: "Re: [linux-audio-dev] Re: lock-free structures ... Re: MidiShare Linux"

New Message	Reply	About this list	Date view	Thread view	Subject view	Author view	Other groups

This archive was generated by hypermail 2b28 : Wed Jul 26 2000 - 08:18:20 EEST