Re: [linux-audio-dev] LADSPA hard_rt_capable

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] LADSPA hard_rt_capable
From: David Olofson (david_AT_gardena.net)
Date: Fri Dec 15 2000 - 08:16:25 EET


On Wednesday 13 December 2000 01:16, Jack O'Quin wrote:
> I've been thinking about the requirements of realtime audio,
> including the many interesting comments on this list. Clearly, you
> folks understand it very well, but it's still easy to underestimate
> the complexity of realtime programming. The main difficulty is
> that "realtime" is a global attribute of the entire system, which
> any component can mess up by being careless in subtle ways.
>
> So, I've come to the unwelcome conclusion that in many cases,
> realtime audio probably requires at least *two* different-priority
> threads. Maybe three. And, that is not even considering SMP.
>
> In support of this idea, I note that Csound implements three
> different cyclical rates at which variables can change their
> values. Here is an excerpt from the Csound manual:
>
> csound> There are four possible rates:
> csound>
> csound> 1) once only, at orchestra setup time (effectively a
> csound> permanent assignment);
> csound> 2) once at the beginning of each note (at
> initialization csound> (init) time: I-rate);
> csound> 3) once every performance-time control loop (perf time
> csound> control rate, or K-rate);
> csound> 4) once each sound sample of every control loop (perf
> time csound> audio rate, or A-rate).
>
> Borrowing this terminology only for purposes of discourse, I
> observe that most existing Linux realtime audio seems to run at
> either I-rate (various MIDI control events, GUI commands), or
> A-rate (hard disk recording, mixing). I'm speaking somewhat
> loosely, here. There may be some plugins that would like to use
> something in between, like the Csound K-rate. Perhaps MIDI
> continuous controllers are in this category. I don't know.
> Perhaps not.
>
> I have observed discussions between I-rate (not "irate")
> programmers and A-rate programmers about whether MIDI is really a
> realtime protocol. Clearly, it is. Responses at least in the low
> millisecond range are essential in many cases for keyboard control
> input, for example. Some would probably argue for even faster
> response times. But, I believe that Ardour has to deal with much
> tighter response requirements. Handling complex MIDI requests
> while simultaneously reading or writing 24 (or more) channels of
> digital audio to disk is a very difficult challenge.
>
> In many sophisticated realtime systems there is a hierarchy of
> priorities representing different guaranteed latencies.

Yes, but streaming audio systems don't quite work like that.

Actually, no RT systems do - all you achieve with more threads and
more priority levels is more complexity - and *possibly* the ability
to solve *some* problems better *most of the time*. Basically, big
thread hierachies are the way to practically non-deterministic code.

> Code runs
> at a high priority when it needs to respond with low latency.

Yes, but guess who gets to figure out how to guarantee that certain
sequences of external events with high internal priorities don't
cause lower priority threads to miss deadlines...? ;-)

And, keeping this audio related; the audio thread is usually higher
latency, lower priority, but absolutely *hard* real time! Don't
*ever* make the audio thread finish late, or you'll get audio
drop-outs.

The alternative is to give MIDI event processing (not necessarilly
MIDI capturing and timestamping!) lower priority, basically giving
MIDI higher worst case latency, in order to avoid having to figure
out how many MIDI events you may process in a certain amount of time,
and what to do if you run out of time...

Alternatively, you can just calculate the simplified absolute worst
case scenario (ie all threads want to do their work at the "wrong"
time), and set the CPU time margin according to that. That's
relatively simple, and guaranteed to work, as long as you get the
maths and the code right. No hidden complexity. (Oh, *do* remember to
make sure that you take the maximum input event density [ie MIDI
speed] in account! It might make your calculations look very
pessimistic, but it really doesn't get better than that if you really
need to be safe...)

> This, in turn, places a requirement on every program running at or
> above that priority to keep its worst-case pathlength within some
> constrained limit. Other realtime programs, running at lower
> priorities, can be allowed much longer pathlengths because their
> guaranteed response is not so tight,

"Tight" could be confusing in this context - actually, in the
softsynth example, the lower prio thread (audio) has a *longer* code
path per cycle, and higher acceptable latency, BUT it has a much
harder deadline than the MIDI thread. It just *mustn't* be missed,
while it's not the end of the world if a MIDI event should be
slightly late.

> and because their lower
> priority allows them to run without affecting higher-priority
> activities.
>
> I like to visualize this phenomenon as a "realtime response
> pyramid". High-priority tasks at the top of the pyramid must
> execute in a short time, symbolized by the pyramid's narrow width
> near its peak. Low-priority tasks nearer the pyramid's base, can
> run much longer, if necessary.

Yes, but don't forget that you also have to consider how the top of
the pyramid affects the lower sections... Especially since we have
this special case that the lower priority threads are hard real time,
while the higher prio ones are not as hard - they shold have lower
average latency, but higher worst case latencies can be accepted.

Fortunately, we have an advantage over "normal" Rt systems in that
most of the data that requires heavy processing is array based, and
allows processing to be chopped up into suitably sized portions, even
on the fly. Thus, we can usually get away with defining a maximum
acceptable latency (say 3 ms) a maximum acceptable average event
response jitter (1 ms), and a maximum response latency (3 ms). Now,
just make sure the whole system runs at least one cycle per "max
average response jitter unit" (1 ms), set up buffering to get the
desired maximum latency, and then make sure that there are no latency
peaks that cause buffer underruns.

In softsynth terms; use a buffer size that results in <1 ms playback
time per buffer, use 3 buffers total, check MIDI once per buffer, and
make sure you don't get buffer underruns.

If you want better timing accuracy for the MIDI events, you need to
check MIDI in a separate thread running at higher priority than the
audio thread, and timestamp events there.

If you need lower MIDI in -> audio out latency, you can eliminate
approximately one buffer by lowering the CPU load to virtually zero -
or by switching to RTLinux or RTAI, to practically eliminate the
scheduling latency peaks.

> On Mon, 11 Dec 2000, Steve Harris wrote:
> > > I've a few plugins that can't be marked as HARD_RT_CAPABLE
> > > because thier cycle consumption varies too much when you change
> > > parameters (ie. they use parameter watching and only rebuild
> > > tables if they need to) otherwise they would be too slow with
> > > small chunk sizes. But this means that they can't be flagged as
> > > RT_CAPABLE, even though they don't use malloc or anything nasty
> > > like that.
> > >
> > > Is thier any advantage to allowing them to flag that they have
> > >unpredictable CPU consumption, but are otherwise safe? I'm not
> > > sure if that helps the host.
>
> Using Csound terminology, the problem Steve describes is a case of
> A-rate code being modulated due to I-rate (or maybe K-rate) events.
> This makes the single HARD_RT_CAPABLE flag of LADSPA seem overly
> simplified.
>
> I realize there's probably nothing LADSPA, itself, can do about
> this. But, it seems that relatively sophisticated hosts may wish to
> implement the realtime response pyramid I've been describing.
> Perhaps some already do.
>
> For this, it would be helpful for plugins to describe their
> realtime properties in terms of priority classifications. Maybe,
> Steve could mark his plugin as A_RATE_CAPABLE, but with a parameter
> modification routine that is only I_RATE_CAPABLE, for example.
> Perhaps that should be handled by creating a separate, but related
> plugin. I don't know.

That is, classify plugins after their balance between how much the
buffer size and individual events affect the execution time...? (What
I mean is, "high quality RT" plugins would have very similar
execution times no matter what, whereas "lower quality RT" plugins
have higher peaks, and thus don't work in threads with very low
I/Olatency.)

Yes, that make sense, but it can't be done on Linux/lowlatency,
unless there's a *big* difference between the classes; ie 2 ms for
the "high quality" thread and 20 ms for the "low quality" thread. Due
to the CPU load, lack of timesharing and the high worst case
latencies, smaller differences will only lead to complex interference
phenomena that would make it very hard to guarantee that the lower
priority thread can meet it's deadlines.

Now, obviously, this is - at least in theory - different on SMP
systems, where the threads don't necessarilly compete for CPU time
when they want to run at the same time...

> Benno Senoner <sbenno_AT_gardena.net> replied:
> > Hi, yes these plugins are a bit a dilemma, especially when it
> > comes to small block sizes. How does one know for example that a
> > plug can work with a block size of X on a CPU with Y Mhz during
> > worst case scenarios ? (user/audiomation software varies
> > parameters like mad (and thus causing recalculation of tables
> > during each run() cycle) In LADSPA it will probably not make
> > sense, but on MAIA we could implement a system where these
> > calculations are performed in a lower priority thread and where
> > the results are delivered in an atomic way to the run() callback.
> > Otherwise [snip...]
>
> Benno seems to be thinking along similar lines. I don't know
> enough to comment on his MAIA and LADSPA comparison, but I agree
> with his idea of recalculating tables in a lower priority thread.

Also not that such recalculation *can* be considered firm or even
soft RT. If the *inherently* are non-deterministic, this separation
would actually mean that "hard RT broken" plugins become "hard RT
plugins with soft RT event response." The latter would be usable for
all kinds of RT audio stuff, while the former would simply be useless
for serious RT work, as they could cause audio drop-outs.

> Essentially, I'm arguing that the extra complexity of describing a
> few distinct realtime priority classifications is worthwhile,
> because they actually simplify the truly difficult issue in
> realtime systems: spelling out very clearly what the realtime
> characteristics must be for every component.

I think it would be more interesting (and useful) to separate the
parts of plugins, and then use the classification on the *parts*,
rather than a fixed assembly of parts. (Which is currently everything
in run(), and some other things.)

Fundamental parts:
        Instantiation
        Entering "standby" mode
        run()
        Responding to certain events/control changes
        Exiting "standby" mode
        Destruction

Now, the classification is still interesting, but I'm not sure
there's much practical use for something very detailed right now.
(Linux/lowlatency scheduling latency, high CPU load in audio threads
etc.) OTOH, it's better to have some extra info than missing info! :-)

Anyway, I'm thinking about a two dimensional system:

        RT class: None, Soft, Firm, Hard
        Scalability: None, Low, Exact, High

An RT class on "None" means that the operation could take "ages" -
the plugin might load a file, do some raytracing or whatever.

"Soft" means that the operation is usually performed in an amount of
time that a user clicking a button with the mouse would perceive as
"almost zero latency", ie a few ms; at most 100 ms or so. Most stuff
will probably go here.

"Firm" is like "Soft", but the upper limit is somewhere around the
time it takes an "ultra low latency" engine to complete one buffer
cycle, ie around 1 ms.

"Hard" means that the operation takes at most as long as it takes on
average to process one sample with the plugin. This is the only class
of operations that can safely be used inside the audio thread of a
lowlatency RT application - but you still have to take care not to
overload the CPU, as not even these operations take zero time!

As for the Scalability axis:

"None" - this takes the same amount of time no matter what, ie it
doesn't scale with buffer size.

"Low" means that the operation takes less time for smaller buffers,
but not half the time for half the buffer size.

"Exact" indicates that the operation scales practically linear with
buffer size.

"High" means that halving the buffer size results in *less* than half
the time for carrying out the operation.

Now, perhaps one should use figures instead of the Scalability
classes, and perhaps even more factors - but how many plugin hackers
would care to calculate all that properly...?

> This is not a fully thought out proposal, just some ideas I've been
> kicking around. I'm open to comments and criticism.

Some more ideas above. Don't know if I'm making any sense at all...

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
`----------------------> http://www.linuxaudiodev.com/maia -'
.- David Olofson -------------------------------------------.
| Audio Hacker - Open Source Advocate - Singer - Songwriter |
`--------------------------------------> david_AT_linuxdj.com -'


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Dec 15 2000 - 09:05:55 EET