Subject: Re: [linux-audio-dev] MuCoS, Glame, API Issues
From: David Olofson (david_AT_gardena.net)
Date: la maalis 11 2000 - 18:59:27 EST
On Fri, 10 Mar 2000, Richard Guenther wrote:
[...]
> > In low latency real time systems, the only reliable (and simple
> > enough to actually realize) way it to run many plugins together in
> > one or a few threads. The most common approach is to have a single
> > engine thread that calls plugins one by one in a strictly controlled
> > order, as defined by the net description. This means less thread
> > switching overhead and no plugin/plugin CPU time interference.
>
> Ok, so its kind of "software scheduling". Obviously if you have n
> processors available in your system you would have n engine threads? Well,
> the scheduling problem will be quite complex in this case considering that
> the network is a directed (acyclic?) graph. Its NP-hard, isnt it?
No, not normally. In general, this kind of systems run plugins one
by one in an order based on input/output port dependencies. Now, if
you have multiple CPUs, there are basically two different ways to go:
* Split the net accross the CPU, with as few dependencies as
possible between the threads.
+ Very efficient for some nets, where the CPUs can run
independently most of the time, and only sync once for
input and output data.
- Very inefficient when there are lots of inter-thread
dependencies within a single buffer cycle.
* Use only one three, and run it interleaved on all CPUs.
(That is, when one CPU is in the middle of processing one
set of input buffers, the next takes over the plugins that
have been run so far, and starts to process the next set
of buffer input buffers. This is slightly similar to
double and tripple buffering of video displays.)
+ Allows chains of plugins to execute in parallel.
- Requires all plugin state data and buffers to move
from CPU to CPU, which can be a serious cache killer
with some plugins.
> > > > > - filters can be combined to filternetworks, every filter runs as a new
> > > > > thread(we're using pthreads)
> > > >
> > > > Although this allows a great deal of flexibility, it's definitely a
> > > > no-no for low latency audio in anything but RTOS environments like
> > > > QNX or RTLinux. There is too much overhead involved, there is a risk
> > > > of your engine being stalled by some kernel operation (even on
> > > > lowlatency kernels!), since each switch involves the kernel directly,
> > > > there are problems with scheduling multiple RT threads in low latency
> > > > settings (no preemptive/timesharing switching among SCHED_FIFO
> > > > threads), and some other things...
> > >
> > > RT threads are bad.
> >
> > No. They are *required* for certain tasks.
>
> Ok, lets lay for glame they're not an issue (right now) :)
Ok. :-) Low latency hard real time processing without having to build
my own DSP boards or using proprietary solutions is the main reason
why I got into this.
> > > They're messy in case you get more than one of it.
> >
> > It's just a different kind of programming. Not always trivial, but
> > you can't avoid it and still get the job done, unfortunately.
> >
> > > As for the latency, on a P100 I get a latency caused by the thread/socket
> > > overhead (on a modestly loaded system) of ~0.06ms using 3 threads, ~0.15ms
> > > using 10 threads, with 2000 threads the latency (measured by "pinging"
> > > through a linear loop of connected filters) goes up to ~10ms - this is not
> > > to bad (its a P100!!). A quick test on a dual PIII-550 shows (it has a
> > > load of 1 at the moment) ~5ms using 256 threads (the 3 threads case is
> > > ~0.05ms) - so the latency is not strictliy cpu bound. An IRIX box w 4 cpus
> > > shows ~0.2ms for 3 threads and ~8ms for 80 threads (cannot go up with the
> > > # of threads, seems to be limited per user).
> >
> > We're talking about *peak* latency here. Drop-outs are not acceptable
> > in a real time audio system, and the worst case scheduling latency of
> > the kernel can appear as frequently once per kernel call, if you have
> > too much bad luck. (Which happens all the time in a real system.)
>
> Well to prevent dropouts we do buffering. Of course the maximum buffer
> size determines the "peak" latency in the net.
Buffering = monitor latency or even noticable "control fiddling ->
output delay".
And, if you don't have deterministic scheduling, you can *never*
guarantee that you won't have a drop out, even (in theory) with one
hour of buffering...
[...]
> > It's not the average overhead. It's the fact that you may be stalled
> > by some other part of the system (kernel) holding a spinlock - and
> > miss your deadline, even though there is unused CPU time.
>
> Of course, but the kernel does not provide hard RT either, so you may get
> stalled for too long time anyways if you are RT or not.
Not true. The lowlatency kernels won't stall you for more than a few
100 µs worst case, as long as you don't get stuck waiting for some
spinlock belonging to a slower subsystem.
If you need better performance, there is always RTLinux, which even
prevents interrupts from being disabled. Peak latencies of less than
10 µs have been reported on Celeron based production systems.
> > > > Anyway, sorry for the harch reply - and for being incomprehensible,
> > > > if that's the case. That's just because we've been discussing all
> > > > forms of plugin API issues for some months around here, and
> > > > personally, I'm very interested in hard real time issues, as that is
> > > ^^^^^^^^^^^^^^
> > > Linux is not hard real time. Its soft real time.
> >
> > Standard Linux, yes. With the lowlatency patch (and hopefully 2.4),
> > this is no longer the case.
>
> Err? Linux is not hard realtime - even with the lowlatency patch. Hard
> realtime is about ensuring a maximum latency for everything.
No. Not even QNX (which claims to be a hard RTOS) and other complex
RTOSes can do that. Some operations are inherently too undeterministic
to be called hard RT in real life situations, but these RTOSes are
still capable of performing them.
A real time *system* (not operating system) must be capable of doing
it'n hard RT tasks in time. It doesn't matter if the system has
non-RT subsystems or not, as the hard RT tasks do not depend on them.
(As for "hard" in "hard real time"; this means that the system should
*never* miss a deadline, as opposed to a "soft real time" system,
which can be allowed to miss a deadline occasionally. It has no other
meaning.)
> (Of course if
> you set this latency to 1 hour, linux is hard realtime ;)) Linux with the
> lowlatency patch can only guarantee that f.i. you get <10us latency with
> a high probability - this is not hard realtime.
This is not true. Granted, there is no such thing as a system that
can't be made to fail one way or another, but the test data on
Benno's site is not faked...
http://www.gardena.net/benno/linux/audio/
> > > And soft is relative here, too.
> >
> > Soft vs. hard can never be relative. A missed deadline is a missed
> > deadline, period. (Ok, allow for hardware failures and the like.
> > There is no such thing as a true guarantee in the real world.)
> >
> > > Audio is not hard real time stuff either
> >
> > It is indeed. Drop-outs during live concerts may cost you your job
> > and reputation. (That's why most live systems are still analog - not
> > even dedicated digital systems are reliable enough!)
> >
> > > - until you want
> > > ~1 sample latency from audio input to output. There is buffering!
> >
> > Buffering has nothing to do with it! The definition of a hard real
> > time system is that it can quarantee a correct response within a
> > certain maximum time frame. That time frame might well be 2 hours,
> > but it's still a hard real time system.
>
> Hehe :) There's the 2 hours... - make it say 1ms and linux will not meet
> the criteria for hard realtime.
So? There is no such thing as a "required maximum latency" that an
OS has to be capable of handling to be a real RTOS.
Aren't some kinds of life supporting systems hard real time just
because they doen't have to react faster than within a minute?
The definition requires an RTOS to be capable of guaranteeing a
_maximum latency_. Figures are not of interest to the definition. If
there is a figure, the system is hard RT, and you can design around
that figure, or chose a higher performance solution. If there is no
figure at all, you're basically screwed from the start, as you cannot
guarantee anything, no matter how much buffering you use.
//David
.- M u C o S --------------------------------. .- David Olofson ------.
| A Free/Open Multimedia | | Audio Hacker |
| Plugin and Integration Standard | | Linux Advocate |
`------------> http://www.linuxdj.com/mucos -' | Open Source Advocate |
.- A u d i a l i t y ------------------------. | Singer |
| Rock Solid Low Latency Signal Processing | | Songwriter |
`---> http://www.angelfire.com/or/audiality -' `-> david_AT_linuxdj.com -'
This archive was generated by hypermail 2b28 : su maalis 12 2000 - 09:14:06 EST