Re: [linux-audio-dev] Realtime

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Realtime
From: David Olofson (david_AT_gardena.net)
Date: Sat Jul 08 2000 - 06:49:11 EEST


On Wed, 28 Jun 2000, yodaiken_AT_fsmlabs.com wrote:
> On Tue, Jun 27, 2000 at 10:22:26PM +0200, Benno Senoner wrote:
> > Hi,
> > on this list, we (minly David Olofson) investigated using RT Linux for low
> > latency audio for some time, but came to the conclusion that it is not the way
> > to go.
> >
> > here some of the reasons:
> >
> > I ma speaking about multimedia apps in general which comprise audio , MIDI and
> > video and possibly other fields.
> >
> >
> > - RTLinux will not spread well enough on the desktops thus by requiring
> > RTLinux for your realtime-audio-app, you are lowering the potential
> > target userbase in a huge way.
>
> RTLinux is now available as RPM. I agree that it would be easier to have
> it on distributions, but it's not a very hard problem to include
> a working kernel with a CD.
>
> > (One solution would be to convince redhat, suse, debian, mandrake and
> > all the others to ship RTLinux instead of Linux but that would he hopeless
> > as well)
>
> Some of them are about ready.
>
> >
> > - multimedia apps need to interact with multimedia devices (soundcard,
> > framebuffer, MIDI interface etc) , thus using RTLinux would require a complete
> > rewrite of all these drivers.
> > no chance.
>
> I don't see why it would require a "complete rewrite".

It doesn't in my experience; after doing some basic groundwork, it's
basically a search/replace operation, and 90% of the porting is done.
The trickiest parts (not very tricky, actually) is to fix the
fiddling with Linux structs that some drivers do. The 2.4 interface
seems to have encapsulated most of this in macros that could be
replaced to generate RTL compatible code.

(My DPI was aiming to have the same driver code handle both RTL and
user space requests at run time to allow sharing of cards with
multiple resources. An interface layer that forwards user space calls
via RTL and the posixio interface could be an alternative - that
way, the drivers can be compiled with RTL code only, but still be
used from user space as well as from RTL threads.)

> > - memory usage: audio (and multimedia apps) need tons of memory,
> > often dynamically allocated during program execution.
> > David told me that RTLinux can only manage a certain amount, fixed at module
> > insertion or so.
>
> That's not really correct. It is quite easy for RTL to ask Linux kernel or
> user applications to allocte more memory. I will send you an example
> if you like.

Exactly; the issue is actually dynamic allocation in general in a
system that may swap to disk - and that hits SCHED_FIFO in user space
as well. Only init and exit code may deal with that level of memory
allocation (possibly preallocating a heap for a privare RT memory
manager), and that's equal for SCHED_FIFO and RTL. As to the amount,
I might have mentioned the 128k kmalloc() limit, but that's not
really an issue either, not with recent kernels anyway.

> > Thus making the efficient managing of large amounts of memory quite difficult.
> > (eg if I want to load a 100MB WAV file, I want to alloc the memory just before
> > loading, then load the file, do my RT stuff on it, and then free() it , so
> > that other apps can take advantage of the 100MB of RAM)
> > (correct me if I am wrong)
>
> You are. RT components cannot directly alloc memory because there is
> a chance in such allocation that there is not enough memory and the
> requestor will suspend. But there are many methods of allocating via
> Linux.

One way would be for the RTL threads to treat non-preallocated RAM
the same way as a direct-from-disk sampler would treat sample data:
Make sure the starting points are always loaded and locked, and then
stream the rest as needed. Just keep in mind that it might actually be
the HD that has to get to work swapping to make that RAM available,
so the preload and buffering needs to be calculated the same way as
for the sampler...

> > - memory protection : in userspace I have full memory protection that means
> > in the case of a bug I get a segfault at maximum but no crash.
> > I don't know if RTLinux provides memory protection right now, but last I
> > checked there was not support for it.
>
> RTLinux modules are like drivers -- if you can accept drivers in kernel
> space, you should not have a problem with RT components.
> But if protected components are necessary, we can do that too.

The problem is that there is no way to ensure that audio plugins have
the same quality as Linux kernel production code. Although the
host/engine can be seen as a driver, and maintained in pretty
much the same way, plugins are closer to applications than they are to
drivers, and will most probably tend to crash very frequently, with
Linux kernel measures.

> > - under RTLinux you have to take special measures in order to ulitize the
> > FPU , since multimedia apps make heavy use of FPU instructions, this can
> > become a problem.
>
> To use fpu in a RTLinux thread you must write the following code
>
> pthread_setfp(thread); /* mark the thread as using the FP */
>
>
> >
> > - developing (and debugging especially) an application for RTLinux is not as
> > easy as writing a simple userspace application.
>
> Yes. This is abosolutely correct -- but realtime is harder than nonrealtime.

Yep... Especially when the bug you're after lies somewhere in the real
time interaction between your threads and IRQs. The good ol' debug
message methods come in handy! :-)

Oh, BTW, the other day I hacked a little tool that provides kernel
code with a simple TSC timestamped debug message API, which produces
data that is displayed as text and/or graphics by a user space
application. (Using GGI for the graphics, as it's pretty fast and
simple, and runs on X, svgalib, fbdev etc, and because I was playing
with GGI for other reasons anyway. :-)

Not specifically for RTL, but since it uses lock-free FIFOs, all
that's needed is changing the spinlocks in the API calls to RTL
spinlocks. (They deal with the multiple writer issue, in case more
than one thread sends debug messages.)

I'll release it under the GPL if there's interest. It'll probably
evolve quite a bit more, as I'm also using the tiny gfx/visualization
lib for various experiments, tuning and analyzing strange kinds of
regulators and other stuff.

> > Plus speaking from a developer POV, we should not let it happen
> > that complete and complex synthesizer/samplers run within
> > the processor's Ring 0 , as it happens under Windows right now.
> > (they are forced to do so in order to get some realtimeness out
> > from that so called "OS")
>
> The interesting, at least to me, question is whether one can structure
> a synth/sampler as a relativley simple RT component and a more
> complex user space component that work together.

One *has* to do it that way anyway, whether running the engine under
RTL or as a SCHED_FIFO thread.

Most existing Linux audio applications had faulty designs in this
respect, with the result that various GUI toolkit phenomena
interfere with the audio processing, making the applications unable to
take full advantage of Linux/lowlatency.

That is, the problem is very real already with Linux/lowlatency, and
these applications have to be fixed, or simply not considered as true
real time applications, therefore not being candidates for addition
of RTL support.

The step to RTL should be rather smaller for any true SCHED_FIFO real
time apps, as there will only be "real time compatible" code in
correctly implemented RT engines. The only problem lies in providing
the required hardware drivers in RTL versions, and providing a few
low level library function replacements that lots of user space code
tend to use. (Most of the code for the latter is probably written
already by the RTL community.)

> > So yes, there might be some very specialized applications which would
> > require VERY low latencies (down to the sample level or so) requiring
> > and RTOS, but the other 99.99% of cases can be solved with the userspace
> > solution, easing life of all involved parties.
> >
> > Victor, your thoughts ?
>
> Well, I am very interested in the question of whether the RTLinux model
> of breaking a RT application into tiny RT parts and user parts can work
> in the audio domain. Speaking with the authority that comes from
> knowing absolutely nothing about the field, it seems to me that at least
> some apps can be structured as
> RT part:
> do{
> samples
> computes using table share with user code
> outputs to device and to shared memory
> check commands, if any, from user app
> }

Yep, that's pretty much it. Actually, most plugin APIs for RT plugins
are based on the idea of processing an array of samples at a time,
(much like array optimized math libraries do it), which fits
perfectly into this design as well. It's not viable to process one
sample at a time at audio sample rates, as the overhead would leave
virtually no CPU power for the actual processing. This may well change
within one or two CPU (or rather, system RAM) generations, though!

As to application fitting within the RTL model; just as in any control
system, any hardware that needs to be controlled in "RTL class" real
time *has* to be run via an RTL driver for RTL threads to make sense.
For some reason, this seems to be hard to grasp for some developers
not familiar with control engineering, but it's inevitable
nevertheless. Designing hard RT applications according to the hard RT
rules (ie the laws of nature; time in particular) is not optional, so
if you want hard RT, the RTL model probably has to fit rather well
automatically.

The only "problem" I can see is the interface between the two parts,
but I actually see the somewhat greater distance between
SCHED_OTHER and RTL threads compared to SCHED_OTHER/SCHED_FIFO as a
design advantage: It forces you to do the right thing and design a
real interface! This is pretty likely to pay off later on, when you
need to extend your system with new features.

> So if there is some way we can help to make this or a better way of using
> RTLinux convenient for developers, I'd like to do so.
> And if Linux kernel user programs offer a better avenue, that's not
> something that will make me upset.

It looks like Linux/lowlatency is more than good enough even for high
end audio processing, and it's definitely good enough for video
(significantly lower frame rate requirements than audio), but you
never know where users will put Linux to work once they realize it
can do 3 ms input->output latency. "What if it could do 1.5? Then I
could run two machines in series and still do real time monitoring!"

What currently looks like the strongest motivation for using RTL for
audio right now, is SMP and cluster systems. When running multiple
plugins on an SMP system, you basically have two ways to go:

1) Pass the output audio data from one CPU to the next one;

   in ---> Plug A --> Plug B ---> Plug C --> Plug D ---> out
          |_________________| ^ |__________________|
                CPU 1 | CPU 2
                              |
        Troublesome sync point, where you get an extra
        doze of the dreaded IPC/scheduling latency...
        This is where RTL gets in; to cut this latency
        to a minimum, to sustain a low total latency.

2) Interleave the time axis instead of splitting up the net;

   in ---> Plug A --> Plug B --> Plug C --> Plug D ---> out
          |_______________________________________|
                    CPU 1; odd buffers
                    CPU 2; even buffers

   Here, you only need to make sure that each plugin a CPU
   is about to execute isn't runnig on the other CPU. (If it
   does, that means your initial state data doesn't exist
   yet.) As the two CPUs will be running 180 degrees out of
   phase, there will only be a maximum of 50% overlap, which
   is at 100% total CPU load. Provided you have more plugins
   than CPUs, and that no single plugin uses up more than
   (100/num_of_CPUs)% of the total power, the plugin_return -
   next_plugin_start times will be well on the plus side, so
   all you need is a simple spinlock, in case something
   should temporarily disturb the timing.

   The drawback with this method is that the plugin state
   data will have to constantly migrate from the cache of one
   CPU to the next. For plugins with little state data (like
   IIR filters with only one accumulator variable per 6 dB
   filter), this is a non-issue, but reverbs and other
   effects that may use a lot more instance data than thay
   use audio data will suffer painfully, especially on
   systems with slow RAM<->chache transfers.

   Systems with dedicated, high speed CPU cache->cache
   transfer logic would be heaven! :-) (IIRC, some high end
   architectures support that kind of things.)

The point; the second alternative would work well with lowlatency, as
it doesn't really do active synchronization between the CPUs. (When
one plugin has been executed on one CPU, it'll take a while before
it's being scheduled on the next CPU.) However, as mentioned above,
this alternative has drawbacks, and it may not work well in all
situations, and then there is only one way to go: make sure that
*buffers* can migrate directly from one CPU to the next with minimun
latency. Statistically, every other plugin execution will start with
a block because the plugin that should provide the input data is
still running on the previous CPU... RTL might be the only way to
keep the sum of all those wake_up() --> plugin execution delays at an
acceptable level.

//David

.- M u C o S --------------------------------. .- David Olofson ------.
| A Free/Open Multimedia | | Audio Hacker |
| Plugin and Integration Standard | | Linux Advocate |
`------------> http://www.linuxdj.com/mucos -' | Open Source Advocate |
.- A u d i a l i t y ------------------------. | Singer |
| Rock Solid Low Latency Signal Processing | | Songwriter |
`---> http://www.angelfire.com/or/audiality -' `-> david_AT_linuxdj.com -'


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Sat Jul 08 2000 - 12:06:44 EEST