Re: [linux-audio-dev] Re: Plug-in API progress?

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Re: Plug-in API progress?
From: David Olofson (audiality_AT_swipnet.se)
Date: pe syys   24 1999 - 17:42:06 EDT


On Fri, 24 Sep 1999, Paul Barton-Davis wrote:
> >...but clustering was just what I had in mind, which would mean that the
> >engine, as you defined it, would be distributed over multiple machines.
> >
> >And I'm not suggesting that the whole cluster should be involved in _low
> >latency_ processing - 50 - 100 ms latency can be perfectly fine in many
> >situations, and is certainly a lot nicer than off-line processing from the end
> >user POV.
> >
> >Also, I'm not thinking TPC/IP here. A (Beowulf class) cluster is a pretty
> >specialized form of network anyway, and I'd use drivers ported to RTLinux,
> >emulating the shared memory style IPC used on "real" supercomputers. That's a
> >very big difference, and the _hardware_ isn't really the problem here. People
> >are successfully using standard ethernet cards for real time streaming already
>
> Sorry, this is wrong. I spent several years running those "real"
> supercomputers (Sequent, KSR's, nCube, etc), and its not true that
> there isn't a hardware problem. Real time streaming is a completely
> different problem - its fundamentally bandwidth related.

There's a problem with applications where threads will block waiting for data
to get through all, the time... That is, applications that do not map well to
parallel processing. But do audio engines really belong in that class?

> For
> event-driven (and by event, I refer not to your proposed event system,
> but to MIDI and X and timers) stuff, the individual message latency is
> a fundamental problem.

Of course, and I'm obviously being very incomprehensible if I seem to suggest
that's not the case...

> When I worked in the CS dept at UofWashington,
> there was a graduate student working on this exact problem. Its very
> very hard with regular networking hardware to get the latency for a
> single message low enough to provide a shared-memory like
> environment.

Yes, since applications designed for shared memory machines expect virtually
*no* latency. I'd say it's nothing but a design error building a new
application like that, if it's supposed to run on systems with "shared memory"
latency. (Unless it's dictated by the nature of the problem.)

> There have been some really cool tricks that have enabled
> Beowulf to take off, but a 100ms delay in response to a MIDI NoteOn is
> going to make you the laughing stock of AES :)

Yep, and I'm not suggesting to run soft synths hooked up to input devices on the
cluster. This is what the RT node with the audio cards is for. You don't play
*everything* at once using real time control, do you? (And if you do, you're
probably better of using the nodes as stand-alone synths with digital audio
interfaces for the communication.)

(BTW, I don't think I should have mentioned shared memory in this context -
emulating a shared memory environment isn't the right way to use ethernet like
hardware. Guess I wasn't awake enough to separate this discussion from that on
my event system...)

> Beowulf is a really
> cool and fabulous system, but its success is entirely in areas where
> the workload can be divided into reasonable large portions that don't
> require very much intermediate synchronization.

I'd say signal processing sub nets fit that description pretty well. (See above
digital audio interface suggestion, BTW - perhaps even ethernet hardware can
do that with RTL drivers...)

> Even on the KSR, which
> had a *much* faster inter-processor bus than ethernet, people doing
> heavy numeric processing that did not have this characteristic
> (i.e. there was a lot of read/write activity on the mythical "shared
> memory" that actually translated into invalidations of the local
> processor caches) found that their performance sucked. they had to
> switch to a NUMA model to get things to really fly, which was hardly
> the point of the KSR.

Of course. Who would expect otherwise?

> So, I don't think that clusters are viable for real-time audio
> generation when you want sub-100ms event latency. They *are* fantastic
> as rendering farms, the way that ILM uses them, for example. Its easy
> to imagine some very impressive audio generation taking place on a
> cluster, but without any input devices to "disturb" the computation.

Exactly, but who's defining anything worse than 100 ms as non-real time now? :-)

True, I'd sure like to do *everything* with sub 1 ms latency, but that's just
not going to happen. (Ok, perhaps on a quad Athlon 800 or some SMP Alpha
monster. But you always find something that needs more power sooner or later,
right? ;-)

Different solutions for different problems - and I can't see a valid reason not
to use multiple kinds of solutions together. I mean, you *do* use normal hard
drives for recording, even if the engine requires far lower latenices than those
can handle, don't you? That doesn't make sub 5 ms processing latency pointless;
nor does the 1-3 *seconds* range disk->engine latency make the 5 ms latency
processing impossible or useless.

[...]
> >Which would cause most network load and latency; GUI<->X communication, or
> >GUI<->engine communication?
>
> Thats a good question. I actually don't know. It depends on the nature
> of the X stuff. X sucks for some kinds of event streams, particular
> those involving lots of bitmaps going over the server. But for mouse
> and key events, its pretty damn efficient. If the server has its
> pixmaps already loaded up so that knob twiddling didn't involve any of
> the costly stuff, then I wouldn't be suprised if X communication cost
> no more than a custom designed (*and* debugged!) GUI<->engine communication.

Yes, that's pretty likely. X is rather impressive sometimes. (Apart from being
ages ahead of certain M$ "inventions" long before those existed...)

[...]
> >process(...**inputs, ...**outputs, ...**events, samples)
> >{
> > int current_sample = 0;
> > int current_event = 0;
> > int count;
> > while(current_sample < samples) {
> > /* process until next event should take effect */
> > count = event[current_event]->time - current_sample;
> > current_sample = [current_event]->time;
> > while(count--) {
> > process one sample;
> > }
> > /* handle event... */
> > .
> > .
> > .
> > }
> >}
>
> how can this work ? lets suppose that there are no events pending. you
> just call process(..., 64) to generate the next 64 samples. someone
> causes a MIDI CC message that is supposed to alter how the plugin
> works. this is presumable queued in `events', but how is the plugin to
> know to look for it ? it will be queued up after the terminator event
> in the `events' "list" you pass in, so it can't see it during this pass.

You can't see *anything* that happens during the execution of the process()
function anyway, since in the normal case, it will execute in a fraction of the
time it takes to play the buffer. You could approximate the entire process()
call as an atomic operation with insignificant execution time, when viewing a
complete processing net. If there are 10 plug-ins, each one would only be open
for event changes during 10% of the total processing time (100% CPU load...),
which means it's quite pointless to care about any news while executing the
process() call. Plug-ins are not executing in parallel...

> so in this case, you've got a 64 sample event latency. to get this
> down to 1, you've got to tell the plugin to only generate a single
> sample.

True. That's a law of nature. Time...

> the way that quasimodo (+supercollider +csound) would handle this is
> that the thread handling MIDI input would cause a callback to run. the
> callback would fiddle with the parameters of the plugin (without
> talking to the plugin, or queing anything up anywhere), and if the
> plugin is running, it will simply use the new value.

...and that happens only a fraction of the times you get an event. And unless
you're only running one plug-in, that takes nearly 100% of the CPU power, the
resulting effect on the output will not have much to do with the actual timing
of the input event.

> perhaps i'm missing something here, but it seems to me that you're
> proposing a polling system with an event latency equivalent to the
> number of samples generated per control-cycle/call to process(). This
> view seems to be reinforced by the following:
>
> >Ok, to put it simple: I build a structured description of what I want the
> >plug-in to do, in the form of one or more events in a shared memory buffer. As
> >the engine does it's event routing for the whole processing net for each turn,
> >the events will get processed by the recieving during the next buffer period.

Yes.

> right, exactly - "during the next buffer period". so your event
> latency is bounded by the control cycle size/buffer period.

Yes, but it does *not* quantisize events to that resolution. The latency is
fixed, with sample accuracy.

And, as you're really keen on cutting the latencies no matter what effect it
has on the jitter ;-) - my system does allow that events are sent to plug-ins
asynchronously while other plug-ins are executing. With 10% of the CPU in each
plug-in, that adds an average of 5% latency compared to your system, as my
system doesn't accept events when the recieving plug-in is in the process()
function. But that means you have to send the events with time == 0, unless you
want a jitter phenomenon caused by some events getting handled in the current
engine loop, and others in the next, but using the same position in the
buffer...

> >The only difference between "polling" once for each engine loop, and
> >just altering the values directly in the DSP code's variables is that
> >you get the same real time deadline for all plug-ins with the
> >"polling" system. The resolution (leaving out event time stamps) can
> >be one buffer in both cases, but the timing accuracy is independent
> >of the latency with the event system.
>
> directly altering the DSP code's variables doesn't change the
> real-time deadline for all plugins. they are running without any
> knowledge that their parameters are being played with. they just know
> they are supposed to generate X samples and return. Quasimodo doesn't
> want plugins to know the real time. they use DSP time, which is
> constant during an entire control cycle. that is, the timestamp during
> execution of the first plugin is *always guaranteed* to be the same as
> that during the execution of the last plugin for any given cycle. it
> is extremely rare that a plugin ever needs to know the "real" time.

Hmm... To me, this sounds like you're actually talking about my system.
Certainly, your plug-ins don't know about real time in the normal case - but
how do you guarantee that the time stamp is the same for all plug-ins during
one cycle, if "events" are allowed t take effect in the middle of the cycle?
*That* makes real time look different depending on when in the cycle a plug-in
is executed... Or: If all plug-ins are working on buffers with the same start
time stamp, they must execute in parallel - or the plug-ins will get different
offsets between real time and DSP time.

//David

 ·A·U·D·I·A·L·I·T·Y· P r o f e s s i o n a l L i n u x A u d i o
- - ------------------------------------------------------------- - -
    ·Rock Solid David Olofson:
    ·Low Latency www.angelfire.com/or/audiality ·Audio Hacker
    ·Plug-Ins audiality_AT_swipnet.se ·Linux Advocate
    ·Open Source ·Singer/Composer


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:27:12 EST