Re: [linux-audio-dev] Problems and Solutions: events v. signals

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Problems and Solutions: events v. signals
From: David Olofson (audiality_AT_swipnet.se)
Date: ke tammi  12 2000 - 23:48:50 EST


On Wed, 12 Jan 2000, Roger Larsson wrote:
> Ok, I was not clear.
> I suggest only one event type == signal update.

Ah, something like this?

struct event_t {
        event_t *next;
        int time; /* Time stamp */
        short channel; /* Channel index */
        void *data; /* New data pointer */
};

That would be a little faster to decode than my events, but it kills
the flexibility...

However, what if some signals are actually special in that they
deliver structures, that look something like my Universal Unlimited
Events...? If the raw event decoding speed is very different, this
could be moving the performance hit where it belongs; where the
advaced stuff is actually used.

I just don't see how this could make enough difference to justify the
performance hit on the Advanced Signal Events (that will have to be
used quite a lot anyway, I'm afraid...), and the extra complexity
that this two level protocol means.

> > > Observation:
> > > What is the difference between a signal and an event?
> > > Not much!
> > > Events happens rarely.
> > > There might be a lot more event destinations than signal/connectors.
> >
> > IMHO, an event tells something about an "unusual" change, while a
> > signal is a continous stream of samples, approximating a continous
> > signal. You never know when you're going to receive an event (unless
> > you peek in the buffer ;-), but you do know that there is signal data
> > to process on all channels until the time of the next event.
>
> Ok, but lets take a pitch control.
> Should moving it it result in events or a sampled signal?
> Usually nothing happens then suddenly it really is used.
> When would you then create events? every time its not the same as the
> previous? or difference bigger than D?
> With sampling you decide the frequency once!
> The same amount of data is sent all the time, plugins get the same
> amount
> of data all the time - no risk of overload due to external activity!

That's an advantage, provided the external activity really hits many
plugins at once. However, the resolution problem is still there. You
*can't* use any fixed rate lower than the required event timing
resolution. Which may mean fractions of samples!

> The externally sampled data can be stored and played back later with the
> same result.

The same is true for events. The only difference is that with events,
you can have µs timing accuracy without having the sequencer
(un)compress gigabytes of data... :-)

> Sampling once per fragment would allow cheap processing - for the plugin
> it would almost look like constant values, check assign once.

But it's still more data to move around, while it disables the whole
idea of sub fragment timing accuracy - the original reason why I
decided on events rather than something like the (very efficient)
system used in Quasimodo.

> Problem:
> * complex net with many rarely changed inputs might waste a lot of
> work.
> * how to handle a keyboard? one output signal per key?

Yeah, that's a problem. That kind of data maps perfectly to events -
that's why nearly all digital systems work that way...

> > > Solution: [to original problem]
> > > All events can be treated as an input signal with per fragment varying
> > > sample rate, if everything that an event might change are assigned to
> > > a signal/connector of its own.
> > > This way all parameters used by a plugin may be a signals too :-)
> >
> > Cool in theory, but there are some serious problems IRL...
>
> I have used a image processing system that does this, WiT by
> LogicalVision
> (http://www.logicalvision.com), and it is more than cool - it is
> close to necessary!

Ok, if you mean connection flexibility, I get the point. :-) But this
is no different from my event system (the new one with channels, that
is :-) - it's even easy to make plugins accept mixed data type on the
same input channel! Just handle the different events for the data
types you want to support. This works for audio channels as well, of
course.

> > > normal signal event chain:
> > > {timeN, signal#A, *buffA1} {timeN+time(buffA1), signal#A,*buffA2}
> > >
> > > event, parameter change, signal chain:
> > > {timeM, param#P, value1} {timeN, param#P, value2}
> > >
> > > If 'time' is treated as described earlier in "P&S: time" there is no
> > > longer any need to distinguish between signals and parameters. They get
> > > with different time formats.
> >
> > ...and these time formats have to be handled, by hosts as well
> > plugins and clients. Great fun having to keep track of N signals that
> > can assume any sample rate the engine considers suitable, right? :-)
> >
>
> Not really!
>
> The engine can only choose among those combinations that the plugin
> accepts!

Yes, I kind of suspected that - the plugins would be of M$ DLL
dimensions otherwise! ;-)

> Thus, the plugin may require that all signals(in/out) uses the same
> frequency.

Yes, anything but that, or simple integer multiples would be hard to
handle... And how does that map onto accurate timing, easy
programming and making efficient implementations possible, both on
the plugin and engine side? (There are people who want to run the
plugins "off-line" as well!) Is it a good idea to *require* any
sequencer-like plugin or client to have a signal data compressor? To
me, that seems to defeat most of our goals for little gain... What am
I missing here?

> > > Note:
> > > * a signal that is event driven. Can normally be driven with a constant.
> > > and only during fragments where something happens it changes to a
> > > sampled
> > > signal format. (even if sampled all the time)
> > > {durationA1, variable#A, *buffA1} {durationA2, variable#A,*buffA2}
> >
> > Who tells you that the signal has changed it's rate? Do you have to
> > check all ports on every call to find out?
> >
>
> The sample rate is a part of each signal fragment.

Ok, that goes into the struct I hacked above...?

> > > * The plugin _may_ chose different implementation depending on input
> > > 'times'.
> >
> > Is that a runtime choice, or what exactly are you referring to
> > here...?
> >
>
> When you get a new fragment you check if it has the same frequency as
> the
> previous, true continue with the current implementation.
> False update plugins instance process pointer with a better version, and
> call it.

Works, but gets very messy with just a fem signals supporting multiple
sample/frame rates...

> > > * Several outputs can not be connected to one input can not be connected
> > > directly.
> >
> > This is good from the engine POV, but it does mean that plugins get
> > multiple input ports. That situation is a complexity nightmare right
> > where things should be kept simple, especially if these ports can
> > have different sample rates.
> >
> You have to store a pointer to the buffer anyway before returning.
> => all buffer pointers needs to be able to store in the instance.
> => they are the signals!

Yep.

> When a signal has no remaining samples you check your event buffer
> for events for that signal. Update the buffer pointers accordingly.
> Check changes in sample frequency. Continue processing.

This works well as long as you can check *only* the timestamp of the
next event, and run the whole loop until then, expecting all signals
to have enough data left. Forget about checking each signal -
conditionals are *expensive*, and even though branch prediction is
improving, this isn't going to change any time soon. (Even the most
advanced architectures have this problem, because they need
pipelines for parallellization of execution.)

> In fact I am planning this slightly different.
>
> When a signal fragment ends (several would end at the same time
> if the same fragment size is used) the plugin returns. It has
> processed a fragment. Signals are not stored in the instance
> they are shared with the engine.
>
> The code should look something like this (inlining done):
>
> void fload_add_process(this, connectors)
> {
> // if allowing frequency changes
> float frequency = connectors.signals[1].frequency;
> if (frequency != connectors.signal[2].frequency ||
> frequency != connectors.signal[3].frequency) {
> float_add_dispatch(this, signals);
> return;
> }
>
> int remaining = connectors.signals[1].remaining;
> assert(remaining == connectors.signals[2].remaining);
> assert(remaining == connectors.signals[3].remaining);
>
> float *pA = connectors.signals[1].pBuffer;
> float *pB = connectors.signals[2].pBuffer;
> float *pR = connectors.signals[3].pBuffer;
>
> while (remaining--) { // or some SMD(sic? = MMX2) instruction
                                        ^^^
SIMD: Single Instruction, Multiple Data.
SMD is a Surface Mount Device - you'll see many of those in your
machine... :-)

> *pR++ = *pA++ + *pB++;
> }
>
> connectors.signals[1].remaining = 0;
> connectors.signals[2].remaining = 0;
> connectors.signals[3].remaining = 0;
> }
>
> // The engine then checks signals:
> // - outputs with nothing remaining are full and can be
> // forwarded to next plugin.
> // - plugins with something on all signals can be run.

Hmm... Back at square one - nothing much left but a buffer for each
signal and each plugin call. This would mork pretty well where it
fits in, but it doesn't look very much like the flexible
plugin/integration API people want to see, rather than inventing
yet another plugin API...

> Lets see - what is the overhead:
> Suppose very short fragments: remaining < 10
> * indirect call
> R0 = read stack relative #this
> R0 = read R0 relative #common.process
> call R0
> => always done
> 3 instructions,
> 2 reads
> 1 indirect call (expensive)
>
> * The frequency check is not needed if this implementation only
> accepts equal frequencies.
> * Get remaining (R4):
> R0 = read stack relative #connectors
> R4 = read R0 relative #(signals+k*sizeof(signal)+remaining)
> => equals 2 instructions, and 2 memory references
> * Get buffer pointers (R1..R3)
> R0 already read
> R1..R3 = read R0 relative #(signals+k*sizeof(signal)+pBuffer)
> => gives 3 instructions, and 3 memory references
>
> * Processing: always done, no _extra_ cost
> test R4
> jump if zero to end // should never happen
> again:
> R5 = read R1 relative; inc R1
> R6 = read R2 relative; inc R2
> write R3 relative = R6; inc R3
> dec R4
> jump if not zero to again
> end:
> => gives remaining*(5..8) instructions,
> remainging*2 reads,
> remaining writes
> remaining jumps
>
> * Write back data
> R0 already read (unless used)
> 3*write R0 relative #(signals+k*sizeof(signal)+remaining)
> => gives 3 (or 4) instructions
>
> * Cache and chipset utilization can be improved by changing
> connector layout to
> connector.signal.remaining[k]
> gives only one cache miss and write combining.
>
> => 2+3+4=9 instructions overhead compared to 5 * fragment size
> instructions processing.
> => with fragment sizes of two are overhead and processing
> comparable...

Good, but is it worth it? It could perhaps fit inside the RTLinux
engine I planned from the start, but even being David "No Latency"
Olofson, I think this is a bit too minimalistic, considering the CPU
power we have these days. I'd agree that MuCoS may not be the perfect
choice for modular softsynth setups with lots of ultra short feedback
loops, of where less than .5 ms latency is the standard setting, but
for all other cases, the flexibility of generic events *will* be
needed, possibly even for the average audio plugin, but definitely
for video plugins and other weird beasts. I think the event system
will handle most tasks nicely, and without too much overhead... (I'd
still like to hack some more prototype code before betting on that,
though! :-)

> > > * One output can be connected to several inputs.
> >
> > Good - no data copying.
> >
> > > Remaining problems:
> > > Will be expensive to check every fragment if there are
> > > lots of channels/signals...
> >
> > Yep, very expensive, I'm afraid. Some smart optimization will be
> > needed just to find out which channels to check in the start of each
> > plugin call, or client "cycle".
> >
>
> Maybe not that expensive...

Depends on how many channels. Unless the "advanced event" signals are
used, you'll ned *lots* of channels for some plugins.

> You only need to check the one expiring.
> And you can specify that your plugin does not handle frequency changes
> or even not handling different frequencies.

That would make life a lot easier for the plugin hacker, but I'm
afraid it'll lead to lots of plugins that have very poor accuracy,
due to the cost of anything but a few hardcoded sample rates.

And hardcoding is a Bad Thing(TM) in most cases.

It belongs inside heavily optimized *implementations*, and shouldn't
be the programming style encouraged by an API.

What I'll do next is have a serious look at the total overhead of
handling events in my system... I've cleaned out a few performance
killing designs already, but obviously, it can never get fast enough!
I still want that flexibility, though...

//David

.- M u C o S -------------------. .- A u d i a l i t y ----------------.
| A Free/Open Multimedia | | Rock Solid, Hard Real Time, |
| Plugin & Integration Standard | | Low Latency Signal Processing |
`------> www.linuxdj.com/mucos -' `--> www.angelfire.com/or/audiality -'
.- D a v i d O l o f s o n ------------------------------------------.
| Audio Hacker, Linux Advocate, Open Source Advocate, Singer/Composer |
`----------------------------------------------> audiality_AT_swipnet.se -'


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:23:26 EST