Re: [linux-audio-dev] XAP spec - early scribbles

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] XAP spec - early scribbles
From: Tim Hockin (thockin@hockin.org)
Date: Thu Feb 06 2003 - 00:41:44 EET


> Well, you know my opinion about what controls plugins.... ;-)

how about:

  * Host
        The program responsible for loading and connecting Plugins, and for
        providing resources to Plugins. Hosts are generally what the user
        interacts with.

> > * Control:
> > A knob, button, slider, or virtual thing that modifies behavior of
>
> I think it might be a bit confusing to drag GUI widgets and stuff in

  * Control:
        An abstract object that modifies some aspect of the Plugin. A
        control can be like a hardware knob, button, slider, or it can be a
        virtual thing, like a filename. Controls can be master (e.g. master
        volume), per-Channel (e.g. channel pressure) or per-Voice (e.g.
        aftertouch).

> > * EventQueue
> > A control input or output.
>
> Confusing. Sounds too much as if an EventQueue would be more or less

  * EventQueue
        The mechanism by which Events are passed to or from Plugins.

> > Plugins may internally have as many
> > EventQueues as they deem necessary. The Host will ask the
> > Plugin for the EventQueue for each Control.
>
> Yep. The effect of this might be cool enough to deserve a comment;

it is commented in the details section :)

> There will never be "output EventQueues", since you're always sending
> directly to the connected inputs, as far as plugins are concerned.
> Whether you're actually sending directly to another plugin, or to a
> host managed sort/merge proxy queue is pretty irrelevant to the API
> and plugins, since it's a host implementation thing.

Well, there is an output-queue, it just might be someone else's input queue.

> > All XAP audio data is processed in 32-bit floating point form.
> > Values are normalized between -1.0 and 1.0, with 0.0 being silence.
>
> I think "normalized" is the wrong term here, since it can't be more
> than a 0 dB reference. There's no useful way to actually *normalize*
> audio data.

I've always heard it called normailzed, and it jives with what I know. It
CAN be more than 0dB - that is one big plus of float data. You can way
exceed 0dB. If you don't lower it before the final output, you'll get a
digital clip, but you can run your whole chain REALLY hot, then gain reduce
before output, and still be OK.

> All I can think of right now is that worker callback API I proposed
> for file I/O, heavy waveform rendering, dynamic memory allocations
> and other inherently non RT safe stuff. You'd make a function call to
> the host telling it to call a specific function from some suitable
> thread (which could mean it's just called directly), and then you get
> an event from the host when the function returns.

That is in my notes, just not in here, yet.

> Rather similar to MIDI channels, but that's about as much as we seem
> to agree on so far. This is closely related to I/O layouts and the
> metadata to describe those, and we have some work left to do in that
> area.

yes - I've intentionally left that blank for now - I want to get all my
notes coherent before opening YET ANOTHER can of worms :)

> And connecting an audio port means handing it a buffer pointer, among
> other things? That is, buffer pointers aren't changed frequently, and
> never silently.
>
> I *think* this should do, since it does not conflict with reuse of
> buffers. Hosts can just figure out optimal buffer usage patters when
> building nets. Right?

That is how I have envisioned it - does this need special annotation?

> > (*) The descriptor for wrapper Plugins may change upon loading of
> > a wrapped Plugin. This is a special case, and the Host must be
> > aware of it. //FIXME: how?
>
> That's messy stuff... It means that descriptors become somewhat
> dynamic data belonging to an *instance*, rather than static data
> related to a class. This kind of suggests that that third state of
> PTAF is a pretty good idea, since it automatically allows each plugin
> instance to mutate and become an instance of another class - which is
> exactly what we need here.

yes - this is marked FIXME: for a very good reason. It is a feature we
need, and we need to design around it, but we need a solid foundation first
:)

> 4, if you count "CLASS" as a state - or whatever we should do to
> handle this "mutating plugins" stuff...

Let's discuss how to do wrapper plugins in a separate thread. The
implications are deep. :)

> > 2.2.1 Create
> >
> > A Plugin is instantiated via it's descriptor's create() method.
> > This method receives two key pieces of information, which the
> > Plugin will use throughout it's lifetime. The first is a pointer
> > to the Host structure (see below), and the second is the Host
> > sample rate. If the Host wants to change either of these, all
> > Plugins must be re-created.
>
> I'm afraid this won't work for all hosts. There are situations where
> you might want to run a plugin under a fake copy of the host, in
> another thread, to perform non RRT safe operations without disrupting
> the rest of the network.

> This is hairy stuff, though, and I'm thinking that it would be nicer
> if all plugins were *required* to be RT safe - even if they do it by
> sending unsafe operations off as worker callbacks. That won't
> guarantee real time response to the offending events, but it'll keep
> plugins from stalling the audio thread.

How does this apply to the host struct, though? Do we need to change the
host struct? hmmm

> > 2.4 Port Setup
> [...disable etc...]
>
> Nice. Can be used for "optimized silence", of course - which leads to
> a question: How about *outputs*? Plugins that keep track of whether
> or not they're actually generating audio would need a way to mark
> each output as "audible" or "silent", before returning from each
> process/run() call.

Yes - we have a few options here. This is something that can save MASSIVE
CPU, and is really needed.

If a chain of plugins starts with a voiced instrument, we have a clue when
there is nothing coming down the chain (instruments need to confirm it, of
course). Each plugin in the chain might eventually be silent, with silent
input. Silence needs no processing.

We can track silence per Port, or just per Plugin (simpler, I think).

Tail size is a good indicator - if we know that the plugin before the
current plugin output silence, then we need only wait for the tail on this
plugin to end and then it will be silent. This relies on plugins getting
their tail-sizes correct. Tail can be 0, finite, infinite. Infinite tails
always need processing

We can also do a call and answer style - the host tells the plugin that it's
feed is silent. When the plugin has died down and is also silent, it
responds to the host that it is silent. Then the host can hot the bypass
switch (is bypass optional?) or remove it from the net, or just stop
processing it.

> > In addition, there is an
> > EventQueue for each Channel and a master Queue for the Plugin.
>
> Why make this explicit? Is there *any* case where you would send
> events to a specific Queue, rather than some target you're connected
> to?
>
> Either way, I think it's a bad idea to enforce that these per-channel
> and master Queues should actually be physical Queues. The problem
> with it is that it forces plugins to sort/merge and/or dispatch
> events internally, unless their internal structure happens to match
> the master + channels layout.

I didn't mean that they are separate queues - the host will ask the plugin
for an event target for the master and each channel. It may be the same
Queue. Should we define Event Target as a term and use it?

  * EventTarget:
        A tuple of an EventQueue and a cookie. Plugins have an EventTarget
        for each Control, as well as a master and per-Channel EventTarget.

> > The Host queries the Plugin for EventQueues via the
> > get_input_queue() method. In order to allow sharing of an
>
> I would prefer calling it get_control_target() or something, given
> that what you actually get is not a Queue, but rather an "atom"
> containing a Queue pointer and a cookie. Each of the two fields of
> this "atom" (XAP_target) is completely useless on it's own, but
> together, they form the complete address of an input Control.

> > Controls may also output Events. The Host will set up output
> > Controls with the set_output_queue() method.
>
> How about connect_control_output() or similar? Again because queues
> have very little relevance on this level, and because "connect" is

Adjusted to get/set_event_target(). Good?

> Another thought: Would plugins want to know when their inputs get
> connected and disconnected?

That is why these are methods - a guaranteed trap for connecting ins and
outs, just like Ports..

> It must be explicit that you're supposed to process events for an
> audio frame *before* processing the audio data for that frame. If it
> isn't, plugins will behave differently in ways that might break some
> hosts.

added

> More importantly, when do you stop processing events? There are two
> alternatives:
>
> 1) Process only events that affect audio frames that
> you're supposed to process during this block.

Which (not obviously) means 0 frames = no work.

> 2) Process as many events you can, without skipping
> into the next block.

Process all events up to and including 'now'. For a 0 frame run(), all
timestamps are 'now'. The Plugin must process all events for now.

> > Plugins can assume that VVIDs are at least global per-Plugin.
> > The host will not activate the same VVID in different Channels at
> > the same time.
>
> Well, as usual, I'd just like to whine some about it being assumed
> that the sequencer is an integral part of the host. ;-)
>
> It could be an event processor or a phrase sequencer you're listening
> to. Not that you'd care when you're a plugin in the receiving end,
> but things get weird if you're about to implement a plugin that
> outputs synth control events, while in the API docs, it appears that
> only hosts can do such things...

ok, ok - I'll reword it. How should it be worded? Assume an arpegiator or
something that outputs VOICE. Where does it get VVIDs? Does it have to ask
the host to reserve a bunch for it? How?

Good feedback - looking for more - specifically, help me flush out the Table
of Contents so all the details get in there...

Tim


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Thu Feb 06 2003 - 00:47:55 EET