Re: [linux-audio-dev] more on XAP Virtual Voice ID system

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] more on XAP Virtual Voice ID system
From: David Olofson (david_AT_olofson.net)
Date: Wed Jan 08 2003 - 22:36:08 EET


On Wednesday 08 January 2003 09.09, Tim Hockin wrote:
[...]
> > Ok, but I don't see the advantage of this, vs explicitly
> > assigning preallocated VVIDs to new voices. All I see is a rather
> > significant performance hit when looking up voices.
>
> Where a perf hit?

You get a 32 bit integer value, and you have to look up the voice
object it refers to. No matter how you do it, it's going to be more
expensive than grabbing the pointer from an array indexed by the
value.

> > Just grab a new VVID and start playing. The synth will decide
> > when a physical voice should be used, just as it decides what
> > exactly to do with that physical voice.
>
> So how does a synth tell the host how it gets activated?

It doesn't because the host/sender doesn't really care. A controller
or sequencer is just supposed to deliver events. It shouldn't care
more about voice activation than it does about the exact relation
between events and audio output.

> A
> VOICE_ON event tells the host and the user 'we are allocating a
> VVID to use'. It also tells the synth.

Yes, but why call it "VOICE_ON" when that's not what it means?

> If the synth wants to not
> play anything for Velocity < 0.5, then it should just not play
> anything. Just because a Voice is silent, doesn't mean it is not
> active.

Right.

For implementational reasons, I'm claiming that it makes a lot of
sense to assume that synths know what to do when they receive the
first event to a VVID that doesn't have a voice. I don't see why the
case "this VVID has no voice" should be something that the
host/sender *has* to worry about. (Especially since the whole concept
is irrelevant to monophonic synths. These will ignore VVIDs anyway.)

> This is a separate discussion entirely from VVIDs.

Yes; VVIDs are just a means of addressing voices. Voice allocation is
a synth implementation issue. That's *exactly* why I don't like the
idea of mixing these two things up on the API level.

> > With continous velocity, it is no longer obvious when the synth
> > should actually start playing. Consequently, it seems like wasted
> > code the have the host/sender "guess" when the synth might want
> > to allocate or free voices, since the synth may ignore that
> > information anyway. This is why the explicit note on/off logic
> > seems broken to me.
>
> _Your_ logic seems broken to me :) If you have a continuous
> controller for Velocity, you have one voice. So you want a new
> voice, you use a new VVID. How do you standardize this interface so
> a host can present a UI that makes sense?

This VVID thing is just the same thing as MIDI pitch - except that
VVIDs don't double as note pitch. When you want to control a specific
note in MIDI, you address it using the MIDI pitch of that note. The
only difference with VVIDs is that the VVID you use for a particular
note does not imply a specific pitch.

As to the UI, that's entirely up to the application designer. If you
want it to look and act like a traditional MIDI sequencer, just use
one VVID for each pitch in the MIDI scale, and address Voice Controls
by MIDI pitch.

> If VOICE_ON doesn't make sense for some synth, then it still makes
> sense for the user.

Why? (Unless the user is a tracker die-hard.)

VOICE_ON, assuming that there can be continous velocity synths, has
no corresponding MIDI event, and doesn't really mean anything to the
user, so I don't see why it would make sense to any user. It's an API
thing entirely.

> > > Block start:
> > > time X: voice(-1, ALLOC) /* a new voice is coming */
> > > time X: velocity(-1, 100) /* set init controls */
> > > time X: voice(-1, ON) /* start the voice */
> > > time X: (plugin sends host 'voice -1 = 16')
> > > time Y: voice(-2, ALLOC)
> > > time Y: velocity(-2, 66)
> > > time Y: voice(-2, ON)
> > > time Y: (plugin sends host 'voice -2 = 17')
> > >
> > > From then out the host uses the plugin-allocated voice-ids. We
> > > get a large (all negative numbers) namespace for new notes per
> > > block.
> >
> > Short term VVIDs, basically. (Which means there will be voice
> > marking, LUTs or similar internally in synths.)
>
> What is LUT?

Look-Up Table. (So you can find objects without searching.)

> What is voice-marking?

What I'm doing in Audiality; sender hands the synth a Voice ID, and
the synth puts that in the voice it allocates. When further events
referring to that Voice ID are received, the synth searches the
voices for the Voice ID, and then (if a voice is found) performs the
requested action on that voice.

> The negative VVIDs are valid
> for the duration of the block, after which they use their new
> names. It seems simple to me.

It's less simple than what I'm doing in Audiality, and pretty much
only succeeds in providing half a solution to the main problem with
that system ("When can I safely reuse a voice ID?"), while
introducing another, more serious problem: The host/sender
eventually(*) gets another ID, which will either have the same
problem as the negative ID, or (much worse) will be a direct voice
index, preventing the synth from performing voice stealing.

(*) When does the sender get the "real" voice ID? Well, forget about
    the next block, unless the controller/sender and the synth are
    running on the same CPU, or possibly the same SMP machine.

Also, keep in mind that any feedback of this kind requires a real
connection in the reverse direction. This makes the API and hosts
more complex - and I still can't see any benefit, really.

> > > We get plugin-specific voice-ids (no hashing/translating).
> >
> > Actually, you *always* need to do some sort of translation if you
> > have anything but actual voice indices. Also note that there must
> > be
>
> Because the plugin can allocate them, the plugin need not hash or
> translate. It can be a direct index.

So, how do you perform voice stealing?

You have to tell the host/sender when a voice index becomes invalid,
and you have to make damn sure the timing is right, so you don't end
up getting some events for the "old" voice with the same ID. You can
forget about "unstealing" voices, of course, as the host/sender will
have to stop sending events to a voice when you tell it it has been
stolen.

It's possible to get right (I think), but again, I don't see the
point in going down this path.

Why not just leave voice allocation to synths, the way it's been done
in professional MIDI gear all the time? It works! The only thing
we're really adding here is the ability to separate voices from pitch
- that's all.

> > a way to assign voice IDs to non-voices (ie NULL voices) or
> > similar, when running out of physical voices.
>
> if voice-ids are allocated by the plugin, there is no NULL voice.
> If you run out of physical voices you steal a voice or you send
> back a failure for the positive voice id.

Yeah - and what do you do with the stray events you'll get because of
the feedback latency? (This is an issue with one block latency
already, and can only get worse in SMP hosts and other distributed
systems.)

> > You can never return an in-use voice ID, unless the sender is
> > supposed to check every returned voice ID. Better return an
> > invalid voice ID or something...
>
> Host:
> send voice_on for temp vid -1
> run
> read events
> find a voice-id -1 => new_vid
> if (new_vid < 0) {
> /* crap, that voice failed - handle it */
> } else {
> if (hash_lookup(plug->voices, new_vid)) {
> /* woops, plugin stole that voice - handle it */
> }
> hash_insert(plug->voices, new_vid, something)
> }
>
> If the plugin wants to steal a voice, do so. If it wants to reject
> new voices, do so. It is simple, easy to code and to understand.

Well, it looks a whole lot more complicated than VVIDs to me. :-)

(And regardless; hashing is definitely more expensive than grabbing a
pointer from an array.)

> > Well, it's an interesting idea, but it has exactly the same
> > problem as VVIDs, and doesn't solve any of the problems with
> > VVIDs. The fact
>
> It has none of the problems of VVIDs.

Probably not, if I understand the above correctly.

> The only problem is that it requires dialog.

That's a rather serious problem, OTOH...

> * no carving of a VVID namespace for controller plugins

No, plugins have to do that in real time instead.

> * the plugin and the host always agree on the active list of voices

I don't see how this is possible. The one block latency means there
will be occasional events that must be detected and thrown away, one
way or another.

Maybe it's sufficient to have synths mark newly stolen voices, so
they ignore directly addressed events until it's known that the no
more events for the old context will arrive? (Problem is, when you do
know that? If the latency is allowed to be more than one block,
synths will have to await a "voice renamed" acknowledge event from
the host/sender...)

> * host sends voice_off no release
> - plugin puts the VID in the free-list immediately
> - host never sends voice_off
> - plugin puts the VID in the free-list whenever it finishes
> - plugin can alert the host or not
> - host sends events or voice_off too late
> - plugin recognizes that the voice is off and ignores events
> - host sends voice_off with a long release
> - plugin puts the VID in the free-list as soon as possible
> - host overruns plugin's max poly
> - plugin chooses a VID and stops it, returns that VID (steals
> the voice) or plugin rejects new voice
>
> what am I missing?

Well, this should work, but I think the two issues; the feedback
requirement, and the inability to cope with more than one block of
feedback latency, are a bit too serious.

> > search" and/or hashing), and it doesn't really buy us much,
> > compared to the wrapping 32 bit VVID idea.
>
> With a large VVID pool we still need:
>
> host:
> /* be sure we can make a new vvid */
> if (cur_poly == max_poly) {
> find an eligible vvid
> tell the plugin it can re-use the voice on this vvid
> (voice_off?)

No, it has nothing to do with VOICE_OFF. We're only saying we won't
talk about that voice any more. The synth may do whatever it likes.
(Obviously, it's generally a good idea to trigger the release before
doing this.)

> } else {
> cur_poly++;
> }

What does cur_poly has to do with anything? (Sequencers and the like
shouldn't care about polyphony or voice allocation, IMNSHO.)

> /* find a vvid that is not currently playing */
> do {
> this_vvid = vvid_next++;
> while (vvid_is_active(this_vvid);

Again, this is voice allocation. Leave this to the synth.

Anyway, what you describe above has very little in common with my
idea of VVIDs. The whole point with VVIDs is that they're *virtual* -
and that obviously means they have no fixed relation to physical
voices. The very reason why they were suggested at all was to
eliminate two-way communication between senders and synths.

The approach I have in mind is this:

        SENDER:
                pool = xap_alloc_vvids(<as many as we need>);

                Start voice context:
                        * //(if this can happen at all)
                          if(we're out of VVIDs)
                          {
                                //Add some to our local pool
                                pool += xap_alloc_vvids(<some>);
                          }
                        * vvid = get_vvid()
                        * xap_send(synth, DETACH_VVID, vvid)

                Send Voice Control event:
                        * xap_send(synth, VOICE_<whatever>, vvid, ...)

                Stop using voice context:
                        * free_vvid(vvid)
                        * (Synth doesn't care about this.)

pool and get/free_vvid() are local to the sender. It could be as
simple as a base index for the 128 VVIDs needed to cover the range of
MIDI pitch values for a MIDI channel. In that case, get_vvid() would
be
        get_vvid(int pitch)
        {
                return vvid_base + pitch;
        }

and free_vvid() would be a NOP.

        SYNTH:
                Handle Voice Control event:
                        if(!vvid_table[ev->vvid])
                                vvid_table[ev->vvid] = alloc_voice();
                        ...
                        Handle event for (SYNTH_voice *)(vvid_table[ev->vvid])
                        ...

                Handle DETACH_VVID, "Voice Finished" or "Voice Stolen":
                        vvid_table[ev->vvid] = NULL;

Note that the vvid_table entries could be integers, void * or
whatever; the point is just that the space is for the *synth* to use
as it sees fit.

Also note that this example assumes that alloc_voice() returns a
dummy voice if allocation fails. (Most synths will probably just
steal a voice, though.)

//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`---------------------------> http://olofson.net/audiality -'
   --- http://olofson.net --- http://www.reologica.se ---


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Wed Jan 08 2003 - 22:36:05 EET