Re: [linux-audio-dev] mucos client/server implementation issues , I need opinions

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] mucos client/server implementation issues , I need opinions
From: Roger Larsson (roger.larsson_AT_norran.net)
Date: ma joulu  20 1999 - 06:54:29 EST


Benno Senoner wrote:
>
> On Sun, 19 Dec 1999, Roger Larsson wrote:
> > > The API should be fullduplex, and allow a tree-like client/server
> > > structure. That means every client can be the "server" of other clients.
> > >
> > ok,
> > if by fullduplex you mean to/from audio board(s)/HDs...
>
> By fullduplex I mean that every process can talk each other in both directions,
> that means that every client can read and write data to his "server".
> That leads to a flexible enviroment, since you can let pre-process and
> post process data to a tree of interdependent clients.
> Not always needed but somewhere very useful
>
ok
 
> > > [ - - - ]
> >
> > by pipelined do you mean - using a pipe?
> > I have been playing with that idea for communication in the large.
> > (Not plugin -> plugin, but engines to engines)
>
> no by pipelined approach I don't mean using pipes (unix fifos etc),
> but using one or more intermediate buffers in order to parallelize operations.
> That means during a simple clients/server communication the clients
> doesn't wait that the server samples his audio fragment,
> but uses the previous buffered fragment (in the startup case the
> intermediate buffer is zero filled).
> It increases latency by one or more fragments but makes the
> buffer interdependency in a tree-like structure quite complex.

Does it really (have to) increase latency by one fragment???

The process that samples data will take its time but does it mean
that that is what have to drive the other processess to take another
turn? (I use my terms since I am not really sure what Bennos server
stands for)

unit->A->B->C->unit

A samples
B waits
C waits

A has got a fragment.
A writes to shared mem and notifies the engine.

engine swaps buffers between A and B
engine can now run A and B (A with empty buffer and B with data)

A samples
B calculates
C waits

B finishes its calculation and notifies the engine.

engine swaps buffers between B and C
engine can now run C (C has data but Bs buffer is empty)

A still samples second buffer is not full yet.
B waits
C outputs first buffer

A has a 'processing' time of 'tfA' for every buffer
C has a 'processing' time of 'tfC' for every buffer.
B has a processing time of tB(n) since the processing time
  might vary depending on data received.

Note: tfA == tfC

And all works well as long as tB(n) <= max(tB(0:n-1))

Since C started at tfA + tB(0)
And requires more data at every
 tfA + tB(0) + n*tfC

C would have to wait for data, with slips, if tB(x) gets
bigger than tB(0). After that slip C will need data at
 tfA + tB(x) + n*tfC

But any approach will have the same problem. And it has
to be delt with.

[ Even a simple
while (1)
{
  read(...)
  process(...)
  write(...)
}
Will get into serious trouble if process will take different
time to run depending on the processed data. ]

As you can see this is my trial plugin API again running at a
larger scale :->

This is a user(!) process (lets call it U) running a engine
with plugins to support communication with a audio app
(lets call it Q :-) written by someone else. And depending
on how well behaved that app is there are several alternative
plugins.

1) Communicating via a loopback device that present itself as
    /dev/dsp and or /dev/audio for Q
   using loopback-device-plugin in U

2) Communicating via an alternative device, configuring Q to
   use it.
   using alternative-device-plugin in U.

3) Communication via generic plugin support in Q, having
   a script that uses a generic-peer-plugin.
   using a generic-peer-plugin in U.

4) Communicating via any of Qs supported external interfaces,
   using a special-Q-interface-plugin in U.

5) Loading Qs calculation part as any other plugin.
   using Q-plugin in U.

> > > [communication application to application deleted]
> > >
> > > In the above example we have about 5-6 context switches per processing cycle:
> > >
> > > server -> client2 -> client1 -> client2 -> client3 -> etc. ....
> >
> > But shouldn't we be able to do a lot better than that?
>
> you missed the point .. keep reading below :-)

Yes, I realized that a little to late. After turning my computer off and
almost
at sleep in my bed :-)

>
> >
> > Suppose you have a server/engine that loads plugins as shared libraries
> > (DLL)
> > and makes a table of the script/drawn scheme/... of how the plugins are
> > connected, then it can call each plugin JIT, no context switches! no
> > need
> > to do 'wait_for_...', no need to know that the memory is shared - it can
> > local, you only get the pointer.
> >
> > Or it could be a simple engine that only handles one plugin, that is
> > loaded by
> > specifying a command line option - then you get one process per plugin.
> >
> > Or it could be a multi-instance-multithreaded-engine a mix between the
> > two -
> > you may start as many engines as you like. In your user process, in a
> > daemon,
> > as a kernel thread, as a RT-Thread.
>
> The goal of the client/server implementation is to provide
> interapplication communication, not inter-plugin communication.
>
> Of course we run an app which hosts 20 plugins in one single
> thread (or in order to take advantage of SMP in 2-4 threads).

And if your computer has 128 processors...
The engine need to be configurable to be able to USE your hardware...

>
> But my goal is to let separate audio apps communicate each other
> in realtime, with as little as latency possible.
>
> Assume you have a softsynth which doesn't give you the possibility
> to run as plugin of another app.
> Assume you want to record the softsynth output in your HD recording app.
>
> With my proposed client/server model, the softsynth sees a virtual
> audio device where it can write the data to, and the HDrecorder
> records from this audio device with as low as possible latency.
>

Yes, but the softsynt will need to support a "virtual audio device".
Will it be as easy as opening "/dev/virtual_dsp" instead of
"/dev/dsp" then any softsynt programmer will support it in weeks.
But suppose there would be a lot more to do...

If there is a driver that intercept the standard /dev/dsp and loops
it back into shared memory, it would be ok. But you get one copy
that you can not get rid of. (only one copy is quite good, BTW :-)

But I still think that a pipe-file approach could have its merits, if
you want to connect two none shared memory avare applications.
(The 'ioctl' will be different. Or not, with a special pipe-fs...)

You basically has these choices (as a writer of an audio app):
- Do nothing
- Add support to write to different types of files (devices, pipes, ...)
- And/or add support of external audio plugins (MuCoS plugins)
- And/or add support of client/server with shared mem (Bennos)
- Rewrite your whole app to fit perfectly.

I think I have ordered the options from easy to hard, but that will
of cause depend on the application itself.

Note: there is also the choice (as an audio app integrator):
- Write a plugin that can communicate with the other application
  in the way it supports.

> >
> > The problem I have with your code are:
> > * that it requires one thread per plugin-instance 'while(1) {...}'
>
> Yes this is because you misinterpreted that the client/server runs
> every plugin in a separate thread.
>
> No, just as Paul pointed out is a mix of both:
>
> assume there are a few monolithic apps running:
> softsynth , HD recorder , FX rack.
> Each app runs a set on plugins in his own thread,
> but the 3 apps communicate through the client/server API,
> in "real-time".
>
> This approach lets integrate even old binary-only (through
> LD_PRELOAD wrapper) apps into Mucos.
>
> Of course the recommendation is to use a number of threads
> as little as possible.
> That means if an app can act as plugin, the "master" should
> run the "plugin" in his thread in order to minimize scheduling overhead.
> But sometime this isn't possible or desired.
> And here the client/server approach comes handy.
>
> I hope that the concepts are a bit clearer.
> (sorry for my superficial explanations :-) )
>
> Anyone that disagrees with my ideas.
>

What do you think of my idea using a 'engine' and communication plugins?
With an app that supports shared memory, the plugin only have to
notify the client app about the address to 'connectors'.

> > (Note that Ingo has recently patched a latency bug in this area...)
>
> do you have the BH (botton half) patch ?
> maybe this will even cut down latency further, because it might eliminate
> the peaks of fragmentsize len)

The one I was thinking of is when sending a signal to another higher
prio
process it was not checked if the sending process needed to reshedule
afterwards. Resheduling was done at next timer interupt.

>
> Benno.

--
Home page:
  http://www.norran.net/nra02596/


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:23:26 EST