Re: [linux-audio-dev] mucos client/server implementation issues , I need opinions

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] mucos client/server implementation issues , I need opinions
From: Roger Larsson (roger.larsson_AT_norran.net)
Date: la joulu  18 1999 - 19:07:31 EST


Benno Senoner wrote:
>
> Hi,
> I am still seeking the best way to implement an efficient
> low latency client/server API.
>
> Of course getting as little latency as possible by using minimal
> system ressources is #1 requirement.
>
ok!

> The API should be fullduplex, and allow a tree-like client/server
> structure. That means every client can be the "server" of other clients.
>
ok,
if by fullduplex you mean to/from audio board(s)/HDs...
 
> My first design was to let the clients waken up by the server but the
> server not waiting for client.
> I opted for this because it saves some syscalls on server side
> ( wait for message or semaphore)
>
> This assumes that since the server plays one audio fragment at time,
> the clients must have a round-trip time which is <fragment time.
>
> Looking at the latency tests , I came to the conclusion that often we
> need more than 2 audio fragments in order to get reliable performance,
> becaue the scheduler doesn't always guarantee that our process is
> rescheduled in a time <fragment time.
>
> For example using 3x128 audio buffers, sometimes 2 full buffers are utilized
> making the above approach unusable.
>
> An other issue is whether to use a pipelined approach or not,
> that means introduce addtional latencies by adding buffers
> (but not introducing addidional CPU load, because there is no memory ping-pong
> copying).
> The pipelined approach has the advantage that you can parallelize (run on
> multiple CPUs) sequential datapaths since the audio is a data stream.
> (parallel datapaths can be parallelized anyway without pipelining)
> It is very simple to implement in the case of one
> single server and many clients at the same level.

by pipelined do you mean - using a pipe?
I have been playing with that idea for communication in the large.
(Not plugin -> plugin, but engines to engines)

I am currently looking at the kernel code.
[There might be a SMP read/write race there with a half full buffer,
 unless it is looked on a higher level (2.2.13).
 PIPE_LOCK is tested in one place, but incremented in another...
 both read and write then updates PIPE_LEN...]

There is an issue with data copy. From user memory to system and
later from system to user.
But I think it should be ok when not used between every plugin.

There is also the possibility to introduce a new file system device,
pipe_isocronous(sic? I mean stuff like speech, video, ...).
If blocking on write until a read process enters, and requiring
that the data sizes are the same you should be able to copy directly
user process to user process. I am not sure if this would be a win
or not but you would not need to do some file system checks.

> for example:
>
> client1 <---->-+
> +---<----> server
> client2 <---->-+
>
> But IMHO this flat design is not flexible enough for us.
>
> ( David wants to feed his softsynth output into quasimodo and then send
> the result to the mucos server, where at this moment an an external mp3 player
> is sending his output too)

Ahh, each client/plugin has only two pipes... That would be a problem...
But if the engine assigns pipes?

>
> client1 <-------> client2 <------>--+
> +--<-----> server
> client3 <-------------------->--+
>
> I think in order to avoid complex pipeline dependencies,
> It would be better to use the not-pipelined approach.
> I say this because the number of parallel datapaths in a common DAW enviroment
> is often much greater that the number of CPUs.
>
Very true today. But since processor constructors are battling with too
much silicon
space, things might change (cache takes a lot of space and may not be
the most
efficient usage of the space, hit rates can only improve marginally):

- multiple CPUs on one chip
- multiple register files on one CPU chip, allows task switching while
waiting
  for cache miss...
- ...
- use all of above
  [if I remember correctly SUN is designing a new architecture with four
execution
   units (4*full(fp+int)) times four register files times four
processors and it
   can expand to any number of processors/register files]

> For example the data flow could be the following: ( we assume that server and
> clients all to fullduplex audio)
>
> SERVER:
> -------
> while(1)
> {
> read() from soundcard into shared mem // clients are able to read this data
> wakeup_clients() // only direct clients that means only client2 and client1
> wait_for_clients() // wait that the client2 and client3 finish the
> processing process() // mixdown etc
> write() soundcard
> }
> -----
>
> CLIENT3:
> -------
> while(1)
> {
> wait_for_server()
> process_data_from_shmem()
> write_data_to_shmem()
> wakeup_server()
> }
>
> [more code deleted]
>
> Would such an approach be acceptable for you ?
> any thoughts , ideas ?
>
> To allow audio data over network, be can simply introduce
> an intermediate "client" between the server and the networked
> clients, which takes care to do intermediate buffering in order to overcome to
> the network latencies.
>
> In the above example we have about 5-6 context switches per processing cycle:
>
> server -> client2 -> client1 -> client2 -> client3 -> etc. ....

But shouldn't we be able to do a lot better than that?

Suppose you have a server/engine that loads plugins as shared libraries
(DLL)
and makes a table of the script/drawn scheme/... of how the plugins are
connected, then it can call each plugin JIT, no context switches! no
need
to do 'wait_for_...', no need to know that the memory is shared - it can
local, you only get the pointer.

Or it could be a simple engine that only handles one plugin, that is
loaded by
specifying a command line option - then you get one process per plugin.

Or it could be a multi-instance-multithreaded-engine a mix between the
two -
you may start as many engines as you like. In your user process, in a
daemon,
as a kernel thread, as a RT-Thread.

Each engine handles zero-to-infinite(well almost) numbers of threads
(workers).

Workers run zero-to-infinite numbers of tightly connected plugins, and
you will
always have some plugins that makes sense to tightly connect. But this
could
be handled by the engine.

The problem I have with your code are:
* that it requires one thread per plugin-instance 'while(1) {...}'

* that clients all the time has to request services from the engine,
instead
  of the engine scheduling stuff in a way that data is available when
run and
  ensuring that plugins are not run in an interfering way. [repetitive]

== Something about "trial API" ==

Yes, it is still very possible to run that code on a multi processor
computer
where each plugin-instance has its own process all executing
concurrently
(since my trial uses double buffering). But that is up to the
engine/user
to specify - it is not a question for each plugin.

I am currently cleaning up the code even more, and since the engine is
currently
missing it is hard for outsiders to see what it would look like, at some
times
it is even hard for me :-).

It could in fact be self powered. Upon returning from a 'process-call'
the worker could check if
a 'connection' has used up its current signal space (read or written to)
and forward it to were it should be used next (forwarding is done by
swapping
buffer pointers, or signal pointers). If you need to communicate
externally
it is handled by another plugin, see 'plugin_charStdin/out' (renames are
in the pipe TODO, it could nowadays be used to read from any file, even
devices :-)

BTW; I am considering moving away from events as a special signal type
that
can affect the whole module and its different signals as well. See the
'plugin_offset' and its offset. (plugin_offset is basically an add/mix)

Instead I am looking at using different sample frequencies to handle
stuff like that.
offset could be a signal with a sample frequency of:
* A constant value / parameter has zero sample frequency.
* A UI knob is sampled at least with 2 Hz, maybe 10 Hz (100 ms)
* A signal from another audio plugin is probably at least 8000 Hz.
But note that the source to this all these signal types
can be generated by another plugin (or rather several other types of).

Note: that a plugin may alter its 'process' call for efficient
 handling the different types of signals. And then only check if it
 still gets that type of signal.

(A plugin may convert MIDI messages to tone signal + envelope signal
for as many signal pairs you like to have polyphonic signals. This
could be in the table, but the )

> Of course it is to determine how low we can get latencies in order to get 100%
> reliability.
> I'm actually experienting with sem*() and msg*() but msg*() seems to give better
> results than sem*() very strange.
> But until I have no latency graphs I won't make any false claims.

What about signals? You only need to wait. Then when the signal arrives
you check that it is the expected signal - that is all. The wait has
been
cancelled and the program continues.
(Note that Ingo has recently patched a latency bug in this area...)

/RogerL

--
Home page:
  http://www.norran.net/nra02596/


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:23:26 EST