[linux-audio-dev] mucos client/server implementation issues , I need opinions

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] mucos client/server implementation issues , I need opinions
From: Benno Senoner (sbenno_AT_gardena.net)
Date: pe joulu  17 1999 - 07:49:51 EST


Hi,
I am still seeking the best way to implement an efficient
low latency client/server API.

Of course getting as little latency as possible by using minimal
system ressources is #1 requirement.

The API should be fullduplex, and allow a tree-like client/server
structure. That means every client can be the "server" of other clients.

My first design was to let the clients waken up by the server but the
server not waiting for client.
I opted for this because it saves some syscalls on server side
( wait for message or semaphore)

This assumes that since the server plays one audio fragment at time,
the clients must have a round-trip time which is <fragment time.

Looking at the latency tests , I came to the conclusion that often we
need more than 2 audio fragments in order to get reliable performance,
becaue the scheduler doesn't always guarantee that our process is
rescheduled in a time <fragment time.

For example using 3x128 audio buffers, sometimes 2 full buffers are utilized
making the above approach unusable.

An other issue is whether to use a pipelined approach or not,
that means introduce addtional latencies by adding buffers
(but not introducing addidional CPU load, because there is no memory ping-pong
copying).
The pipelined approach has the advantage that you can parallelize (run on
multiple CPUs) sequential datapaths since the audio is a data stream.
 (parallel datapaths can be parallelized anyway without pipelining)
It is very simple to implement in the case of one
single server and many clients at the same level.
for example:

client1 <---->-+
                         +---<----> server
client2 <---->-+

But IMHO this flat design is not flexible enough for us.

( David wants to feed his softsynth output into quasimodo and then send
the result to the mucos server, where at this moment an an external mp3 player
is sending his output too)

client1 <-------> client2 <------>--+
                                                            +--<-----> server
client3 <-------------------->--+

I think in order to avoid complex pipeline dependencies,
It would be better to use the not-pipelined approach.
I say this because the number of parallel datapaths in a common DAW enviroment
is often much greater that the number of CPUs.

For example the data flow could be the following: ( we assume that server and
clients all to fullduplex audio)

SERVER:
-------
while(1)
{
  read() from soundcard into shared mem // clients are able to read this data
  wakeup_clients() // only direct clients that means only client2 and client1
  wait_for_clients() // wait that the client2 and client3 finish the
  processing process() // mixdown etc
  write() soundcard
  }
-----

CLIENT3:
-------
while(1)
{
  wait_for_server()
  process_data_from_shmem()
  write_data_to_shmem()
  wakeup_server()
}

CLIENT2:
---------
while(1)
{
  wait_for_server()
  pre_process_data() // ie does some preprocessing for client's 1 input data
  wakeup_client1() // wakes up client1 since client2 is the "server" of client1
  wait_for_client1() // waits that client1 finishes the processing
  read_from_client1_shmem()
  post_process_data() // postprocess data returned by client1
  write_to_server_shm()
  wakeup_server()
}

CLIENT1: (just like client3 but uses client2 as "server" )
-------
while(1)
{
  wait_for_client2()
  process_data_from_shmem()
  write_data_to_shmem()
  wakeup_client2()
}

Would such an approach be acceptable for you ?
any thoughts , ideas ?

To allow audio data over network, be can simply introduce
an intermediate "client" between the server and the networked
clients, which takes care to do intermediate buffering in order to overcome to
the network latencies.

In the above example we have about 5-6 context switches per processing cycle:

server -> client2 -> client1 -> client2 -> client3 -> etc. ....

Of course it is to determine how low we can get latencies in order to get 100%
reliability.
I'm actually experienting with sem*() and msg*() but msg*() seems to give better
results than sem*() very strange.
But until I have no latency graphs I won't make any false claims.

Benno.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:23:26 EST