[linux-audio-dev] Fullduplex audio client/server implementation

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Fullduplex audio client/server implementation
From: Benno Senoner (sbenno_AT_gardena.net)
Date: ma syys   27 1999 - 05:19:19 EDT


Hi,

I'm still thinking about an ideal audio-server implementation (userspace).
I'm trying to introduce as few as possible sync points between client and
server.
what about this implementation:
Please read carefully it should be quite optimal.

(we assume blocking read/write to /dev/dsp)

SERVER:
-------
while(1)
{
  read() from /dev/dsp to a shared mem input buffer
  for(i=0;i<NUM_CLIENTS;i++)
  {
    if(client[i]->shmem.ready) // when client is ready it sets the ready flag
in the shmem buffer
     {
       read audiodata from client i via shmem
       mix audiodata ( client[i].shmem.oubuf ) to output buffer
    }
  }
  write() output buffer to /dev/dsp
  for(i=0;i<NUM_CLIENTS;i++)
  {
    client[i]->shmem.ready=0;
    wakeup client i (through a semaphore or message)
  }
}
-----

CLIENT:

----
clear client's audio output buffer
shmem.ready=1;
wait for wakeup from server ( through msgrcv() or a semaphore)
while(1)
{
  do DSP computations using as input buffer the server's shared audio input
buffer
  write results to shmem.outbuf
  shmem.ready=1;
  wait for wakeup from server
}
------

Note that the server gets the processed fragment from the client one iteration later, since the client does it's own computations during the client blocks on the read() from /dev/dsp.

The only drawback of this approach is that the minimum Input-to-output latency you get is 3 fragments.

with a monolithic implementation ( no client/server ) while() { read() from /dev/dsp do_processing() write() to /dev/dsp }

you can get a minimum of 2 fragments latency, but if you want to use up to 100% of the CPU, you can't afford even a minimal jitter amount, since after the do_processing() there are only very few bytes in the audio buffer left. That means you need the 3rd fragment anyway.

The difference between my audio server implementation, and the above monolithic implementation is that, if we assume we want 3 fragments I/O latency, in the audio server case we have to preload the output buffer with 2 fragments, while in the other case you have to preload it with 3 fragments.

look at the audio server example: we assume we want 3 fragments I/O latency, we need only a 2 fragments bufffer for output W: write buffer R: read buffer [ ] fragment [ 00000000 ] fragment filled with 0's [ ________ ] empty fragment

1) server writes 2 fragments to the output buffer

W [ 00000000 ] [ 00000000 ] R [ ________ ] [ ________ ]

server enters main loop:

2) server does blocking read() from /dev/dsp to shared mem buffer, the first output fragment got played, the read buffer contains exactly 1 fragment of audio data

W [ 00000000 ] [ ________ ] R [ ________ ] [ 11111111 ]

3) server reads data processed data from client , but since it is the first loop we get a 0-filled buffer. The server now writes the client data to the output buffer (we assume only 1 client here, but the behavior is the same with multiple clients, except that you have to mix the client's data first before writing to output buffer)

W [ 00000000 ] [ 00000000 ] R [ ________ ] [ ________ ]

4) is the same step as 2) ( read() from /dev/dsp ... etc. )

W [ ________ ] [ 00000000 ] R [ ________ ] [ 22222222 ]

5) server reads data from client ( "11111111" fragment ) and writes data to output buffer.

W [ 11111111 ] [ 00000000 ] R [ ________ ] [ ________ ]

now our first fragment read from the ADC is finally in the audio output buffer, but we see that it takes 3 fragments until the "11111111" fragment get into the output buffer, since we have to play the first 2 zero-filled fragments, plus the frist 0-filled fragment from the client since the client gets the ADC data from the server one fragment later.

But we see that there is always at least 1 fragment in the buffer, that means the scheduling jitter can be as high as one fragment-play-time.

This audio client-server model seems quite optimized to me since there is only 1 sync point per DSP cycle between server and clients (= less scheduling overhead), no additional mem copying is involved since it's all implemented via shared memory buffers and you can get as low as 3 fragments effective I/O audio latency, which is the pratical minimum in a system where you have to take scheduling jitter into account.

Of course the the server doesn't wait for clients, therefore a too slow client (perhaps because it doesn't run SCHED_FIFO due to lack of permissions etc.), can't ruin the behaviour of other audio streams. The udible effect will be that only the audio stream produced by the slow client, will be affected by "audio drop-outs" but not the rest of streams.

comments ?

regards, Benno.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : pe maalis 10 2000 - 07:27:12 EST