Re: [linux-audio-dev] Audio over Ethernet / Livewire

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Audio over Ethernet / Livewire
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Tue Jun 22 2004 - 09:20:24 EEST


Audio traffic has a constant data rate.
eg 44.1khz 16bit stereo is 176400 bytes/sec
since the audio cards use audio fragments (or periods) of N frames
it is natural to send audio using packets over the network of that size
(or multiples of it).
UDP is the natural choice because of the low latency.
Of course there might be a packet loss but I if you go TCP/IP or
implement your own retransmission schemes
you lose the low latency characteristics.

On a local LAN on a non congested network if the hardware is not broken
the packet loss ratio is basically
zero and AFAIK most of the low latency audio over IP protocols are
implemented with this assumption in mind.
So I think we should go that route too.
You cannot saturate the network till it gets congested because it would
not even work with TCP, which would
prevent errors but causing so much delays that the audio streams would
stutter.

Regarding the MIDI data over network: MIDI is event based and in theory
there is no upper limit of how much
midi events per time unit one might want to send.

But in praticle standard MIDI (over a serial 31.250kbit link) is limited
to about 3000bytes/sec.

This is enough in most of cases but we all know that big MIDI setups
need multiple midi interfaces
(most professional midi expanders provide 2 MIDI input connectors to
achieve better timing in a high track setup
and to provide more than 16way multitimbratlity).

So far so good.
I think one of the simplest approaches would be to send audio packets
with embedded midi data over the network.
As we know the more information we pack into a signle datapacket and the
less handshakes the network devices
do the higher the probability that the data flow is reliable and fast.

Let's do some math: UDP limits the payload of a packet to about
1500bytes (the ethernet frame).
That way it's guaranteed that when we send a block of data it gets
delivered atomically without incurring into
fragmentation issues (of course if you send it across several routers
where the MTU is < 1500 then it's another story
but since we are talking LAN ( NIC -> switch -> NIC) fragmentation is
not an issue.

assume the above stereo stream but we want to transmit floats (jackd
uses floats).
sizeof(float) (=4) * 44100 * 2 = 352800 bytes/sec

assume we send a stereo stream 128 frames per packet
128*2 *sizeof(float) = 1024 bytes , 2.9msec worth of audio data
we still have over 400 bytes left for our custom header (eg id remote
jack client, jack port number etc) and for midi data.

So my proposal would be to limit the number of midi events per audio
fragment.
let's take simlified MIDI encoding and let's see what we can fit into
400 bytes.
Since we want sample accurate midi triggering (which traditional MIDI
over serial does not provide) we could do the following:
a MIDI command is usally not longer than 3 bytes (let's forget abut
sysex etc for now).
we could divide 400 bytes into 100 midi events consisting in:
1 byte timestamp relative to the audio fragment (0-127) , this would
limit the fragmentsize to max 256 frames
3 byte midi payload

This means we could have up to 100 MIDI events (note on/off, controller,
programchange etc) per 2.9msec audio fragment !
This is a LOT of events.
Just take traditional MIDI over serial: a NOTE-ON takes 1.1msec to
transmit (3byte) this means in 2.9msec you can barely
send a NOTE-ON and a NOTE-OFF and you consumed almost the full channel
capacity.

In the case of serial MIDI you achieve a maximum about (3byte MIDI
events) 1000 events/sec.

In our case we send about 344 packets/sec which multiplied with 100 midi
events gives us
34400 events/sec which are SAMPLE ACCURATE !
basically it would be like having the equivalent 34 midi interfaces.

Such a (bidirectional) audio/midi stream would consume about
500KByte/sec , which means a 100Mbit LAN could run 10-15
of those at the same time without loosing data.

basically it would work as follows:

client PC (has an audio card) <----> jack server PC (runs jackd and jack
clients like samplers, softsynths, HDR apps etc, no audio card).

the client PC would run a jack network client which does the client PC
<---> server PC communication

the server PC has a special (not implemented yet) jackd input/output
driver which recieves/sends the audio data to the network,
for the rest the jack clients residing on the server PC don't know
anything about the network, to them it looks like a standard jack server.

The "clock" to the jackd residing on server PC is given by the client PC.

client PC:
-----
jack_process_callback() {
  send_to_network(local_jack_input_port); // non blocking
  
fetch_packet_from_local_queue_and_check_if_new_packet_arrived(local_jack_output_port);
// non blocking
}

----

server PC: (I'm not yet familiar with the jackd driver API so the naming of the functions will be wrong but you'll hopefully understand what I'm trying to explain). ---- while(1) { receive_audio_input_data_from_network(); // blocking call process() of local jack clients send_audio_output_data_to_network(); // non blocking } ----

on the client PC we send out the data packet in non blocking mode and the next call fetches the next audio fragment from a local queue. Of course during the first iteration the queue does not contain anything so we will prefill it with 1-2 fragments worth of silence. 1 fragment is the minimum needed to work (since the fetch_packet.. will be called only a few usecs later after send_to_network because it's non blocking). We can prefill with more fragments which will of course increase the end to end latency but will at the same help to eliminate network jitter problems. I think with the above approach end to end latencies of < 8-10msec can be achieved which means you can play a cluster of sampler/softsynths live and having rendered the audio in real time on your client box containing the audio card.

The audiocard-less PCs will all stay in sync with the PC containing the audio card thanks to the blocking receive_audio...() call.

Do you see particular flaws in this proposal ? (of course it would need to be adapted and made more flexible so that it can deal with an arbitrary number of jack audio/midi ports) It's the simplest I can think of and as said above, adding all sorts of error correction, synchronization etc would buy us almost nothing and probably never achieve the same low latency that this system can deliver.

Of course we will not know before we actually turn ideas into working code. But if my approach fails then I'll be glad to offer a few beers at next ZKM :) (Steve H. ? :)) )

cheers, Benno

Hans Fugal wrote:

>>Plus considering that midi over jack is being implemented too you would >>have both midi and audio over ethernet >>through jack, available to any jack client without the application >>needing to be changed. >> >> >That would be convenient, yes. But at the implementation level there is >quite a bit of difference between MIDI traffic and audio traffic. MIDI >is much less forgiving of errors, or much more if you know which errors >to make. You can do a lot of smart things when doing MIDI over a network >that you can't do with audio, a la MWPP or whatever it's called now. > >That said, if you've got the bandwidth and latency issues worked out for >audio, MIDI should be a piece of cake and you may not need to worry >about the smart things you can do with it. > >FWIW, I've implemented basic MIDI over TCP/IP at [1], which is loosely >based on MWPP and needs some TLC, but already outperforms aseqnet. > >1. http://hans.fugal.net/music/nmidi-0.1.0.tar.gz > > >


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Jun 22 2004 - 18:20:58 EEST