[linux-audio-dev] Re: high level laaga

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Re: high level laaga
From: Paul Davis (pbd_AT_Op.Net)
Date: Fri May 11 2001 - 17:02:33 EEST


--------
For now, I'll limit my comments to one factual issue.

>> Kai's site describes the problems with using IPC for the
>> interconnection. So we need something else.
>
>the theoretical extra cost is 2*(N-1) context switch (where N is process
>count). I'd like to know if indeed this break everything.

This is not true. A context switch forces the Translation
Lookaside Buffer (TLB) to be flushed because the address space is
changed. This causes the memory references that follow the switch to
be dramatically slower than they would be if they had been executed
without the switch. This is the reason why a stream of instructions
without any context switches executes faster than the sum of its
running periods when there are context switches.

And unfortunately, the cost of this doesn't decrease very much as
processor speeds increase. This is part of the reason for the interest
in CS circles in using 64bit processors with a single address space
(making memory protection and addressing orthogonal, rather than
combining them as is the case on most multitasking 32 bit operating
systems).

So, putting the same code in N different processes and executing it
via context switches is more than 2*(N-1) context switches slower than
running it without any context switches. How much depends on the TLB,
the processor and the nature of the code (particularly how much memory
is referenced).

See http://euclid.nmu.edu/~benchmark/index.php?page=context for an
example of this effect.

Returning to the fixed and mostly predictable costs of a context
switch, they currently take between 100 and a 1000 instructions, so
this will decrease in wall-clock time as processor speeds increase. On
a 1GHz machine, it will be about 1usec for the switch itself. From the
page above, we could predict effective times (after taking the TLB
effect into account) on the order 10-50usecs for a typical audio
process. If there are just 3 processes, say the engine, a HDR system
and a soft-synth and assume an effective switch of 25 usec, thats
100usecs of switch overhead.

Thats 7.7% of the time available for computing a 1.3ms audio
fragment. Make that a more interesting system (say, 4 components), and
you've got 15% of the time available spent context switching. Make it
a system somewhat like the one I sense you imagining (lots of small
components, not much LADSPA-inlining-ness), and that number might rise
to over 50%.

I am hoping that I've demonstrated that context-switching is not
scalable for low-latency work, and has noticeable costs even in the
simplest possible case. Is my hope misplaced?

>> Establishing connections: alsa-lib offers one model for this,
>> based on the shared memory IPC mechanisms.
>> Such a solution cannot scale to setups with
>> lots of clients running at low latencies because
>> of the overhead of context switching and the
>> associated memory/cache performance loss.
>
>True for context switching, but I don't see the difference concerning
>memory/cache. This last thing is related to how a component is written
>and not if it run in the same process space or not.

See comments above.

--p


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri May 11 2001 - 17:16:58 EEST