Re: [linux-audio-dev] Performance and Elegance? (Was: High Cost of IPC)

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Performance and Elegance? (Was: High Cost of IPC)
From: John Regehr (regehr_AT_cs.utah.edu)
Date: Wed May 16 2001 - 09:23:37 EEST


Paul Davis says:

> this is an area of OS performance that has been under study for 20-30
> years. I was a systems programmer for the research group at UW CS&E
> that had a dozen or so grad students and 4 senior faculty working on
> the general issues. there is a very interesting paper at UW CS&E that,
> despite being, oh, 10 years old(?) now, is still totally relevant. I
> don't recall the exact title, but to paraphrase the general message:
> processors have gotten faster, but applications are spending more real
> clock time than ever dealing with the kernel in one way or another, so
> why is this?

I don't remember the exact title either, but it's something to the
effect of "why aren't operating systems getting faster as fast as
hardware?" by Ousterhout. The answer, IIRC, is pretty simple: the OS
must move a fair amount of data around and doesn't have much locality,
so it's often limited by the memory subsystem and can't take advantage
of modern clock speeds.

There's another great paper by Mogul and Borg from the same time period
about the effect of cache misses on context switch performance. The
gist is that the cost of cache misses can easily dominate the cost of a
context switch. And things have gotten a lot worse since then! DRAM is
still slow and chip clocks are through the roof. I want to write a
followup paper about this, and about writing schedulers that try really
hard to avoid unnecessary context switches while still giving real-time
guarantees. This is especially important on multiprocessors where
migrating data between caches sucks performance. I read somewhere that
the relative difference in speed between a fast P4 and main memory is
about the same as the difference between an 8086 and a hard drive.

I recently (for better or for worse) wanted to demonstrate that it's
okay to add a few microseconds to thread context switch time spent in
the kernel, so I ran some experiments on how the performance of threads
with different working set sizes are impacted by context switches.
This figure shows some data:

http://www.cs.utah.edu/~regehr/papers/diss/doc-wwwch10.html#x27-18000010.4

The result is that while a context switch always involves a few
microseconds in the kernel, on a 500 MHz PIII it can take an app about
4ms to reestablish its working set in the cache after a context switch
when the cache is cold and the working set is nearly the size of the
cache. This means that the cache cost is close to three orders of
magnitude bigger than the nominal context switch cost. Ugly!

Of course, if threads have small working sets or they are thrashing the
cache anyway then this penalty would be much smaller. Ironically,
however, the best-tuned applications, the ones whose working sets just
fit into the cache, are most heavily penalized by context switches.

Food for thought...

John Regehr


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Wed May 16 2001 - 09:42:11 EEST