Jack O'Quin wrote:
> On Sun, Oct 19, 2008 at 2:55 AM, Paul Davis <paul@email-addr-hidden> wrote:
>
>> i don't know whether to shoot myself or eat another couple of the
>> oft-promised hats.
>
> Don't beat yourself up too badly. Multiple threads accessing shared memory
> is *very* tricky. Even smart people (like you) get it wrong sometimes.
I agree with Jack. I remember a friend of mine once qualifying multi-threading
of "kernel quantization". This is complex but also challenging :)
Although your understanding of these matters is far beyond mine, I think that I
have a good professional experience of coding methodology. And to me the problem
here is indeed methodological: there is a lack of unit tests in JACK.
>> so the next question is how best to prevent it. as far as i can see we
>> have a couple of proposals:
>>
>> 1) fons' design, which never actually wraps readptr or writeptr, but
>> masks the address used to access the data buffer
>>
>> 2) removing the intermediate state's visibility
>>
>> i admit to preferring (2) even though i know that with a 64 bit index,
>> not wrapping the ptrs is not really a problem.
>
> I also prefer (2).
>
>> however, it is not totally clear to me how to prevent an optimizing
>> compiler from doing the wrong thing here. unlike the claims made by
>> someone involved with portaudio, i believe that it is correct to declare
>> the read_ptr and write_ptr volatile, so that the compiler knows that it
>> cannot try to be clever about it accesses of the "other" ptr (i.e.
>> read_ptr for the writing thread, and vice versa). maybe the comment on
>> the use of volatile was based on some idea that i thought it made the
>> variables thread safe, which is and never was the case.
>>
>> i suspect that the safest code looks like:
>>
>> size_t tmp = (ptr + incr) & mask;
>> barrier();
>> ptr = tmp;
>>
>> but i am not sure whether barrier needs to be read, write or both.
>>
>> i think that the simpler code:
>>
>> ptr = (ptr + incr) & mask;
>>
>> is subject to potential compiler and/or processor "optimization" that
>> might reduce it back to the problem case of two ops without an
>> intermediate load/store location. the volatile declaration ought to
>> prevent the compiler from doing this, and i don't see why a processor
>> would do this, ever, but clearly i've already been deeply wrong about
>> this. does anybody know for certain?
>
> It is best to avoid assumptions about what some future compiler may
> consider an "optimization". If the register pressure is high at some
> point in the program, it may decide to store some value just to free up
> register space for other variables.
>
> The "volatile" declaration should remove any need for the compiler
> barrier() statement, AFAICT. Note that barrier() is a compiler
> directive, and has no effect on the CPU's ability to reorder cache
> operations in an SMP memory hierarchy.
>
> Here is a fairly clear and complete description of the memory barrier
> issues: http://lxr.linux.no/linux/Documentation/memory-barriers.txt
>
> As best I can tell, both ring buffer threads require "general" memory
> barriers, because they both read and write (different) shared data. In
> Linux kernel terms, they'd need to use smp_mb(), since the problems
> they address are multiprocessor-only. But, since JACK is not compiled
> differently for UP and SMP, the full mb() seems more appropriate.
>
> That stuff makes my head hurt.
Maybe that is because your approach is too theoretical. I've read many theories
about whether memory barriers are needed or not in lock-free ring buffers, and I
think it might lead nowhere without experimenting, that is: write test cases.
Until now, I just couldn't write a test that fail because of the lack of memory
barrier on x86. However, I think I found another bug in Jack ringbuffer, by
writing another test.
It's a bit of a weird test, I call it "bit circle": I start 16 "node" threads.
Each node :
- communicate with its adjacent node through 2 ring buffers
- is responsible for shifting an integer by one bit
- send the shifted result to the next node through a ring buffer
- checks that the value read from previous node is correct.
On my Debian Quad Core and Mac OS X Core Duo boxes, this test fails with Jack's
ringbuffer even with my patch applied. But it succeeds with Portaudio (both with
and without memory barriers), and also with Fons' implementation.
I wish someone could eventually run my test suite on a PowerPC, especially SMP.
The usage is unchanged:
svn co http://svn.samalyse.com/misc/rbtest
cd rbtest
make test
The run time is now shorter, about 2 minutes. Below is the output. You can see
that jack (test-bit-circle-jack) fails the bit circle test, even when patched
(test-bit-circle-jack-fix1). "|" is printed when a node thread starts, and "-"
when a node has read 1000 values, to ensure data is really flowing.
The test-int-array-* tests are the same as my original test. All *-lfq tests use
Fons's Lfq ringbuffer (modified to use char instead of uint32_t) as backend.
./run-tests.sh bit-circle 512 jack jack-fix1 portaudio portaudio-nobarrier lfq
test-bit-circle-jack - starting (5s max) - buffer size: 512
|| FAILURE: 2 != 514
test-bit-circle-jack-fix1 - starting (5s max) - buffer size: 512
|| FAILURE: 2 != 514
test-bit-circle-portaudio - starting (5s max) - buffer size: 512
||-|-|-||-|-|-|-|-||--|||--|---- SUCCESS
test-bit-circle-portaudio-nobarrier - starting (5s max) - buffer size: 512
|||--|-|||---||-||-|-|--||--|--- SUCCESS
test-bit-circle-lfq - starting (5s max) - buffer size: 512
|||--|||-|||-|||--|||-------|--- SUCCESS
./run-tests.sh int-array 16 jack jack-fix1 portaudio portaudio-nobarrier lfq
test-int-array-jack - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] 52572 != 52568 at offset 0
FAILURE in chunk 91822
test-int-array-jack-fix1 - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS
test-int-array-portaudio - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS
test-int-array-portaudio-nobarrier - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS
test-int-array-lfq - starting (20s max) - array/buffer size: 8/16
[writer started] [reader started] [flowing] SUCCESS
-- Olivier Guilyardi / Samalyse _______________________________________________ Linux-audio-dev mailing list Linux-audio-dev@email-addr-hidden http://lists.linuxaudio.org/mailman/listinfo/linux-audio-devReceived on Mon Oct 20 20:15:03 2008
This archive was generated by hypermail 2.1.8 : Mon Oct 20 2008 - 20:15:03 EEST