Re: [Freebob-devel] [linux-audio-dev] ieee1394 deadlock on RT kernels

From: Pieter Palmers <pieterp@email-addr-hidden>
Date: Mon Jun 26 2006 - 23:35:09 EEST

Lee Revell wrote:
> On Mon, 2006-06-26 at 21:44 +0200, Pieter Palmers wrote:
>> Lee Revell wrote:
>>> On Mon, 2006-06-26 at 21:05 +0200, Pieter Palmers wrote:
>>>> Lee Revell wrote:
>>>>> On Mon, 2006-06-26 at 16:51 +0200, Pieter Palmers wrote:
>>>>>>
>>>>>> Of course. My monday-morning bad temper is over by now, and I hope I
>>>>>> didn't transfer it to any of you. I'll provide the panic, one way or
>>>>>> another.
>>>>>>
>>>>> Can you reproduce the problem on a non-RT kernel?
>>>>>
>>>> No, it only occurs with RT kernels, and only with those configured for
>>>> PREEMPT_RT. If I use PREEMPT_DESKTOP, there is no problem. (with
>>>> threaded IRQ's etc... only switched over the preemption level in the
>>>> kernel config).
>>>>
>>>> I've uploaded the photo's of the panic here:
>>>> http://freebob.sourceforge.net/old/img_3378.jpg (without flash)
>>>> http://freebob.sourceforge.net/old/img_3377.jpg (with flash)
>>>>
>>>> both are of suboptimal quality unfortunately, but all info is readable
>>>> on one or the other.
>>> Can you add debug printk's before and after tasklet_kill() in
>>> ohci1394_unregister_iso_tasklet to see where it locks up?
>>>
>> That's the first thing I did: the printk before tasklet_kill succeeds,
>> the one right after the tasklet_kill doesn't.
>
> OK that's what I suspected.
>
> It seems that the -rt patch changes tasklet_kill:
>
> Unpatched 2.6.17:
>
> void tasklet_kill(struct tasklet_struct *t)
> {
> if (in_interrupt())
> printk("Attempt to kill tasklet from interrupt\n");
>
> while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
> do
> yield();
> while (test_bit(TASKLET_STATE_SCHED, &t->state));
> }
> tasklet_unlock_wait(t);
> clear_bit(TASKLET_STATE_SCHED, &t->state);
> }
>
> 2.6.17-rt:
>
> void tasklet_kill(struct tasklet_struct *t)
> {
> if (in_interrupt())
> printk("Attempt to kill tasklet from interrupt\n");
>
> while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
> do
> msleep(1);
> while (test_bit(TASKLET_STATE_SCHED, &t->state));
> }
> tasklet_unlock_wait(t);
> clear_bit(TASKLET_STATE_SCHED, &t->state);
> }
>
> You should ask Ingo & the other -rt developers what the intent of this
> change was. Obviously it loops forever waiting for the state bit to
> change.
>

because you are not allowed to yield() in an RT context?

I wish I had been a little more elaborate on my initial mail, as it
would have saved us some time, and communication troubles (on my part
that is). I already spotted the msleep() change in the patch, and I
already tried reverting it. That gives you a nice new panic message,
something like 'BUG: yield()'ing in ...'.

I'm wondering why a patched, but not 'complete preemption' configured
kernel works fine. This change is present in them too, so it probably
has something to do with the msleep() implementation.

Another strange thing is: why doesn't the tasklet finish, so that it can
be 'unscheduled'? I have my IRQ priorities higher than any other RT
threads, so I would expect that the tasklet can finish. Or is
tasklet_kill not-preemtible? that would be very strange as I would
expect that busy waiting on something in a non-preemptible code path on
a single-cpu system always deadlocks.

Greets,

Pieter
Received on Tue Jun 27 00:15:04 2006

This archive was generated by hypermail 2.1.8 : Tue Jun 27 2006 - 00:15:04 EEST