[linux-audio-dev] Re: [Jackit-devel] hangs with 2.4.20, jack and clients... one tiny patch latter

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Re: [Jackit-devel] hangs with 2.4.20, jack and clients... one tiny patch latter
From: Fernando Pablo Lopez-Lezcano (nando@ccrma.Stanford.EDU)
Date: Tue Jan 21 2003 - 07:14:18 EET


> > >The hang is not happening in any of these processes, as far as I can
> > >tell. It is not always in exactly the same place, although it happens
> > >always around the same area. I can see time suddenly jumping up, there
> > >are xruns printed (huge underruns, not the usual stuff - I assume this
> > >is before rt_monitor degrades the priorities and things return to
> > >normal). This usually is right after ardour's process function returns.
> > >So I have to see which process is actually interrupting all of this and
> > >hanging the whole thing.
> > >
> > >Very confused at this point.
> >
> > can you check the signalled/awake/complete timestamps in the client
> > struct/debug output? these tell you whether/when:
> >
> > (1) jackd woke the client with write()
> > (2) the client woke up from poll
> > (3) the client wrote to the next fifo
> >
> > these are 3 critical steps that tell you whether or not the hang
> > happens between the two processes, or within one of them. each
> > situation is drastically different from the other and we need to know
> > which it is.
>
> If I understand things correctly the problem seems to be happening in
> the alsa_driver_process function (or in alsa itself). Here a list of
> what happens just before the hang with some comments, please correct me
> if I'm wrong:

[very long printout of trace deleted]

Just by chance I stumbled on this message:

  http://www.redhat.com/mailing-lists/ext3-users/msg08990.html

(I was looking at latency issues on a 3ware controller - no matter what
I do I get 12-18 mSec hits - google for vm.bdflush and look at the
second link)

"This patch fixes an inefficiency and potential system lockup in the 2.4
kernel's ext3 filesystem. The problem has been present since
2.4.20-pre5."

Aha!! pre5 is when I started having problems!
Latter Andrew tells us:

"Unless task A and task B happen to both have realtime scheduling policy
- if they do then kjournald will never run. The state is never cleared
and your box locks up."

The problem always happens with realtime scheduling :-)

So, I patched the kernel and I've been running jack+ardour SCHED_FIFO
for more than an hour (previously it would die at most in a couple of
minutes). Even stressing the disk with a nice tar. I would hate to have
to post in 1/2 hour saying that it locked again, so this time I will
_not_ say the problem is solved :-)

Try it out.
-- Fernando


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Jan 21 2003 - 07:18:16 EET