Re: [linux-audio-dev] Watchdogs?

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Watchdogs?
From: Paul Davis (pbd_AT_Op.Net)
Date: Tue Mar 19 2002 - 20:03:20 EET


>Just wondering if there's any consensus on the use of
>watchdogs. I just locked up my box playing with yet another
>fun freshly-downloaded soundapp from sourceforge...
>(amsynth in this case, but I've seen lockups many many times
>before). Let's face it, it ain't hard to freeze the box and
>kill the keyboard when you're running as root with realtime
>priority. The "Magic SysReq" keys are useless if you get
>no response from the keyboard...
>
>It's not as scary as it used to be now that I use journalling
>filesystems, so I can reboot real quick and not corrupt
>my data. But still, it sucks to have to power down to get
>out of anything.
>
>So, what do we do? Watchdog? In hardware or software?
>Seems intuitive to me that a hardware watchdog would be the
>most reliable, but I haven't looked into it. What's a good
>one, and what do they cost?

the kernel uses the h/w watchdog timer already, but its not made
available for user space - it just catches kernel lockups.

SCHED_FIFO lockups are different, and can't be handled by the kernel,
since they are not "an error" - its just an application with "a lot of
work to do and permission to take as long as it needs".

jack has just this morning seen the addition of a watchdog thread that
runs SCHED_FIFO and at higher priority than the rest of a jack
system. as long as the kernel is still running, the watchdog can kill
any SCHED_FIFO runaway within jack. it checks every 5 seconds to make
sure that progress is being made ...

this idea has been discussed here quite a bit. however, its much, much
harder to do this *between* different processes. again, there is no
way to tell that a SCHED_FIFO thread has gone wrong from "its just
very busy", and you can't even really identify either of these
conditions. the only type of thread that could (a SCHED_FIFO thread
with higher priority) will run anyway, regardless of the fact that the
rest of the system appears to have locked up.

BTW, my impression is that if magic sysreq doesn't work, you've got
more than a SCHED_FIFO hang - you've got a full-scale kernel panic or
deadlock.

IMHO, app writers should be using their own watchdogs if they allow
SCHED_FIFO. And of course i have to add that jack will take care of
all this for you :)

--p


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Mar 19 2002 - 19:50:23 EET