WATCHDOG(4) - Device Drivers Manual #
WATCHDOG(4) - Device Drivers Manual
NAME #
watchdog - hardware timers/counters for quick crash recovery
DESCRIPTION #
Hardware watchdog timers are devices that reboot the machine when it hangs. The kernel continually resets the watchdog clock on a regular basis. Thus, if the kernel halts, the clock will time out and reset the machine. Watchdog timers may be configured to be reset from userland to cause a reboot if process scheduling fails; see watchdogd(8) for further information.
A number of hardware watchdogs are supported, and all are configured using sysctl(8) under the kern.watchdog name:
kern.watchdog.auto
Automatically reset (’tickle’) the watchdog timer but disable it at system shutdown time.
kern.watchdog.period
The timeout in seconds. Setting it to zero disables the watchdog timer.
In situations where the machine provides vital services which are not handled completely in kernel space, e.g. mail exchange, it may be desirable to reboot the machine if process scheduling fails. This is done by setting kern.watchdog.auto to zero and running a process which repeatedly sets kern.watchdog.period to the desired timeout value. Then, if process scheduling fails, the process resetting the timer will not be run, leading to the machine being rebooted. Note that the kernel will not automatically disable an enabled watchdog at system shutdown time when kern.watchdog.auto is set to zero.
Watchdog timers should be used in high-availability environments where getting machines up and running quickly after a crash is more important than determining the cause of the crash. A watchdog timer enables a crashed machine to autonomously attempt to recover quickly after a system failure.
Note that this also means that it is unwise to combine watchdog timers with ddb(4) since the latter may prevent the former from resetting the watchdog timeout before it expires. This means that the machine will be rebooted before any debugging can be done. In other words: For mission critical machines, disable ddb(4) by adding “ddb.panic=0” to sysctl.conf(5) since this will give the chance to perform a crash dump and reboot. Simply setting the watchdog will lose the debug trace of what went wrong.
SEE ALSO #
ddb(4), sysctl.conf(5), config(8), sysctl(8), watchdogd(8)
BUGS #
For systems with multiple watchdog timers available, only a single one can be used at a time. There is currently no way of selecting which device is used; the first discovered by the kernel is selected.
OpenBSD 7.5 - May 21, 2009