Watchdog

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

A watchdog enables an automatic reboot when a server crashes. An activated watchdog performs a countdown that leads to a reboot when the countdown expires. If the OS (a daemon running in the OS) writes to the watchdog device, the countdown starts again. If there is no write access to the watchdog device (e.g. due to a crash), a reboot occurs as desired.

Watchdog chips of some Thomas-Krenn servers

System Watchdog Chip (via Super I/O chip) Linux Kernel Modul Linux Kernel support since Notes
LES v3 Fintek F81803 f71808e_wdt 5.4 kernelnewbies.org, Commit
LES v4 ITE IT8613E not yet in the kernel
LES plus v2 ITE IT8784E-I it87_wdt 5.10 kernelnewbies.org, Commit

Tests of the watchdog function

Below are some test results of different Thomas-Krenn servers.

LES v3

The following watchdog options can be set in the BIOS under Advanced -> Super IO Configuration:

  • WatchDog Reset Timer (Enabled/Disabled)
  • WatchDog Reset Timer Value
  • WatchDog Reset Timer Unit (Sec./Min.)
  • WatchDog Wake-up Time (Enabled/Disabled)

In this example (with Ubuntu 20.04.3 installed), we enable the watchdog reset timer and set the value for it to 5 min. Then, when Linux is booted and no further configuration is done, the system restarts every 5 minutes.

After booting Ubuntu 20.04.3 with Linux kernel 5.11, load the Watchdog module and check the output of dmesg:

tk@ubuntu:~$ sudo modprobe f71808e_wdt
tk@ubuntu:~$ dmesg | tail -n 1
[  185.557090] f71808e_wdt: Found f81803 watchdog chip, revision 16

The wdctl command then shows details of the watchdog:

tk@ubuntu:~$ sudo wdctl 
Device:        /dev/watchdog
Identity:      f81803 watchdog [version 0]
Timeout:       60 seconds
FLAG           DESCRIPTION                   STATUS BOOT-STATUS
CARDRESET      Card previously reset the CPU      0           0
KEEPALIVEPING  Keep alive ping reply              0           0
MAGICCLOSE     Supports magic close char          0           0

The BIOS setting (5 min.) is overwritten by the OS setting (60 seconds). Likewise, the watchdog is now deactivated. Only a write to the watchdog device /dev/watchdog activates the watchdog again.

To test the watchdog functionality, execute the following command as root user, then press the Enter key 2x and wait:

root@ubuntu:~# cat >> /dev/watchdog 


After the timeout expires (60 seconds in this example), a reboot occurs.

For regular use of the watchdog, install the watchdog package. Then make sure that the f71808e_wdt module and the watchdog daemon are started automatically on system boot. The daemon writes regularly to the watchdog device, this prevents a reboot by the watchdog. If a crash occurs, the watchdog will reboot the system after the timeout has expired.

LES v4

So far, the it87_wdt module in the Linux kernel does not support the installed ITE IT8613E (using a Debian 12 test system with a 6.5 Kernel):

tk@debian12:~$ uname -a
Linux debian12 6.5.0-rc3-dirty #1 SMP PREEMPT_DYNAMIC Thu Jul 27 13:35:58 CEST 2023 x86_64 GNU/Linux
tk@debian12:~$ sudo modprobe it87_wdt
modprobe: ERROR: could not insert 'it87_wdt': No such device
tk@debian12:~$ sudo dmesg | tail -n 1
[  270.984823] it87_wdt: Unknown Chip found, Chip 8613 Revision 000c

The following patch adds support for the IT8613 chip:

--- a/drivers/watchdog/it87_wdt.c	2023-09-25 15:03:52.986033681 +0200
+++ b/drivers/watchdog/it87_wdt.c	2023-09-25 15:03:36.046415573 +0200
@@ -13,9 +13,9 @@
  *		    http://www.ite.com.tw/
  *
  *	Support of the watchdog timers, which are available on
- *	IT8607, IT8620, IT8622, IT8625, IT8628, IT8655, IT8665, IT8686,
- *	IT8702, IT8712, IT8716, IT8718, IT8720, IT8721, IT8726, IT8728,
- *	IT8772, IT8783 and IT8784.
+ *	IT8607, IT8613, IT8620, IT8622, IT8625, IT8628, IT8655, IT8665,
+ *	IT8686, IT8702, IT8712, IT8716, IT8718, IT8720, IT8721, IT8726,
+ *	IT8728, IT8772, IT8783 and IT8784.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -50,6 +50,7 @@
 /* Chip Id numbers */
 #define NO_DEV_ID	0xffff
 #define IT8607_ID	0x8607
+#define IT8613_ID	0x8613
 #define IT8620_ID	0x8620
 #define IT8622_ID	0x8622
 #define IT8625_ID	0x8625
@@ -277,6 +278,7 @@
 		max_units = 65535;
 		break;
 	case IT8607_ID:
+	case IT8613_ID:
 	case IT8620_ID:
 	case IT8622_ID:
 	case IT8625_ID:

With this patch, the IT8613 chip is supported:

tk@debian12:~$ uname -a
Linux debian12 6.6.0-rc2-dirty #2 SMP PREEMPT_DYNAMIC Fri Sep 22 12:20:27 CEST 2023 x86_64 GNU/Linux
tk@debian12:~$ sudo modprobe it87_wdt
tk@debian12:~$ sudo dmesg | tail -n 1
[   72.801650] it87_wdt: Chip IT8613 revision 12 initialized. timeout=60 sec (nowayout=0 testmode=0)
tk@debian12:~$ sudo wdctl /dev/watchdog1 
Device:        /dev/watchdog1
Identity:      IT87 WDT [version 1]
Timeout:       60 seconds
Pre-timeout:    0 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          1           0
MAGICCLOSE     Supports magic close char      0           0
PRETIMEOUT     Pretimeout (in seconds)        1           0
SETTIMEOUT     Set timeout (in seconds)       0           0

To test the watchdog functionality, execute the following command as root user, then press the Enter key 2x and wait:

tk@debian12:~$ sudo su -
root@debian12:~# cat >> /dev/watchdog1


In our tests, after 60 seconds the watchdog executes a hard reboot.

System with IT8659

Test with 6.7.0-rc1

Test with Kernel 6.7.0-rc1:

tk@lmde:~$ uname -a
Linux lmde 6.7.0-rc1 #1 SMP PREEMPT_DYNAMIC Thu Nov 16 09:54:38 CET 2023 x86_64 GNU/Linux
tk@lmde:~$ sudo modprobe it87_wdt
modprobe: ERROR: could not insert 'it87_wdt': No such device
tk@lmde:~$ sudo dmesg | tail -1
[   41.018799] it87_wdt: Unknown Chip found, Chip 8659 Revision 0007

Patch

--- a/drivers/watchdog/it87_wdt.c	2023-11-21 11:36:18.548640180 +0100
+++ b/drivers/watchdog/it87_wdt.c	2023-11-21 11:38:11.994805918 +0100
@@ -56,6 +56,7 @@
 #define IT8625_ID	0x8625
 #define IT8628_ID	0x8628
 #define IT8655_ID	0x8655
+#define IT8659_ID	0x8659
 #define IT8665_ID	0x8665
 #define IT8686_ID	0x8686
 #define IT8702_ID	0x8702
@@ -284,6 +285,7 @@
 	case IT8625_ID:
 	case IT8628_ID:
 	case IT8655_ID:
+	case IT8659_ID:
 	case IT8665_ID:
 	case IT8686_ID:
 	case IT8718_ID:

Test with Patch

tk@lmde:~$ sudo modprobe it87_wdt
tk@lmde:~$ sudo dmesg | tail -1
[   19.933522] it87_wdt: Chip IT8659 revision 7 initialized. timeout=60 sec (nowayout=0 testmode=0)


Foto Werner Fischer.jpg

Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Related articles

BIOS POST Code A9 Solution
M.2 and mSATA SSD Support of Thomas-Krenn Servers
SATA exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen