Analyze System Freeze
For the analysis of system crashes or when a server does not react to any input (System Freeze), systems with Supermicro X10 motherboards and X11 motherboards (Socket 3647) offer a function for reading CPU status and log information. In this Wiki article we will show you how to save this information in order to analyze a problem in more detail.
Usage szenarios
If a system crashes during operation or seems to freeze (i. e. no longer responds to any input), it is possible to request status and log information from the CPU via the IPMI web interface. This information makes it easier to analyse the exact cause of the problem and thus facilitates permanent troubleshooting.
The function to read out this status and log information is supported by server systems with the following Supermicro motherboards:
- Supermicro X10 motherboards with IPMI firmware from version 3.60 (status information from 13.11.2017: Version 3.62 is currently under test at Thomas-Krenn)
- Supermicro X11 Socket 3647 motherboards
Supermicro X11 Socket 1151 motherboards do not offer this feature.
Read out Trouble Shooting Information
Important note: Always save the trouble shooting information as long as the server is switched on and in the Freeze system. If you reboot, the CPU trouble shooting information will be lost.
To save the information, follow these steps:
Note: If only the point undefined instead of Trouble Shooting appears in the menu under Miscellaneus, the IPMI firmware has just been updated and the web browser has not been restarted since. In this case, close the web browser, empty its cache, and then log on to the IPMI web interface again. The menu item and the function are then available as described above.
Contents of the Trouble Shooting file
The saved text file has the following structure:
start time: Mon Nov 13 14:10:05 2017 CPUID 57 01 00 40 c3 06 03 00 MicroCode 57 01 00 40 1d 00 00 00 SRC_LOG CPU0 57 01 00 90 00 00 00 00 IERR_LOG CPU0 57 01 00 90 00 00 00 00 MCERR_LOG CPU0 57 01 00 90 00 00 00 00 57 01 00 90 00 00 00 00 57 01 00 90 00 00 00 00 57 01 00 90 00 00 00 00 57 01 00 90 00 00 00 00 57 01 00 40 03 00 00 00 CPU0 Data end time: Mon Nov 13 14:10:05 2017
Further Information
- Platform-Level Error Handling Strategies for Intel Systems (www.intel.com)
Author: Werner Fischer Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.
|