Machine Check Exception
The number of transitors in current CPUs and storage chips is constantly increasing. The speed of the hardware bus systems is also growing. This increases the probability of individual bit errors. Modern chips notice such bit errors and can partially (not always) correct them (for example via ECC). When the chips notice such errors, one speaks of a Machine Check (MC).
(The information of this article is from Machine check handling on Linux from Andi Kleen[1].)
What is a Machine Check
There are two different ways of Machine Checks:
- Machine Check Exception (MCE): This appears, when the hardware notices an error, but cannot correct it. The CPU interrupts the program that is currently running and calls a special Exception Handler.
- Silent Machine Check: The hardware can correct the error, but logs it into the internal directory. These directories can be read out later by the operating system or the firmware.
Causes for Machine Checks can be:
- CPU
- PCI IO
- Storage (Memory)
- Caches
- internal busses
- Software Bugs in drivers (When PCI IO errors are enabled machine checks could be also caused by software bugs in drivers.[2])
Observed Machine Check Exceptions
We could observe MCEs in the following examples:
- Linux Fehlermeldung Machine Check Exception
- SR2500 Critical Interrupt und Bus Uncorrectable error im SEL
References
- ↑ Andi Kleen: Machine check handling on Linux, August 2004, SUSE Labs, additional files available at http://www.halobates.de/
- ↑ Andi Kleen: Machine check handling on Linux, August 2004, SUSE Labs, page 2
More information
- http://en.wikipedia.org/wiki/Machine_Check_Exception
- http://www.admin-blog.com/archives/167-Machine-Check-Exception-unter-Linux-automatisiert-verarbeiten.html
- http://kbase.redhat.com/faq/docs/DOC-3864
- Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Kapitel 5.4.3 Machine-Check Exceptions (Übersicht siehe auch http://www.intel.com/products/processor/manuals/)
- Video Experiences of a x86 maintainer, Andi Kleen, Intel Open Source Technology Center, Feb 2009
- http://blog.incase.de/index.php/cpu-feature-flags-and-their-meanings/, http://boincfaq.mundayweb.com/index.php?language=1&view=176 (MCE CPU Flag)
- http://www.intel.com/software/products/documentation/vlin/mergedprojects/analyzer_ec/mergedprojects/reference_olh/mergedprojects/instructions/instruct32_hh/vc46.htm
- http://lkml.org/lkml/2009/4/29/89 (current developments concerning MCE for Linux Kernel 2.6.31)
- http://adminwiki.de/index.php/Auto_MCELog-Check
- http://en.wikipedia.org/wiki/Machine_Check_Exception
- http://www.admin-blog.com/archives/167-Machine-Check-Exception-unter-Linux-automatisiert-verarbeiten.html
- http://kbase.redhat.com/faq/docs/DOC-3864
- Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Kapitel 5.4.3 Machine-Check Exceptions (overview: http://www.intel.com/products/processor/manuals/)
- Video Experiences of a x86 maintainer, Andi Kleen, Intel Open Source Technology Center, Feb 2009
- http://blog.incase.de/index.php/cpu-feature-flags-and-their-meanings/, http://boincfaq.mundayweb.com/index.php?language=1&view=176 (MCE CPU Flag)
- http://www.intel.com/software/products/documentation/vlin/mergedprojects/analyzer_ec/mergedprojects/reference_olh/mergedprojects/instructions/instruct32_hh/vc46.htm
- http://lkml.org/lkml/2009/4/29/89 (current developments concerning MCE for Linux Kernel 2.6.31)
- http://adminwiki.de/index.php/Auto_MCELog-Check
|
Author: Werner Fischer Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.
|
|
Translator: Alina Ranzinger Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.
|


