LSI RAID Monitoring Plugin

From Thomas-Krenn-Wiki
Jump to: navigation, search

The LSI RAID Monitoring Plugin can be used to monitor LSI Controller RAID sets. The plugin is written in Perl and uses the command line tool storcli to interact with the controller.

Current Version

The current check_lsi_raid version can be obtained from the Thomas Krenn git server:

Functionalities

The plugin README lists all available checks:

Mailing List

Questions are answered via the TK-Monitoring-User-Mailing-List:

tk-monitoring-plugins-user@lists.thomas-krenn.com

The archive can be found at:

http://lists.thomas-krenn.com/pipermail/tk-monitoring-plugins-user/

Prerequisites

A detailed description on how to install the requirements will follow in the next sections:

  • On the monitored server
    • Install check_lsi_raid plugin
    • Install storcli
    • Add sudoers entry for user nagios to call storcli without sudo password
    • If NRPE is used a command definition for NRPE
  • On the Icinga server

Installation

Manually

Installing the plugin manually involves copying the file into the directory /usr/lib/nagios/plugins.

:~$ git clone http://git.thomas-krenn.com/check_lsi_raid.git
Cloning into 'check_lsi_raid'...
:~$ cd check_lsi_raid/
:~/check_lsi_raid$ ls
check_lsi_raid   check_lsi_raid.html README
:~/check_lsi_raid$ sudo cp check_lsi_raid /usr/lib/nagios/plugins/

The LSI Storage Command Line Tool (StorCLI) can be obtained from the LSI website for the corresponding controller and operating system: http://www.lsi.com/Search/Pages/downloads.aspx?k=Latest%20StorCLI

TK Ubuntu-Repository

After setting up the Repo - Thomas Krenn Ubuntu Repo - the package nagios-plugins-thomas-krenn installs check_lsi_raid:

:~$ sudo apt-get install nagios-plugins-thomas-krenn
[...]
Suggested packages:
  arcconf storcli freeipmi-tools libipc-run-perl
The following NEW packages will be installed:
  nagios-plugins-thomas-krenn
0 upgraded, 1 newly installed, 0 to remove and 79 not upgraded.
Need to get 0 B/25.2 kB of archives.
After this operation, 127 kB of additional disk space will be used.
Selecting previously unselected package nagios-plugins-thomas-krenn.
(Reading database ... 68398 files and directories currently installed.)
Unpacking nagios-plugins-thomas-krenn (from .../nagios-plugins-thomas-krenn_0.3-1_all.deb) ...
Setting up nagios-plugins-thomas-krenn (0.3-1) ...

The suggested package storcli must be installed also:

:~$ sudo apt-get install storcli
[...]
The following NEW packages will be installed:
  storcli
0 upgraded, 1 newly installed, 0 to remove and 72 not upgraded.
Need to get 1,385 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.thomas-krenn.com/packages/ precise/optional storcli amd64 1.03.11-1 [1,385 kB]
Fetched 1,385 kB in 0s (1,779 kB/s)
Selecting previously unselected package storcli.
(Reading database ... 64459 files and directories currently installed.)
Unpacking storcli (from .../storcli_1.03.11-1_amd64.deb) ...
Setting up storcli (1.03.11-1) ...

Configuration

The plugin is suited to monitor a host via NRPE as well as a local host. In both cases the plugin has to be installed on the monitored server.

Via NRPE

One the Icinga Server

The host definition defines the command that is called via NRPE on the remote host. The plugin parameters are specified on the remote host afterwards:

define service {
    service_description           lsi-raid-nrpe
    display_name                  LSI RAID
    use                           generic-service
    host_name                     test
    check_command                 check_nrpe_1arg!check_lsi_raid
}

Attention: In order to use the Call-Home-Service some templates have to be created - cf. Using Call-Home-Service with Icinga or Nagios. Then instead of use generic-service, the line use thomas-krenn-service must be used at service definition!

Using TKmon to configure check_lsi_raid

The LSI raid plugin is already integrated in the TKmon service catalogue. It is sufficient to select "LSI RAID via NRPE" when adding a service to a host:

On the monitored Server

The user nagios has to call storcli without providing a sudo password. Therefore the following sudoers definition is created:

:~$ sudo vi /etc/sudoers.d/check_lsi_raid 
nagios ALL=(root)NOPASSWD:/usr/sbin/storcli
:~$ sudo chmod 440 /etc/sudoers.d/check_lsi_raid

If the definition is correct, the following command does not ask for a password:

:~$ sudo su nagios --shell /bin/bash
:~$ sudo /usr/sbin/storcli -V

      Storage Command Line Tool  Ver 1.03.11 Jan 30, 2013

    (c)Copyright 2012, LSI Corporation, All Rights Reserved.

Exit Code: 0x00

A NRPE configuration file specifies the check command that is called if check_lsi_raid is used via NRPE. This command must be the same as the one on the Icinga side in the host definition:

:~$ sudo vi /etc/nagios/nrpe.d/raid.cfg
command[check_lsi_raid]=/usr/lib/nagios/plugins/check_lsi_raid -C 0 -p /usr/sbin/storcli
:~$ sudo service nagios-nrpe-server restart

Finally a test on the Icinga side checks the correct definitions:

:~$ /usr/lib/nagios/plugins/check_nrpe -H 10.0.0.2 -c check_lsi_raid
OK (CTR, LD, PD, CV)|CV_Temperature=27;70;85 ROC_Temperature=62;80;90

Check local RAID

A local check is useful if the Icinga server itself uses a LSI RAID controller.

Manually

Requirements for a local installation are the plugin, storcli and the sudoers entry. As a first step an Icinga command definition is created:

:~$ sudo vi /etc/nagios-plugins/config/check_lsi_raid.cfg
define command {
        command_name    check_lsi_raid
        command_line    /usr/lib/nagios/plugins/check_lsi_raid -C '$ARG1$' -p '$ARG2$'
}

A service definition uses this command:

define service{
        use                             generic-service
        host_name                       tkmon
        service_description             lsi-raid
        check_command                   check_lsi_raid!0!/usr/sbin/storcli
}

Example Output

$ sudo ./check_lsi_raid -p /opt/MegaRAID/storcli/storcli64 -vvv
Critical (CTR Warn, LD Crit, PD Warn) [c0/v0_State = Critical (Dgrd)][c0/e252/s2_State = Critical (Rbld)][CTR_Degraded_drives = Warning (1)]
[c0/e252/s2_Rebuild = Warning (4)]|CV_Temperature=24;70;85 ROC_Temperature=58;80;90
Used storcli commands:
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 /c0 /cv show status
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 adpallinfo a0
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 /c0/vall show all
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 /c0/vall show init
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 /c0/eall/sall show initialization
- /usr/bin/sudo /opt/MegaRAID/storcli/storcli64 /c0/eall/sall show rebuild
Critical sensors:
	- c0/v0_State (Dgrd)
	- c0/e252/s2_State (Rbld)
Warning sensors:
	- CTR_Degraded_drives (1)
	- c0/e252/s2_Rebuild (4)
CTR information:
	- LSI MegaRAID SAS 9271-4i:
		- Serial No=SV30900638
		- FW Package Build=23.28.0-0010
		- Mfg. Date=02/23/13
		- Revision No=07B
		- BIOS Version=5.46.02.0_4.16.08.00_0x06060900
		- FW Version=3.400.05-3175
		- ROC temperature=58  degree Celcius
LD information:
	- c0/v0:
		- Access=RW
		- Cache=RWBD
		- Consist=Yes
		- DG/VD=0/0
		- Size=74.0
		- State=Dgrd
		- TYPE=RAID1
		- ld=c0/v0
		- sCC=-
PD information:
	- c0/e252/s1:
		- BBM Error Count=0
		- DG=-
		- DID=6
		- Drive Temperature=N/A
		- EID:Slt=252:1
		- Intf=SATA
		- Med=SSD
		- Media Error Count=0
		- Model=INTEL SSDSC2BB080G4
		- Other Error Count=0
		- PI=N
		- Predictive Failure Count=0
		- S.M.A.R.T alert flagged by drive=No
		- SED=N
		- SeSz=512B
		- Shield Counter=0
		- Size=74.0GB
		- Sp=U
		- State=UGood
		- pd=c0/e252/s1
	- c0/e252/s2:
		- BBM Error Count=0
		- DG=0
		- DID=5
		- Drive Temperature=0C (32.00 F)
		- EID:Slt=252:2
		- Intf=SATA
		- Med=SSD
		- Media Error Count=0
		- Model=INTEL SSDSC2BB080G4
		- Other Error Count=0
		- PI=N
		- Predictive Failure Count=0
		- S.M.A.R.T alert flagged by drive=No
		- SED=N
		- SeSz=512B
		- Shield Counter=0
		- Size=74.0GB
		- Sp=U
		- State=Rbld
		- pd=c0/e252/s2
		- rebuild=4
	- c0/e252/s3:
		- BBM Error Count=0
		- DG=0
		- DID=4
		- Drive Temperature=N/A
		- EID:Slt=252:3
		- Intf=SATA
		- Med=SSD
		- Media Error Count=0
		- Model=INTEL SSDSC2BB080G4
		- Other Error Count=0
		- PI=N
		- Predictive Failure Count=0
		- S.M.A.R.T alert flagged by drive=No
		- SED=N
		- SeSz=512B
		- Shield Counter=0
		- Size=74.0GB
		- Sp=U
		- State=Onln
		- pd=c0/e252/s3
CV information:
		- CV_Replacement_required=No
		- CV_Status=OK
		- CV_Temperature=24

Author: Georg Schönberger

Related articles

SMART Attributes Monitoring Plugin
Call-Home-Service
Call-Home-Service FAQs