RAID Consistency Check

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

RAID controllers provide a variety of options for testing the consistency of a RAID set. The objective of such tests is the early detection of parity and block errors.

Generally, all of the associated blocks on the affected hard disks will be read. If individual read errors (bad blocks) occur during this testing process (and sufficient redundant data is available), these blocks can be re-written with the correct data (such as by means of 3ware’s Dynamic Sector Repair feature). When re-writing the data from the affected data block on the hard disk, the hard disk’s firmware will replace the erroneous sector with another free sector (reserve sector). Additional information about this can be found on Wikipedia at http://en.wikipedia.org/wiki/Bad_sector

3ware RAID Controller

3ware provides the verify feature for testing RAID arrays. If this feature is executed, the performance of the RAID array will be affected by the test. The affect on performance can be configured by varying the Background Task Rate setting.

  • 3ware recommends running a verify process at least weekly.
  • If a RAID unit has not yet been initialized then the initialization will automatically be executed instead of the test when starting the verify feature. After the initialization process has completed, the verify feature must be re-started. The initialization status of a RAID unit can be determined by issuing the tw_cli /c0/u0 show all command from the command line interface (CLI).
  • Additional, detailed information can be found in the RAID Controller User Guide from 3ware in the Sections, About Initialization and About Verification.

Example:

[root@testserver ~]# tw_cli /c0/u0 start verify
Sending start verify message to /c0/u0 ... Done.

[root@testserver ~]# tw_cli /c0/u0 show all
/c0/u0 status = VERIFYING
/c0/u0 is not rebuilding, its current state is VERIFYING
/c0/u0 is verifying with percent completion = 1
/c0/u0 is initialized.
/c0/u0 volume(s) = 1
/c0/u0 name =                      
/c0/u0 serial number = 5ND29QM781616400022D 
/c0/u0 Storsave Policy = protect   
/c0/u0 Command Queuing Policy = off       

Unit     UnitType  Status         %Cmpl  Port  Stripe  Size(GB)  Blocks
-----------------------------------------------------------------------
u0       RAID-1    VERIFYING      1      -     -       232.82    488259584   
u0-0     DISK      OK             -      p1    -       232.82    488259584   
u0-1     DISK      OK             -      p0    -       232.82    488259584   

Parameter index does not exist 

[root@testserver ~]#

Areca RAID Controller

The Consistency Check feature tests the consistency of Areca RAID arrays. The feature can check RAID3, RAID5 and RAID6 level RAID sets. In doing so, all associated data blocks will be read, parity will be calculated, the stored parity read, and finally the calculated and stored parity values will be compared. Naturally, the performance of the RAID set will be affected by this process.

Areca also recommends performing this type of check at least weekly.

The user manual for the Areca RAID controller contains detailed information in the Consistency Check section as well in the CLI user guide.

Example:

[root@testserver ~]# cli64 vsf check vol=1

Consistency checks that in are process can also be interrupted (such as during performance bottlenecks).

[root@testserver ~]# cli64 vsf stopcheck

Practical Experience

RAID tests were performed using an Intel SR2500 server (BIOS: 94, BMC: 64, FRUSDER: 47) using the Areca ARC-1210-4x SATA controller (BIOS: v1.21, Firmware: 1.46), including a battery backup module with three hard disks (ST3200826AS). A Level 5 RAID volume with 20 gigabytes was tested. The process took five minutes and twenty-five seconds, which corresponds to a test throughput of roughly sixty-three megabytes per second.

Adaptec RAID Controller

With the Adaptec controller, the verify feature tests the disk media.

There are the following logical drive options according to the Adaptec CLI Users Guide v 5.20:

  • verify_fix (Verify with fix) - verifies the disk media and repairs the disk if bad data is found
  • verify - verifies the disk media

Example:

[root@testserver root]# ./arcconf TASK

 Usage: TASK START <Controller#> LOGICALDRIVE <LogicalDrive#> <task> [noprompt]
 Usage: TASK STOP  <Controller#> LOGICALDRIVE <LogicalDrive#> 

 Usage: TASK START <Controller#> DEVICE <Channel# ID#> <task> [noprompt]
 Usage: TASK STOP  <Controller#> DEVICE <Channel# ID#> 
 ======================================================

Performs a task on a logical or physical device

    Task           : Task to be started or performed.

    LogicalDrive#  : logical device ID on which task is to be performed
    Logical Tasks  : verify_fix (Verify with fix)
                     verify
                     clear

    Channel# ID#   : The Channel and ID of the physical device on which task is to be
                     performed.  Optionally ALL indicates all ready drives for initialize
                     task only (ex. ARCCONF TASK START 1 DEVICE ALL INITIALIZE).
    Physical Tasks : verify_fix
                     verify
                     clear
                     initialize
                     secureerase
[root@testserver root]# ./arcconf TASK START 1 LOGICALDRIVE 0 verify
Controllers found: 1
Verify of a Logical Device is a long process.

Are you sure you want to continue?
Press y, then ENTER to continue or press ENTER to abort: y


Command completed successfully.
[root@testserver root]# ./arcconf GETSTATUS 1
Controllers found: 1
Logical device Task:
   Logical device                 : 0
   Task ID                        : 101
   Current operation              : Verify
   Status                         : In Progress
   Priority                       : High
   Percentage complete            : 1


Command completed successfully.
[root@testserver root]#

The following document contains additional, detailed information:

Service and Maintenance of Adaptec RAID Solutions

Hard Disks without RAID Controller

Tests for block errors can be performed on hard disks that are managed directly by the main board (without a RAID controller) with the help of SMART tools. For example, the smartmontools can be used under Linux. Wikipedia provides a comparison of additional SMART Tools.

The following example is the result of a test using smartmontools under Linux.

[root@testserver ~]# smartctl -t long /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 115 minutes for test to complete.
Test will complete after Fri Apr 11 12:34:29 2008

Use smartctl -X to abort test.
[root@testserver ~]#

After the test has been performed, the result can be requested.

[root@testserver ~]# smartctl -l selftest /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       112         -

[root@testserver ~]#

If there are errors, correction does not require much effort. Additional information about this can be found at the Smartmontools Project: Bad Block HowTo (see also Analysis of a hard disk with bad blocks using smartctl).

Note: With some SATA hard disks, executing smartctl may also require one of the following flags: -d ata or -d sat (regarding this, see also http://smartmontools.sourceforge.net/#testinghelp).


Foto Werner Fischer.jpg

Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Related articles

ClickBIOS Web Application
Querying RAID Status
RAID Controller Basics