Battery Backup Unit (BBU/BBM) Maintenance for RAID Controllers

From Thomas-Krenn-Wiki
Jump to: navigation, search

Modern RAID controllers have integrated caches for increasing performance. With corresponding protective mechanisms, the content of these caches would be lost when a power failure occurs. For that reason, the cache content is often protected by a BBU or BBM (depending on the manufacturer, either the term Battery Backup Unit (BBU) or Battery Backup Module (BBM) is used). However, proper maintenance is required so that the BBU will actually work properly during a power failure. With such maintenance, complete data loss may be a risk during a power failure in the worst case.

Note: RAID controllers, which do not use a BBU to protect the cache (but instead copy the content of the cache to flash memory in the event of a power failure), do not require special cache protection maintenance (e.g. Adaptec ZMCP or LSI CacheVault).

BBU & BBM Maintenance Basics

BBUs always consist of two components:

The battery will be completed charged when it is first placed in service. Through the self-discharging process, however, the battery will lose part of its stored energy. For that reason, it will be periodically re-charged.

Loss of Capacity

Over the course of time, the battery will lose some capacity (thus, the maximum storable amount of energy will decrease). This behavior is also well known from notebook batteries. For a new notebook with a new battery, the potential battery operational time might amount to three hours, for example. After three years of use, a fully charged notebook battery might last only 40 minutes, for example.

RAID controller manufacturers generally indicate a useful period of up to five years for BBU batteries. The actual life will depend upon several factors (environmental temperature, number of charge/discharge cycles and so forth). If a battery has only a minimal capacity after several years, it will only be able to protect the cache content for a few minutes during a power failure (even if the battery has been fully charged). Thus, the battery is an expendable component. It status should be periodically checked. If its capacity has reached a minimal state, either the battery or the BBU should be replaced, to avoid data loss during a power failure. (Note: The battery itself can be replaced in 3ware controllers; for Adaptec and Areca controllers, the battery is soldered to the electronics, which forces replacement of the entire BBU.)

Backup Duration

Even a new battery with high capacity can only retain the cache content for a limited interval (typically 72 hours). If the power failure were to last several days, the cache content might be lost despite a new battery.

Examples

3ware RAID Controllers

3ware provides the ability to perform a so-called "battery test" with their RAID controllers[1]. This test serves to determine the precise capacity of the battery and thereby determine an estimated value for the potential backup duration during a power failure.

The objective of this test is the determination of the most precise estimated value possible. For this, the battery will first be fully charged. After that, a complete discharge cycle will be started. At the end of this test, the battery will automatically be completely re-charged. The entire process typically takes eight to twelve hours. 3ware recommends performing this test every four weeks.

Important note: During the entire test and the subsequent re-charging of the battery, the RAID controller’s cache will be deactivated. Because it might lead to a limitation on performance, this test should only be performed at times, when there will be a minimal load.

For example, the status of the BBU can be requested through the 3ware command line interface (CLI).

root@testserver:~# tw_cli /c0 show

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    OK             -       -       -       34.4482   ON     OFF   
u1    SPARE     OK             -       -       -       34.4684   -      OFF   

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     34.47 GB    72303840      WD-WMANT1051720    
p1     OK               u0     34.47 GB    72303840      WD-WMANT1051894    
p2     OK               u1     34.47 GB    72303840      WD-WMAKH1083404    
p3     NOT-PRESENT      -      -           -             -

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       255    06-Apr-2009 

root@testserver:~#

You will find additional information about the potential BBU status of 3ware RAID controllers in the 3ware BBU States and Their Effects on Cache Settings article.

Adaptec RAID Controllers

With Adaptec, the battery state can also be queried. For this, the following options are available:

  • Request through the Adaptec CLI, arcconf
  • Request through the Adaptec Storage Manager (ASM)
  • Request through the RAID controller’s BIOS

As long as the battery capacity can retain the cache content for at least 24 hours during a power failure, the RAID controller’s cache will remain in write-back mode (thus, active). At lower capacity levels, the cache will be placed in write-through mode (to the extent that the cache has not been permanently (meaning without regard for the BBU state) reset to write-back mode).

Ideal State Status

Requesting Status through the Adaptec CLI

The final lines (in the section below controller battery information) are relevant in the report from arcconf GETCONFIG 1 AD.

linux-k3oa:~ # /usr/StorMan/arcconf GETCONFIG 1 AD
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Channel description                      : SAS/SATA
   Controller Model                         : Adaptec 5805
   Controller Serial Number                 : 8C35109557F
   Physical Slot                            : 6
   Temperature                              : 70 C/ 158 F (Normal)
   Installed memory                         : 512 MB
   Copyback                                 : Disabled
   Background consistency check             : Disabled
   Automatic Failover                       : Enabled
   Global task priority                     : High
   Performance Mode                         : Default/Dynamic
   Stayawake period                         : Disabled
   Spinup limit internal drives             : 0
   Spinup limit external drives             : 0
   Defunct disk drive count                 : 0
   Logical devices/Failed/Degraded          : 2/0/0
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (16343)
   Firmware                                 : 5.2-0 (16343)
   Driver                                   : 1.1-5 (2456)
   Boot Flash                               : 5.2-0 (16343)
   --------------------------------------------------------
   Controller Battery Information
   --------------------------------------------------------
   Status                                   : Optimal
   Over temperature                         : No
   Capacity remaining                       : 99 percent
   Time remaining (at current draw)         : 3 days, 7 hours, 16 minutes


Command completed successfully.
linux-k3oa:~ #

Request through the Adaptec Storage Manager (ASM)

Adaptec-bbu-status-asm.png

Request through the RAID controller’s BIOS

Adaptec-bbu-status-bios.png

Charging State Status

In comparison with the system above, the time remaining is less, because the battery will not be completely charged.

linux-kfqr:~ # /usr/StorMan/arcconf GETCONFIG 1 AD
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Channel description                      : SAS/SATA
   Controller Model                         : Adaptec 5805
   Controller Serial Number                 : 8C3510954C9
   Physical Slot                            : 6
   Temperature                              : 71 C/ 159 F (Normal)
   Installed memory                         : 512 MB
   Copyback                                 : Disabled
   Background consistency check             : Disabled
   Automatic Failover                       : Enabled
   Global task priority                     : High
   Performance Mode                         : Default/Dynamic
   Stayawake period                         : Disabled
   Spinup limit internal drives             : 0
   Spinup limit external drives             : 0
   Defunct disk drive count                 : 0
   Logical devices/Failed/Degraded          : 2/0/0
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (16343)
   Firmware                                 : 5.2-0 (16343)
   Driver                                   : 1.1-5 (2456)
   Boot Flash                               : 5.2-0 (16343)
   --------------------------------------------------------
   Controller Battery Information
   --------------------------------------------------------
   Status                                   : Charging
   Over temperature                         : No
   Capacity remaining                       : 73 percent
   Time remaining (at current draw)         : 2 days, 10 hours, 57 minutes


Command completed successfully.
linux-kfqr:~ # 

Other Status States

Additional potential status states include:

  • Not Installed
  • Failed

Areca RAID Controllers

Areca also offers the ability to request the state through the CLI.

[root@testserver ~]# ./cli64 hw info
Physical Hardware Information
The Hardware Monitor Information
===========================================
Fan#1 Speed (RPM)   : 2673
Battery Status      : 100%
HDD #1  Temp.       : 0
HDD #2  Temp.       : 0
HDD #3  Temp.       : 0
HDD #4  Temp.       : 0
===========================================
GuiErrMsg<0x00>: Success.
[root@testserver ~]#

Areca describes the following approach to checking the proper functionality of the BBM in their documentation[2] (however, we recommend this approach only for test systems. For production system, we really recommend replacing the battery in case of doubt).

  1. Write a large file, 5 gigabytes for example
  2. Once the write process has completed, pull the plug immediately.
  3. Check the BBM status. It should beep every couple of seconds.
  4. Re-start the system and open the controller’s BIOS using the Tab or F6 keys.
  5. Check the controller’s event log from the controller’s BIOS. An entry indicated controller boot up with power recovered should appear in the log.

As noted above, we recommend against this method of testing for production systems.

References

  1. 3ware SAS/SATA RAID Software User Guide page 203 (section Testing Battery Capacity)
  2. Areca SATA RAID Cards USER Manual page 144 (the Battery Functionality Test Procedure section)


Foto Werner Fischer.jpg

Author: Werner Fischer

Werner Fischer, working in the Web Operations & Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Related articles

Creating an onboard Intel RAID
Querying RAID Status
RAID Consistency Check