Battery Backup Unit (BBU/BBM) Maintenance for RAID Controllers
Please note that this article / this category refers either on older software / hardware components or is no longer maintained for other reasons. This page is no longer updated and is purely for reference purposes still here in the archive available. |
---|
Modern RAID controllers have integrated caches for increasing performance. With corresponding protective mechanisms, the content of these caches would be lost when a power failure occurs. For that reason, the cache content is often protected by a BBU or BBM (depending on the manufacturer, either the term Battery Backup Unit (BBU) or Battery Backup Module (BBM) is used). However, proper maintenance is required so that the BBU will actually work properly during a power failure. With such maintenance, complete data loss may be a risk during a power failure in the worst case.
Note: RAID controllers, which do not use a BBU to protect the cache (but instead copy the content of the cache to flash memory in the event of a power failure), do not require special cache protection maintenance (e.g. Adaptec ZMCP or LSI CacheVault).
BBU & BBM Maintenance Basics
BBUs always consist of two components:
- electronics for controlling and communicating with RAID controllers
- a rechargeable battery
The battery will be completed charged when it is first placed in service. Through the self-discharging process, however, the battery will lose part of its stored energy. For that reason, it will be periodically re-charged.
Loss of Capacity
Over the course of time, the battery will lose some capacity (thus, the maximum storable amount of energy will decrease). This behavior is also well known from notebook batteries. For a new notebook with a new battery, the potential battery operational time might amount to three hours, for example. After three years of use, a fully charged notebook battery might last only 40 minutes, for example.
RAID controller manufacturers generally indicate a useful period of up to five years for BBU batteries. The actual life will depend upon several factors (environmental temperature, number of charge/discharge cycles and so forth). If a battery has only a minimal capacity after several years, it will only be able to protect the cache content for a few minutes during a power failure (even if the battery has been fully charged). Thus, the battery is an expendable component. It status should be periodically checked. If its capacity has reached a minimal state, either the battery or the BBU should be replaced, to avoid data loss during a power failure. (Note: The battery itself can be replaced in 3ware controllers; for Adaptec and Areca controllers, the battery is soldered to the electronics, which forces replacement of the entire BBU.)
Backup Duration
Even a new battery with high capacity can only retain the cache content for a limited interval (typically 72 hours). If the power failure were to last several days, the cache content might be lost despite a new battery.
Examples
3ware RAID Controllers
3ware provides the ability to perform a so-called "battery test" with their RAID controllers[1]. This test serves to determine the precise capacity of the battery and thereby determine an estimated value for the potential backup duration during a power failure.
The objective of this test is the determination of the most precise estimated value possible. For this, the battery will first be fully charged. After that, a complete discharge cycle will be started. At the end of this test, the battery will automatically be completely re-charged. The entire process typically takes eight to twelve hours. 3ware recommends performing this test every four weeks.
Important note: During the entire test and the subsequent re-charging of the battery, the RAID controller’s cache will be deactivated. Because it might lead to a limitation on performance, this test should only be performed at times, when there will be a minimal load.
For example, the status of the BBU can be requested through the 3ware command line interface (CLI).
root@testserver:~# tw_cli /c0 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-1 OK - - - 34.4482 ON OFF u1 SPARE OK - - - 34.4684 - OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 34.47 GB 72303840 WD-WMANT1051720 p1 OK u0 34.47 GB 72303840 WD-WMANT1051894 p2 OK u1 34.47 GB 72303840 WD-WMAKH1083404 p3 NOT-PRESENT - - - - Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 06-Apr-2009 root@testserver:~#
You will find additional information about the potential BBU status of 3ware RAID controllers in the 3ware BBU States and Their Effects on Cache Settings article.
Adaptec RAID Controllers
With Adaptec, the battery state can also be queried. For this, the following options are available:
- Request through the Adaptec CLI,
arcconf
- Request through the Adaptec Storage Manager (ASM)
- Request through the RAID controller’s BIOS
As long as the battery capacity can retain the cache content for at least 24 hours during a power failure, the RAID controller’s cache will remain in write-back mode (thus, active). At lower capacity levels, the cache will be placed in write-through mode (to the extent that the cache has not been permanently (meaning without regard for the BBU state) reset to write-back mode).
Ideal State Status
Requesting Status through the Adaptec CLI
The final lines (in the section below controller battery information) are relevant in the report from arcconf GETCONFIG 1 AD
.
linux-k3oa:~ # /usr/StorMan/arcconf GETCONFIG 1 AD Controllers found: 1 ---------------------------------------------------------------------- Controller information ---------------------------------------------------------------------- Controller Status : Optimal Channel description : SAS/SATA Controller Model : Adaptec 5805 Controller Serial Number : 8C35109557F Physical Slot : 6 Temperature : 70 C/ 158 F (Normal) Installed memory : 512 MB Copyback : Disabled Background consistency check : Disabled Automatic Failover : Enabled Global task priority : High Performance Mode : Default/Dynamic Stayawake period : Disabled Spinup limit internal drives : 0 Spinup limit external drives : 0 Defunct disk drive count : 0 Logical devices/Failed/Degraded : 2/0/0 -------------------------------------------------------- Controller Version Information -------------------------------------------------------- BIOS : 5.2-0 (16343) Firmware : 5.2-0 (16343) Driver : 1.1-5 (2456) Boot Flash : 5.2-0 (16343) -------------------------------------------------------- Controller Battery Information -------------------------------------------------------- Status : Optimal Over temperature : No Capacity remaining : 99 percent Time remaining (at current draw) : 3 days, 7 hours, 16 minutes Command completed successfully. linux-k3oa:~ #
Request through the Adaptec Storage Manager (ASM)
Request through the RAID controller’s BIOS
Charging State Status
In comparison with the system above, the time remaining is less, because the battery will not be completely charged.
linux-kfqr:~ # /usr/StorMan/arcconf GETCONFIG 1 AD Controllers found: 1 ---------------------------------------------------------------------- Controller information ---------------------------------------------------------------------- Controller Status : Optimal Channel description : SAS/SATA Controller Model : Adaptec 5805 Controller Serial Number : 8C3510954C9 Physical Slot : 6 Temperature : 71 C/ 159 F (Normal) Installed memory : 512 MB Copyback : Disabled Background consistency check : Disabled Automatic Failover : Enabled Global task priority : High Performance Mode : Default/Dynamic Stayawake period : Disabled Spinup limit internal drives : 0 Spinup limit external drives : 0 Defunct disk drive count : 0 Logical devices/Failed/Degraded : 2/0/0 -------------------------------------------------------- Controller Version Information -------------------------------------------------------- BIOS : 5.2-0 (16343) Firmware : 5.2-0 (16343) Driver : 1.1-5 (2456) Boot Flash : 5.2-0 (16343) -------------------------------------------------------- Controller Battery Information -------------------------------------------------------- Status : Charging Over temperature : No Capacity remaining : 73 percent Time remaining (at current draw) : 2 days, 10 hours, 57 minutes Command completed successfully. linux-kfqr:~ #
Other Status States
Additional potential status states include:
- Not Installed
- Failed
Areca RAID Controllers
Areca also offers the ability to request the state through the CLI.
[root@testserver ~]# ./cli64 hw info Physical Hardware Information The Hardware Monitor Information =========================================== Fan#1 Speed (RPM) : 2673 Battery Status : 100% HDD #1 Temp. : 0 HDD #2 Temp. : 0 HDD #3 Temp. : 0 HDD #4 Temp. : 0 =========================================== GuiErrMsg<0x00>: Success. [root@testserver ~]#
Areca describes the following approach to checking the proper functionality of the BBM in their documentation[2] (however, we recommend this approach only for test systems. For production system, we really recommend replacing the battery in case of doubt).
- Write a large file, 5 gigabytes for example
- Once the write process has completed, pull the plug immediately.
- Check the BBM status. It should beep every couple of seconds.
- Re-start the system and open the controller’s BIOS using the Tab or F6 keys.
- Check the controller’s event log from the controller’s BIOS. An entry indicated controller boot up with power recovered should appear in the log.
As noted above, we recommend against this method of testing for production systems.
References
- ↑ 3ware SAS/SATA RAID Software User Guide page 203 (section Testing Battery Capacity)
- ↑ Areca SATA RAID Cards USER Manual page 144 (the Battery Functionality Test Procedure section)
Author: Werner Fischer Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.
|