Linux Software RAID
Linux Software RAID (often called mdraid or MD/RAID) makes the use of RAID possible without a hardware RAID controller. For this purpose, the storage media used for this (hard disks, SSDs and so forth) are simply connected to the computer as individual drives, somewhat like the direct SATA ports on the motherboard.
In contrast with software RAID, hardware RAID controllers generally have a built-in cache (often 512 MB or 1 GB), which can be protected by a BBU or ZMCP. With both hardware and software RAID arrays, it would be a good idea to deactivate write caches for hard disks, in order to avoid data loss during power failures. SSDs with integrated condensers, which write the contents of the cache to the FLASH PROM during power failures, are the exception to this (such as the Intel 320 Series SSDs).
- 1 Functional Approach
- 2 RAID Superblock
- 3 Creating a RAID Array
- 4 Creating a RAID 1
- 5 Deleting a RAID Array
- 6 Roadmap
- 7 References
- 8 Additional Information
- RAID 0
- RAID 1
- RAID 4
- RAID 5
- RAID 6
- RAID 10
A Linux software RAID array will store all of the necessary information about a RAID array in a superblock. This information will be found in different positions depending the metadata version.
Superblock Metadata Version 0.90
The 0.90 version superblock is 4,096 bytes long and located in a 64 KiB-aligned block at the end of the device. Depending on the device size, the superblock can first start at 128 KiB before the end of the device or 64 KiB before the end of the device at the latest. To calculate the address of the superblock, the device size must be rounded down to the nearest 64 KiB and then 64 KiB deducted from the result.
Version 0.90 Metadata Limitations:
- 28 devices maximum in one array
- each device may be a maximum of 2 TiB in size
- No support for bad-block-management
Superblock Metadata Version 1.*
The position of the superblock depends on the version of the metadata:
- Version 1.0: The superblock is located at the end of the device.
- Version 1.1: The superblock is located at the beginning of the device.
- Version 1.2: The superblock is 4 KiB after the beginning of the device.
Creating a RAID Array
The following example will show the creation of a RAID 1 array. A Fedora 15 live system will be used in the example.
The software RAID array will span across /dev/sda1 and /dev/sdb1. These partitions will have the Linux raid autodetect type (fd):
[root@localhost ~]# fdisk -l /dev/sda Disk /dev/sda: 120.0 GB, 120034123776 bytes 139 heads, 49 sectors/track, 34421 cylinders, total 234441648 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x2d0f2eb3 Device Boot Start End Blocks Id System /dev/sda1 2048 20973567 10485760 fd Linux raid autodetect [root@localhost ~]# fdisk -l /dev/sdb Disk /dev/sdb: 120.0 GB, 120034123776 bytes 139 heads, 49 sectors/track, 34421 cylinders, total 234441648 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xe69ef1f5 Device Boot Start End Blocks Id System /dev/sdb1 2048 20973567 10485760 fd Linux raid autodetect [root@localhost ~]#
Creating a RAID 1
[root@localhost ~]# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. [root@localhost ~]#
The progress of the initialization process can be requested through the proc file system or mdadm:
[root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1 sda1 10484664 blocks super 1.2 [2/2] [UU] [========>............] resync = 42.3% (4440832/10484664) finish=0.4min speed=201856K/sec unused devices: <none> [root@localhost ~]#
[root@localhost ~]# mdadm -D /dev/md0 /dev/md0: Version : 1.2 Creation Time : Tue Jul 26 07:49:50 2011 Raid Level : raid1 Array Size : 10484664 (10.00 GiB 10.74 GB) Used Dev Size : 10484664 (10.00 GiB 10.74 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Tue Jul 26 07:50:23 2011 State : active, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Rebuild Status : 62% complete Name : localhost.localdomain:0 (local to host localhost.localdomain) UUID : 3a8605c3:bf0bc5b3:823c9212:7b935117 Events : 11 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 [root@localhost ~]#
Testing the Alignment
The version 1.2 metadata will be used in the example. The metadata is thus close to the beginning of the device with the actual data after it, however aligned at the 1 MiB boundary (Data offset: 2048 sectors, a sector has 512 bytes):
[root@localhost ~]# mdadm -E /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 3a8605c3:bf0bc5b3:823c9212:7b935117 Name : localhost.localdomain:0 (local to host localhost.localdomain) Creation Time : Tue Jul 26 07:49:50 2011 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 20969472 (10.00 GiB 10.74 GB) Array Size : 20969328 (10.00 GiB 10.74 GB) Used Dev Size : 20969328 (10.00 GiB 10.74 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 10384215:18a75991:4f09b97b:1960b8cd Update Time : Tue Jul 26 07:50:43 2011 Checksum : ea435554 - correct Events : 18 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) [root@localhost ~]#
Depending on the version of mdadm the size of the data offset varies:
- Note: mdadm's current development version allows to specify the size of the data offset manually (for --create, --grow, not for --add): Add --data-offset flag for Create and Grow
- since mdadm-3.2.5: 128 MiB Data Offset (262144 sectors), if possible: super1: fix choice of data_offset. (14.05.2012): While it is nice to set a high data_offset to leave plenty of head room it is much more important to leave enough space to allow of the data of the array. So after we check that sb->size is still available, only reduce the 'reserved', don't increase it. This fixes a bug where --adding a spare fails because it does not have enough space in it.
- since mdadm-3.2.4: 128 MiB Data Offset (262144 sectors) super1: leave more space in front of data by default. (04.04.2012): The kernel is growing the ability to avoid the need for a backup file during reshape by being able to change the data offset. For this to be useful we need plenty of free space before the data so the data offset can be reduced. So for v1.1 and v1.2 metadata make the default data_offset much larger. Aim for 128Meg, but keep a power of 2 and don't use more than 0.1% of each device. Don't change v1.0 as that is used when the data_offset is required to be zero.
- since mdadm-3.1.2: 1 MiB Data Offset (2048 sectors) super1: encourage data alignment on 1Meg boundary (03.03.2010): For 1.1 and 1.2 metadata where data_offset is not zero, it is important to align the data_offset to underlying block size. We don't currently have access to the particular device in avail_size so just try to force to a 1Meg boundary. Also default 1.x metadata to 1.2 as documented. (see also Re: Mixing mdadm versions)
Adjusting the Sync Rate
A RAID volume can be used immediately after creation, even during synchronization. However, this reduces the rate of synchronization.
In this example directly accessing a RAID 1 array spanning two SSDs (without partitions on /dev/sda and /dev/sdb), synchronization starts at roughly 200 MB/s and drops to 2.5 MB/s as soon as data has been written to the RAID 1 array’s file system:
[root@localhost ~]# dd if=/dev/urandom of=/mnt/testfile-1-1G bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 115.365 s, 9.3 MB/s [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb sda 117219728 blocks super 1.2 [2/2] [UU] [============>........] resync = 63.3% (74208384/117219728) finish=279.5min speed=2564K/sec unused devices: <none> [root@localhost ~]#
The synchronization can be accelerated by manually increasing the sync rate:
[root@localhost ~]# cat /proc/sys/dev/raid/speed_limit_max 200000 [root@localhost ~]# cat /proc/sys/dev/raid/speed_limit_min 1000 [root@localhost ~]# echo 100000 > /proc/sys/dev/raid/speed_limit_min [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb sda 117219728 blocks super 1.2 [2/2] [UU] [============>........] resync = 64.2% (75326528/117219728) finish=41.9min speed=16623K/sec unused devices: <none> [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb sda 117219728 blocks super 1.2 [2/2] [UU] [=============>.......] resync = 66.3% (77803456/117219728) finish=7.4min speed=88551K/sec unused devices: <none> [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb sda 117219728 blocks super 1.2 [2/2] [UU] [=============>.......] resync = 66.4% (77938688/117219728) finish=6.4min speed=101045K/sec unused devices: <none> [root@localhost ~]#
Deleting a RAID Array
If a RAID volume is no longer required, it can be deactivated using the following command:
[root@localhost ~]# mdadm --stop /dev/md0 mdadm: stopped /dev/md0 [root@localhost ~]#
The superblock for the individual devices (in this case, /dev/sda1 and /dev/sdb1 from the example above) will be deleted by the following commands. By doing this, you can re-use these partitions for new RAID arrays.
[root@localhost ~]# mdadm --zero-superblock /dev/sda1 [root@localhost ~]# mdadm --zero-superblock /dev/sdb1
Neil Brown published a roadmap for MD/RAID for 2011 on his blog:
- MD/RAID road-map 2011 (neil.brown.name)
Support for the ATA trim feature for SSDs (discard-support von Linux Software RAID) is periodically discussed. However this feature is still an the end of the list for future features (by end of June 2011):
- Re: Software RAID and TRIM (Neil Brown, linux-raid Mailing List), see also Complete discussion regarding software RAID and TRIM from June & July of 2011
- mdadm (en.wikipedia.org)
- ALERT: md/raid6 data corruption risk. (lkml.org, Neil Brown, 18.08.2014)
- RAID superblock formats - The version-0.90 Superblock Format (Linux Raid Wiki)
- does 3.1 offer (2): Storage and File Systems: Software RAID and Device Mapper (heise Open Kernel Log)
- RAID superblock formats - Sub-versions of the version-1 superblock (Linux Raid Wiki)
- SSDs vs. md/sync_speed_(min|max) (Lutz Vieweg, linux-raid mailing list, 18.07.2011)
- The Software-RAID HOWTO
- Linux Raid Wiki (raid.wiki.kernel.org)
- RAID Setup (raid.wiki.kernel.org)
- Workshop – Setting up a software RAID array under Linux (tecchannel.de, 17.04.2011)
- Quick HOWTO : Ch26 : Linux Software RAID (linuxhomenetworking.com)
- linux-raid Mailing List
- Ubuntu server installation using a software RAID array