VDO - Virtual Data Optimizer

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

VDO (Virtual Data Optimizer) is a Linux device mapper target that offers inline deduplication, compression and thin provisioning functions on block level. VDO is managed via LVM and can be integrated in every existing storage stack.

Central functions

VDO offers central functions:

  • deduplication
  • compression
  • thin provisioning

VDO works for applications fully transparent. All three functions are invisible for applications that use the storage. The applications read and write blocks exactly as if VDO was not present.

Deduplication

Deduplication reduces the required storage by removing copies of duplicate blocks.

If data is saved several times, existing data is recognized by inline deduplication. Instead of writing the data one more time, it is only a reference to the original block is saved.

Example:

  • A picture (Auto.jpg, 5 MB Dateigröße) has already been saved.
  • This picture is now used in a presentation and a PDF is generated. The size of the PDF is 5 MB larger than before.
  • However, the actual physical storage, that is occupied, does not increase by inserting the picture. Deduplication only writes references to the existing data blocks of the image.


VDO manages an assignment of logical block addresses to physical block addresses on the memory layer under VDO. After deduplication, several logical block addresses can be mapped to the same physical block address. These are referred to as shared blocks.


If a shared block is overwritten, a new physical block is assigned to store the new block data.

Compression

VDO uses a lossless data compression for further data reduction. Individual blocks are minimized with the help of coding algorithms.

In VDO compression, the blocks are compressed using the LZ4 algorithm[1] and, if possible, compressed in such a way that several compressed blocks fit into a single 4 KB block on the underlying storage space.

LZ4 offers a good compromise between speed and compression ratio. As a rule, it has a lower (i.e. worse) compression ratio than the similar LZO algorithm. On the other hand, LZ4 has both higher compression and decompression speeds. In addition to VDO, LZ4 is also used, for example, with Open-ZFS[2] or SquashFS[3].

Thin-provisioning

Thin-provisioning manages the assignment of logical block addresses, which are presented by VDO, to the addresses of the underlying memory. All blocks with zeros are eliminated at the same time.

Terms

Slab

The physical storage of VDO volume is divided into a series of slabs ("disks"):[4]

  • Every slab is a contiguous area of the physical storage.
  • All slabs of a data carrier have the same size (any potency of 2 between 128 MB up to 32 GB).
  • The standard size of a slab is 2 GB to facilitate the evaluation of VDO on smaller test systems.
  • A single VDO data carrier can have up to 8.192 slabs. In the standard configuration with 2 GB slabs, the maximum permitted physical memory is therefore 16 TB.
  • When using 32 GB slabs, the maximum permitted physical memory is 256 TB.
  • VDO always reserves at least one entire slab for metadata so that the reserved slab can not be used for the storage of user data.
  • The slab size has no influence on the performance of the VDO volume.
feature value
size of a slab 128 MB, 256 MB, 512 MB, 1 GB, 2 GB, 4 GB, 8 GB, 16 GB, 32 GB
number of slabs 2 to 8.192
maximum permitted physical storage 256 TB (for 8.192 slabs á 32 GB)

Universal deduplication service (UDS)

The deduplication index (called UDS) is used for finding duplicate data.

VDO was developed to efficiently identify duplicate data. It uses some common features of duplicate data (from empirical observations):[5]

  1. In most data sets that contain a significant amount of duplicates, the duplicates are usually localized in time. If a duplicate is discovered, it is more likely that further duplicates will be discovered and that these duplicates were written at approximately the same time. For this reason, the data records in the index are stored in chronological order.
  2. The second insight is that newer data tends to duplicate more recent data than older data and that it generally makes less and less sense to look further into the past. When the index is full, the oldest data records should therefore be deleted to make room for new data records.

Index design is about reducing storage space through deduplication. A compromise must be found between the memory space saved and the effort involved. It is therefore sufficient to remove most redundancies.

Requirements meta data

Each vdo volume reserves 3 GB of storage space for metadata, or more depending on the configuration.[6]It makes sense to check whether the space saved by deduplication and compression is not offset by the need for metadata. An estimate of the space saving for a specific data set can be calculated using the vdo estimator tool, which is available at the following address:

Placement of LVM-VDO in storage stack

LVM-VDO in use on a virtualization server. If several VMs with the same operating system (and therefore the same data) are created, this redundant data only occupies physical space once. Image source: Red Hat

Like other device mapper targets, VDO fits into the Linux Storage Stack.

As VDO performs a deduplication, compression and thin provisioning, these functions are automatically used for all layers based on VDO.

To minimize negative effects, the following targets should only be used below VDO:

  • DM Crypt
  • Software RAID (LVM oder MD RAID)

Since LVM-VDO represents its deduplicated storage as a regular logical volume (LV), it can be used with standard file systems and also as an iSCSI/FC/NVMe target.

Creation of LVM-VDO volume

The following example shows the creation of a virtual volume with 100 GB (with 40 GB of physical memory actually available).

lvcreate

With lvcreate --type vdo, a VDO volume can be created:

tk@lmde:~$ sudo lvcreate --type vdo --name vdo-test --size 40G --virtualsize 100G vg-sata
    The VDO volume can address 36.00 GB in 18 data slabs, each 2.00 GB.
    It can grow to address at most 16.00 TB of physical storage in 8192 slabs.
    If a larger maximum size might be needed, use bigger slabs.
  Logical volume "vdo-test" created.
tk@lmde:~$ lsblk 
NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                           8:0    0 223,6G  0 disk 
└─sda1                        8:1    0 223,6G  0 part 
  └─vg--sata-vpool0_vdata   254:0    0    40G  0 lvm  
    └─vg--sata-vpool0-vpool 254:1    0   100G  0 lvm  
      └─vg--sata-vdo--test  254:2    0   100G  0 lvm  
nvme0n1                     259:0    0 111,8G  0 disk 
├─nvme0n1p1                 259:1    0   286M  0 part /boot/efi
├─nvme0n1p2                 259:2    0   8,2G  0 part [SWAP]
└─nvme0n1p3                 259:3    0 103,3G  0 part /

Creation of file system

The creation of a Ext4 data system takes without further configuration 19 seconds on the testing system:

tk@lmde:~$ time sudo mkfs.ext4 /dev/vg-sata/vdo-test 
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done                            
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: 801ce445-1bd4-4414-b41f-e694fa0704ce
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done   

real	0m18,591s
user	0m0,010s
sys	0m0,001s

The reason for this long duration is the limited performance at Discard/Trim operations.[7]When using the option mkfs.ext4 -E nodiscard, the creation takes < 1 second.

Statistics

The following example shows that only a few additional physical blocks are occupied when copying data:

tk@lmde:~$ sudo mount /dev/mapper/vg--sata-vdo--test /mnt
tk@lmde:~$ sudo dd if=/dev/urandom of=/mnt/testfile-1.bin bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 2,03686 s, 527 MB/s
tk@lmde:~$ sync
tk@lmde:~$ sudo vdostats 
Device                1k-blocks      Used Available Use% Space saving%
vg--sata-vpool0-vpool  41943040   5274020  36669020  13%           67%
tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-2.bin 
tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-3.bin 
tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-4.bin 
tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-5.bin 
tk@lmde:~$ sync
tk@lmde:~$ sudo vdostats 
Device                1k-blocks      Used Available Use% Space saving%
vg--sata-vpool0-vpool  41943040   5279324  36663716  13%           85%

History

VDO was originally developed by Permabit Technology as proprietary kernel modules and userspace tools. Rad Hat bought the company and technology in July 2017 and relicensed the software under the GPL.[8]The kernel module was finally integrated as dm-vdo in the Linux Kernel 6.9. It was used as out-of-tree modul kvdo.

More information

  • vdo (github.com/dm-vdo/vdo)

References

  1. LZ4 (compression algorithm) (en.wikipedia.org)
  2. OpenZFS Compression and Encryption (klarasystems.com)
  3. SquashFS (docs.kernel.org)
  4. Slab size in VDO (docs.redhat.com)
  5. Design of dm-vdo (docs.kernel.org/admin-guide)
  6. dm-vdo (docs.kernel.org/admin-guide)
  7. man 7 lvmvdo (man7.org) The performance of TRIM/Discard operations is slow for large volumes of VDO type. Please try to avoid sending discard requests unless necessary because it might take considerable amount of time to finish the discard operation.
  8. Red Hat Acquires Permabit Assets, Eases Barriers to Cloud Portability with Data Deduplication Technology (redhat.com, 31.07.2017)


Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Related articles

ATA exception Emask
Cloning a Windows installation with Clonezilla
Linux I/O Performance measurements with iotop