VDO - Virtual Data Optimizer
VDO (Virtual Data Optimizer) is a Linux device mapper target that offers inline deduplication, compression and thin provisioning functions on block level. VDO is managed via LVM and can be integrated in every existing storage stack.
Central functions
VDO offers central functions:
- deduplication
- compression
- thin provisioning
VDO works for applications fully transparent. All three functions are invisible for applications that use the storage. The applications read and write blocks exactly as if VDO was not present.
Deduplication
Deduplication reduces the required storage by removing copies of duplicate blocks.
If data is saved several times, existing data is recognized by inline deduplication. Instead of writing the data one more time, it is only a reference to the original block is saved.
Example:
- A picture (Auto.jpg, 5 MB Dateigröße) has already been saved.
- This picture is now used in a presentation and a PDF is generated. The size of the PDF is 5 MB larger than before.
- However, the actual physical storage, that is occupied, does not increase by inserting the picture. Deduplication only writes references to the existing data blocks of the image.
VDO manages an assignment of logical block addresses to physical block addresses on the memory layer under VDO. After deduplication, several logical block addresses can be mapped to the same physical block address. These are referred to as shared blocks.
If a shared block is overwritten, a new physical block is assigned to store the new block data.
Compression
VDO uses a lossless data compression for further data reduction. Individual blocks are minimized with the help of coding algorithms.
In VDO compression, the blocks are compressed using the LZ4 algorithm[1] and, if possible, compressed in such a way that several compressed blocks fit into a single 4 KB block on the underlying storage space.
LZ4 offers a good compromise between speed and compression ratio. As a rule, it has a lower (i.e. worse) compression ratio than the similar LZO algorithm. On the other hand, LZ4 has both higher compression and decompression speeds. In addition to VDO, LZ4 is also used, for example, with Open-ZFS[2] or SquashFS[3].
Thin-provisioning
Thin-provisioning manages the assignment of logical block addresses, which are presented by VDO, to the addresses of the underlying memory. All blocks with zeros are eliminated at the same time.
Terms
Slab
The physical storage of VDO volume is divided into a series of slabs ("disks"):[4]
- Every slab is a contiguous area of the physical storage.
- All slabs of a data carrier have the same size (any potency of 2 between 128 MB up to 32 GB).
- The standard size of a slab is 2 GB to facilitate the evaluation of VDO on smaller test systems.
- A single VDO data carrier can have up to 8.192 slabs. In the standard configuration with 2 GB slabs, the maximum permitted physical memory is therefore 16 TB.
- When using 32 GB slabs, the maximum permitted physical memory is 256 TB.
- VDO always reserves at least one entire slab for metadata so that the reserved slab can not be used for the storage of user data.
- The slab size has no influence on the performance of the VDO volume.
| feature | value |
|---|---|
| size of a slab | 128 MB, 256 MB, 512 MB, 1 GB, 2 GB, 4 GB, 8 GB, 16 GB, 32 GB |
| number of slabs | 2 to 8.192 |
| maximum permitted physical storage | 256 TB (for 8.192 slabs á 32 GB) |
Universal deduplication service (UDS)
The deduplication index (called UDS) is used for finding duplicate data.
VDO was developed to efficiently identify duplicate data. It uses some common features of duplicate data (from empirical observations):[5]
- In most data sets that contain a significant amount of duplicates, the duplicates are usually localized in time. If a duplicate is discovered, it is more likely that further duplicates will be discovered and that these duplicates were written at approximately the same time. For this reason, the data records in the index are stored in chronological order.
- The second insight is that newer data tends to duplicate more recent data than older data and that it generally makes less and less sense to look further into the past. When the index is full, the oldest data records should therefore be deleted to make room for new data records.
Index design is about reducing storage space through deduplication. A compromise must be found between the memory space saved and the effort involved. It is therefore sufficient to remove most redundancies.
Requirements meta data
Each vdo volume reserves 3 GB of storage space for metadata, or more depending on the configuration.[6]It makes sense to check whether the space saved by deduplication and compression is not offset by the need for metadata. An estimate of the space saving for a specific data set can be calculated using the vdo estimator tool, which is available at the following address:
Placement of LVM-VDO in storage stack

Like other device mapper targets, VDO fits into the Linux Storage Stack.
As VDO performs a deduplication, compression and thin provisioning, these functions are automatically used for all layers based on VDO.
To minimize negative effects, the following targets should only be used below VDO:
- DM Crypt
- Software RAID (LVM oder MD RAID)
Since LVM-VDO represents its deduplicated storage as a regular logical volume (LV), it can be used with standard file systems and also as an iSCSI/FC/NVMe target.
Creation of LVM-VDO volume
The following example shows the creation of a virtual volume with 100 GB (with 40 GB of physical memory actually available).
lvcreate
With lvcreate --type vdo, a VDO volume can be created:
tk@lmde:~$ sudo lvcreate --type vdo --name vdo-test --size 40G --virtualsize 100G vg-sata
The VDO volume can address 36.00 GB in 18 data slabs, each 2.00 GB.
It can grow to address at most 16.00 TB of physical storage in 8192 slabs.
If a larger maximum size might be needed, use bigger slabs.
Logical volume "vdo-test" created.
tk@lmde:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 223,6G 0 disk
└─sda1 8:1 0 223,6G 0 part
└─vg--sata-vpool0_vdata 254:0 0 40G 0 lvm
└─vg--sata-vpool0-vpool 254:1 0 100G 0 lvm
└─vg--sata-vdo--test 254:2 0 100G 0 lvm
nvme0n1 259:0 0 111,8G 0 disk
├─nvme0n1p1 259:1 0 286M 0 part /boot/efi
├─nvme0n1p2 259:2 0 8,2G 0 part [SWAP]
└─nvme0n1p3 259:3 0 103,3G 0 part /
Creation of file system
The creation of a Ext4 data system takes without further configuration 19 seconds on the testing system:
tk@lmde:~$ time sudo mkfs.ext4 /dev/vg-sata/vdo-test mke2fs 1.47.0 (5-Feb-2023) Discarding device blocks: done Creating filesystem with 26214400 4k blocks and 6553600 inodes Filesystem UUID: 801ce445-1bd4-4414-b41f-e694fa0704ce Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872 Allocating group tables: done Writing inode tables: done Creating journal (131072 blocks): done Writing superblocks and filesystem accounting information: done real 0m18,591s user 0m0,010s sys 0m0,001s
The reason for this long duration is the limited performance at Discard/Trim operations.[7]When using the option mkfs.ext4 -E nodiscard, the creation takes < 1 second.
Statistics
The following example shows that only a few additional physical blocks are occupied when copying data:
tk@lmde:~$ sudo mount /dev/mapper/vg--sata-vdo--test /mnt tk@lmde:~$ sudo dd if=/dev/urandom of=/mnt/testfile-1.bin bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1,1 GB, 1,0 GiB) copied, 2,03686 s, 527 MB/s tk@lmde:~$ sync tk@lmde:~$ sudo vdostats Device 1k-blocks Used Available Use% Space saving% vg--sata-vpool0-vpool 41943040 5274020 36669020 13% 67% tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-2.bin tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-3.bin tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-4.bin tk@lmde:~$ sudo cp /mnt/testfile-1.bin /mnt/testfile-5.bin tk@lmde:~$ sync tk@lmde:~$ sudo vdostats Device 1k-blocks Used Available Use% Space saving% vg--sata-vpool0-vpool 41943040 5279324 36663716 13% 85%
History
VDO was originally developed by Permabit Technology as proprietary kernel modules and userspace tools. Rad Hat bought the company and technology in July 2017 and relicensed the software under the GPL.[8]The kernel module was finally integrated as dm-vdo in the Linux Kernel 6.9. It was used as out-of-tree modul kvdo.
More information
- vdo (github.com/dm-vdo/vdo)
References
- ↑ LZ4 (compression algorithm) (en.wikipedia.org)
- ↑ OpenZFS Compression and Encryption (klarasystems.com)
- ↑ SquashFS (docs.kernel.org)
- ↑ Slab size in VDO (docs.redhat.com)
- ↑ Design of dm-vdo (docs.kernel.org/admin-guide)
- ↑ dm-vdo (docs.kernel.org/admin-guide)
- ↑ man 7 lvmvdo (man7.org) The performance of TRIM/Discard operations is slow for large volumes of VDO type. Please try to avoid sending discard requests unless necessary because it might take considerable amount of time to finish the discard operation.
- ↑ Red Hat Acquires Permabit Assets, Eases Barriers to Cloud Portability with Data Deduplication Technology (redhat.com, 31.07.2017)
|
Author: Werner Fischer Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.
|

