Qemu drive cache option

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

Quick Emulator (QEMU) is an Open Source emulator that emulates virtual hardware. QEMU can be used in combination with KVM and is used in this way in Proxmox. In this article, the QEMU drive option "cache" will be explained. The configuration of a QEMU drive contains the generation of a block driver node (backend) as well as a guest device.

cache option

The option "cache" controls how the host cache is used for the access on block data.

The "cache" connection is a connection that determines the three options cache.direct, cache.no-flush (both as in -blockdev) as well as cache.writeback (-device). The five cache modes writeback, none, writethrough, directsync and unsafe correspond to the following settings:[1]

Cache -device options -blockdev options Notes[2]
Name QEMU Name Proxmox VE GUI cache.writeback cache.direct cache.no-flush Ensuring data integrity after a sudden host failure Writing performance
writeback Write back ✔ (on) - - High, if the guest operating system/file system writes regularly the cache-content on the disk (flush). This is normally performed automatically by modern file systems. See also Ext4 Write Barriers. Pagecache of host is used. high
none No cache ✔ (on) ✔ (on) - Pagecache of host is not used. However, data may still be stored in other caches (e.g., HDD cache). Therefore, the guest operating system must still perform regular flushes.
writethrough Write through - - - Very high, since QEMU with "writethrough" cache mode never stores data in a "write cache", but calls the fsync() function after each write operation. Pagecache of the host is used. low, as the fsync() function is called up after every writing process.
directsync Direct sync - ✔ (on) - Pagecache des Hosts wird nicht verwendet.
unsafe Write back (unsafe) ✔ (on) - ✔ (on) No guarantee of data integrity! High risk of data loss!

This caching mode should only be used for temporary data, where data loss is not a problem. This mode can only be helpful for accelerating the guest installation. However, it should be absolutely switched to another caching mode in production environments.[3]

Pagecache of host is used. highest

The standard mode in QEMU is cache=writeback. In Proxmox VE, none (no cache) is used as standard.

cache.writeback

The mode cache.writeback=on is used by default. Data write operations are reported as completed as soon as the data is available in host pagecache. This is safe as long as your guest operating system ensures that the hard disk caches are emptied correctly if necessary. If your guest operating system does not proceed volatile hard disk writing caches correctly, your host or the power supply fails, it may come to data damages in your guest operating system.

For such guests, the use of cache.writeback=off should be taken into consideration. This means that the host pagecache is used for reading and writing data, but the writing notification is only then sent to the guest after QEMU made sure that every writing process has been emptied on the hard disk. Please note that this has a significant impact on the performance.[4]

Disk (settings Proxmox VE GUI) Request cache via Result at cache.writeback=on Result at cache.writeback=off Notes
IDE hdparm -I /dev/sdc | grep "Write cache" * Write cache Write cache (no asterisk in front of it) hdparm executes a ATA IDENTIFY DEVICE command, to request settings.
SATA
VirtIO Block cat /sys/block/vda/queue/write_cache write back write through The VirtIO block driver provides the information via sysfs.
SCSI sdparm --get=WCE /dev/sda WCE = 1 WCE = 0 sdparm performs a SCSI MODE SENSE command, to request settings. WCE stands for "Writeback Cache Enable".

chache.direct

cache.direct=on bypasses the host pagecache. This is an attempt to execute the disk-I/O directly in the memory of the guest. QEMU may continue to perform a internal copy of the data.

cache.no-flush

QEMU announces that cache.no-flush (Attention - risk of data loss! - only use when data integrity is not important, for example in pure tests) never has to write data on the hard disk, but to safe everything in the cache. If something goes wrong, for example if your host does not have any power, the hard disk was accidentally disconnected or else, your image is likely to get useless.

Tests

The following tests have been performed on a lvmthin volume. These only support raw (no qcow2).[5] For background information on cache=none with qcow2, please refer to the article "Understanding QCOW2 Risks with QEMU cache=none in Proxmox".[6]

With the diskchecker.pl Perl Skript, we have tested, if data gets lost during a power outage. You will find information on this script here: http://brad.livejournal.com/2116715.html

Debian 13 VM with raw
Cache settings diskchecker.pl results
Default (no cache) Total errors: 0
Write back (unsafe) Total errors: 2786
Write through Total errors: 0
Write back Total errors: 0
Direct sync Total errors: 0

Default (No cache)

tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 create test_file 1000
  diskchecker: running 1 sec, 0.46% coverage of 1000 MB (293 writes; 293/s)
  diskchecker: running 2 sec, 1.06% coverage of 1000 MB (684 writes; 342/s)
  diskchecker: running 3 sec, 1.70% coverage of 1000 MB (1099 writes; 366/s)
  diskchecker: running 4 sec, 2.33% coverage of 1000 MB (1507 writes; 376/s)
  diskchecker: running 5 sec, 2.93% coverage of 1000 MB (1907 writes; 381/s)
  diskchecker: running 6 sec, 3.58% coverage of 1000 MB (2326 writes; 387/s)
  diskchecker: running 7 sec, 4.20% coverage of 1000 MB (2739 writes; 391/s)
  diskchecker: running 8 sec, 4.81% coverage of 1000 MB (3152 writes; 394/s)
  diskchecker: running 9 sec, 5.40% coverage of 1000 MB (3553 writes; 394/s)
  diskchecker: running 10 sec, 6.03% coverage of 1000 MB (3976 writes; 397/s)
[...]
tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 verify test_file 
 verifying: 0.03%
 verifying: 46.22%
 verifying: 100.00%
Total errors: 0
tk@debian13-1:~$

Write back (unsafe)

tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 create test_file_unsafe 1000
  diskchecker: running 1 sec, 0.45% coverage of 1000 MB (286 writes; 286/s)
  diskchecker: running 2 sec, 1.22% coverage of 1000 MB (782 writes; 391/s)
  diskchecker: running 3 sec, 2.06% coverage of 1000 MB (1328 writes; 442/s)
  diskchecker: running 4 sec, 2.81% coverage of 1000 MB (1820 writes; 455/s)
  diskchecker: running 5 sec, 3.55% coverage of 1000 MB (2306 writes; 461/s)
  diskchecker: running 6 sec, 4.25% coverage of 1000 MB (2778 writes; 463/s)
  diskchecker: running 7 sec, 5.04% coverage of 1000 MB (3302 writes; 471/s)
  diskchecker: running 8 sec, 5.85% coverage of 1000 MB (3863 writes; 482/s)
  diskchecker: running 9 sec, 6.73% coverage of 1000 MB (4466 writes; 496/s)
  diskchecker: running 10 sec, 7.50% coverage of 1000 MB (4984 writes; 498/s)
[...]
tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 verify test_file_unsafe 
 verifying: 0.02%
  Error at page 26, 4 seconds before end.
  Error at page 34, 6 seconds before end.
  Error at page 49, 3 seconds before end.
  Error at page 123, 0 seconds before end.
  Error at page 166, 0 seconds before end.
  Error at page 167, 6 seconds before end.
  Error at page 169, 1 seconds before end.
  Error at page 199, 1 seconds before end.
  Error at page 200, 2 seconds before end.
  Error at page 214, 4 seconds before end.
[...]
 verifying: 100.00%
Total errors: 2786
Histogram of seconds before end:
     0  420
     1  478
     2  339
     3  285
     4  303
     5  351
     6  354
     7  256
tk@debian13-1:~$

Write through

tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 create test_file_writethrough 1000
  diskchecker: running 1 sec, 0.36% coverage of 1000 MB (234 writes; 234/s)
  diskchecker: running 2 sec, 0.92% coverage of 1000 MB (587 writes; 293/s)
  diskchecker: running 3 sec, 1.45% coverage of 1000 MB (929 writes; 309/s)
  diskchecker: running 4 sec, 1.97% coverage of 1000 MB (1268 writes; 317/s)
  diskchecker: running 5 sec, 2.49% coverage of 1000 MB (1612 writes; 322/s)
  diskchecker: running 6 sec, 3.02% coverage of 1000 MB (1957 writes; 326/s)
  diskchecker: running 7 sec, 3.53% coverage of 1000 MB (2295 writes; 327/s)
  diskchecker: running 8 sec, 4.07% coverage of 1000 MB (2650 writes; 331/s)
  diskchecker: running 9 sec, 4.57% coverage of 1000 MB (2989 writes; 332/s)
  diskchecker: running 10 sec, 5.07% coverage of 1000 MB (3327 writes; 332/s)
[...]
tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 verify test_file_writethrough 
 verifying: 0.02%
 verifying: 56.67%
 verifying: 100.00%
Total errors: 0
tk@debian13-1:~$ 

Write back

tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 create test_file_writeback 1000
  diskchecker: running 1 sec, 0.45% coverage of 1000 MB (289 writes; 289/s)
  diskchecker: running 2 sec, 1.03% coverage of 1000 MB (671 writes; 335/s)
  diskchecker: running 3 sec, 1.62% coverage of 1000 MB (1051 writes; 350/s)
  diskchecker: running 4 sec, 2.21% coverage of 1000 MB (1438 writes; 359/s)
  diskchecker: running 5 sec, 2.82% coverage of 1000 MB (1831 writes; 366/s)
  diskchecker: running 6 sec, 3.40% coverage of 1000 MB (2224 writes; 370/s)
  diskchecker: running 7 sec, 3.98% coverage of 1000 MB (2611 writes; 373/s)
  diskchecker: running 8 sec, 4.56% coverage of 1000 MB (3005 writes; 375/s)
  diskchecker: running 9 sec, 5.15% coverage of 1000 MB (3397 writes; 377/s)
  diskchecker: running 10 sec, 5.73% coverage of 1000 MB (3788 writes; 378/s)
[...]
tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 verify test_file_writeback 
 verifying: 0.03%
 verifying: 47.70%
 verifying: 100.00%
Total errors: 0
tk@debian13-1:~$ 

Direct sync

tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 create test_file_directsync 1000
  diskchecker: running 1 sec, 0.38% coverage of 1000 MB (246 writes; 246/s)
  diskchecker: running 2 sec, 0.90% coverage of 1000 MB (579 writes; 289/s)
  diskchecker: running 3 sec, 1.49% coverage of 1000 MB (960 writes; 320/s)
  diskchecker: running 4 sec, 2.01% coverage of 1000 MB (1302 writes; 325/s)
  diskchecker: running 5 sec, 2.53% coverage of 1000 MB (1643 writes; 328/s)
  diskchecker: running 6 sec, 3.08% coverage of 1000 MB (2006 writes; 334/s)
  diskchecker: running 7 sec, 3.61% coverage of 1000 MB (2363 writes; 337/s)
  diskchecker: running 8 sec, 4.16% coverage of 1000 MB (2721 writes; 340/s)
  diskchecker: running 9 sec, 4.74% coverage of 1000 MB (3108 writes; 345/s)
  diskchecker: running 10 sec, 5.26% coverage of 1000 MB (3463 writes; 346/s)
[...]
tk@debian13-1:~$ ./diskchecker.pl -s 172.16.0.112 verify test_file_directsync 
 verifying: 0.01%
 verifying: 57.56%
 verifying: 100.00%
Total errors: 0
tk@debian13-1:~$ 

More information

References

  1. QEMU User Documentation (Manpage) - Block device options (www.qemu.org/docs) cache is “none”, “writeback”, “unsafe”, “directsync” or “writethrough” and controls how the host cache is used to access block data. This is a shortcut that sets the cache.direct and cache.no-flush options (as in -blockdev), and additionally cache.writeback, which provides a default for the write-cache option of block guest devices (as in -device).
  2. libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable (opendev.org/openstack/nova, 04.05.2019) The thing that makes 'writethrough' so safe against host crashes is that it never keeps data in a "write cache", but it calls fsync() after _every_ write. This is also what makes it horribly slow. But 'cache=none' doesn't do this and therefore doesn't provide this kind of safety. The guest OS must explicitly flush the cache in the right places to make sure data is safe on the disk. And OSes do that.
    So if 'cache=none' is safe enough for you, then 'cache=writeback' should be safe enough for you, too -- because both of them have the boolean 'cache.writeback=on'. The difference is only in 'cache.direct', but 'cache.direct=on' only bypasses the host kernel page cache and data could still sit in other caches that could be present between QEMU and the disk (such as commonly a volatile write cache on the disk itself).
  3. https://opendev.org/openstack/nova/src/commit/18a7dcb6e816e26e405259e981498f6a7bc71608/nova/conf/libvirt.py#L724
  4. QEMU User Documentation (Manpage) - Block device options (www.qemu.org/docs) By default, the cache.writeback=on mode is used. It will report data writes as completed as soon as the data is present in the host page cache. This is safe as long as your guest OS makes sure to correctly flush disk caches where needed. If your guest OS does not handle volatile disk write caches correctly and your host crashes or loses power, then the guest may experience data corruption.
    For such guests, you should consider using cache.writeback=off. This means that the host page cache will be used to read and write data, but write notification will be sent to the guest only after QEMU has made sure to flush each write to the disk. Be aware that this has a major impact on performance.
  5. https://forum.proxmox.com/threads/proxmox-disks-changed-from-qcow-to-raw-after-migration.96007/#post-417101
  6. Understanding QCOW2 Risks with QEMU cache=none in Proxmox (kb.blockbridge.com)


Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Create a Debian VM in Proxmox VE
Creation of SSH key under Windows
Proxmox VE Support-Subscriptions