Ceph - increase maximum recovery & backfilling speed
If a Ceph recovery / backfilling takes place due to a node or volume failure, you may want it to recover at maximum speed. For this there are two options under Ceph, which one can set with the data media (OSDs), so that these can accept and maximally process several Backfilling Requests at the same time.
Important: Changing the parameters can significantly increase the load on the system and lead to undesirable behavior such as a deterioration in performance (compute, storage).
Relevant parameters
The relevant parameters can be found in the Ceph documentation.[1]
## osd_max_backfills The maximum number of backfills allowed to or from a single OSD. Note that this is applied separately for read and write operations. ## osd_recovery_max_active The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster. This value is only used if it is non-zero. Normally it is 0, which means that the hdd or ssd values (below) are used, depending on the type of the primary device backing the OSD. ## osd_recovery_max_active_hdd The number of active recovery requests per OSD at one time, if the primary device is rotational. default: 3 ## osd_recovery_max_active_ssd The number of active recovery requests per OSD at one time, if the primary device is non-rotational (i.e., an SSD). default: 10
Find out current parameters
As an example the OSD.1 is used here, the following commands must be executed on the Ceph node where the OSD.1 is located. If you are not sure, you can get a listing of all OSDs using ceph osd tree. Here you can see the OSD.1 is located on the host PMX1.
root@PMX1:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 13.09845 root default -3 4.36615 host PMX1 0 nvme 0.72769 osd.0 up 1.00000 1.00000 1 nvme 0.72769 osd.1 up 1.00000 1.00000 2 nvme 0.72769 osd.2 up 1.00000 1.00000 3 nvme 0.72769 osd.3 up 1.00000 1.00000 4 nvme 0.72769 osd.4 up 1.00000 1.00000 5 nvme 0.72769 osd.5 up 1.00000 1.00000 -5 4.36615 host PMX2 6 nvme 0.72769 osd.6 up 1.00000 1.00000 7 nvme 0.72769 osd.7 up 1.00000 1.00000 8 nvme 0.72769 osd.8 up 1.00000 1.00000 9 nvme 0.72769 osd.9 up 1.00000 1.00000 10 nvme 0.72769 osd.10 up 1.00000 1.00000 11 nvme 0.72769 osd.11 up 1.00000 1.00000 -7 4.36615 host PMX3 12 nvme 0.72769 osd.12 up 1.00000 1.00000 13 nvme 0.72769 osd.13 up 1.00000 1.00000 14 nvme 0.72769 osd.14 up 1.00000 1.00000 15 nvme 0.72769 osd.15 up 1.00000 1.00000 16 nvme 0.72769 osd.16 up 1.00000 1.00000 17 nvme 0.72769 osd.17 up 1.00000 1.00000
root@PMX1:~# ceph daemon osd.1 config get osd_max_backfills { "osd_max_backfills": "1" } root@PMX1:~# ceph daemon osd.1 config get osd_recovery_max_active { "osd_recovery_max_active": "0" }
Set new parameters
Using these commands, you can adjust the parameters accordingly. The parameter values can vary depending on the system, e.g. a system with NVMes can have more OSD max backfills than a system with HDDs. These commands change the parameters on all available OSDs in the cluster.
ceph tell 'osd.*' injectargs '--osd-max-backfills 16' ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
Reset parameters
If necessary, you may want to set the default parameters again after the successful recovery. For this purpose, you should use the parameters from the section "Finding out the current parameters".
ceph tell 'osd.*' injectargs '--osd-max-backfills 1' ceph tell 'osd.*' injectargs '--osd-recovery-max-active 0'
References
- ↑ OSD-Config-Reference - Backfilling (docs.ceph.com)
Author: Jonas Sterr Jonas Sterr has been working for Thomas-Krenn for several years. Originally employed as a trainee in technical support and then in hosting (formerly Filoo), Mr. Sterr now mainly deals with the topics of storage (SDS / Huawei / Netapp), virtualization (VMware, Proxmox, HyperV) and network (switches, firewalls) in product management at Thomas-Krenn.AG in Freyung.
|