Ceph - increase maximum recovery & backfilling speed

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

In this article, you will find the instructions for the maximisation of the recovery time of a Ceph recovery / backfilling after the failure of a node or data carrier. There are two options under Ceph, which can be set on the data carriers (OSDs), so that several backfilling requests can be accepted at the same time.

Important: The change of the parameters can increase the load on the system significantly and can lead to unwanted behaviour, for example a worsening of the compute or storage performance.

Relevant parameters

The relevant parameters can be found in the Ceph documentation.

## osd_max_backfills

The maximum number of backfills allowed to or from a single OSD. Note that this is applied separately for read and write operations.

## osd_recovery_max_active

The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster.

This value is only used if it is non-zero. Normally it is 0, which means that the hdd or ssd values (below) are used, depending on the type of the primary device backing the OSD.

## osd_recovery_max_active_hdd

The number of active recovery requests per OSD at one time, if the primary device is rotational.
default: 3

## osd_recovery_max_active_ssd

The number of active recovery requests per OSD at one time, if the primary device is non-rotational (i.e., an SSD).
default: 10

Finding out current parameters

The OSD.1 is used as example here. The following commands have to be executed on the Ceph node, on which the OSD.1 is located. If you are not sure, a list of all OSDs via ceph osd tree can be displayed

root@PMX1:~# ceph osd tree

ID  CLASS  WEIGHT    TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         13.09845  root default                             
-3          4.36615      host PMX1                            
 0   nvme   0.72769          osd.0       up   1.00000  1.00000
 1   nvme   0.72769          osd.1       up   1.00000  1.00000
 2   nvme   0.72769          osd.2       up   1.00000  1.00000
 3   nvme   0.72769          osd.3       up   1.00000  1.00000
 4   nvme   0.72769          osd.4       up   1.00000  1.00000
 5   nvme   0.72769          osd.5       up   1.00000  1.00000
-5          4.36615      host PMX2                            
 6   nvme   0.72769          osd.6       up   1.00000  1.00000
 7   nvme   0.72769          osd.7       up   1.00000  1.00000
 8   nvme   0.72769          osd.8       up   1.00000  1.00000
 9   nvme   0.72769          osd.9       up   1.00000  1.00000
10   nvme   0.72769          osd.10      up   1.00000  1.00000
11   nvme   0.72769          osd.11      up   1.00000  1.00000
-7          4.36615      host PMX3                            
12   nvme   0.72769          osd.12      up   1.00000  1.00000
13   nvme   0.72769          osd.13      up   1.00000  1.00000
14   nvme   0.72769          osd.14      up   1.00000  1.00000
15   nvme   0.72769          osd.15      up   1.00000  1.00000
16   nvme   0.72769          osd.16      up   1.00000  1.00000
17   nvme   0.72769          osd.17      up   1.00000  1.00000

Here you can see that the OSD.1 is located on the host PMX1.

 
root@PMX1:~# ceph daemon osd.1 config get osd_max_backfills
{
    "osd_max_backfills": "1"
}
root@PMX1:~#  ceph daemon osd.1 config get osd_recovery_max_active
{
    "osd_recovery_max_active": "0"
}

Setting new parameters

These commands can be used to adjust the parameters accordingly. The parameter values may vary depending on the system. A system with NVMes can, for example, execute more OSD max backfills than a system with HDDs. These commands change the parameters on every available OSD in the cluster.

ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'

Resetting parameters

If you want to set the default parameter after a successful recovery, the parameters from the section "finding out current parameters" should be used.

ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 0'

References


Author: Jonas Sterr

Jonas Sterr has been working for Thomas-Krenn for several years. Originally employed as a trainee in technical support and then in hosting (formerly Filoo), Mr. Sterr now mainly deals with the topics of storage (SDS / Huawei / Netapp), virtualization (VMware, Proxmox, HyperV) and network (switches, firewalls) in product management at Thomas-Krenn.AG in Freyung.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Ceph: a password is required command=nvme error
Change hostname in a productive Proxmox Ceph HCI cluster
Proxmox GUI remove old Ceph Health Warnings