Determine the optimal nearfull ratio in the Proxmox Ceph 3-node cluster

From Thomas-Krenn-Wiki
Jump to navigation Jump to search
OSD overview in a 3 node cluster.
OSD overview in a 3 node Ceph cluster.

There are threshold values to avoid data loss in every Ceph cluster. These warn and protect from overfilling the cluster. Their standard values usually protect sufficiently well.

Ceph clusters with 3 hosts, however, represent a special case. This article explains how to determine safe threshold values for Ceph clusters with 3 hosts using an example.

Configuration of cluster

3-Node Ceph cluster with 83% fill level.
3 node Ceph cluster with 83 % fill level.

We use a Ceph cluster with 3 hosts in a virtual environment on Proxmox VE. In such a Ceph cluster, 12 OSDs are recommended as the minimum number.[1]:

Hosts 3
OSDs 12 (4 per host)
capacity per OSD 16 GiB
Replica Standard (osd_pool_default_size) 3
Replica Minimum (osd_pool_default_min_size) 2

For this, we have installed a "monitor" and a "manager" on every host.

Capacity

In total, we have the following storage capacity in our example cluster:

  • Total capacity: 192 GiB (minus overhead due to meta data)
  • Net capacity: 64 GiB (minus overhead due to meta data)
  • Net capacity per OSD: 5,33 GiB (minus overhead due to meta data)

Using the default values, the following threshold values for the fill level of the cluster are also obtained based on the total capacity:

  • mon_osd_nearfull_ratio (85 %):
    • Cluster: 163,2 GiB
    • OSD: 13,6 GiB
  • mon_osd_backfillfull_ratio (90 %):
    • Cluster: 172,8 GiB
    • OSD: 14,4 GiB
  • mon_osd_full_ratio (95%):
    • Cluster: 182,4 GiB
    • OSD: 15,2 GiB

The threshold values are verified per OSD and displayed accordingly in the dashboard.

Determine suitable threshold values

One OSD failed. Due to backfillfull_ratio, no more PGs are reproduced.
An OSD failed. There are no PGs recreated due to backfillfull_ratio.

To determine good thresholds for our cluster, we first calculate what happens with the default values in various failure scenarios. Then, we calculate better threshold values and set them in the cluster.

Failure scenarios

We fill the test cluster to the point where individual OSDs already reach the mon_osd_nearfull_ratio threshold value of 85% (13.6 GiB).

Host fails

If a host fails, no recovery of the PGs is initiated because no third host is available as a fault domain.

The data redundancy of the cluster is minimized.

Notification: If there are more than 3 hosts, the PGs are restored on the remaining hosts in a 3-fold replicated pool.

OSD fails

The cluster starts a Recovery. The Placement Groups (PGs) of the failed OSDs are recreated on the remaining OSDs of the same host, since it still exists as the third error domain. The following formula can be used to calculate the fill level that these OSDs would then achieve:

Allocationnew=OSDsOSDs1*Allocationold

In our case, with 13.6 GiB utilization and 3 remaining OSDs on the host, this means:

19,13=43*13,6

The maximum capacity of the OSD would be exceeded. This is prevented by a safety mechanism. As soon as mon_osd_backfillfull_ratio is reached, Ceph stopps the recreation.

The PGs are not recreated completely until storage capacity is released or the cluster is expanded. As there are replicas on other hosts, no data will be lost.

The data redundancy of the cluster, however, is reduced.

Adjusting threshold values

To guarantee that a complete recreation is possible in case of an OSD failure, the threshold values should be adapted.

Optimal threshold values for cluster with 3 hosts

The right threshold values ensure that the recovery does not exceed the Backfill threshold during a OSD failure. This means:

Backfillfullosd>Allocationnew

The configuration of mon_osd_nearfull_ratio is the most important one. If we keep the other values the same, the optimal value in our case is:

nearfull<0,9*43=0,675

Where "0.9" stands for an OSD that is 90% full. Therefore, a mon_osd_nearfull_ratio of 0.67 or less is recommended.

Set threshold values in Ceph

The threshold values can be adapted with the following commands[2]:

  • mon_osd_nearfull_ratio:
root@pve0:~# ceph osd set-nearfull-ratio .67
  • mons_osd_backfillfull_ratio:
root@pve0:~# ceph osd set-backfillfull-ratio .9
  • mon_osd_full_ratio:
root@pve0:~# ceph osd set-full-ratio .95

References

  1. Minimale OSDs in PVE Ceph-Clustern (pve.proxmox.com, 16.12.2025)
  2. No free drive space (docs.ceph.com, 16.12.2025)


Author: Stefan Bohn

Stefan Bohn has been employed at Thomas-Krenn.AG since 2020. Originally based in PreSales as a consultant for IT solutions, he moved to Product Management in 2022. There he dedicates himself to knowledge transfer and also drives the Thomas-Krenn Wiki.

Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Ceph - increase maximum recovery & backfilling speed
Ceph Perfomance Guide - Sizing & Testing
Change hostname in a productive Proxmox Ceph HCI cluster