Ceph Recovery Stop in the event of node failure

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

In a Ceph-Cluster with a memory utilization of 75 per cent or more, there is the danger that one or more OSDs fill up in the event of a node failure. This happens due to the automatic distribution of Placement Groups (PGs) of the failed node to the remaining OSDs. If individual OSDs reach 95 per cent or more of their capacity, they will be stopped by Ceph. As a result, the Ceph-Cluster and the virtual machines stored on it are no longer functional.

In this article, you will learn how to avoid this scenario and which risks and restrictions come along with that.

Effects of automatic recovery

Ceph starts to retrieve the PGs on other nodes automatically in the event of a node failure without manual intervention. In the event of a high memory utilization, this can lead to:

  • individual OSDs reach an utilization of over 95 per cent and are stopped
  • the whole Ceph Storage system becomes useless
  • VMs are not responsive anymore cluster wide.

This can be avoided by the norecover flag.

Manual setting of norecover

To work around the problem, it is helpful to set the flag norecover under "Manage Global Flags" in the Ceph OSD view. This avoids that Ceph starts to retrieve the PGs automatically. The PGs of the failed nodes are not shifted to other OSDs to avoid an overflow. The remaining storage stays functional.

Implementation

The following picture gallery shows how to activate the norecover flag.

Important: After the failure of a node, you have 10 minutes to activate the flag norecover!

Risks and restrictions

The use of norecover flag involves risks, for example the danger of data loss:

  • A failure of an additional OSD during the activation of the flag can lead to data loss.
  • The simultaneous failure of two OSDs will cause critical data losses in this state.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Determine the optimal nearfull ratio in the Proxmox Ceph 3-node cluster
Monitoring a Proxmox VE Ceph host with checkmk
Proxmox GUI remove old Ceph Health Warnings