Monitoring of a Proxmox VE Ceph Hosts with checkmk
The software checkmk is a tool for the supervision of server systems. The monitoring of Proxmox VE Ceph hosts is also possible with checkmk. This article is an instruction for the installation and launch of checkmk exactly for this purpose.
Requirements
To supervise a Proxmox VE Ceph with checkmk, the following components are required:
- a Linux server installation (Debian preferred)
- a complete Docker installation[1]
- a ready for use checkmk-RAW Container[2][3]
- a Proxmox Ceph HCI Cluster
- a created and registrated Proxmox VE Host in checkmk
Installation and launch of checkmk
If the mentioned requirements are fulfilled, the installation of "checkmk" on Ceph hosts can be started.
Agent mk_ceph
The first step is to create an mk_ceph agent. This requires a few settings in the "checkmk" user interface:
- First, switch to agents in the menu under setup and select Linux
- Now, copy with a right click and the selection "copy link" for
mk_cephthe plugin path http://10.2.1.179:8006/cmk/check_mk/agents/plugins/mk_ceph
Agent installation on Ceph hosts
To install the agents, the following commands must now be entered on each Ceph host:
cd /usr/lib/check_mk_agent/plugins wget http://10.2.1.179:8006/cmk/check_mk/agents/plugins/mk_ceph
However, the file has to be executable, otherwise the checks will not work. Please use the following command to do this
chmod +x mk_ceph
Adjustment of configuration file on Ceph hosts
The configuration file /usr/lib/check_mk_agent/plugins/mk_ceph has to be adjusted as follows:
USER=client.admin KEYRING=/etc/pve/priv/ceph.client.admin.keyring
The rest of the file may remain unchanged.
Activation of Ceph checks
To activate the checks, a service discovery has to be performed once again in the checkmk surface. After this, the checks are functional:
MON OSD Ratio check
We recommend the supervision of mon_osd_fullratio and the setting of a suitable nearfull-ratio.
Digression - mon_osd_fullratio
The mon_osd_fullratio configuration parameter in Ceph determines a threshold value in per cent for the available capacity of an OSD data carrier. If this threshold value is reached, no more new data is written to this OSD. This prevents the data carrier from being written to in full.
In Proxmox VE, mon_osd_fullratio is adjusted to 95 per cent by default.
If an OSD reaches this threshold value in a Ceph pool, Ceph switches the full OSDs and the associated pool in a "read only" mode. As a result, data loss will be avoided. It is important to supervise the pool and to react early to avoid such a scenario.
Therefore, we recommend expanding the pool with new data carriers from an occupancy rate of 60%!
The data will be redistributed and the percentual occupancy rate of individual OSDs will be reduced through this. Therefore, the availability of the pool is ensured and downtime is avoided.
The need and the expected occupancy rate of OSDs should be considered during the planning of the hosts. Please take into consideration failure scenarios. We are pleased to help you with that. You will receive a warning of a fixed percentage by setting a nearfull-ratio. This allows you to act without restriction and prevents your system from losing functionality.
The actual OSD ratios can be found out as follows:
root@PMX4:~# ceph osd dump | grep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.75
After this, a suitable value has to be set:
root@PMX4:~# ceph osd set-nearfull-ratio 0.6 osd set-nearfull-ratio 0.6
If an OSD reaches an occupancy rate of 60 per cent, a Ceph warning will be triggered. This will also show up via checkmk in the Ceph health check and will correctly warn of a potential threatening downtime.
References
|
Author: Jonas Sterr Jonas Sterr has been working for Thomas-Krenn for several years. Originally employed as a trainee in technical support and then in hosting (formerly Filoo), Mr. Sterr now mainly deals with the topics of storage (SDS / Huawei / Netapp), virtualization (VMware, Proxmox, HyperV) and network (switches, firewalls) in product management at Thomas-Krenn.AG in Freyung.
|
|
Translator: Alina Ranzinger Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.
|
- ↑ Debian installation of Docker (docs.docker.com)
- ↑ checkmk-RAW Container Download (checkmk.com)
- ↑ installation instructions for checkmk-RAW Container in Docker (checmk.com)


