Boot-Device Replacement - Change of Proxmox ZFS Mirror Disk

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

These instructions describe the change of a boot device in a Proxmox VE (PVE) host system with ZFS-mirror. This change is necessary, when a disk is damaged or has failed in the compound. In this article, it is explained what to do after a system hard drive failure. Options for different boot-loaders ("GRUB" or "systemd-boot") are explained. Furthermore, it is explained how to change a hard drive within a PVE so that the used mirror is completely online, and healthy and that the redundancy of the operating system is recreated.

Buy Proxmox optimized servers

Recommendation - Test environment

All described steps can be previously performed in a test environment. For this, create a virtual machine in Proxmox VE and install PVE on a ZFS RAID-1.

Therefore, you gain experience in this process and lower the risk for your productive system.

Instructions

This paragraph summarizes the commands that are necessary for changing your data carrier and its influence on your system in brief.

Attention: It is highly recommended to read the detailed instructions if it is your first time performing this process or are still inexperienced in administering Proxmox VE systems!

  1. Identify the faulty storage device and change it.
  2. Copy partition layout of the healthy disk on the exchange disk (and randomize the GUID)
  3. Find the ZFS partition using the volume ID (Solaris /usr & Apple ZFS)
  4. Copy partition of the new disk into the zpool
  5. Finalize exchange with proxmox-boot-tool

Once you have completed all the steps correctly, your ZFS will be "online", "healthy", and "boot-safe" again.

Identify failed data carrier

First, you need to identify the failed disk and its name in your PVE system. This can be made, for example, with the lsblk command.

Before failure

After installation, two boot disks are available (sda and sdb):

root@pve-virtual-01:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   100G  0 disk
├─sda1   8:1    0  1007K  0 part
├─sda2   8:2    0   512M  0 part
└─sda3   8:3    0  99.5G  0 part
sdb      8:16   0   100G  0 disk
├─sdb1   8:17   0  1007K  0 part
├─sdb2   8:18   0   512M  0 part
└─sdb3   8:19   0  99.5G  0 part

After failure

In this test scenario, the sda hard drive has failed. It is missing in the lsblk output after the failure:

root@pve-virtual-01:~# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdb      8:16   0  100G  0 disk
├─sdb1   8:17   0 1007K  0 part
├─sdb2   8:18   0  512M  0 part
└─sdb3   8:19   0 99.5G  0 part

Important: When replacing a data storage device, the descriptions may change. Always verify the exact name immediately after replacing the damaged storage device with a new one! A faulty specification of the name could irreparably damage your system in the next steps.

Copy partition layout

This information can be always found in the current Proxmox VE documentation. [1]

# sgdisk <healthy bootable device> -R <new device>
# sgdisk -G <new device>
# zpool replace -f <pool> <old zfs partition> <new zfs partition>

The new data carrier is in the system and does not have a partition layout yet. However, we require the partitions so that the system can boot properly and that we can provide two technically identic data carriers for the ZFS:

root@pve-virtual-01:~# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk
sdb      8:16   0  100G  0 disk
├─sdb1   8:17   0 1007K  0 part
├─sdb2   8:18   0  512M  0 part
└─sdb3   8:19   0 99.5G  0 part

We now copy the partition layout from the healthy data carrier to the new data carrier.

Please note once again that you MUST first identify the correct device name (device-name) using (Device-Name) with lsblk. Otherwise, you risk ending up with a broken Proxmox VE system, as the partition layout of the "new" disk might be accidentally replicated to the "healthy" disk. This would result in the system being unable to boot at all, forcing you to perform a clean install.
root@pve-virtual-01:~# sgdisk /dev/sdb -R /dev/sda
The operation has completed successfully.

root@pve-virtual-01:~# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk
├─sda1   8:1    0 1007K  0 part
├─sda2   8:2    0  512M  0 part
└─sda3   8:3    0 99.5G  0 part
sdb      8:16   0  100G  0 disk
├─sdb1   8:17   0 1007K  0 part
├─sdb2   8:18   0  512M  0 part
└─sdb3   8:19   0 99.5G  0 part

Since we copied the layout to the new disk, the disk and partitions now have the same GUIDs. That is why we still need to randomize them:

root@pve-virtual-01:~# sgdisk -G /dev/sda
The operation has completed successfully.

Find out disk ID (by-id)

To replace the volume correctly in ZFS, we need to determine the ID of the new volume. In this case, we are looking for the volume associated with the device name sda.

root@pve:~# ls -l /dev/disk/by-id/*

lrwxrwxrwx 1 root root  9 Mar 16 15:11 /dev/disk/by-id/ata-QEMU_DVD-ROM_QM00003 -> ../../sr0
lrwxrwxrwx 1 root root  9 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0 -> ../../sda
lrwxrwxrwx 1 root root 10 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb
lrwxrwxrwx 1 root root 10 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Mar 16 15:11 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part3 -> ../../sdb3

PVE always creates dedicated partitions for boot, EFI, and ZFS on storage devices:

root@pve:~# fdisk -l /dev/sda

Disk /dev/sda: 32 GiB, 34359738368 bytes, 67108864 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 7D6423DE-2A9C-4B8D-A272-C7B28E1452D9

Device       Start      End  Sectors  Size Type
/dev/sda1       34     2047     2014 1007K BIOS boot
/dev/sda2     2048  1050623  1048576  512M EFI System
/dev/sda3  1050624 67108830 66058207 31.5G Solaris /usr & Apple ZFS

In this case, we require the device-ID (by-id) from the /dev/sda3. partition, which is as follows:

/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3

Switch ZFS data carrier

First, we verify the ZFS pool status. Here, we see that the failed hard drive is not available anymore in the system and previously had the ID /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3.

root@pve-virtual-01:~# zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        rpool                                           DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            15467202543801207082                        UNAVAIL      0     0     0  was /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part3  ONLINE       0     0     0

Now, we change the data carriers.

Important: In this example, our new hard drive has the same ID as the old one. This will not be the case in your actual scenario. Please follow the syntax:

root@pve-virtual-01:~# zpool replace -f rpool /dev/disk/by-id/ID-ALTE-FESTPLATTE /dev/disk/by-id/ID-NEUE-FESTPLATTE

In our example:

root@pve-virtual-01:~# zpool replace -f rpool /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3

The exchange is now complete. Now, you can verify if the RAID-1 (ZFS-Mirror) is online and healthy again:

root@pve-virtual-01:~# zpool status -v
  pool: rpool
 state: ONLINE
  scan: resilvered 998M in 0 days 00:00:08 with 0 errors on Tue Mar 16 12:16:34 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        rpool                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part3  ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part3  ONLINE       0     0     0

Finalize disk exchange

Finally, there are a few steps left to ensure that the system is stable after a reboot and fully functional.

Now, you have to perform the following steps with the proxmox-boot-tool:

  • First, you need to find out the disk ID as described in the step Find out disk ID (by-id)—but this time, you need to find the ID of the second partition, since this is always used for the EFI system.
  • When you have found out the ID (in this case /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2),, you can execute the following commands:
root@pve:~# proxmox-boot-tool format /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2

UUID="" SIZE="536870912" FSTYPE="" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdb" MOUNTPOINT=""
Formatting '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2' as vfat..
mkfs.fat 4.2 (2021-01-31)
Done.


root@pve:~# proxmox-boot-tool init /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2

Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
UUID="FD52-5CAE" SIZE="536870912" FSTYPE="vfat" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="sdb" MOUNTPOINT=""
Mounting '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2' on '/var/tmp/espmounts/FD52-5CAE'.
Installing systemd-boot..
Created "/var/tmp/espmounts/FD52-5CAE/EFI/systemd".
Created "/var/tmp/espmounts/FD52-5CAE/EFI/BOOT".
Created "/var/tmp/espmounts/FD52-5CAE/loader".
Created "/var/tmp/espmounts/FD52-5CAE/loader/entries".
Created "/var/tmp/espmounts/FD52-5CAE/EFI/Linux".
Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/var/tmp/espmounts/FD52-5CAE/EFI/systemd/systemd-bootx64.efi".
Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/var/tmp/espmounts/FD52-5CAE/EFI/BOOT/BOOTX64.EFI".
Random seed file /var/tmp/espmounts/FD52-5CAE/loader/random-seed successfully written (512 bytes).
Not installing system token, since we are running in a virtualized environment.
Created EFI boot entry "Linux Boot Manager".
Configuring systemd-boot..
Unmounting '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2'.
Adding '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-part2' to list of synced ESPs..
Refreshing kernels and initrds..
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Copying and configuring kernels on /dev/disk/by-uuid/5D2E-4BFB
        Copying kernel and creating boot-entry for 5.15.30-2-pve
WARN: /dev/disk/by-uuid/5D2F-103F does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
Copying and configuring kernels on /dev/disk/by-uuid/FD52-5CAE
        Copying kernel and creating boot-entry for 5.15.30-2-pve

root@pve:~# proxmox-boot-tool status

Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
5D2E-4BFB is configured with: uefi (versions: 5.15.30-2-pve)
WARN: /dev/disk/by-uuid/5D2F-103F does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
FD52-5CAE is configured with: uefi (versions: 5.15.30-2-pve)


root@pve:~# proxmox-boot-tool refresh

Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/5D2E-4BFB
        Copying kernel and creating boot-entry for 5.15.30-2-pve
WARN: /dev/disk/by-uuid/5D2F-103F does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
Copying and configuring kernels on /dev/disk/by-uuid/FD52-5CAE
        Copying kernel and creating boot-entry for 5.15.30-2-pve


root@pve:~# proxmox-boot-tool clean

Checking whether ESP '5D2E-4BFB' exists.. Found!
Checking whether ESP '5D2F-103F' exists.. Not found!
Checking whether ESP 'FD52-5CAE' exists.. Found!
Sorting and removing duplicate ESPs..


root@pve:~# proxmox-boot-tool status

Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
5D2E-4BFB is configured with: uefi (versions: 5.15.30-2-pve)
FD52-5CAE is configured with: uefi (versions: 5.15.30-2-pve)

After that, the system has been successfully rebooted and is now fully redundant and secure again.

References

  1. ZFS on Linux (pve.proxmox.com) Abschnitt Changing a failed device im Unterkapitel 3.8.5.


Author: Jonas Sterr

Jonas Sterr has been working for Thomas-Krenn for several years. Originally employed as a trainee in technical support and then in hosting (formerly Filoo), Mr. Sterr now mainly deals with the topics of storage (SDS / Huawei / Netapp), virtualization (VMware, Proxmox, HyperV) and network (switches, firewalls) in product management at Thomas-Krenn.AG in Freyung.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Install Proxmox VE
Linkspeed configuration of Broadcom network cards
ZFS cannot import rpool no such pool available - fix Proxmox boot problem