Veeam Replica Failover and Failback functionality

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

Veeam Backup & Replication is a backup solution with a wide range of functions. It was developed especially for virtual VMware vSphere and Microsoft Hyper-V environments. This article shows the How a failover and failback works in detail and explains the different Failover and Failback options supported by Veeam Backup & Replication.

Veeam Software at Thomas-Krenn

Replica failover

Sequence of a failover process

A failover describes the change from a production VM on the source host to a replica VM on the target host. During the failover process, Veeam Backup & Replication provides a fully functional VM on the target host within seconds at a desired time. This allows users to access the services and applications they need with minimal disruption. When you failover, the state of the original VM on the source host is not affected in any way. If you need to test the replicated VM and its restore points for recoverability, you can failover while the original VM is running. After all necessary tests have been performed, you can undo the failover and return to normal operation. As an alternative test method, Veeam Backup & Replication also offers SureReplica technology. For more information, see under SureReplica. It is recommended that you use Veeam Backup & Replication to map failover scenarios. Avoid manually powering on a replicated VM - this can disrupt further replication operations or lead to loss of important data.

Failover functionality

You will learn how a failover works in the following steps:

  1. Veeam Backup & Replication resets the replicated VM to the desired restore point. To do this, the replica is reset to the necessary snapshot in the chain.
  2. Veeam Backup & Replication switches the replica on. The state of the replica VM is changed from Normal to Failover. If you perform a failover, for test or DR simulation purposes and the original VM still exists and runs, the original VM remains switched on. Note that all replication attempts for the original VM will fail until the replica VM is returned to its normal passive state.
  3. All changes made to the replica VM while it is running in failover state are written to the delta file of the snapshot or restore point. In Veeam Backup & Replication, the actual failover is considered a temporary state that should be finalized. While the Replica VM is in failover state, you can undo the failover, use the failback feature, or use the permanent failover feature. In the event of a disaster, after testing the replica VM and ensuring that it is running stable, you should take another finalizing step to perform the permanent failover.

Replica failback

Replica failback functionality

If you want to resume operation of a productive VM, you can access it from a replicated VM. When you failback, you return from the replica VM to the original VM, thus moving your I/Os and processes from the target host to the production host and returning to normal operation mode.

When you have succeeded in restoring the source host to working order, you can switch from the replica VM to the original VM on the source host. If the source host is unavailable, you can restore the original VM to a new location and switch.

Failback functions

Veeam Backup & Replication offers you three failback options

  • You can failback to a VM in its original location on the source host.
  • You can failback to a VM that has already been restored to a new host from the backup.
  • You can failback to a different location by transferring all replica files to the destination location.

The first two options help you to reduce recovery time and minimize network utilization: Veeam Backup & Replication only needs to transfer the differences between the source VM and the replica VM. The third option can be used if there is no way to use the original VM or restore the VM from the backup before failing back.

Failback functionality

If you want to fail back to the original VM, Veeam Backup & Replication orchestrates the following actions:

  1. If the original VM is still active, Veeam will turn you off.
  2. A failback snapshot of the original VM is created.
  3. Veeam Backup & Replication calculates the differences between the disks of the original VM and replicated VM in failover state. This will determine the delta and detect which data needs to be transferred from the replica VM to the original VM in order to synchronize it with the replicated VM.
  4. Veeam transports only the delta of the data to the original VM. The transported data is written to the delta file of the failback snapshot of the original VM.
  5. Veeam Backup & Replication switches off the replicated VM. It remains switched off until you perform a commit failback or switch back to the replica VM using an undo failback.
  6. Veeam creates a Failback Snapshot for the replicated VM. The snapshot acts as a new restore point and saves the time before failback to the original VM. You can then use this snapshot to return to the state before the failback.
  7. Veeam Backup & Replication recalculates the delta between the replica VM and the original VM and transports the change data to the original VM. This new synchronization cycle allows Veeam Backup & Replication to take into account some of the last minute changes that occurred during the failback process.
  8. In the final step, Veeam removes the Failback Snapshot of the original VM. Changes written to the Snapshot Delta file are merged with the original VM disk files.
  9. The state of the Replica VM is changed from Failover to Failback. Veeam Backup & Replication temporarily suspends replication activities for the original VM.
  10. When you failback to the original VM restored to a different location, Veeam Backup & Replication updates the ID of the original VM in the Veeam Backup & Replication configuration database. The ID of the original VM is replaced with the ID of the restored VM.
  11. Veeam Backup & Replication will automatically start the restored original VM on the target host if you checked the checkbox.

Failback to a new location

When you failback to a completely new location, Veeam will perform the following steps:

  1. Veeam Backup & Replication transports all replica VM files and stores them in primary storage on the target site.
  2. Veeam Backup & Replication registers a new VM on the target host.
  3. The VM is started automatically if the appropriate option is selected.

In Veeam Backup & Replication, a failback is considered a temporary state that should be finalized. After you have tested the restored original VM and ensured that it works without problems, you should perform a commit failback. You can also undo the failback and put the VM replica back into the failover state. To complete the failover process, you can permanently switch to the replica VM.

Permanent failover

Permanent failover functionality

If you perform a permanent failover, commit or confirm the failover operation. Only perform this operation if you are permanently switching from the original VM to a replica VM and want to use this replicated VM as the new "productive" VM. Through the permanent failover the replica VM loses the property of a replicated VM and takes on the role of a productive VM.

This scenario makes sense if the original VM and the replica VM are in the same location and are nearly identical in terms of resources. In this case users do not have to expect increased latencies.

Procedure of a permanent failover

  1. Veeam Backup & Replication removes snapshots (restore points) of the Replica VM from the snapshot chain and deletes associated files from the datastore. Changes written to the Snapshot delta file are merged with the virtual disks of the replicated VM to bring the replica VM up to date.
  2. Veeam removes the Replica VM from the list of replicated VMs in the Veeam Backup & Replication Console.
  3. To protect the Replica VM from data corruption after the permanent failover is complete, Veeam Backup & Replication adjusts the Replication Job and adds the original VM to the list of exceptions. When the replication job is started, the original VM is skipped during processing. As a result, no data is written to the now productive replica VM.

Failover plan

Failover plan functionality

If you have multiple VMs running interdependent applications, you must failover one after the other as a group. To automate the process, you can create a failover plan.

In a failover plan you define the sequence in which VMs are processed and possible intervals between them. The intervals are defined periods of time that Veeam Backup & Replication must wait before starting the failover process for the next VM in the sequence. The scheduled delay ensures that VMs, such as a DNS server, are already fully booted at the time a dependent VM is started. The time delay can be defined individually for each VM in the failover plan, except for the last VM in the sequence.

The failover plan is created in advance. If the primary VM group goes offline, you can execute the corresponding failover plan manually. When you start the process, you can choose whether you want a failover to the latest status or choose a different time. Veeam Backup & Replication will then attempt to boot the virtual machines as replicas from the state closest to your selection.

Failover process

The failover process is performed as follows:

  1. Veeam Backup & Replication recognizes for each VM the corresponding Replica VM. If replicas of failed virtual machines are already in failover or failback status, the original VMs are skipped during processing.
  2. The replica VMs are started according to the order and time delay defined in the failover plan.

Finalization of failover plans

A failover is a temporary state that has yet to be finalized. The possible finalizing steps for a group failover are similar to a regular failover: undo failover, permanent failover, or failback.

If you choose failover or failback, you must do this for each VM individually. However, you can undo executed failover operations for the whole group with the option Undo Failover Plan.

If you undo the failover operation, the system switches back to the previously productive VM and discards all changes made to the replicated VM. When you undo the group failover operation, Veeam Backup & Replication uses the list of VMs processed during the last failover plan execution and uses them as productive VMs again. If some of the VMs have been switched back to productive VMs in the meantime via Undo Failover, e.g. manually by the user, they will be skipped during Undo Failover Plan execution.

Veeam will perform an Undo Failover operation for a maximum of 5 VMs simultaneously to reduce the impact on the infrastructure. For example, if there are 10 VMs in the Failover Plan, the failover will be performed for the first 5 VMs in the list, then paused for 10 seconds and then processed for the next 5 VMs.

Planned failover

Planned failover functionality

If you know that your production VMs are about to go offline, you can also proactively redirect workloads to their corresponding replicas. A scheduled failover is a smooth manual swap from a production VM to its replica with minimal disruption. A planned failover is a good way to plan a data center migration, maintenance or software upgrades of the production VMs. A planned failover is also a good idea if you know about impending disasters that would cause an outage of the productive environment.

Start of a planned failover

When you start the scheduled failover process, Veeam Backup & Replication performs the following steps:

  1. The failover process triggers an immediate incremental replication run to transfer the changes not yet replicated to the replica.
  2. The VM is shut down.
  3. Another incremental replication run is executed to synchronize the latest changes with the replica VM.
  4. The failover process from the productive VM to the replica VM is executed.
  5. The replica VM is started.

During the scheduled failover, Veeam Backup & Replication creates two restore points that are not deleted after the failover. These restore points appear in the list of restore points for this VM and are used for rollback purposes. Once your primary host is back online, you can roll back. The finalizing actions for a scheduled failover are similar to those for an unplanned failover: undo failover, permanent failover, or failback.

Undo failover

Undo failover functionality

To restore a replicated VM to its state before the failure, you can undo the failover operation.

When you undo the failover operation you switch back from the replicated VM to the original VM. Veeam Backup & Replication discards all changes to the replica VM since the failover. You can use the Undo Failover scenario if you have switched to the replicated VM for testing and troubleshooting purposes and want to return to the normal operation mode of the original VM.

Undo failover

  1. Veeam Backup & Replication resets the replicated VM to the state before the failover. To do this, Veeam Backup & Replication will turn off the replica VM and restore it to the state of the latest snapshot in the snapshot chain. Changes written to the snapshot delta file while the VM was in failover state are discarded.
  2. The state of the VM replica is restored to normal and Veeam Backup & Replication resumes replication of the original VM from the source host.

Commit failback

Commit failback scheme

To complete the recovery of the original VM, you must confirm the failback operation (commit). In the failback you confirm the switch back to the former productive VM. Veeam Backup & Replication returns to normal operation mode and continues the replication process as configured.

A commit failback is performed as follows:

  1. Veeam Backup & Replication changes the status of the replica from Failback to Normal.
  2. Further steps depend on the location where the VM failback was performed:
    1. If the failback of the replicated VM is accompanied by a new location, Veeam Backup & Replication also changes the replication job and adds the previous production VM to the list of exceptions. The VM restored to the new location takes over the role of the production VM and is added to the replication job instead of the excluded VM. When the replication job is started, Veeam Backup & Replication processes the newly restored VM instead of the former productive VM.
    2. If the failback operation of the replica VM returns to the original location, the replication job is not adjusted. When the job is started, Veeam Backup & Replication processes the original VM as before.

The Failback Commit process does not delete the Failback Snapshot, which stores the state of a VM replica before the failback. Veeam Backup & Replication uses this snapshot as an additional restore point for replication. With the pre-failback snapshot, Veeam Backup & Replication needs to transfer fewer changes and therefore puts less load on the network when replication resumes.

Undo failback

Undo failback functionality

If the former production VM does not work as expected after the failback operation, you can undo the operation and return to the replica VM.

An Undo Failback is performed as follows:

  1. Veeam Backup & Replication deletes the failback protection snapshot created for the replicated VM.
  2. Veeam Backup & Replication starts the replica VM and changes the state of the replica from failback to failover.

Related articles

Setting up Veeam Replication in a VMware environment