In an ideal world, if part of your hybrid cloud platform goes down, processing only slows and then recovers automatically,...
as other segments of the platform take up the load -- or, in the case of a public cloud, as the workload migrates to other availability zones. In reality, it's difficult to achieve data consistency with hybrid cloud backup and disaster recovery.
WAN data transfers between cloud platforms can take a long time, especially with bulk transfers. For example, a storage system with two local replicas can complete a write operation in a few milliseconds, while a three-replica system takes over 10 seconds.
One common way to address this issue is to have eventual consistency for the remote replica. But this leaves a window -- ranging from minutes to hours -- where the data is not in sync. Hybrid cloud backup and disaster recovery scenarios rely on the use of remote replicas; the cloud provider that hosts still-operational segments of the hybrid cloud must ensure data consistency. However, this could be your responsibility for the on-premises private cloud segment.
Best practices to ensure data consistency
Data management gets more complicated with hybrid cloud backup and disaster recovery because users can store data on either a public or private cloud. For example, one common issue is how to avoid divergence with backup and archive copies of data sets.
To do this, know what data sets have changed and what the new data is. A write-journal file would work here but is vulnerable to an outage and likely wouldn't transmit to another part of the hybrid cloud platform before a shutdown. However, a solid design and frequent transmission can reduce the recovery point objective (RPO).
Hosting a journal server in a colocation facility across town, for instance, provides RPOs in seconds. The colocation provider likely uses uninterruptible power supply systems and maintains WAN links, so data is well-protected. There are different ways to create this journal, but one way is to use the colocation server to forward updates to the public cloud. Also, to protect against hackers and ransomware, consider software with a continuous backup option, which would also complement the colocation journal approach.
When you host on public cloud, the problem is much deeper. Make sure the cloud service provider takes data protection measures that are similar to yours. Cloud providers are not very open on the subject of internal infrastructure, but it's crucial to know how they protect their synced data for consistency with public cloud data. There can be major differences between interzone sync and access in a hybrid environment, for example.
Likely, the cloud service provider has journaling servers that are highly available. In hybrid environments, however, you might need a software tool to forward journals to other clouds or private segments. This problem likely will disappear as hybrid clouds gain momentum and object stores are allowed to bridge across multiple segments. In the meantime, you may be stuck with an extra replica of volatile files in a different public zone.
Take preventative measures
Lock mechanisms prevent multiple users from updating data across all cloud segments to prevent an indeterminate result. To create a single, consistent data set, apply a point-in-time snapshot to the relevant storage in all cloud segments.
Normally, snapshots remember all changes in the sequence in which they occur, but you can recover a snapshot from the main branch and apply changes that do not directly go into the main branch. You can use a temporary version, for development work or for a backup target. After you apply a point-in-time snapshot, spawn a fully recovered version to which journal changes are applied. You can use these locks to check consistency, though it's not required.
The recovered snapshot is also used for the backup. Create it in any of the cloud segments, but remember that VMs in cloud differ considerably in performance and storage throughput. There are big differences between cloud service providers in regards to compute performance and storage speed. Without some care, the job will run slow or, if the choice changes, may disrupt backup windows.
With these approaches, it should be possible to keep RPO very low in hybrid cloud backup and disaster recovery approaches. Automatic recovery isn't available yet, but it is most likely on the horizon for every major cloud service provider.
Select the right hybrid cloud management tools
Avoid these common hybrid cloud mistakes
See the benefits of a hybrid cloud management platform