In cloud bursting, IT teams size their private cloud deployments to support average workloads, and then use the...
public cloud to handle load spikes. There are several roadblocks, however, to building an efficient cloud bursting architecture -- and one of the biggest is the wide area network.
The deployment of high-speed links in wide area networks lags far behind the local area network. As a result, file-transfer speeds between private and public clouds are often slow, severely affecting the speed at which organizations can perform cloud bursting.
During the cloud bursting process, data must move to the public cloud before processing can start. Setting up a new VM only takes a few seconds, and Docker containers can reduce that timeframe even more. So, if data transfers between a private and public cloud are slow, a cloud bursting architecture loses much of its value.
In most enterprise data centers, the majority of data is stable and unchanging. However, users need to access some of that data frequently. Pricing data, for example, may change monthly, but be accessed every second. In addition, different types of data receive different updates. Database synchronization is usually done on a record-by-record basis, but webpages, for example, are updated as a folder entity, where all the files change at once.
Data deduplication in storage appliances help overcome these challenges. This service eliminates all but one copy of a data object, replacing the other copies with pointers to that single copy. IT teams can then use replication or erasure coding to ensure the integrity of that single copy. This saves space and also makes updates easier, since only one file and a list of pointers receive the change.
Data duplication for a cloud bursting architecture
Deduplication is an excellent practice, but cloud bursting requires planned data duplication. The concept is simple: IT teams pre-position copies of the data that is needed in the public cloud, as well as in the private cloud.
If data never changed, this would be easy to do; you could just copy all the files and pay the monthly storage fees. But real-world data changes, and file duplication requires data synchronization before cloud bursting can start. Some data must remain in lockstep between the two cloud environments.
IT teams should categorize data based on the level of synchronicity it requires. Higher transaction latency occurs when dealing with tightly bound data because it makes round trips between clouds for updates. If data synchronization is looser -- and only syncs about once a month -- usage in both clouds is easier to manage. Try to find ways to move data from a tight to a loose synchronization level to increase performance.
To move to a looser synchronization model for databases, flag each record that is changed in a single, small list. Update this every minute from the private to the public cloud, and the public cloud database will know to reference that record for the latest data.
Another option is to shard the data structure. This works well with databases, but can be applied to any data set. By preselecting the range of data that will burst to the public cloud -- for example, client names starting with R through Z -- you can minimize data synchronization when starting to burst. Keep a snapshot of recent data in the public cloud, so the sync difference remains small.
Images and applications are slow-changing, so copies should be in both clouds. For hypervisors, this can lead to a lot of traffic at the start of cloud bursting process, and it is one reason containers share a common image and applications.
There aren't many automated tools for planned data duplication in a cloud bursting architecture, but that's going to change within the next year. In the meantime, solid data management will accelerate hybrid cloud operations.
Review the challenges of a cloud bursting architecture
Explore the role of DR and cloud bursting in hybrid cloud
Evaluate the pros and cons of cloud bursting