Many enterprises are attracted to hybrid cloud architectures, but they don't realize that the process of hybridization...
introduces a critical problem with data placement and movement.
Hybrid clouds are designed to have at least some workload elements extend across private data centers and public clouds. Enterprises need to learn where to store hybrid data, how to access it and ensure that their strategy doesn't break the bank -- or the application.
Data movement and placement strategies
Wherever you house data for hybrid cloud apps, it is likely to cross cloud boundaries, which can trigger cloud and network charges. Data movement between the cloud and the data center may be priced by data volume and can add thousands of dollars per month to your cloud bill.
A typical hybrid application will use the cloud as a front-end transaction source and the data center for actual transaction processing. There are then two basic approaches to hybrid data hosting:
- Host the data in the cloud and provide access to that data from data center components of the application.
- Host the data in the data center and provide the cloud components with access.
It may also be possible to use a mixture of the approaches.
Most public clouds will charge for data storage and also for egress traffic. Ingress traffic is normally free, so you want the cloud to receive more data from the data center than it sends to on-premises systems. If an application supplies more data to the user than it receives, it's best to host that data in the data center.
One general rule in hybrid data modeling is that you don't run disk-level I/O operations across a cloud boundary, because it will generate a lot of chargeable traffic. So, when considering a cloud database, don't presume you're modeling disk-level reads and writes.
Instead, you're sending queries to, and receiving results from, an architected database management system. This eliminates the need to drag every database record across a cloud interface to determine whether it fits particular criteria.
The specific character of this query-result exchange is the first issue you need to address in your hybrid data planning. How your cloud provider prices data and data movement is also a factor.
Database location considerations
The problem with hosting hybrid data in the cloud is that no matter how you scale cloud front-end components, you're relying on a link to a single data center's database management system for transaction validation and handling. It also constrains you from backing up your data center components since the data access gateway would also be in the data center and would presumably be compromised if the components fail.
In the vast majority of cases, enterprises run analytics, regulatory reporting and business planning programs in their data centers. It's not feasible for those businesses to move the database to the cloud because it would raise egress traffic charges significantly. However, if hybrid cloud apps use databases that are modestly sized, they can be run in parallel in both environments. Just make sure those databases are updated synchronously.
Sometimes, the cloud front end of the applications may only need some validating database lookups, such as an account number. If this is the case, consider maintaining a summary database in the cloud that contains only crucial database elements. Summary databases may also be helpful if enterprises use a cloud analytics application on data normally stored on premises. Often, historical analytics doesn't require either the latest updates or the full database, and anything that reduces the number of records or the size of the data per record will reduce storage costs.
You need to think about all your company's data uses when you plan hybrid cloud database hosting. Many businesses forget that regular report generation and ad hoc analytics often generate more database usage than transaction processing.
The request-result ratio is different for report and analytics queries than it is for transactional access. And because of the egress fees, users are somewhat penalized for activities that pull lots of data out of the cloud versus those that generate volume into the cloud. So while it's free to push transactional data into the cloud, costs can add up if users pull data on premises from those small ad hoc queries that generate big results.
It's particularly important to ensure report and analytics applications that access cloud databases are run on the cloud to avoid egress charges.
The more traffic you move between environments, the more load you place on the connection between your VPN and the cloud. That capacity will cost you too, and in some cases the cost for specialized connectivity to the cloud will rival that of egress traffic. Look at all your costs when you plan your data hosting and plot its flows.
Unless cloud providers and network operators offer to move your data traffic at zero cost, your cloud application business case may rest on the way you handle data.