This content is part of the Essential Guide: What developers need to know about cloud app integration
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Stampede to cloud presents data integration problems

Alan Earls explains why the rapid adoption of the cloud is creating data integration problems, and what to do about it.

The rapid adoption of the cloud by many organizations, for a growing array of use cases, is perhaps the biggest change in IT in a generation. Not surprisingly, that transition comes with plenty of challenges.

Among the largest: data integration problems. Determining how to allow multiple application programs to share data in cloud-based networks, either directly or through third-party software, has become increasingly burdensome.

For instance, James Elwood, CTO of Geezeo Inc., a Tolland, Conn.-based software services provider for banks and credit unions, ran into difficulties when his company started using cloud networking with Amazon Web Services in 2009. "We needed to build a quasi-tunnel or switching tunnel architecture between our applications and customer data centers," he said.

Most vendors will state right in their contract they're not liable for lost data.

Joe McKendrick,
analyst, McKendrick & Associates

The wild card was that Geezeo was purely in Amazon Elastic Compute Cloud (EC2). "Without the infrastructure, being able to set up multiple [Internet Protocol Security (IPsec)] tunnels to various data centers and the ability to keep them separate cleanly is a bit of a technical challenge, as well as an ongoing management challenge," Elwood said.

Geezeo looked into the costs of buying a traditional data center, installing physical Cisco networking devices and creating a way to bridge its infrastructure cleanly over to Amazon EC2. Then the company spent two months cobbling together solutions that involved 15 hours of employee time each week for monitoring and maintenance -- which company officials considered too costly.

Geezeo eventually ended up finding a non-traditional data center product: VNS3 from CohesiveFT. VNS3 directs and manages traffic with the insight and control Geezeo needed, Elwood said, adding that if IPsec tunnels go down, it means the company isn't gathering transactional data. Adding VNS3 Manager allowed the core team to "manage all of their connections from a single control point and offer reliable, manageable and secure tunnels," he added.

Challenges and opportunities

In the past, integration architecture approaches such as the enterprise service bus and message brokering services managed to pull together disparate systems for various integration scenarios, said Jiten Patil, principal cloud expert and technology consultant for Persistent Systems Ltd., a global technology services company. However, in the evolving cloud era, cloud integration services are responsible for making devices, cloud applications, on-premises systems, hybrid and social solutions work seamlessly to deliver heterogeneous business use cases. "Cloud integration is replacing on-premise integration middleware and moving integration scenarios to cloud-based multi-tenant and self-service [Software as a Service (SaaS)] applications," he said. Furthermore, he noted, by pre-provisioning integration services such as connectors to various endpoints, it is helping drive the emergence of integration Platform as a Service (iPaaS).

Joe McKendrick, analyst at McKendrick & Associates, points out that the idea of having many applications sharing one set of data has always been the holy grail of IT. It was the idea behind relational databases when they first came to market in the early 1980s, namely, to have one repository of data which many applications can access. "Relational databases were partially successful in accomplishing this ... because there are two issues that keep getting in the way: organizational silos and data variety," said McKendrick. These challenges will also vex cloud-based data integration unless the organization makes a strong commitment to an enterprise data architecture.

Solving data integration problems requires that the business be put in charge of the data, McKendrick said. "The first step in any cloud process is going to be deciding which data sets will be going to the cloud, and which will remain within their current locations," he noted. In addition, it's important -- probably doubly important in the cloud -- to assure the data's trustworthiness. "Trust is the foundation of data management and analytics. Everyone needs to have complete trust in the information being captured and presented," he said.

Eight tips for successful cloud integration

Jiten Patil of Persistent Systems offers the following suggestions for cloud integration:

1. Roadmap future business requirements as well as IT transformation requirements.

2. Develop process integration scenarios.

3. Document and rate the specific complexity involved in transforming SaaS data into on-premise systems.

4. Map applications and services to integrate.

5. Evaluate the needs for custom integration requirements and application program interfaces to be developed or managed for legacy systems.

6. Create a strategy for data integration and cleansing.

7. Identify a fully scalable integration platform that has built-in connectors to required systems and also has the capability to extend beyond them.

8. Test a subset of a business process through the cloud integration scenario.

McKendrick also pitches the idea that a move to the cloud means it's a good time to consider a move to master data management (MDM). "MDM is essential to any data integration project, as it establishes a single master 'gold copy' of data, versus separate, siloed data sets," he said.

Similarly, according to McKendrick, a move to the cloud also means a natural move to Data as a Service.

The advantages of cloud integration are twofold, according to McKendrick. First, business end users are able to get at information they need on a moment's notice. Second, IT and data management staffs will see a boost in their own productivity, because they aren't consumed in writing scripts or code to achieve manual integration.

"You don't have to tear up your existing data center and databases to achieve greater integration," McKendrick emphasized. What is needed, however, is an architectural approach to integration, one that addresses requirements across the enterprise. "The practice to date has been 'one-off'-type data integration efforts, done with manual scripting, on a project-by-project basis,” he said. The problem with that approach: It doesn't allow for predicting who is going to need which data source or predetermining where the next data source will come from. "Instead, we need to make it easy -- in almost a self-service way -- for decision-makers to identify the data source they need and be able to bring it into the decision-making environment through a well-architected flow, with no need to go through IT and set up reporting or dashboards," he said.

McKendrick says companies that are putting their data out with third-party cloud vendors need to take a hard look at the contractual terms. Scrutiny needs to go beyond just security from hackers. For example, what happens to your data after your contract is terminated? How long does the vendor hang on to it? What about an outage or data loss? "Most vendors will state right in their contract they're not liable for lost data. You need to make sure you have either a secondary cloud or an on-premises backup site," he said.

“It's all about architecture," McKendrick continued, adding that companies want to avoid what he called "JBOD -- just a bunch of data" in an architecture sprawled across their organizations. Instead, "you want a flexible architecture that accommodates any and all changes, such as new data sources being added, older ones being removed, or interfaces changing," he said.

Deploying enterprise applications in the cloud, whether moving a legacy application or launching a "greenfield" project, requires understanding how the cloud environment differs from the physical co-located or hosted state, said Michael Higgins, manager of enterprise solutions architecture at CloudSigma, an Infrastructure as a Service (IaaS) provider.

Companies must be willing to review architectures that were created for hardware solutions, Higgins said. "The cloud has a rich set of features to offer that are not available in a typical hardware environment," he noted. Companies should understand the differences between "burst" and long-term commitment billing models and add cost control as a key metric in project planning. "Solutions that can accept single, simple failures or which can self-heal -- roll back, fail over, 'round-robin,' etc. -- work amazingly well in the cloud,” he added.

Dig Deeper on Cloud APIs and integration

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Does your organization use master data management (MDM), as suggested in this article?
We just don't. IT investment has been a low priority in the past 10 years or so.