Creativa - Fotolia
The novelty of cloud can disguise some profound problems, including many involving cloud workflow, or the movement of information between application components. Foremost among these is the risk to application quality of experience and availability caused by cloud connectivity. Data paths in the cloud are outside business control, and when applications are split between the cloud and the data center, network path issues can undermine the whole business case if they're not addressed. To get things right, architects need to think about component aggregation in cloud planning, identify workflows that cross critical cloud boundaries and manage delay budgets carefully when planning cloud workflows.
One of the major issues in modern cloud application design is balancing Agile-driven componentization with the need to create efficient applications. Traditionally, architects are taught that if they build applications from an inventory of small reusable components, they can accelerate development and even reduce application lifecycle management (ALM) issues. The problem is that when applications are broken down into pieces and linked with enterprise service buses and business process execution language, deployment to the cloud requires constant workflow in and out of the cloud as the application is distributed among components.
One easy way to address this is to integrate multiple software components into a single service where the components are always used together in an application. The components are still separate at the development level, so there are ALM benefits in efficiency and agility, but only one service or cloud workflow exchange is needed to invoke the group.
Next, look at the relationship between workflows and cloud boundaries. Workflows are movements of work from a service bus to a component set. The workflow processing is often retained in the data center when the cloud is used for some components. Work continually crosses the cloud border, which is usually where control of quality of service is the most limited.
To minimize these crossovers for critical flows, architects have to identify such flows. That means plotting the trajectory of work not just as a mass flow, but also the number of times a given path is followed, how changes in the path delay work and how reliability impacts the flows and changes worker quality of experience (QoE). The paths with the largest QoE impact, both for delay sensitivity reasons and because they're used repetitively, should be examined to reduce the number of times they cross into and out of the cloud.
The failure to quantify performance factors related to QoE is one of the worst mistakes a software architect can make in application design. When managing cloud applications and cloud workflows, it's important to start with an understanding of the expected response time for a given application or workflow. From there, architects can derive specific guidance on what has to be done to make a workflow ready for the cloud.
Nearly all transactional applications have two limit points for response time, one representing the time interval at which users notice and complain about response and the other representing the point where worker productivity is altered enough to jeopardize the application's business utility. Operations management and enterprise architects can often provide guidance on where these points are, as well as suggest the conditions under which they can be exceeded. Providing a design goal to work toward is the starting point for cloud workflow planning.
The second step is to determine the process time of the application, which includes the time needed to execute components in the workflow and normal queuing delays. If this is subtracted from the design goal response time, the remainder is the delay budget available for cloud-induced workflow delays. Testing can establish the "normal" time needed to direct work to a cloud component over the cloud network boundary. If a given workflow's sum of delays exceeds that budget, architects will need to optimize it in some way.
Minimizing border crossings for the workflows identified as critical is a matter of component placement and application design. Some of the principles of front-end cloud components feeding on-premises transaction processing can be applied here. But that won't work if the cloud is acting as a backup or overflow resource. If the delay budget and QoE goals won't permit degradation of performance during cloud bursting periods, one possible solution is to use a cloud-based workflow manager and a premises manager. Pass work between the managers so each can control his or her own components. Cloud providers like Amazon Web Services' (AWS) Simple Workflow Service can sequence multiple components in the cloud as a subordinate to the main workflow or service bus engine. IT teams that use a cloud workflow engine may want to consider bursting a collection of components at a time rather than single components.
It never hurts to carefully test cloud networking performance (into and out of the cloud and among cloud components) to determine how much of an issue workflow optimization will be. Architects may also be able to prevent performance degradation by using geographic clustering of hosts to ensure their components don't move too far from the data center. Direct, private paths between the data center and the cloud (provided by AWS Direct Connect or AWS Data Pipeline, among others) and special intra-cloud handling (Data Pipeline again) can improve quality of service where it's difficult to tune workflows to cloud conditions.
Where the cloud is going to be on the front end and feed a large number of transactions into the data center, cloud-based real-time flow support (provided by vendors such as Amazon Kinesis) can be coupled to front-end processes to aggregate transactions and perform some basic pre-edit and processing. Using these tools may require architects to rethink application design around the Web services of their cloud provider.
Application performance has to meet operational standards or business activity itself can be threatened. Elastic resource pools will always generate some performance variability risk, but with proper attention to these issues and remedies, an architect can keep applications running within the QoE range set by business requirements.
Tibco rolls Web services into workflow