Oleksiy Mark - Fotolia


Achieve better performance with cloud microservices and apps

Cloud microservices can be powerful tools, but nothing is perfect. Users have reported performance issues, so how do you fix them? Tom Nolle has the answers.

The cloud and microservices have proven to be incredibly powerful tools for structuring optimal IT responsiveness to business conditions, but some users have also reported unexpected major performance problems. To ensure your use of the cloud for microservices or applications works as expected, know the major sources of performance issues, design and host applications and workflows to reduce your risk, and know what to monitor to spot problems before they become serious.

Cloud performance problems arise from three sources. First, the cloud hosting platform itself may be inadequate for the application, either because it's under-configured or because it's shared with other users and overloaded. Second, the network connections used to pass work among components or to address microservices can be generating delay. Finally, the data resources that a component requires may not be located where access is easy and efficient, and so users of the component are slowed down when data is read or written. While none of these sources of performance issues may seem significant, they can add up to a major problem.

The reason is that delay accumulates. A cloud application that uses multiple microservices is essentially a chain of processing points. The user's perceived response time and quality of experience are set by the sum of the delays encountered along this process chain. In some cases, a single user transaction can spawn dozens of separate activities, all of which have to be complete for the user to receive a response. This is especially true for database access in any form, because a component might access a database hundreds of times to support a single request. If there's a problem with that, the QoE will quickly decline to far below acceptable levels.

As is often the case with componentization and cloud hosting, controlling the three factors that can influence performance starts with an understanding of the workflow. Applications should be designed to avoid network-connecting intercomponent paths that will carry a lot of traffic. If two or more microservices are used sequentially in high-traffic workflows, consider combining them to ensure efficient connection of their work.

This kind of design decision can also work wonders in assuring efficient cloud hosting. Cloud microservices that are a single application element should be scaled horizontally and replaced as a unit, and  should make sure that combining the logic will prevent them from becoming separated by long network "distances" if something fails or multiple instances for load balancing are created.

Deciding on cloud microservices

Pay particular attention to any workflows that cross a cloud boundary, between public clouds, a public cloud provider or your own data center. These boundary connections will usually have lower network speed and more delay, particularly if the public internet is used to make the connection. Databases used in the cloud must often be stored in the data center for cost and compliance reasons. This means that the applications should be designed to use databases for query transactions, not to read individual records. It may be wise to colocate the update component with the database being updated.

Workflow analysis will also help you identify the cloud microservices that are in the main data path used most often. You may want to consider dedicated hosting and special infrastructure as a service hosting configurations to insure that the performance of these components is the best it can be. Components that are used less can be hosted via basic cloud services. Also consider using any hosting location controls your cloud provider offers to locate the back-end components physically close to the point of connection to your data center, if you're using data center resources, including databases, for part of the workflow.

There are some general rules in networking that should be observed in your application deployments. First, packet loss and delay tend to increase with distance, so avoid having key workflows transit large distances either between components in the cloud or between users and the cloud front-end application components and services. Second, most applications will adapt to network delay more gracefully than to packet loss, so if you can obtain an service-level agreement from a network or cloud provider for network connectivity, try to minimize packet loss in the SLA.

It's important to structure your cloud applications to optimize performance. However, it's just as important to know how to determine that your optimizing measures aren't working and have a remedy in mind. Network connections within the cloud will be out of your control, but those between cloud providers or between users or data centers and the cloud may be something you contract separately. Sometimes just making sure that your users access the internet through the best ISP makes a difference, and sometimes you can provide for backup connectivity or even multiple network paths. In all these cases, you'll need to get good data on application performance.

Good data begins with response time, which means measuring end-to-end delay and packet loss for the entire workflow. This information can sometimes be obtained from standard software at the client end, but if that's not the case, then your client applications should be written to timestamp inquiries and responses to measure it. Diagnostic processes will always start with identifying an end-to-end response time or packet loss problem.

Determining where the problem is occurring can be tricky, but if you can have work elements timestamped by everything that handles them, you can quickly see where additional packet delay is accumulating. If this isn't possible, then recognize that an overloaded component or network connection will back up traffic, and that will lead to congestion in the earlier components in the flow. Similarly, traffic beyond a congestion point will fall because less work is getting through, so simply looking at the throughput of each component can let you localize a problem.

Any distributed-component model for applications is more performance-sensitive than a monolithic application, but with care, you can preserve the benefits of distributed cloud microservices by controlling the performance issues and build better apps in the long run.

Next Steps

Microservices and RESTful APIs have complicated testing efforts

How microservice architecture fits into the cloud?

Developers, know the the pros and cons of microservices

Dig Deeper on Cloud application monitoring and performance