This content is part of the Essential Guide: An enterprise guide to big data in cloud computing
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Find your cloud computing and big data recipe for success

Supporting big data and high-performance computing in the cloud introduces a number of challenges. Fortunately, larger VM instances and GPUs offer relief.

Cloud users are beginning to embrace big data to solve operational problems for high-performance computing and other narrow market segments. However, since cloud computing environments have become so diverse, users need to be careful about directing big data and high-performance computing tasks to the right cloud environment.

big data processing requires larger instances, which give virtual machines access to more DRAM, more compute cycles and better disk I/O. Large instances have evolved over the last couple of years to include local instance storage. This type of disk storage is persistent during the life of the instance, and provides higher performance I/O than standard networked storage. With SSDs as alternatives to slower hard drive instance storage, large VM performance has increased.

All the major cloud providers, including Amazon Web Services, offer large instances, which is creating an IT arms race. Large instances have also become larger to fill complete cores and multicore configurations. They can also support in-memory database operations and drive low and midsize Hadoop tasks.

Still, cloud computing and big data pose various scalability and performance challenges -- and one way to address them is including GPU processing in the instance. These large instances with GPUs are already being used in the video production and big data markets. Adobe has adopted this approach by trading expensive workstations for an editing service suite that runs in the cloud. The leader in GPU design, Nvidia, is building clustered, powerful instances that can focus on major-scale issues, such as scientific computing.

Here's a breakdown of different large instance approaches, mapped to their most common use cases:

  • Low- to mid-tier big data analytics: Large CPU instances with HDD or SSD local storage
  • High-end analytics: GPU instances with SSD local storage
  • Scientific computing: Large instances or GPU instances, depending on the application available
  • Oil and gas/geology: Application-dependent, but GPU instances, if possible
  • Weather prediction: Application-dependent, but GPU instances, if possible
  • Video editing: Application-dependent, but GPU instances, if possible. Consider software as a service (SaaS) offerings, such as Adobe, as an alternative.
  • High-end computer gaming: Usually SaaS, but large instances also fit the bill well

The future of big data, HPC in cloud

Even though cloud is widely embraced in the enterprise, high-performance computing (HPC) environments have been slow to adopt cloud. This is mostly because of the fine-tuning of many HPC applications and the wide variety of operating systems that are deployed.

The good news is that HPC progress is rapid, with very clear benefits to users. For example, orchestration allows supercomputer power to be fragmented into small chunks that are more cost-effective for researchers with limited budgets.

In the near future, we will see server architectures that boost memory performance, such as the Hybrid Memory Cube, which aims for terabyte-per-second bandwidth. Instance storage is expected to evolve rapidly, as well. Intel and Micron launched "cross point" memory that brings non-volatility onto DRAM sub-systems at much higher speeds than flash. These technologies hint at the likely evolution of large instances in the next few years.

As another example, the healthcare market will likely move towards a cloud-based approach for automated image analysis. Of course, the cloud instance used will be determined by available software. But GPUs appear to be a good way to accelerate image analysis, which should include a large element of parallel processing.

From real-time advertising to faster clinical research, large instances will impact our daily lives. This is the leading edge of IT, and there are still many architectures and techniques to discover. In the meantime, the evolution of raw horsepower will rapidly continue.

About the author
Jim O'Reilly was vice president of engineering at Germane Systems, where he created ruggedized servers and storage for the U.S. submarine fleet. 

Next Steps

Choosing the right cloud big data platform for you

How Hadoop in the cloud impacts big data analysis

On-server GPU additions can help with big data

How much do you know about big data in the cloud?

Dig Deeper on Big data, machine learning and AI