Gordon E. Moore, co-founder of Intel, noticed that over the course of tech history, compute power doubled every two years. Later coined as “Moore’s law,” his observation was meant to predict the general upward trend of processor speeds. Over the past few years, however, new content outlets — social media, in particular — have caused an explosion of unstructured data, a phenomenon that has bypassed Moore’s law by a landslide.
There is no single super-tool to tackle the growing generation of big data, according to Ben Butler, senior solutions marketing manager of big data at AWS. In his session at the AWS Summit in San Francisco, Butler advocated, instead, for a network of solutions — AWS solutions, to be specific — that leveraged the flexibility, capacity and cost effectiveness of the cloud.
Last week, Butler hosted another session at an AWS Summit in New York. His talk drilled down AWS solutions a bit further, offering specific use cases from different industries.
Big data has been used for fraud detection, click stream analysis and ad targeting, to name a few. One of the more exciting use cases is gene sequencing. This analysis of genetic variation can be used for disease research, personalized medicine and molecular testing. It is, in short, a tool that contributes to our understanding of disease and could be instrumental to the evolution of healthcare.
The sudden influx of big data has put pressure on on-premises systems that used to store, analyze and share data without much trouble, just a few years ago.
“DNA sequencing is scaling faster than Moore’s Law, so processing the sequence data is an increasingly significant barrier,” said Alex Dickinson, VP of strategic initiatives at Illumina, a genetic research company. Dickinson confirmed that the best solution for this processing bottleneck was cloud computing.
All of Illumina’s raw data streams from its sequencing instruments, over the Internet, to AWS, Dickinson explained. “There the data undergoes intensive processing to assemble final genomes from that raw data. It is then stored on AWS and made available to researchers for further analysis.” In other words, most of the big data lifecycle is processed on AWS.
Dickinson cited three reasons for selecting Amazon over other cloud providers. One, AWS has large instances that can handle big loads of raw data. Two, AWS has sites all over the globe. Three, AWS has competitive pricing.
Whether big data researchers choose AWS or not, the cloud is certainly the next frontier for processing massive datasets. In Illumina’s case, it is removing computational constraints and, by extension, generating more opportunities for scientific insight. As Dickinson put it, “the cloud enables raw instrument data to be transformed into disruptive healthcare discoveries.”