This content is part of the Essential Guide: How to solve your TMI problem: Data science analytics to the rescue

Data currency demands differing storage strategies

Storage strategies should handle data differently, depending on its age, a phenomenon known as data currency. Know when your data transitions from new and hot to old and cold.

The need to access data diminishes over time. This spectrum of data currency, from new and "hot" to old and "cold," is best served by a combination of different storage practices and technologies.

Speaking to a packed conference room at the Cloud Computing Expo in New York, Scott Cleland, senior director of product marketing at Western Digital's HGST business unit, said organizations should consider data currency when devising an enterprise storage strategy.

As an example, Cleland discussed an approach that encompasses a consolidated big data platform for hot-to-warm files up to five years old and a so-called active archive for older cool-to-cold files. An active archive should be thought of as a store of data that is too valuable to discard, but which is accessed only occasionally. Traditionally, files more than a decade old were often relegated to offline tape storage, a technology that today garners little mind share, but which remains widely used.

A complicating yet crucial factor to managing data currency, one that Cleland said is often overlooked, is that data almost always exists in multiple copies.

"We are always working with multiple copies of data, even though we often don't realize it," Cleland said. "We have copies for backup and disaster recovery, copies used by different corporate departments, and more copies around the globe to ensure fast response times from any user location."

These multiple copies not only strain storage budgets, but they also add complexity to the process of analysis, he said.

Developers need to understand that it is first important to get all your data in one place.
Scott Clelandsenior director of product marketing, Western Digital's HGST

Though the challenge of keeping multiple copies of gigantic databases and other files fully synchronized was beyond the scope of Cleland's discussion, he cited areas where data duplication can evolve unintentionally. These include analysis clusters, data warehouses, silos that arise from line-of-business departmental use, disaster-recovery snapshots, and overlaps in public- and private-cloud application implementations.

The reason data currency is growing increasingly important is because the rate at which we now amass data is unprecedented, Cleland said. "Data continues to be created at an exponential rate and total capacity of storage hardware being shipped is not able to keep up." Cleland cited data from researcher IDC that forecast by 2020 the total amount of data that is created and replicated annually will surpass 40 zettabytes while so-called capacity shipments will fall shy of 10 zettabytes.

Speaking directly to developers of cloud and mobile applications, Cleland said that storage hardware is only as good as the software tools that surround it. "Developers need to understand that it is first important to get all your data in one place then do a deep analytic run on that data."

The move to real-time streaming analytics is forcing changes to the way we think about storage and is one driver behind the push toward object storage, Cleland said.

Next Steps

What is the role of data redundancy?

Tips for integrating on-premises and cloud storage

Is flash memory the future of archival storage?

Dig Deeper on Topics Archive