This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
4. - Terms related to data governance and stewardship: Read more in this section
Explore other sections in this guide:
Big data (also spelled Big Data) is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates -- data that would take too much time and cost too much money to load into a relational database for analysis. Although Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data.
A primary goal for looking at big data is to discover repeatable business patterns. It’s generally accepted that unstructured data, most of it located in text files, accounts for at least 80% of an organization’s data. If left unmanaged, the sheer volume of unstructured data that’s generated each year within an enterprise can be costly in terms of storage. Unmanaged data can also pose a liability if information cannot be located in the event of a compliance audit or lawsuit.
Big data analytics is often associated with cloud computing because the analysis of large data sets in real-time requires a framework like MapReduce to distribute the work among tens, hundreds or even thousands of computers.