Big data is an evolving term that describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information.
Google Cloud Dataproc is a managed service within the Google Cloud Platform for processing large datasets, such as those used in big data initiatives. Dataproc is built on open source platforms including Apache Hadoop, Spark and Pig. The service is primarily used by data scientists, business decision-makers and researchers.
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
MapReduce is a core component of the Apache Hadoop software framework.
Microsoft Azure Data Lake is a highly scalable public cloud service that allows developers, scientists, business professionals and other Microsoft customers to gain insight from large, complex data sets.