Big data is an evolving term that describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information.
Google Cloud Dataproc is a managed service within the Google Cloud Platform for processing large datasets, such as those used in big data initiatives. Dataproc is built on open source platforms including Apache Hadoop, Spark and Pig. The service is primarily used by data scientists, business decision-makers and researchers.
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.
Microsoft Azure Data Lake is a highly scalable data storage and analytics service hosted in Azure, Microsoft's public cloud. The service is largely intended for big data storage and analysis.