Big data


 "Big data" (Big data) research organization Gartner gave such a definition. "Big data" requires new processing models to have stronger decision-making power, insight and discovery, and process optimization capabilities to adapt to the massive, high growth rate and diversified information assets.

The definition given by the McKinsey Global Institute is: a large-scale data collection that greatly exceeds the capabilities of traditional database software tools in terms of acquisition, storage, management, and analysis. It has massive data scale, fast data flow, and diverse Four characteristics of low data type and low value density. [3]
The strategic significance of big data technology is not to master huge data information, but to professionally process these meaningful data. In other words, if big data is compared to an industry, then the key to profitability of this industry lies in improving the "processing capability" of data and achieving "value-added" of data through "processing". [4]
From a technical point of view, the relationship between big data and cloud computing is as inseparable as the front and back of a coin. Big data cannot be processed by a single computer, and a distributed architecture must be adopted. Its characteristic lies in the distributed data mining of massive data. But it must rely on the distributed processing of cloud computing, distributed database and cloud storage, and virtualization technology. [1] 
With the advent of the cloud era, big data has also attracted more and more attention. The analyst team believes that Big data is usually used to describe a large amount of unstructured data and semi-structured data created by a company. These data will spend too much time and money when downloaded to a relational database for analysis. Big data analysis is often associated with cloud computing, because real-time analysis of large data sets requires a framework like MapReduce to distribute work to tens, hundreds, or even thousands of computers.

Big data requires special technology to effectively process a large amount of data within a tolerable elapsed time. Technologies applicable to big data, including massively parallel processing (MPP) databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems