大数据
对于“大数据”(Big data)研究机构Gartner给出了这样的定义。“大数据”是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力来适应海量、高增长率和多样化的信息资产。
大数据需要特殊的技术,以有效地处理大量的容忍经过时间内的数据。适用于大数据的技术,包括大规模并行处理(MPP)数据库、数据挖掘、分布式文件系统、分布式数据库、云计算平台、互联网和可扩展的存储系统
Big data
"Big data" (Big data) research organization Gartner gave such a definition. "Big data" requires new processing models to have stronger decision-making power, insight and discovery, and process optimization capabilities to adapt to the massive, high growth rate and diversified information assets.
Big data requires special technology to effectively process a large amount of data within a tolerable elapsed time. Technologies applicable to big data, including massively parallel processing (MPP) databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems
The strategic significance of big data technology is not to master huge data information, but to professionally process these meaningful data. In other words, if big data is compared to an industry, then the key to profitability of this industry lies in improving the "processing capability" of data and achieving "value-added" of data through "processing". [4]
From a technical point of view, the relationship between big data and cloud computing is as inseparable as the front and back of a coin. Big data cannot be processed by a single computer, and a distributed architecture must be adopted. Its characteristic lies in the distributed data mining of massive data. But it must rely on the distributed processing of cloud computing, distributed database and cloud storage, and virtualization technology. [1]
With the advent of the cloud era, big data has also attracted more and more attention. The analyst team believes that Big data is usually used to describe a large amount of unstructured data and semi-structured data created by a company. These data will spend too much time and money when downloaded to a relational database for analysis. Big data analysis is often associated with cloud computing, because real-time analysis of large data sets requires a framework like MapReduce to distribute work to tens, hundreds, or even thousands of computers.