大数据

对于“大数据”(Big data)研究机构Gartner给出了这样的定义。“大数据”是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力来适应海量、高增长率和多样化的信息资产。

麦肯锡全球研究所给出的定义是:一种规模大到在获取、存储、管理、分析方面大大超出了传统数据库软件工具能力范围的数据集合,具有海量的数据规模、快速的数据流转、多样的数据类型和价值密度低四大特征。 
大数据技术的战略意义不在于掌握庞大的数据信息,而在于对这些含有意义的数据进行专业化处理。换而言之,如果把大数据比作一种产业,那么这种产业实现盈利的关键,在于提高对数据的“加工能力”,通过“加工”实现数据的“增值”。 
从技术上看,大数据与云计算的关系就像一枚硬币的正反面一样密不可分。大数据必然无法用单台的计算机进行处理,必须采用分布式架构。它的特色在于对海量数据进行分布式数据挖掘。但它必须依托云计算的分布式处理、分布式数据库和云存储、虚拟化技术。 
随着云时代的来临,大数据(Big data)也吸引了越来越多的关注。分析师团队认为,大数据(Big data)通常用来形容一个公司创造的大量非结构化数据和半结构化数据,这些数据在下载到关系型数据库用于分析时会花费过多时间和金钱。大数据分析常和云计算联系到一起,因为实时的大型数据集分析需要像MapReduce一样的框架来向数十、数百或甚至数千的电脑分配工作。

大数据需要特殊的技术,以有效地处理大量的容忍经过时间内的数据。适用于大数据的技术,包括大规模并行处理(MPP)数据库、数据挖掘、分布式文件系统、分布式数据库、云计算平台、互联网和可扩展的存储系统


Big data


 "Big data" (Big data) research organization Gartner gave such a definition. "Big data" requires new processing models to have stronger decision-making power, insight and discovery, and process optimization capabilities to adapt to the massive, high growth rate and diversified information assets.

The definition given by the McKinsey Global Institute is: a large-scale data collection that greatly exceeds the capabilities of traditional database software tools in terms of acquisition, storage, management, and analysis. It has massive data scale, fast data flow, and diverse Four characteristics of low data type and low value density. [3]
The strategic significance of big data technology is not to master huge data information, but to professionally process these meaningful data. In other words, if big data is compared to an industry, then the key to profitability of this industry lies in improving the "processing capability" of data and achieving "value-added" of data through "processing". [4]
From a technical point of view, the relationship between big data and cloud computing is as inseparable as the front and back of a coin. Big data cannot be processed by a single computer, and a distributed architecture must be adopted. Its characteristic lies in the distributed data mining of massive data. But it must rely on the distributed processing of cloud computing, distributed database and cloud storage, and virtualization technology. [1] 
With the advent of the cloud era, big data has also attracted more and more attention. The analyst team believes that Big data is usually used to describe a large amount of unstructured data and semi-structured data created by a company. These data will spend too much time and money when downloaded to a relational database for analysis. Big data analysis is often associated with cloud computing, because real-time analysis of large data sets requires a framework like MapReduce to distribute work to tens, hundreds, or even thousands of computers.

Big data requires special technology to effectively process a large amount of data within a tolerable elapsed time. Technologies applicable to big data, including massively parallel processing (MPP) databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems