The system stores and distributes structured, semi-structured, and unstructured data across multiple servers. ECL uses an "apply schema on read" method to infer the structure of stored data when it is queried, instead of when it is stored. Quantcast File System was available about the same time.
Systems up until 2008 were 100% structured relational data.
Since then, Teradata has added unstructured data types including XML, JSON, and Avro. (now Lexis Nexis Group) developed a C -based distributed file-sharing framework for data storage and query. The two platforms were merged into HPCC (or High-Performance Computing Cluster) Systems and in 2011, HPCC was open-sourced under the Apache v2.0 License.
Data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information.
For example, to manage a factory one must consider both visible and invisible issues with various components.
In 2004, Google published a paper on a process called Map Reduce that uses a similar architecture.
The Map Reduce concept provides a parallel processing model, and an associated implementation was released to process huge amounts of data.This type of architecture inserts data into a parallel DBMS, which implements the use of Map Reduce and Hadoop frameworks.This type of framework looks to make the processing power transparent to the end user by using a front-end application server.This helps people who analyze it to effectively use the resulting insight.Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.preferring direct-attached storage (DAS) in its various forms from solid state drive (SSD) to high capacity SATA disk buried inside parallel processing nodes.