This chapter was focused on comparing some common data models with each other. Common data models include (not limited to): Relational, document, graph-based.
Evolution of data models
Hierarchical and network models
In the early years, network and hierarchical models were the main approaches for data storage and querying.
Hierarchical model
In 1970s, the most popular database was IBMβs Information Management System (IMD). It used a simple hierarchical modelβvery similar to JSON model (later used by document databases).
- It worked really well for one-to-many relationships; but many-to-many relationships difficult and didnβt support joins
- Problems of β60s and β70s were very similar to problems devs faced with using document databases
Deep-dive into hierarchical model
Store βββ Electronics βββ Smartphone β βββ Brand: "Apple" β βββ Price: 999 βββ Laptop βββ Brand: "Dell" βββ Price: 1200 βββ Clothing βββ Jacket β βββ Brand: "North Face" β βββ Color: "Red" βββ T-shirt βββ Brand: "Nike" βββ Size: "Large"It is organized in a tree structure with parent-child relationships.
- Structure of each item is rigid
- Querying for particular attribute requires traversing through the tree
- To access price β
Store β Electronics β Smartphone β Price- Very node must have exactly one parent (except the root node)
- Records are linked with help of pointers
There various solutions proposed to solve the limitations of the hierarchical model. The two prominent ones were (1) relational model (2) network model
Network model
This is a generalization of the hierarchical model.
Deep-dive into network model
Categories: βββ Electronics βββ Clothing βββ Special Offers Products: βββ Smartphone βββ Jacket βββ T-shirt βββ Laptop Attributes: βββ Brand: "Apple" βββ Brand: "Nike" βββ Color: "Red" βββ Size: "Large" βββ Price: 999 βββ Price: 1200 Relationships (Network): βββ Electronics β Smartphone β Brand: "Apple" β Price: 999 βββ Clothing β Jacket β Brand: "North Face" β Color: "Red" βββ Clothing β T-shirt β Brand: "Nike" β Size: "Large" βββ Electronics β Laptop β Brand: "Dell" β Price: 1200 βββ Special Offers β Smartphone βββ Special Offers β JacketThe links between records are pointers. Similar to hierarchical model, accessing a record requires traversing from root to the target record. This is called an access pattern.
For both hierarchical and network models, any change in the data model (such as adding a new access path) required updates to the access path, which meant modifying the application code.
SQL and NoSQL models
Relational model β data is organized into relations (tables), where each relation is an unordered collection of tuples (rows).
It included a new way of querying data: SQL is a declarative query language, whereas IMS queried using imperative code.
When SQL was defined, it was very similar to relational algebra:
select * from animals where family = 'Sharks';| Declarative query | Imperative query |
|---|---|
| Specify the ops to perform and its order | Specify the pattern, conditions and transformation; but not how to achieve the goal. DB systemβs query optimizer decides which indexes and join methods to use, and in which order to execute various parts of the query. |
Driving forces for NoSQL Databases
- A need for greater scalability than relational databases can easily achieve, including very large datasets or very high write throughput
- A widespread preference for free and open source software over commercial database products
- Specialized query operations that are not well supported by the relational model
- Frustration with the restrictiveness of relational schemas, and a desire for a more dynamic and expressive data model
- Not Only SQL
For a data structure that is self-contained, a JSON representation can be quite appropriate. Doc-oriented databases like MongoDB, CouchDB support this model.
The JSON representation has better locality than the multi-table schema.
Several differences when comparing relational databases to document databases - fault-tolerance properties and handling of concurrency.
- Doc model - schema flexibility, better performance due to locality
- Relational model - support for joins, many-to-one and many-to-many relationships
Schema flexibility in the document model
Document databases are called schemaless, which is misleading. More accurate term is schema-on-read.
Schema-on-write - traditional approach of relational databases.
Schema-on-read is similar to dynamic type checking in programming languages (vs) Schema-on-write is similar to static type checking.
Data locality for queries β Storage locality
- storing the entire model in a single document is useful if the entire model is required (for example, to render a page).
- But reading small portion of it, can waste memory as the entire document is read into memory
- The idea of grouping related data together for locality is not limited to the document model
- Column-family concept in the Bigtable data model; this is used in Cassandra and HBase, has a similar purpose of managing locality.
Quick summary
- Data was represented as hierarchical model which wasnβt good for representing many-to-many relationships.
- Relational model was invented to solve that problem. However, it doesnβt fit well in all the model structures.
- NoSQL data stores were introduced; diverged into two directions:
- Document databases β self-contained documents and relationships are rare
- Graph databases β opposite of document; targeting use cases where everything is related to everything
Similarity b/w graph & document β typically donβt enforce a schema; question of whether the schema is explicit (enforced on write) or implicit (handled on read)
Different models has its own query language β SQL, MapReduce, MongoDB, Cypher, SPARQL, and Datalog.