Storage and Retrieval

Quote

On the most fundamental level, a database needs to do two things: when you give it some data, it should store the data, and when you ask it again later, it should give the data back to you.

This chapter focuses two families of storage engines: log-structured engines, and page-oriented engines.

Data Structures That Power Your Database

One of the simplest data structure for a database is a log. It is an append-only sequence of records. This can be implemented in few lines of code:

#!/bin/bash
 
db_set(){
	echo "$1, $2" >> database
}
 
db_get(){
	grep "^$1," database | sed "\^$1,\\" | tail -n 1
}

SET - appends the key-value pair to the data file. This operation is quite efficient
- File system stores metadata of a file which includes size and pointer to end of the file (EOF)
- Therefore, the time complexity of appending to a file would be O(1)
  - Practical considerations:
    - Extremely large files - If it is stored in a fragmented file system, locating EOF might have additional overhead
    - If multiple processes are appending to the file simultaneously, the file system may introduce sync overhead (e.g. locking)
  - $ But theoretical time complexity remains O(1)
GET - Reads through each line in data file to find the key
- Time complexity of this operation is O(1) as it requires sequential scanning of each line in the data file to find a match for the key
- ! As the data file grows, this is quite inefficient.

To improve the read performance, a different data structure called an index is used.

An Index is an additional structure derived from the primary data. This structure helps to locate a specific data.

Indexes doesn’t affect the contents of the databases; It only affects the performance of queries
This additional structure adds overhead to write operation; fastest is simply appending to a file
~ Trade-off: Well-chosen indexes speed up read queries, but every index slows down writes.

Hash Indexes

Key-value data is one of the commonly used type of data in applications. This is implemented as a hash map (hash table). The simplest possible indexing strategy for the above log structure is this:

Use an in-memory hash map where every key is mapped to a byte offset
Write: Appending a new kv pair also updates the hash map to reflect the offset
Read: Hash map is used to find the offset in teh data file, seek to that location, and read the value

This is used in Bitcask, the default storage engine in Riak. Given this approach, it’s possible for the log file to reach the max disk size. And the solution:

Close a segment file when it reaches a max size and make subsequent writes to a new segment
$ Compaction: Throws duplicate keys in the log and retain only the recent update for each key
As compaction reduces the size of segments, several segments are often merged together when performing compaction
1. Compaction + merging - in background thread
  - Read and write requests using the old segment files
  - Each segment file has its own hash table, while merging we first check the most recent segment’s hash map; if it’s not present, we check the second-most-recent segment and so on
2. Merging is completed - switch read requests to new merged segment
3. Old segment files are deleted

This is a high-level of hash indexes and the following are some issues in real implementation:

File format - often binary format is used
Deleting records - tombstone is a special record to indicate deletion of a record.
- This is used in merging process to discard any previous values of the key
Crash recovery
Partially written records
Concurrency control

So why even use append-only log?

Appends and merges are sequential write ops, generally faster than random writes
- Preferable on flash-based SSDs
Concurrency & crash recovery are simpler

There are few limitations for hash table index:

Hash table must fit in memory, so it wouldn’t work with large number of keys
Range queries are not efficient; not easy to scan over all keys

Appendix

Bash notes

grep (global regex) - cmd utility to search plaintext using regex

sed (stream editor) - cmd utility to perform basic text transformations

-e is the script or cmd to execute

The Commonplace Book

Explorer

Storage and Retrieval

Data Structures That Power Your Database

Hash Indexes

Appendix

Graph View

Table of Contents

Backlinks