Quote
On the most fundamental level, a database needs to do two things: when you give it some data, it should store the data, and when you ask it again later, it should give the data back to you.
This chapter focuses two families of storage engines: log-structured engines, and page-oriented engines.
Data Structures That Power Your Database
One of the simplest data structure for a database is a log. It is an append-only sequence of records. This can be implemented in few lines of code:
#!/bin/bash
db_set(){
echo "$1, $2" >> database
}
db_get(){
grep "^$1," database | sed "\^$1,\\" | tail -n 1
}
SET- appends the key-value pair to the data file. This operation is quite efficient- File system stores metadata of a file which includes size and pointer to end of the file (EOF)
- Therefore, the time complexity of appending to a file would be
O(1)- Practical considerations:
- Extremely large files - If it is stored in a fragmented file system, locating EOF might have additional overhead
- If multiple processes are appending to the file simultaneously, the file system may introduce sync overhead (e.g. locking)
- $ But theoretical time complexity remains
O(1)
- Practical considerations:
GET- Reads through each line in data file to find the key- Time complexity of this operation is
O(1)as it requires sequential scanning of each line in the data file to find a match for the key - ! As the data file grows, this is quite inefficient.
- Time complexity of this operation is
To improve the read performance, a different data structure called an index is used.
An Index is an additional structure derived from the primary data. This structure helps to locate a specific data.
- Indexes doesn’t affect the contents of the databases; It only affects the performance of queries
- This additional structure adds overhead to write operation; fastest is simply appending to a file
- ~ Trade-off: Well-chosen indexes speed up read queries, but every index slows down writes.
Hash Indexes
Key-value data is one of the commonly used type of data in applications. This is implemented as a hash map (hash table). The simplest possible indexing strategy for the above log structure is this:
- Use an in-memory hash map where every key is mapped to a byte offset
- Write: Appending a new kv pair also updates the hash map to reflect the offset
- Read: Hash map is used to find the offset in teh data file, seek to that location, and read the value
This is used in Bitcask, the default storage engine in Riak. Given this approach, it’s possible for the log file to reach the max disk size. And the solution:
- Close a segment file when it reaches a max size and make subsequent writes to a new segment
- $ Compaction: Throws duplicate keys in the log and retain only the recent update for each key
- As compaction reduces the size of segments, several segments are often merged together when performing compaction
- Compaction + merging - in background thread
- Read and write requests using the old segment files
- Each segment file has its own hash table, while merging we first check the most recent segment’s hash map; if it’s not present, we check the second-most-recent segment and so on
- Merging is completed - switch read requests to new merged segment
- Old segment files are deleted
- Compaction + merging - in background thread
This is a high-level of hash indexes and the following are some issues in real implementation:
- File format - often binary format is used
- Deleting records - tombstone is a special record to indicate deletion of a record.
- This is used in merging process to discard any previous values of the key
- Crash recovery
- Partially written records
- Concurrency control
So why even use append-only log?
- Appends and merges are sequential write ops, generally faster than random writes
- Preferable on flash-based SSDs
- Concurrency & crash recovery are simpler
There are few limitations for hash table index:
- Hash table must fit in memory, so it wouldn’t work with large number of keys
- Range queries are not efficient; not easy to scan over all keys
Appendix
Bash notes
grep(global regex) - cmd utility to search plaintext using regexsed(stream editor) - cmd utility to perform basic text transformations
-eis the script or cmd to execute