Context
These are just my notes from one of the lectures in Martin Kleppmann’s Distributed Systems series. If you’re interested, the full lecture playlist is here.
1. Physical clocks - count number of seconds elapsed
- Quartz based clocks
- 1 second is measured by resonating a quartz crystal at a particular frequency.
- A quartz crystal is used after rigorous testing to operate accurately in room temperature.
- Furthermore, imperfections in the crystal cause additional drift in the measurement.
- Atomic clocks - these uses an isotope of Caesium and is very precise in measurement. However these clocks are quite heavy and expensive. Therefore, quartz based clocks are generally used in modern computer systems.
- Greenwich mean time (GMT) was initially based on the sunrise and sunset. However this is quite inaccurate and hard to maintain consistency across the globe.
- This led to, UTC - Coordinated universal time, which uses atomic clock to measure seconds elapsed since 1 Jan, 1970.
- For practical reasons, we use UTC with adjustments to account for curvature of the earth.
- This is particularly important in distributed systems as we need to measure a variety of stuff such as ordering of events, scheduling cron jobs. Therefore, systems often a combination of quartz based clocks and auto-corrects for drifts by synchronising it’s clock with a network clock using Network Time Protocol (NTP).
-
How does it work?
Let’s say:
- A (sends at ) → B (receives at )
- B (sends at ) → A (receives at )
Because there is no central source of truth to measure the network latency, we calculate the total network latency as:
In essence, we are calculating latency accounting for the difference in each clock’s state. and are measured by A, and are measured by B.
- is network latency from A’s perspective
- is the time taken by B to respond
Assuming network latency is symmetrical, the time at which A would have received a message based on B’s time would be:
Node A synchronises it’s time with B by calculating the difference between the received at time of A & B:
-
2. Causality and happens-before
- Happens-before
- Given we can sync clocks across system with manageable error margin, we can understand a few basic order of events.
- Event A happened before Event B, if:
- Both events happened in the same node and A happened before B
- If event A ??
- Happens-before is signified using → operator
- We can say, if A → B then T(A) < T(B), but not vice versa
- But it’s hard to predict happens-before relationship when there are network delays. As A sends a message to B and C, but both B & C might not receive the message in the same time. If B sends a message to A and C, second message by B could reach C before the first message.
- This requires us to agree to a new set of clocks which are logical:
- Lamport clock
- Vector clock