Home Posts Post Search Tag Search

Thinking Distributed Systems 02 - System models, order, and time
Published on: 2026-01-20 Tags: Think Distributed Systems, Models, Systems, Asynchronous, synchronous, clocks

2 System models, order, and time (23)

For the next chapter we will need to think about different models and systems. Synchronous Asynchronous, Physical time vs Logical time. Within a Synchronous system we can know if a message was sent and received but within an Asynchronous system we may never know as we don’t know much about when or if it was sent.


2.1 System models (23)

We need to talk about the assumptions that we make within any given model that we use. Here is a basic version of this but it will not be inclusive for all assumptions and even shed to much light on the specific model we are using.


One system model | An other system model
Components may not fail | Components may fail
Messages may not get lost | Messages may get lost
Clocks are perfectly synched | Clocks are not synced

You must alway keep in mind that you can model something within one framework that will not work within an other. Know what your systems is and then base the model off of that.

2.1.1 Theory and practice (24)

Theoretical Models - are used to discuss the impossibilities. Bounds and number of components
Practical Models - are used to discuss possibilities. What might be needed to implement even cost can come into this.


2.1.2 Synchronous distributed systems (25)

Synchronous distributed systems - are systems that components have access to a clock that is perfectly synchronized. There might be some upper and lower bounds for timings but in the end all components will operate within some time constraint.


2.1.3 Asynchronous distributed systems (25)

Asynchronous will have some key differences:
• Models will have no notion of time
• Models will have a weak notion of time.


No Notion of Time
One of the most important parts of this is the there is no time-outs for events. As every component does not have a way to determine if an event took “too long”


Weak Notion of Time
There is the use of timeouts but there might only be an arbitrary delay.


2.1.4 Partially synchronous systems (26)

Keep in mind that no model or system is entirely within one of the models/systems. This is an other example of what scale you are looking at to determine synchronous vs asynchronous. You home internet might be an example here you will normally have it meet all expectations but there will be times where there is timeouts.


2.1.5 Component and network behavior (27)

Components and networks are frequently characterized by there failure behavior.


Component Failure - failures within a component
crash-stop - This is when a component stops at an arbitrary moment and ceases to exist.
omission failure - This is when a component takes a break. It will start and stop at the time of the break.
crash-recovery - This is an other crash but in this case it might lose state during the crash.
byzantine failure - This is whn a components acts out of its normal bounds. Ie anything could happen.


network failure - system differs from the normal system actions

message reordering - Messages sent or received out of order.
message duplication - Sent or receive a message more than once.
message lose - Message sent and received on the network but not sent to the other component.


2.1.6 Realistic system models (30)

With all this in mind you will start to better understand the model that you should be using while understanding that no model will be able to accomplish everything that it could be trying to convey.


2.2 Order and time (30)

collaboration and coordination refer to management of dependencies between steps of components. Within this we will need to figure out a way to determine what is the correct order for a set of actions.


Assume that you have 2 proposers P1 P2 and 2 acceptors A1 and A2
• A1 = A2 = 0
• P1 or P2 will broadcast (+2) or (*2)


So if both broadcast (+2) then the order doesn’t matter as it will always be the same but a different broadcast (x2) and (+2) will matter. We could get values of A1 = 2 (x then +) and A2 = 4 (+ then x)


If we wanted to have them always be the same we might introduce a C (coordinator) that will not only take in the P1 and P2 but order then so that when we send out the requests A1 and A2 can see the value of the request and then wait for the initial request if they happen to get request 2 before request 1. This is a way to ensure that you get the right order no matter what. Keep in mind that C will have to have an internal state of when requests come in so it can keep it’s, let’s call it index, updated after each request.


2.2.1 The happened-before relationship (32)

Let’s explore the order of events and relationship to physical time and logical time, and the idea of happened-before relationship.
intracomponent - This is the idea of events happening in the same component happening before each other. a happened before b
intercomponent - This events a and b happen in different components and a is a send message and b is the corresponding receive then a happened before b.


Situation | Notation
a happened before b. | a -> b
b happened before a. | b -> a
a and b are concurrent | a || b = b || a

transitivity - This is the transitive property for events, if a happened before b and b happened before c then a -> b -> c and a -> c
causality - causal refers to the fact that an event a potentially influenced and event b. We will not know if it in fact did influence b but it could happen.


Think about events with a start and and end and you might see a better why of looking at things and what events will have causal to each other. If event a starts and stops before b then it could have a causal influence. but if both a and b start before a ends then it can’t.


2.2.2 Time and clocks (35)

Clock Consistency states that event a happened before event b then the timestamp of a is less thant the timestamp for b.
• a -> b => C(a) < C(b)

2.2.3 Physical time and physical clocks (35)

Physical clocks are physical devices that use the physical time to timestamp events. Keep in mind that there will be skew and drift because physical clocks are neither perfect nor synchronized.
Clock Skew - The difference in time between two clocks in a distributed system. Time difference of 5 min can be compounded as your progress through the system.
Clock Drift - Differences in frequency of ticks between 2 clocks. The difference starts with 5 minutes off and stays 56 minutes off as time progresses.


To me this feels like it is saying the opposite from the images:
• (skew) as figure 2.16 shows the same times starting and then each tick one moves 5 min and the other moves 10 min.
• (drift) as figure 2.17 the clocks being off 5 min to start and then each tick moves them both 5 min each maintaining the 5 min separation.


Mitigation
There are ways to mitigate these issues with protocols. Network Time Protocol (NTP) are widely used. NTP works by having one component called the NTP server broadcast it’s clock reading to the other components known as the NTP clients.


Time-of-day Clock Vs. Monotonic Clock
Even adjusting the time within a component can cause problems so hardware and software provide 2 types of clocks.
time-of-day clocks provide a timestamp as close as possible to wall-clock time but may move backwards due to clock synchronization.
monotonic clocks - provide a timestamp that is independent of the wall-clock. They will guarantee time that will not move backwards and will be consistent compared to themselves.


Here is a bit of code that you might see


// Time of day clock  
var timestamp1 = System.TimeOfDay  
var timestamp2 = System.TimeOfDay # intercomponent  
// Monotonic clock  
var timestamp3 = System.Monotonic  
var timestamp4 = System.Monotonic # only intracomponent  

2.2.4 Logical time and logical clocks (37)

Logical time allows events to be ordered by their happened-before relationships. This is something like the Lamport clock, where every time an event occurs it is tagged with the components Logical clock time, then every time an other component receives a request it will update its logical clock with the max its own internal logical clock and the received Lamport clock value plus 1.


This is not the only way that you can keep track of time however. Apache Kafka is a messaging and streaming platform, that organized data into topics then into partitions. Then each message in a partition is assigned an offset that indicates its position within that partition. This insures that within a partition everything is ordered but you cant ensure across partitions (interpartitiuon).


2.2.5 Physical clocks vs. logical clocks (38)

Physical clocks are meant to keep order within a component while logical clocks are meant to keep order within the system.