Home Posts Post Search Tag Search

Thinking Distributed Systems 07 - Partitioning
Published on: 2026-02-12 Tags: Side Project, Libraries, Deployment, Think Distributed Systems, Models, Systems

7 Partitioning (90)

For any large system you will need to take some of the data and put it into more than one node. This is partitioning.

7.1 Encyclopedias and volumes (90)

We will see that there are a few ways to deal with how to partition a set of data. The book goes in the idea of the encyclopedia that takes knowledge and breaks it up into 26 volumes. Each volume is a letter of the alphabet and will reference anything the begins with that letter. This is good if you have and even distribution of data across letters, and there is no cross reference but in the real world one volume will be bigger than an other and you will need to use a different volume to get a real picture of any topic.


¡ Uneven distribution —Not all letters in the English language have an equal number
  of corresponding entries. The letter E may have a bulky volume of entries,
  whereas the letter X may have a slim volume.
¡ Uneven demand—For this reason, we may end up reaching for the volume with
  the letter E more often than the one with the letter X.
¡ Cross-references—Some entries refer to other entries to provide a comprehensive
  picture. A detailed entry about partitioning, found under P, may reference replication,
  found under R, necessitating access to two volumes.

7.2 Thinking in partitions (92)

The term partitioning refers to representing a single logical object by multiple, disjoint, physical objects. There is also the node are those objects. Something to keep in mind here is that “big data” for any given node or structure can be relative. “Big data” for a mac book is far less than a NAS.

7.3 The mechanics of partitioning and balancing (93)

For the rest of the chapter we will think about data as a key-value store. If we have the data set {x=1, y=2, z=3} using some replication you can have the same data for a key on more than 1 partition. In this sense the replication is the single part of the data set (x=1) on more than 1 node, while the partition will hold the entire data set.


This might seem trivial but one of the biggest things to try and account for is the access and write for any given partitioned system. Do you need to be able to access based of time of post? Then a partition based on the time stamp might be helpful. What about a search for a user’s post then a partition based off of the user id. Then think about a site like Twitter. They will have most post based off of a users follows and then an algorithm for what should be at the top.


With that said there is never a one-and-done partition you will need to (Re)partition a set of data.

7.4 (Re)partitioning (94)

Partitioning refers to assignment of dat items into partitions. Repartitioning refers to reassignment of previously assigned data items to different partitions.

7.4.1 Types of partitioning (94)

There are two different dimensions of partitioning.


Static and Dynamic Partitioning


¡ Static partitioning —The number of partitions is fixed and cannot be changed
  online. Any changes to the number of partitions must be done offline because
  the change is considered an administrative operation of the system.
¡ Dynamic partitioning —The number of partitions is variable and can be changed
  online. Changing the numbers of partitions is a normal operation of the system.
  In other words, the system is elastic.

Horizontal and Vertical Partitioning


¡ Horizontal partitioning -Refers to splitting a set of data across a key row. 
  You will store all the data for a user but only a few for each partition.
¡ Vertical partitioning -Refers to splitting a set of data across the column
  for all the users. You will store all the first names, then the last names.

With this being said it might be beneficial to store using both types. It might makes sense to First create vertical partitions with a users text data in one partition (first, last, etc), then create an other vertical partition for larger data types like a profile picture.


Now we can horizontally partition those vertical partitions into key-value pairs of data.

7.4.2 Data item to partition assignment strategies (97)

With those types out of the way we need to talk about how we are going to figure out how to store the data and how to retrieve it. Let’s get some vocab out of the way. Variance refers to the degree to which items are uniformly distributed. Relocation corresponds to the number of items that require relocation when the number of partitions change.


Item-Based Assignment Strategy


Item-based assignment is assignment based solely on the characteristic of the data. This might be a hash of the key, or simply a mod based off the number of partitions.

partitions = {i: {} for i in range(5)}

def placement(key):
  # Simple placement function that works with integers.
  return key % 5

def place(key, value):
  # Calculate the partition ID based on our hash function.
  partition = placement(key)
  # Add the item to the appropriate partition.
  partitions[partition][key] = value

def get(key):
  # Calculate the partition ID based on our hash function.
  partition = placement(key)
  # Retrieve the item from the appropriate partition.
  return partitions[partition].get(key)

Directory-Based Assignment Strategy


Directory-based takes the item and then uses a directory or lookup table in order to find the right placement or retrieval of an item. There can be issues with this as the directory is an other step and can be the bottle neck of the system.

partitions = {i: {} for i in range(5)}
assignment = {}

def place(key, value):
  # Instead of calculating the partition,
  randomly choose and update the directory.
  partition = random.randint(0, len(partitions)1)
  # Add the key to the directory
  assignment[key] = partition
  # Add the item to the appropriate partition.
  partitions[partition][key] = value

def get(key):
  # Retrieve the partition ID from the directory.
  partition = assignment.get(key)
  # Retrieve the item from the appropriate partition.
  if partition is not None:
    return partitions[partition].get(key)
  else:
  return None

In the end if you need less control of the system you might be able to just go with item-based, but a finer control can come with a directory-based system.

7.5 Common item-based assignment strategies (99)

The book now goes into detail about who one might use an item-based strategy based off the dictionary english words. This is aa pretty go indicator about how a distribution might work with each style. The next line will be the function that will read the data and then we can go into strategies for partitioning. Each strategy will be followed with the distribution of the strategy.

partitions = {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}
def placement(word):
  ...

with open('/usr/share/dict/words') as f:

  words = [word.lower() for word in f.read().split()]

  for word in words:
    partitions[placement(word)] += 1

Now lets go into different strategies.

7.5.1 Range partitioning (100)

This first one is just using a key range for the partition. Predefine a set of ranges for each partition. (Encyclopedia)

def placement(key):
  if key[0] in ['a', 'b', 'c', 'd', 'e']:
    return 0
  if key[0] in ['f', 'g', 'h', 'i', 'j']:
    return 1
  if key[0] in ['k', 'l', 'm', 'n', 'o']:
    return 2
  if key[0] in ['p', 'q', 'r', 's', 't']:
    return 3
  if key[0] in ['u', 'v', 'w', 'x', 'y', 'z']:
    return 4

0: 67730
1: 33203
2: 35829
3: 73432
4: 25782

7.5.2 Hash partitioning (100)

This one uses a hash that takes the word and turns it into an integer that we can use modular arithmetic on.

def placement(key):
  return hash(key) % 5

0: 47219
1: 47187
2: 47362
3: 47175
4: 47033

7.6 Repartitioning (101)

With these strategies we can see what happens if we need to reparation the data sets by adding in an other partition. Here is the code to repartition. Each strategy will have the items that remain in the same partition and the items that will have to change.

same = 0
diff = 0

def placement_5(word):
  ...

def placement_6(word):
  ...

with open('/usr/share/dict/words') as f:

  words = [word.lower() for word in f.read().split()]

  for word in words:

    if placement_5(word) == placement_6(word):
      same += 1
    else:
      diff += 1

7.6.1 Range partitioning (101)

This one will add in an other partition. The ranges will now be only 4 for most of them not the 5 we had before.

def placement_6(word):
  if word[0] in ['a', 'b', 'c', 'd']:
    return 0
  if word[0] in ['e', 'f', 'g', 'h']:
    return 1
  if word[0] in ['i', 'j', 'k', 'l']:
    return 2
  if word[0] in ['m', 'n', 'o', 'p']:
    return 3
  if word[0] in ['q', 'r', 's', 't']:
    return 4
  if word[0] in ['u', 'v', 'w', 'x', 'y', 'z']:
    return 5

Same: 114790
Diff: 121186

7.6.2 Hash partitioning (102)

This one will still use the hash but now it will be mod 6.

def placement_5(word):
  return hash(word) % 5

def placement_6(word):
  return hash(word) % 6

Same: 39443
Diff: 196533

7.7 Consistent hashing (103)

We can see from the above that although hash partitioning can be helpful for minimizing variance, both the above strategies are very bad and reallocation. There is an other strategy that we can use. Consistent hashing is a strategy that is designed to reduce the relocation for any given key-value pair. If we are given n items and m partitions, then n/m items will need to be relocated.

from uhashring import HashRing

def placement_5(word):
  return HashRing(nodes=[0, 1, 2, 3, 4]).get_node(word)

def placement_6(word):
  return HashRing(nodes=[0, 1, 2, 3, 4, 5]).get_node(word)

0: 49084
1: 47753
2: 48172
3: 47144
4: 43823

Same: 192243
Diff: 43733

7.8 (Re)balancing and overpartitioning (103)

Balancing refers to the assignment of partitions to nodes. Rebalancing refers to the reassignment of previously assigned partitions to different nodes. There are a few strategies the book talks about for this but one is to have more partitions on node than you need so you can scale up. Or consider just one partition per node.


For the first you can add in nodes to help with demand or decommission nodes if demand is lessened. Keep in mind with this strategy that you can only scale up based off the current number of partitions.


For the second you are limited in the number of nodes that you can have as it will be far less than the first example.


Both of these strategies have drawbacks as you must not think about access time for data, and the distributed nature of the tasks.