Rand Score
Introduction#
The Rand score is metric that shows us the similarity between 2 data clusterings or one clustering and the ground truth.
It is formally defined by this formula:
If you're unsure what TP, TN, FP and FN are, you can read more about them here.
Here's a meme if you just want a quick recap.
In the context of clustering, here's what they mean:
TP: Same class and Same cluster FP: Same class and Different cluster TN: Different class and Same cluster FN: Different class and Different cluster

The value of Rand score is between 0 and 1, 0 indicating that both clusters do not match at all and 1 indicating that the clustering is completely same as the other.
Another form of Rand score is:
- a is the number of pairs of points that are in the same cluster both clustering algorithims
- b is the times the pair of points are in different clusters belong in different clustering algorithims.
- āæCā is the total number of unordered pairs.
Let's understand this better with an example.
Example#
We have data of 5 flowers and we cluster them using 2 different clustering algorithms, here are the results.
Clustering 1: {1,1,1,2,2} Clustering 2: {1,1,2,2,3}
Now we will calculate the Rand score for both clustering algorithms, starting with counting the number of unordered pairs.
We have 5 flowers in the dataset, let's call them a, b, c, d, e.
Dataset: {A, B, C, D, E}
The 10 unordered pairs are: {A, B}, {A, C}, {A, D}, {A, E}, {B, C}, {B, D}, {B, E}, {C, D}, {C, E}, {D, E}, this is āæCā part of the equation.
The unordered pairs that are in the same in both clusters is {A, B} = 1, this is 'a'.
Now we find 'b' which includes {A, D}, {A, E}, {B, D}, {B, E} and {C, E}, these are unordered pairs that belong to different clusters across both clustering methods = 5
Plugging in all these values we get: