Skip to main content

Homogeneity Score

The problem#

Homogeinity score measure of whether each cluster contains only members of a single class, this metric ranges from 0 to 1. (higher the better)

It describes the closeness of the clustering algorithm to the ground truth.

Formally defined by this formula:

h=1โˆ’H(C,K)H(C)h = 1 - \frac{H(C,K)}{H(C)}
Where,H(CโˆฃK)=โˆ’โˆ‘c,knckNlogncknkWhere, H(C|K) = -\sum_{c,k} \frac {n_ck}{N} log \frac {n_ck}{n_k}

If each data point is in its own cluster, then the homogeneity score is 1. (higest)

1.png

Similarly if all data points are in one cluster, then the homogeneity score is 0. (lowest)

2.png