K-Means, Gaussian Mixtures, and the Expectation-Maximization Algorithm#
K-Means Algorithm#
The k-Means algorithm is a specific example of the Expectation-Maximization (EM) algorithm In k-Means, there are two main steps:
Expectation step - we freeze the current cluster centers or “prototypes” and assign each datapoint to a cluster that minimizes the sum of squared distances between the datapoint and the center of the assigner cluster
Maximization step - we freeze the cluster membership and update the location of each cluster center We iterate between these two steps many times until the cluster memberships no longer change.
We can think of k-Means as a “hard clustering” algorithm where each datapoint belongs to exactly one cluster.
We show an example of using K-Means algorithm for image compression, where we reduce the number of colors in an image. In this case, given a color image, our goal is to learn an encoding scheme such that we reduce the number of colors to only 10 colors.
We show how initially, we start with all 10 colors to have the same (0.5, 0.5, 0.5)
RGB values. But K-means is able to quickly figure out to adjust the clusters into distinct colors. Thus allowing us to have an image that is similar to our original image but with much fewer colors.
In the animation below, we see how this happens using the EM algorithm we discussed above. We show how the compressed image changes over time and how the different clusters and cluster centers (red star), move over time to optimize the colors.