Master Expectation Maximization for CS6601 Assignment 5

Introduction to Expectation Maximization

Expectation Maximization (EM) is a powerful algorithm used for finding maximum likelihood estimates in models with latent variables. In CS6601 Assignment 5, you will implement EM for Gaussian Mixture Models (GMMs) and k-Means clustering. This tutorial breaks down the key components without giving away the solution, helping you understand the underlying math and vectorization techniques.

Why EM Matters in 2026

With the rise of AI applications like personalized recommendations and autonomous systems, EM is widely used for clustering and density estimation. For example, just as Spotify groups songs into genres using latent features, EM can discover hidden patterns in data. This assignment gives you hands-on experience with a fundamental unsupervised learning tool.

Part 1: k-Means Clustering (19 Points)

k-Means is a special case of EM where each cluster is assumed to have a spherical distribution. You'll implement the standard Lloyd's algorithm: initialize centroids, assign points to nearest centroid, and update centroids. Key challenge: vectorize the assignment step using broadcasting to avoid loops. Use numpy.linalg.norm with axis parameter for efficiency.

Vectorization Tips

Instead of iterating over points, compute distance matrix D where D[i,j] is distance from point i to centroid j. Then use np.argmin(D, axis=1) for assignments. This reduces runtime from O(n*k*d) to vectorized operations.

Part 2: Gaussian Mixture Model (48 Points)

GMM assumes data is generated from a mixture of several Gaussian distributions with unknown parameters. You'll implement the EM algorithm: E-step computes responsibilities (posterior probabilities), M-step updates means, covariances, and mixing coefficients.

E-Step: Responsibility Calculation

For each point i and component k, compute r[i,k] = pi_k * N(x_i | mu_k, Sigma_k) / sum_j pi_j * N(x_i | mu_j, Sigma_j). Use multivariate normal PDF from scipy or implement manually with np.linalg.det and np.linalg.solve for stability. Avoid singular covariance matrices by adding a small diagonal term.

M-Step: Parameter Updates

Update mixing coefficients: pi_k = sum_i r[i,k] / N. Means: mu_k = sum_i r[i,k] * x_i / sum_i r[i,k]. Covariances: Sigma_k = sum_i r[i,k] * (x_i - mu_k)(x_i - mu_k)^T / sum_i r[i,k]. Vectorize by using matrix multiplication: mu_k = (r[:,k] @ X) / N_k.

Part 3: Model Performance Improvements (20 Points)

To improve convergence, consider initialization strategies like k-Means++ or multiple restarts. Also, implement a convergence check based on log-likelihood change. In 2026, many AI startups use similar techniques for real-time clustering of user behavior.

Part 4: Bayesian Information Criterion (12 Points)

BIC helps select the number of components: BIC = -2 * log_likelihood + k * log(N), where k is number of parameters. Lower BIC indicates better model. You'll compute BIC for different K and choose optimal one. This is crucial for avoiding overfitting in applications like customer segmentation.

Common Pitfalls

Not vectorizing: Loops cause timeout (40 min limit). Always use numpy operations.
Numerical instability: Use log-sum-exp trick for responsibilities to avoid underflow.
Covariance singularity: Add small regularization (1e-6 * I) to covariance matrices.

Final Tips

Test your implementation with the provided mixture_tests.py. Ensure your submission file includes all imports. Remember to set your best submission as Active on Gradescope. Good luck!