Picture each land-cover class (forest, water, city) as a fuzzy cloud floating in spectral space. Max Likelihood asks the question:

“Which cloud is this pixel most likely sitting inside?”

Crucially, it considers each cloud’s shape (covariance), not just its center. Min Distance only looks at the center — it’s like deciding which city you’re closest to without checking which one’s borders you’re inside.

That’s why ML usually wins on accuracy: it understands a tightly-grouped class is more confident, while a wide loose class lets in more variety.

🔬 Science / formula

For each pixel, compute the probability it belongs to each class. Assign to the most likely class.

Key assumption: each class is multivariate Gaussian (each band normally distributed inside the class).
Uses the class mean AND covariance matrix — that’s how it accounts for class shape, not just position.
Strength: most accurate of the four decision rules when classes are well-sampled and bands are roughly normal.
Weakness: needs enough training samples to estimate covariance; fails on non-Gaussian classes.
Bayesian variant (Hord 1982): supply per-class prior probabilities instead of assuming equal priors.

💡

ML uses **mean + covariance**, not just the mean. Min Distance ignores shape (just the mean). Mahalanobis adds shape. Max Likelihood adds priors on top. The full discriminant formula lives in the long-form review — the *concept* lives here.

classification/ml-discriminant