Statistical distances
How do we measure the distance between probability distributions?
Methods
Metrics
- Total variation distance
- Wasserstein distance (aka Kantorovich-Rubinstein metric, aka Earth mover’s
distance)
- Includes TV distance as a very special case (for distance \(d(x,y) = 1_{x \neq y}\))
- Hellinger distance (cf. Bhattacharyya distance )
- Kolmogorov-Smirnov distance
- Lecture notes by Sourav Chatterjee
- Lévy-Prokhorov metric
- Energy distance (doi)
Divergences (non-metric)
- KL divergence (aka relative entropy)
- Rényi divergence (aka-divergence) (notes)
- \(\chi^2\)-distance
Literature
General
- Rachev, 1991: Probability metrics and the stability of stochastic models
- Villani: “For the taxonomy of probability metrics and their history, the unavoidable reference is the monograph by Rachev, which lists dozens and dozens of metrics together with their main properties and applications. (Many of them are variants… of the Wasserstein and Lévy-Prokhorov metrics.)”
- Basu, Shioya, Park, 2011: Statistical inference: The minimum distance approach (doi), Chapter 2: Statistical distances
- Gibbs & Su, 2002: On choosing and bounding probability metrics (doi, arxiv)
- A good place to start
- The relationship diagram in Figure 1 is especially helpful
- Cha, 2007: Comprehensive survey on distance/similarity measures between PDFs (pdf)
- Basseville, 1989: Distance measures for signal processing and pattern recognition (doi)
- Basseville, 2013: Divergence measures for statistical data processing (doi)
Wasserstein distance
The literature on Wasserstein distances is simply enormous, spanning probability, statistics, optimal transport, and other branches of pure and applied mathematics. Here are a few choice references:
- Villani, 2003: Topics in optimal transportation, Chapter 7: The metric side of optimal transportation
- Villani, 2009: Optimal transport: Old and new, Chapter 6: The Wasserstein distances
For more references, see optimal transport.