Measures of dependence
How to quantitatively measure the dependence between two random variables?
Methods
- Classical
- Pearson correlation
- Spearman’s, Kendall’s, Hoeffding’s
- RV coefficient
- Information-theoretic
- Mutual information (MI)
- Maximal information coefficient (MIC)
- Hyped in Science (summary)
- Criticized by Simon & Tibshirani for lack of power compared to dCov
- Mutual dependence (arxiv)
- Distance/kernel-based
- Distance covariance (dCov)
- Method of Heller, Heller, & Gorfine (HHG) (arxiv)
- Hilbert-Schmidt information criterion (HSIC)
- Correlation-based
- Renyi’s maximal correlation
- Alternating conditional expectations (ACE) by Breiman &Friedman
- Randomized dependence coefficient (arxiv)
Many methods can be obtained by using a statistical distance to compare the joint distribution with the product of the marginal distributions, e.g.
- MI from KL divergence
- Mutual dependence from Hellinger distance
- Distance covariance from weighted \(L^2\) metric on characteristic functions
Literature
Surveys
- Josse, Holmes, 2016: Measuring multivariate association and beyond (doi,
arxiv)
- Focus on RV coefficient and distance covariance but see Sec. 4 for other measures
- de Siqueira Santos et al, 2014: A comparative study of statistical methods used to identify dependencies between gene expression signals (doi)
- Wagner & Eckhoff, 2015: Technical privacy metrics: a systematic survey (arxiv)
- On privacy (including differential privacy), not dependence, but there is much overlap