Skip to main content

Computational Efficiency and Robust Statistics

Ilias Diakonikolas ( University of Southern California )

Suppose we would like to estimate (learn) the mean of a high-dimensional Gaussian distribution. It is easy to see that the sample mean is an accurate estimator of the mean. Now suppose that an adversary corrupts a small fraction of our samples. Can we still recover the true mean, up to a small error? A moment's thought reveals that the sample mean is not a good estimator in this noisy setting. 

It turns out that the concept of the Tukey median is a robust estimator of the mean in the noisy setting. Unfortunately, it is NP-hard to compute, even approximately. This prompts the following question: Can we reconcile robustness and computational efficiency in high-dimensional distribution estimation?

In this talk, I will describe a general approach for efficiently detecting and correcting corruptions in high dimensions. This yields the first computationally efficient algorithms for robustly learning fundamental families of high-dimensional structured distributions, including Gaussians, discrete product distributions, and mixtures thereof (under some natural conditions). Time permitting, I will mention generalizations of the approach to robust estimation of Bayesian networks, and other families of graphical models.

The talk will be based on joint works with (different subsets of) G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.

 

 

Share this: