Estimating the unseen: a sublinear-sample estimator of distributions
Greg Valiant (UC Berkeley)
Info
|
Date |
24th May 2011 (week 4, Trinity Term 2011) |
|
Time |
11:30 |
|
Place |
147 |
Abstract
In joint work with Paul Valiant, we introduce a new approach to
characterizing the unobserved
portion
of a distribution, which provides sublinear-sample additive estimators
for a class of properties that
includes entropy, distribution
support size, and various distance metrics between pairs of
distributions (including
L1 distance).
Together with our new matching lower bounds (which I
will only have time to touch upon), this settles
the longstanding
question of the sample complexities of these estimation problems (up
to constant factors). Our
algorithm estimates these properties up to
an arbitrarily small additive constant, using O(n log n) samples
(where
n is a bound on the support size, or in the case of estimating
the support size, 1/n is a lower bound the probability
of any element
of the domain). Previously, no explicit sublinear-sample algorithms
for any of these estimation
tasks were known. Additionally, our algorithm
runs in time linear in the number of samples used.
If time
permits, I will also briefly discuss the black magic of
Stein's method--an extremely versatile approach to proving central
limit theorems, which plays a key role in our lower-bounds.
Further info
|
Related series |
|
