Skip to main content

Approximating k-Means

Ernest van Wijland ( IRIF, Université Paris-Cité )

Among clustering objectives, k-Means is arguably the most well-known. Given a collection of points in a metric space, the goal is to partition them into k clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers.
In this seminar, we go over our recent work, describing a polynomial-time 5.83-approximation algorithm for k-Means in general metrics.

A natural approach for k-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem.
The previous best results for k-Means build upon an adaptation of an LMP 3-approximation for facility location with metric connection costs in [Jain, Vazirani - J.ACM'01] based on a primal-dual method, rather than on the improved LMP greedy 2-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J.ACM'03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). We show how to overcome this barrier by providing such an adaptation and a way to apply it to k-Means.

This is joint work with Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni and Euiwoong Lee.