Results 1 
9 of
9
Coresets and Sketches for High Dimensional Subspace Approximation Problems
"... We consider the problem of approximating a set P of n points in R d by a jdimensional subspace under the ℓp measure, in which we wish to minimize the sum of ℓp distances from each point of P to this subspace. More generally, the Fq(ℓp)subspace approximation problem asks for a jsubspace that minim ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
We consider the problem of approximating a set P of n points in R d by a jdimensional subspace under the ℓp measure, in which we wish to minimize the sum of ℓp distances from each point of P to this subspace. More generally, the Fq(ℓp)subspace approximation problem asks for a jsubspace that minimizes the sum of qth powers of ℓpdistances to this subspace, up to a multiplicative factor of (1 + ɛ). We develop techniques for subspace approximation, regression, and matrix approximation that can be used to deal with massive data sets in high dimensional spaces. In particular, we develop coresets and sketches, i.e. small space representations that approximate the input point set P with respect to the subspace approximation problem. Our results are: • A dimensionality reduction method that can be applied to Fq(ℓp)clustering and shape fitting problems, such as those in [8, 15]. • The first strong coreset for F1(ℓ2)subspace approximation in highdimensional spaces, i.e. of size polynomial in the dimension of the space. This coreset approximates the distances to any jsubspace (not just the optimal one). • A (1 + ɛ)approximation algorithm for the jdimensional F1(ℓ2)subspace approximation problem with running time nd(j/ɛ) O(1) + (n + d)2 poly(j/ɛ). • A streaming algorithm that maintains a coreset for the F1(ℓ2)subspace approximation problem and uses a space log n
Coresets for Discrete Integration and Clustering
 In proceedings of FSTTCS
, 2006
"... The problem received the title of `Buridan's sheep. ' The biological code was taken from a young merino sheep, by the CasparoKarpov method, at a moment when the sheep was between two feeding troughs full of mixed fodder. This code, along with additional data about sheep in general, was fe ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The problem received the title of `Buridan's sheep. ' The biological code was taken from a young merino sheep, by the CasparoKarpov method, at a moment when the sheep was between two feeding troughs full of mixed fodder. This code, along with additional data about sheep in general, was fed into CODD. The machine was required: a) to predict which trough the merino would choose, and b) to give the psychophysiological basis for this choice. The mystery of the hind leg, Arkady and Boris Strugatsky Given a set P of n points on the real line and a (potentially in nite) family of functions, we investigate the problem of nding a small (weighted) subset S ⊆ P, such that for any f ∈ F, we have that f(P) is a (1 ± ε)approximation to f(S). Here, f(Q) = ∑ q∈Q w(q)f(q) denotes the weighted discrete integral of f over the point set Q, where w(q) is the weight assigned to the point q. We study this problem, and provide tight bounds on the size S for several families of functions. As an application, we present some coreset constructions for clustering. 1
Relative εApproximations in Geometry
, 2007
"... We reexamine relative εapproximations, previously studied in [Pol86, Hau92, LLS01, CKMS06], and their relation to certain geometric problems. We give a simple constructive proof of their existence in general range spaces with finite VCdimension, and of a sharp bound on their size, close to the be ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We reexamine relative εapproximations, previously studied in [Pol86, Hau92, LLS01, CKMS06], and their relation to certain geometric problems. We give a simple constructive proof of their existence in general range spaces with finite VCdimension, and of a sharp bound on their size, close to the best known one. We then give a construction of smallersize relative εapproximations for range spaces that involve points and halfspaces in two and higher dimensions. The planar construction is based on a new structure—spanning trees with small relative crossing number, which we believe to be of independent interest. We also consider applications of the new structures for approximate range counting and related problems.
Geometric optimization and sums of algebraic functions
, 2009
"... We present a new optimization technique that yields the first FPTAS for several geometric problems. These problems reduce to optimizing a sum of nonnegative, constant descriptioncomplexity algebraic functions. We first give an FPTAS for optimizing such a sum of algebraic functions, and then we appl ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present a new optimization technique that yields the first FPTAS for several geometric problems. These problems reduce to optimizing a sum of nonnegative, constant descriptioncomplexity algebraic functions. We first give an FPTAS for optimizing such a sum of algebraic functions, and then we apply it to several geometric optimization problems. We obtain the first FPTAS for two fundamental geometric shape matching problems in fixed dimension: maximizing the volume of overlap of two polyhedra under rigid motions, and minimizing their symmetric difference. We obtain the first FPTAS for other problems in fixed dimension, such as computing an optimal ray in a weighted subdivision, finding the largest axially symmetric subset of a polyhedron, and computing minimumarea hulls. 1
A NearLinear Algorithm for Projective Clustering Integer Points
, 2012
"... We consider the problem of projective clustering in Euclidean spaces of nonfixed dimension. Here, we are given a set P of n points in R m and integers j ≥ 1, k ≥ 0, and the goal is to find j ksubspaces so that the sum of the distances of each point in P to the nearest subspace is minimized. Observ ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider the problem of projective clustering in Euclidean spaces of nonfixed dimension. Here, we are given a set P of n points in R m and integers j ≥ 1, k ≥ 0, and the goal is to find j ksubspaces so that the sum of the distances of each point in P to the nearest subspace is minimized. Observe that this is a shape fitting problem where we wish to find the best fit in the L1 sense. Here we will treat the number j of subspaces we want to fit and the dimension k of each of them as constants. We consider instances of projective clustering where the point coordinates are integers of magnitude polynomial in m and n. Our main result is a randomized algorithm that for any ε> 0 runs in time O(mn polylog(mn)) and outputs a solution that with high probability is within (1 + ε) of the optimal solution. To obtain this result, we show that the fixed dimensional version of the above projective clustering problem has a small coreset. We do that by observing that in a fairly general sense, shape fitting problems that have small coresets in the L ∞ setting also have small coresets in the L1 setting, and then exploiting an existing construction for the L∞ setting. This observation seems to be quite useful for other shape fitting problems as well, as we demonstrate by constructing the first “regular” coreset for the circle fitting problem in the plane.
Sampling Based Dimension Reduction for Subspace Approximation
"... We give a randomized bicriteria algorithm for the problem of finding a kdimensional subspace that minimizes the Lperror for given points, i.e., pth root of the sum of pth powers of distances to given points, for any p ≥ 1. Our algorithm runs in time Õ ( mn · k 3 (k/ɛ) p+1) and produces a subset ..."
Abstract
 Add to MetaCart
We give a randomized bicriteria algorithm for the problem of finding a kdimensional subspace that minimizes the Lperror for given points, i.e., pth root of the sum of pth powers of distances to given points, for any p ≥ 1. Our algorithm runs in time Õ ( mn · k 3 (k/ɛ) p+1) and produces a subset of size Õ ( k 2 (k/ɛ) p+1) from the given points such that, with high probability, the span of these points gives a (1 + ɛ)approximation to the optimal kdimensional subspace. We also show a dimension reduction type of result for this problem where we can efficiently find a subset of size Õ ( k p+3 + (k/ɛ) p+2) such that, with high probability, their span contains a kdimensional subspace that gives (1 + ɛ)approximation to the optimum. We prove similar results for the corresponding projective clustering problem where we need to find multiple kdimensional subspaces. 1
On the Least Trimmed Squares Estimator
, 2007
"... The linear least trimmed squares (LTS) estimator is a statistical technique for estimating the line (or generally hyperplane) of fit for a set of points. It was proposed by Rousseeuw as a robust alternative to the classical least squares estimator. Given a set of n points in R d, in classical least ..."
Abstract
 Add to MetaCart
The linear least trimmed squares (LTS) estimator is a statistical technique for estimating the line (or generally hyperplane) of fit for a set of points. It was proposed by Rousseeuw as a robust alternative to the classical least squares estimator. Given a set of n points in R d, in classical least squares the objective is to find a linear model (that is, nonvertical hyperplane) that minimizes the sum of squared residuals. In LTS the objective is to minimize the sum of the smallest 50 % squared residuals. LTS is a robust estimator with a 50%breakdown point, which means that the estimator is insensitive to corruption due to outliers, provided that the outliers constitute less than 50 % of the set. LTS is closely related to the well known LMS estimator, in which the objective is to minimize the median squared residual, and LTA, in which the objective is to minimize the sum of the smallest 50 % absolute residuals. LTS has the advantage of being statistically more efficient than LMS. Unfortunately, the computational complexity of LTS is less well understood than LMS. In this paper we present new algorithms, both exact and approximate, for computing the LTS estimator. We also present hardness results for exact and approximate LTS and LTA.