Results 1  10
of
188
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 984 (32 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Topic Detection and Tracking Pilot Study Final Report
 IN PROCEEDINGS OF THE DARPA BROADCAST NEWS TRANSCRIPTION AND UNDERSTANDING WORKSHOP
, 1998
"... Topic Detection and Tracking (TDT) is a DARPAsponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories. The TDT problem consists of three major tasks: (1) segmenting a stream of data, especially recognized speech, into distinc ..."
Abstract

Cited by 313 (34 self)
 Add to MetaCart
Topic Detection and Tracking (TDT) is a DARPAsponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories. The TDT problem consists of three major tasks: (1) segmenting a stream of data, especially recognized speech, into distinct stories; (2) identifying those news stories that are the first to discuss a new event occurring in the news; and (3) given a small number of sample news stories about an event, finding all following stories in the stream.
The Pilot Study ran from September 1996 through October 1997. The primary participants were DARPA, Carnegie Mellon University, Dragon Systems, and the University of Massachusetts at Amherst. This report summarizes the findings of the pilot study.
The TDT work continues in a new project involving larger training and test corpora, more active participants, and a more broadly defined notion of "topic" than was used in the pilot study.
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 302 (22 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
A Study on Retrospective and OnLine Event Detection
, 1998
"... This paper investigates the use and extension of text retrieval and clustering techniques for event detection. The task is to automatically detect novel events from a temporallyordered stream of news stories, either retrospectively or as the stories arrive. We applied hierarchical and nonhierarchi ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
(Show Context)
This paper investigates the use and extension of text retrieval and clustering techniques for event detection. The task is to automatically detect novel events from a temporallyordered stream of news stories, either retrospectively or as the stories arrive. We applied hierarchical and nonhierarchical document clustering algorithms to a corpus of 15,836 stories, focusing on the exploitation of both content and temporal information. We found the resulting cluster hierarchies highly informative for retrospective detection of previously unidentified events, effectively supporting both queryfree and querydriven retrieval. We also found that temporal distribution patterns of document clusters provide useful information for improvement in both retrospective detection and online detection of novel events. In an evaluation using manually labelled events to judge the systemdetected events, we obtained a result of 82% in the F1 measure for retrospective detection, and a F1 value of 42% for...
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract

Cited by 191 (4 self)
 Add to MetaCart
(Show Context)
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.
Improved fast Gauss transform and efficient kernel density estimation
 In ICCV
, 2003
"... Evaluating sums of multivariate Gaussians is a common computational task in computer vision and pattern recognition, including in the general and powerful kernel density estimation technique. The quadratic computational complexity of the summation is a significant barrier to the scalability of this ..."
Abstract

Cited by 154 (8 self)
 Add to MetaCart
(Show Context)
Evaluating sums of multivariate Gaussians is a common computational task in computer vision and pattern recognition, including in the general and powerful kernel density estimation technique. The quadratic computational complexity of the summation is a significant barrier to the scalability of this algorithm to practical applications. The fast Gauss transform (FGT) has successfully accelerated the kernel density estimation to linear running time for lowdimensional problems. Unfortunately, the cost of a direct extension of the FGT to higherdimensional problems grows exponentially with dimension, making it impractical for dimensions above 3. We develop an improved fast Gauss transform to efficiently estimate sums of Gaussians in higher dimensions, where a new multivariate expansion scheme and an adaptive space subdivision technique dramatically improve the performance. The improved FGT has been applied to the mean shift algorithm achieving linear computational complexity. Experimental results demonstrate the efficiency and effectiveness of our algorithm. 1
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 130 (14 self)
 Add to MetaCart
(Show Context)
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
NCApproximation Schemes for NP and PSPACEHard Problems for Geometric Graphs
, 1997
"... We present NC approximation schemes for a number of graph problems when restricted to geometric graphs including unit disk graphs and graphs drawn in a civilized manner. Our approximation schemes exhibit the same time versus performance tradeoff as the best known approximation schemes for planar gr ..."
Abstract

Cited by 116 (1 self)
 Add to MetaCart
We present NC approximation schemes for a number of graph problems when restricted to geometric graphs including unit disk graphs and graphs drawn in a civilized manner. Our approximation schemes exhibit the same time versus performance tradeoff as the best known approximation schemes for planar graphs. We also define the concept of precision unit disk graphs and show that for such graphs the approximation schemes have a better time versus performance tradeoff than the approximation schemes for arbitrary unit disk graphs. Moreover, compared to unit disk graphs, we show that for precision unit disk graphs, many more graph problems have efficient approximation schemes. Our NC approximation schemes can also be extended to obtain efficient NC approximation schemes for several PSPACEhard problems on unit disk graphs specified using a restricted version of the hierarchical specification language of Bentley, Ottmann and Widmayer. The approximation schemes for hierarchically specified un...
Efficient algorithms for geometric optimization
 ACM Comput. Surv
, 1998
"... We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear progra ..."
Abstract

Cited by 114 (10 self)
 Add to MetaCart
We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear programming and related problems, and LPtype problems and their efficient solution. We then describe a variety of applications of these and other techniques to numerous problems in geometric optimization, including facility location, proximity problems, statistical estimators and metrology, placement and intersection of polygons and polyhedra, and ray shooting and other querytype problems.
Applications of weighted voronoi diagrams and randomization to variancebased kclustering (Extended Abstract)
"... In this paper we consider the kclustering problem for a set S of n points pi = (xi) in the ddimensional space with variancebased errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the intercluster criterion ..."
Abstract

Cited by 106 (4 self)
 Add to MetaCart
In this paper we consider the kclustering problem for a set S of n points pi = (xi) in the ddimensional space with variancebased errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the intercluster criterion to minimize, the sum of intracluster errors over every cluster is used, and as the intracluster criterion of a cluster Sj, jSjjff\Gamma 1 X pi2Sj kxi \Gamma _x(Sj)k2 is considered, where k \Delta k is the L2 norm and _x(Sj) is the centroid of points in Sj, i.e., (1=jSjj) Ppi2Sj xi. The cases of ff = 1; 2 correspond to the sum of squared errors and the allpairs sum of squared errors, respectively. The kclustering problem under the criterion with ff = 1; 2 are treated in a unified manner by characterizing the optimum solution to the kclustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shutter function for the Voronoi diagrams. The primary shutter function is shown to be O(nO(kd)), which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intracluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an fflapproximate 2clustering in O(n(1=ffl)d) time, which is quite practical and may be used to real largescale problems such as the color quantization problem.