Results 1  10
of
103
Image retrieval: Current techniques, promising directions and open issues
 Journal of Visual Communication and Image Representation
, 1999
"... This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially contentbased image retrieval, an area that has been so active and prosperous in the past few years. The survey includes 100+ papers covering the research aspects of image fea ..."
Abstract

Cited by 354 (11 self)
 Add to MetaCart
This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially contentbased image retrieval, an area that has been so active and prosperous in the past few years. The survey includes 100+ papers covering the research aspects of image feature representation and extraction, multidimensional indexing, and system design, three of the fundamental bases of contentbased image retrieval. Furthermore, based on the stateoftheart technology available now and the demand from realworld applications, open research issues are identified and future promising research directions are suggested. C ○ 1999 Academic Press 1.
On Clusterings: Good, Bad and Spectral
, 2000
"... We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spe ..."
Abstract

Cited by 257 (12 self)
 Add to MetaCart
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worstcase guarantees
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 247 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Clustering data streams: Theory and practice
 IEEE TKDE
, 2003
"... Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little ..."
Abstract

Cited by 106 (2 self)
 Add to MetaCart
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
Hierarchical Document Clustering Using Frequent Itemsets
 IN PROC. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2003 (SDM 2003
, 2003
"... A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, each document often contains a small fraction of words in the vocabulary. These features require special handlings. Anoth ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, each document often contains a small fraction of words in the vocabulary. These features require special handlings. Another requirement is hierarchical clustering where clustered documents can be browsed according to the increasing specificity of topics. In this paper, we propose to use the notion of frequent itemsets, which comes from association rule mining, for document clustering. The intuition of our clustering criterion is that each cluster is identified by some common words, called frequent itemsets, for the documents in the cluster. Frequent itemsets are also used to produce a hierarchical topic tree for clusters. By focusing on frequent items, the dimensionality of the document set is drastically reduced. We show that this method outperforms best existing methods in terms of both clustering accuracy and scalability.
Better Streaming Algorithms for Clustering Problems
, 2003
"... We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximatio ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximation in one pass using storage space O(kpolylog n). This is a significant improvement of the previous best algorithm which yielded a 2 O(1/ɛ) approximation using O(n ɛ)space. Next we give a streaming algorithm for the k–Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.
Evolutionary Spectral Clustering by Incorporating Temporal Smoothness
, 2007
"... Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatica ..."
Abstract

Cited by 62 (7 self)
 Add to MetaCart
Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatically from the recent history. To fulfill this dual purpose, a measure of temporal smoothness is integrated in the overall measure of clustering quality. In this paper, we propose two frameworks that incorporate temporal smoothness in evolutionary spectral clustering. For both frameworks, we start with intuitions gained from the wellknown kmeans clustering problem, and then propose and solve corresponding cost functions for the evolutionary spectral clustering problems. Our solutions to the evolutionary spectral clustering problems provide more stable and consistent clustering results that are less sensitive to shortterm noises while at the same time are adaptive to longterm cluster drifts. Furthermore, we demonstrate that our methods provide the optimal solutions to the relaxed versions of the corresponding evolutionary kmeans clustering problems. Performance experiments over a number of real and synthetic data sets illustrate our evolutionary spectral clustering methods provide more robust clustering results that are not sensitive to noise and can adapt to data drifts.
Exact and Approximation Algorithms for Clustering
, 1997
"... In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the kcenter problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 + ffl)approximation algorith ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the kcenter problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 + ffl)approximation algorithm for the kcenter problem, with running time O(n log k) + (k=ffl) O(k 1\Gamma1=d ) . Finally, we present a n O(k 1\Gamma1=d ) time algorithm for solving the Lcapacitated kcenter problem, provided that L = \Omega\Gamma n=k 1\Gamma1=d ) or L = O(1). We conclude with a simple approximation algorithm for the Lcapacitated kcenter problem. The work on this paper was partially supported by a National Science Foundation Grant CCR9301259, by an Army Research Office MURI grant DAAH049610013, by a Sloan fellowship, by an NYI award and matching funds from Xerox Corporation, and by a grant from the U.S.Israeli Binational Science Foundation. y Department of Computer Science, Box ...
Reverse Nearest Neighbor Aggregates Over Data Streams
, 2001
"... Reverse Nearest Neighbor (RNN) queries have been studied for finite, stored data sets and are of interest for decision support. ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
Reverse Nearest Neighbor (RNN) queries have been studied for finite, stored data sets and are of interest for decision support.