Results 1  10
of
50
A densitybased algorithm for discovering clusters in large spatial databases with noise
, 1996
"... Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clu ..."
Abstract

Cited by 1538 (67 self)
 Add to MetaCart
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The wellknown clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a densitybased notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the wellknown algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract

Cited by 149 (1 self)
 Add to MetaCart
(Show Context)
Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a userspecified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
Closest Pair Queries in Spatial Databases
 In Proceedings of the ACMSIGMOD Conference on Management of Data
, 2000
"... This paper addresses the problem of finding the K closest pairs between two spatial data sets, where each set is stored in a structure belonging in the Rtree family. Five different algorithms (four recursive and one iterative) are presented for solving this problem. The case of 1 closest pair is tr ..."
Abstract

Cited by 80 (10 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of finding the K closest pairs between two spatial data sets, where each set is stored in a structure belonging in the Rtree family. Five different algorithms (four recursive and one iterative) are presented for solving this problem. The case of 1 closest pair is treated as a special case. An extensive study, based on experiments performed with synthetic as well as with real point data sets, is presented. A wide range of values for the basic parameters affecting the performance of the algorithms, especially the effect of overlap between the two data sets, is explored. Moreover, an algorithmic as well as an experimental comparison with existing incremental algorithms addressing the same problem is presented. In most settings, the new algorithms proposed clearly outperform the existing ones. 1
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in HighDimensional Space
 IEEE Transactions On Knowledge And Data Engineering
, 2000
"... AbstractÐSimilarity search in multimedia databases requires an efficient support of nearestneighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearestneighbor search are not efficient ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
(Show Context)
AbstractÐSimilarity search in multimedia databases requires an efficient support of nearestneighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearestneighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearestneighbor search which corresponds to a computation of the Voronoi cell of each data point. In a second step, we store conservative approximations of the Voronoi cells in an index structure efficient for highdimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e., it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in other index structures such as the Xtree. Index TermsÐNearest neighbor search, highdimensional indexing, efficient query processing, spatial databases, Voronoi diagrams. æ 1
Fast indexing and visualization of metric datasets using slimtrees
 IEEE Transactions on Knowledge and Data Engineering (TKDE
, 2002
"... AbstractÐMany recent database applications must deal with similarity queries. For such applications, it is important to measure the similarity between two objects using the distance between them. Focusing on this problem, this paper proposes the Slimtree, a new dynamic tree for organizing metric da ..."
Abstract

Cited by 30 (9 self)
 Add to MetaCart
(Show Context)
AbstractÐMany recent database applications must deal with similarity queries. For such applications, it is important to measure the similarity between two objects using the distance between them. Focusing on this problem, this paper proposes the Slimtree, a new dynamic tree for organizing metric data sets in pages of fixed size. The Slimtree uses the triangle inequality to prune distance calculations needed to answer similarity queries over objects in metric spaces. The proposed insertion algorithm uses new policies to select the nodes where incoming objects are stored. When a node overflows, the Slimtree uses a Minimal Spanning Tree to help with the split. The new insertion algorithm leads to a tree with high storage utilization and improved query performance. The Slimtree is the first metric access method to tackle the problem of overlap between nodes in metric spaces and to propose a technique to minimize it. The proposed ªfatfactorº is a way to quantify whether a given tree can be improved and also to compare two trees. We show how to use the fatfactor to achieve accurate estimates of the search performance and also how to improve the performance of a metric tree through the proposed ªSlimdownº algorithm. This paper also presents a new tool in the arsenal of resources of Slimtree aimed at visualizing it. Visualization is a powerful tool for interactive data mining and for the visual tracking of the behavior of a tree under updates. Finally, we present a formula to estimate the number of disk accesses in range queries. Results from experiments with real and synthetic data sets show that the new algorithms of the Slimtree lead to performance improvements. These results show that the Slimtree outperforms the Mtree up to 200 percent for range queries. For insertion and split, the MinimalSpanningTreebased algorithm achieves up to 40 times faster insertions. We observed improvements up to 40 percent in range queries after applying the
NextGeneration Content Representation, Creation and Searching for New Media Applications in Education
, 1998
"... Content creation, editing, and searching are extremely time consuming tasks that often require substantial training and experience, especially when highquality audio and video are involved. "New media" represents a new paradigm for multimedia information representation and processing, in ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Content creation, editing, and searching are extremely time consuming tasks that often require substantial training and experience, especially when highquality audio and video are involved. "New media" represents a new paradigm for multimedia information representation and processing, in which the emphasis is placed on the actual content. It thus brings the tasks of content creation and searching much closer to actual users and enables them to be active producers of audiovisual information rather than passive recipients. We discuss the stateoftheart and present nextgeneration techniques for content representation, searching, creation, and editing. We discuss our experiences in developing a Webbased distributed compressed video editing and searching system (WebClip), a media representation language (Flavor) and an objectbased video authoring system (Zest) based on it, and large image/video search engines for the WorldWide Web (WebSEEk and VideoQ). We also present a case study of new media applications based on specific planned multimedia education experiments with the above systems in several K12 schools in Manhattan.
Continuous Nearest Neighbor Queries over Sliding Windows
 IEEE Transactions on Knowledge and Data Engineering (TKDE
, 2007
"... Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (countbased) or 2) the arrivals within a fixed interval W covering the most recent time stamps (timebased). The task of the query processor is to constantly maintain the result of longrunning NN queries among the valid data. We present two processing techniques that apply to both countbased and timebased windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distancetime space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skylinebased algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Locationdependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows. 1
Clustering Techniques for Large Data Sets: From the Past to the Future
 Proc. Int’l Conf. Knowledge Discovery and Data Mining, ACM
, 1999
"... ..."
Spatial Priority Search: An Access Technique for Scaleless Maps
 In Proceedings of ACM SIGMOD International Conference on Management of Data
, 1991
"... In geographic information systems, an important goal is the maintenance of seamless, scaleless maps. The amount of detail desired on a map decreases with decreasing scale. Cartographic techniques called generalization define the representations of geographic objects, depending on the scale. While g ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
In geographic information systems, an important goal is the maintenance of seamless, scaleless maps. The amount of detail desired on a map decreases with decreasing scale. Cartographic techniques called generalization define the representations of geographic objects, depending on the scale. While generalization as a whole is considered an art, simple automatic generalization techniques exist for simple geometric objects. For polygonal lines and polygons, simplification techniques assign priorities to points. A map at a desired scale is then obtained by ignoring all points of sufficiently low priority. This implies that a geometric object appears on a map only if its priority is high enough, and also that an object is represented only by those of its defining points that have sufficiently high priority. The efficiency of retrieving a map of some area at a certain scale ideally should only depend on the amount of data retrieved. In this paper, we present algorithms and a fully adaptive...
Aggregate Processing of Planar Points
 In Extending Database Technology
, 2002
"... Aggregate window queries return summarized information about objects that fall inside a query rectangle (e.g., the number of objects instead of their concrete ids). Traditional approaches for processing such queries usually retrieve considerable extra information, thus compromising the processing ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
Aggregate window queries return summarized information about objects that fall inside a query rectangle (e.g., the number of objects instead of their concrete ids). Traditional approaches for processing such queries usually retrieve considerable extra information, thus compromising the processing cost.