Results 1  10
of
18
Exact and Approximation Algorithms for Clustering
, 1997
"... In this paper we present a n O(k1�1=d) time algorithm for solving the kcenter problem in R d, under L1 and L2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 +)approximation algorithm for the kcenter pr ..."
Abstract

Cited by 59 (5 self)
 Add to MetaCart
In this paper we present a n O(k1�1=d) time algorithm for solving the kcenter problem in R d, under L1 and L2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 +)approximation algorithm for the kcenter problem, with running time O(n log k) + (k = ) O(k1�1=d). Finally, we present a n O(k1�1=d) time algorithm for solving the Lcapacitated kcenter problem, provided that L = (n=k 1�1=d) or L = O(1). We conclude with a simple approximation algorithm for the Lcapacitated kcenter problem.
Spgist: An extensible database index for supporting space partitioning trees
 J. Intell. Inf. Syst
"... Abstract. Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of a ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SPGiST is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SPGiST can behave as a kD tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SPGiST are addressed. A dynamic minimumheight clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SPGiST is presented as well as performance studies of the various SPGiST’s tuning parameters. Keywords: spacepartitioning trees, spatial databases, extensible index, generalized search trees, clustering
An extensible index for spatial databases
 In Statistical and Scientific Database Management
, 2001
"... Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these i ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
(Show Context)
Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SPGiST, is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SPGiST can behave as a kD tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SPGiST are addressed. A dynamic minimumheight clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SPGiST is presented as well as performance studies of the various SPGiST’s tuning parameters. 1.
SpacePartitioning Trees in PostgreSQL: Realization and Performance
 In Proc. of the 22nd International Conference on Data Engineering (ICDE’06
, 2006
"... Many evolving database applications warrant the use of nontraditional indexing mechanisms beyond B+trees and hash tables. SPGiST is an extensible indexing framework that broadens the class of supported indexes to include diskbased versions of a wide variety of spacepartitioning trees, e.g., dis ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Many evolving database applications warrant the use of nontraditional indexing mechanisms beyond B+trees and hash tables. SPGiST is an extensible indexing framework that broadens the class of supported indexes to include diskbased versions of a wide variety of spacepartitioning trees, e.g., diskbased trie variants, quadtree variants, and kdtrees. This paper presents a serious attempt at implementing and realizing SPGiSTbased indexes inside PostgreSQL. Several index types are realized inside PostgreSQL facilitated by rapid SPGiST instantiations. Challenges, experiences, and performance issues are addressed in the paper. Performance comparisons are conducted from within PostgreSQL to compare update and search performances of SPGiSTbased indexes against the B+tree and the Rtree for string, point, and line segment data sets. Interesting results that highlight the potential performance gains of SPGiSTbased indexes are presented in the paper. 1
CPStree: A compact partitioned suffix tree for diskbased indexing on large genome sequences
 In Proceedings of the International Conference on Data Engineering
, 2007
"... Suffix tree is an important data structure for indexing a long sequence (like a genome sequence) or a concatenation of sequences. It finds many applications in practice, especially in the domain of bioinformatics. Suffix tree allows for efficient pattern search with time independent of the sequence ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Suffix tree is an important data structure for indexing a long sequence (like a genome sequence) or a concatenation of sequences. It finds many applications in practice, especially in the domain of bioinformatics. Suffix tree allows for efficient pattern search with time independent of the sequence length. However, the performance of diskbased suffix tree is a concern as it is slowed down significantly by poor localized access resulting in high IO disk access. The focus of this paper is to design an IOefficient and Compact Partitioned Suffix tree representation (CPStree) on disk. We show that representing suffix tree using CPStree has several advantages. First, our representation allows us to visit any node in the suffix tree by accessing at most log n pages of the tree where n is the length of the sequence. Second, our storage scheme improves the access pattern and reduces the number of page fault resulting in efficient search retrieval and efficient tree traversal operations. Third, by bit packing, our index is compact. Experimental results show that CPStree outperforms other indexes on disk. When fully loaded into the main memory, CPStree is still efficient. Hence, we expect CPStree to be a good diskbased representation of suffix tree, with potential use in practical applications. 1.
A framework for supporting the class of spacepartitioning trees
, 2001
"... Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these i ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SPGiST is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SPGiST can behave as a kD tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SPGiST are addressed. A dynamic minimumheight clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SPGiST is presented as well as performance studies of the various SPGiST’s tuning parameters. Keywords: SPGiST, spacepartitioning trees, GiST, spatial tree indexes, access methods, clustering. 1.
Data Replication for External Searching in Static Tree Structures
 In Proceedings of the Ninth ACM International Conference on Information and Knowledge Management
, 2000
"... This paper explores the use of data replication to improve external searching in static tree structures. We present general and ecient mappings from the nodes of a tree T to blocks of size B when nodes of T can be replicated. The amount of replication is controlled and block utilization and blocknum ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper explores the use of data replication to improve external searching in static tree structures. We present general and ecient mappings from the nodes of a tree T to blocks of size B when nodes of T can be replicated. The amount of replication is controlled and block utilization and blocknumber are optimized. We consider total node replication (measuring the total space used) and individual node replication (measuring the replication of indivdual nodes). For an arbitrary tree T of size N and height h, we show that by using at most 3 2 N space one can achieve a blocknumber proportional to the optimal blocknumber of dh=Be. We show that when every node can be replicated only a constant number of times, no signicant reduction in the blocknumber may be possible. Our work also shows that generating mappings in which all but one block contain exactly B nodes increases the blocknumber by at most 2. 1.
Clustering in Trees: Optimizing . . .
 JOURNAL OF GRAPH ALGORITHMS AND APPLICATIONS
, 2000
"... This paper considers partitioning the vertices of an nvertex tree into p disjoint sets C1,C 2,...,C p , called clusters so that the number of vertices in a cluster and the number of subtrees in a cluster are minimized. For this NPhard problem we present greedy heuristics which di#er in (i) how su ..."
Abstract
 Add to MetaCart
This paper considers partitioning the vertices of an nvertex tree into p disjoint sets C1,C 2,...,C p , called clusters so that the number of vertices in a cluster and the number of subtrees in a cluster are minimized. For this NPhard problem we present greedy heuristics which di#er in (i) how subtrees are identified (using either a bestfit, goodfit, or firstfit selection criteria), (ii) whether clusters are filled one at a time or simultaneously, and (iii) how much cluster sizes can di#er from the ideal size of c vertices per cluster, n = cp. The last criteria is controlled by a constant #,0# #<1, such that cluster C i satisfies (1  # 2 )c #C i #c(1+#), 1 # i # p. For algorithms resulting from combinations of these criteria we develop worstcase bounds on the number of subtrees in a cluster in terms of c, #, and the maximum degree of a vertex. We present experimental results which give insight into how parameters c, #, and the maximum degree of a vertex impact the...
SearchOptimized Persistent Suffix Tree Storage for Biological Applications
"... The suffix tree is a well known and popular indexing structure for various sequence processing problems arising in biological data management. However, unlike traditional indexing structures, suffix trees are orders of magnitude larger than the underlying data. Moreover, their construction and searc ..."
Abstract
 Add to MetaCart
(Show Context)
The suffix tree is a well known and popular indexing structure for various sequence processing problems arising in biological data management. However, unlike traditional indexing structures, suffix trees are orders of magnitude larger than the underlying data. Moreover, their construction and search algorithms are extremely inefficient when implemented directly on disk. Recently, we have shown that it is possible to significantly speedup the ondisk construction of suffix trees through a careful choice of buffering policy and physical representation of suffix tree nodes. In this paper, we explore the problem of performing efficient searches on diskresident suffix trees. Specifically, we investigate the gains that can be achieved through customized nodetopage layout strategies. Through detailed experimentation on reallife genomic sequences, we demonstrate that a new layout, called Stellar, provides significantly improved search performance. The key feature of Stellar is that in addition to standard roottoleaf lookup queries, it also supports complex sequence search algorithms that exploit the suffix link feature of suffix trees. These results are encouraging with regard to the ultimate objective of seamlessly integrating sequence processing in database engines. 1
Applications of Clustering Problems
, 1997
"... Introduction Clustering algorithms have been employed in many fields of human knowledge that require finding a "natural association" among some specific data. The way the term "natural association" is defined depends on the field and the particular application that uses it, and ..."
Abstract
 Add to MetaCart
(Show Context)
Introduction Clustering algorithms have been employed in many fields of human knowledge that require finding a "natural association" among some specific data. The way the term "natural association" is defined depends on the field and the particular application that uses it, and can vary quite a lot. For example, in the life sciences, the data to be clustered consists of life forms and the clusters themselves are species, or subspecies of a given species. The need for using clustering alghorithms is determined by the scientists' wish to study various characteristics of the life forms. [5] mentions a situation where mammals are clustered by similarities in the proportion of three chemicals in their milk. The "natural association" of the animals depends on the percentages of these chemicals in their milk. A number of other domains use clustering algorithms to extract information by grouping a large collection of objects, such as the medical sciences (clustering of symptoms yields