Results 1  10
of
48
ModelBased Clustering, Discriminant Analysis, and Density Estimation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract

Cited by 431 (29 self)
 Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for modelbased clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Maximal Vector Computation in Large Data Sets
 IN VLDB
, 2005
"... Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull  and hence, linear optimization  and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyl ..."
Abstract

Cited by 81 (1 self)
 Add to MetaCart
Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull  and hence, linear optimization  and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved. The initial
Randomized Competitive Algorithms for the List Update Problem
 Algorithmica
, 1992
"... We prove upper and lower bounds on the competitiveness of randomized algorithms for the list update problem of Sleator and Tarjan. We give a simple and elegant randomized algorithm that is more competitive than the best previous randomized algorithm due to Irani. Our algorithm uses randomness only d ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
(Show Context)
We prove upper and lower bounds on the competitiveness of randomized algorithms for the list update problem of Sleator and Tarjan. We give a simple and elegant randomized algorithm that is more competitive than the best previous randomized algorithm due to Irani. Our algorithm uses randomness only during an initialization phase, and from then on runs completely deterministically. It is the first randomized competitive algorithm with this property to beat the deterministic lower bound. We generalize our approach to a model in which access costs are fixed but update costs are scaled by an arbitrary constant d. We prove lower bounds for deterministic list update algorithms and for randomized algorithms against oblivious and adaptive online adversaries. In particular, we show that for this problem adaptive online and adaptive offline adversaries are equally powerful. 1 Introduction Recently much attention has been given to competitive analysis of online algorithms [7, 20, 22, 25]. Ro...
Refreshing the sky: the compressed skycube with efficient support for frequent updates
 In SIGMOD
, 2006
"... The skyline query is important in many applications such as multicriteria decision making, data mining, and userpreference queries. Given a set of ddimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
(Show Context)
The skyline query is important in many applications such as multicriteria decision making, data mining, and userpreference queries. Given a set of ddimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of d dimensions. This paper focuses on supporting concurrent and unpredictable subspace skyline queries in frequent updated databases. Simply to compute and store the skyline objects of every subspace in a skycube will incur expensive update cost. In this paper, we investigate the important issue of updating the skycube in a dynamic environment. To balance the query cost and update cost, we propose a new structure, the compressed skycube, which concisely represents the complete skycube. We thoroughly explore the properties of the compressed skycube and provide an efficient objectaware update scheme. Experimental results show that the compressed skycube is both query and update efficient. 1.
Orthogonal Range Searching on the RAM, Revisited
, 2011
"... We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model: 1. We present two data structures for 2d orthogonal range emptiness. The first achieves O(n lg lg n) space and ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
(Show Context)
We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model: 1. We present two data structures for 2d orthogonal range emptiness. The first achieves O(n lg lg n) space and O(lg lg n) query time, assuming that the n given points are in rank space. This improves the previous results by Alstrup, Brodal, and Rauhe (FOCS’00), with O(n lg ε n) space and O(lg lg n) query time, or with O(n lg lg n) space and O(lg 2 lg n) query time. Our second data structure uses O(n) space and answers queries in O(lg ε n) time. The best previous O(n)space data structure, due to Nekrich (WADS’07), answers queries in O(lg n / lg lg n) time. 2. We give a data structure for 3d orthogonal range reporting with O(n lg 1+ε n) space and O(lg lg n+ k) query time for points in rank space, for any constant ε> 0. This improves the previous results by Afshani (ESA’08), Karpinski and Nekrich (COCOON’09), and Chan (SODA’11), with O(n lg 3 n) space and O(lg lg n + k) query time, or with O(n lg 1+ε n) space and O(lg 2 lg n + k) query time. Consequently, we obtain improved upper bounds for orthogonal range reporting in all constant dimensions above 3.
Algorithms and Analyses for Maximal Vector Computation
"... The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are exte ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors n and the dimensionality k. Early algorithms based on divideandconquer established seemingly good average and worstcase asymptotic runtimes. In fact, the problem can be solved in O(n) averagecase (holding k as fixed). We prove, however, that the performance is quite bad with respect to k. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve O(kn) averagecase. While k matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its averagecase running time is O(kn). 1
Scalable Skyline Computation Using Objectbased Space Partitioning
"... The skyline operator returns from a set of multidimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multiobjective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority o ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
The skyline operator returns from a set of multidimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multiobjective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority of them focuses on minimizing the I/O cost. However, in high dimensional spaces, the problem can easily become CPUbound due to the large number of computations required for comparing objects with current skyline points while scanning the database. Based on this observation, we propose a dynamic indexing technique for skyline points that can be integrated into stateoftheart sortbased skyline algorithms to boost their computational performance. The new indexing and dominance checking approach is supported by a theoretical analysis, while our experiments show that it scales well with the input size and dimensionality not only because unnecessary dominance checks are avoided but also because it allows efficient dominance checking with the help of bitwise operations.
SelfOrganizing Data Structures
 In
, 1998
"... . We survey results on selforganizing data structures for the search problem and concentrate on two very popular structures: the unsorted linear list, and the binary search tree. For the problem of maintaining unsorted lists, also known as the list update problem, we present results on the competit ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
. We survey results on selforganizing data structures for the search problem and concentrate on two very popular structures: the unsorted linear list, and the binary search tree. For the problem of maintaining unsorted lists, also known as the list update problem, we present results on the competitiveness achieved by deterministic and randomized online algorithms. For binary search trees, we present results for both online and offline algorithms. Selforganizing data structures can be used to build very effective data compression schemes. We summarize theoretical and experimental results. 1 Introduction This paper surveys results in the design and analysis of selforganizing data structures for the search problem. The general search problem in pointer data structures can be phrased as follows. The elements of a set are stored in a collection of nodes. Each node also contains O(1) pointers to other nodes and additional state data which can be used for navigation and selforganizati...
More OutputSensitive Geometric Algorithms (Extended Abstract)
 In Proc. 35th Annu. IEEE Sympos. Found. Comput. Sci
, 1994
"... A simple idea for speeding up the computation of extrema of a partially ordered set turns out to have a number of interesting applications in geometric algorithms; the resulting algorithms generally replace an appearance of the input size n in the running time by an output size A n. In particular, ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
A simple idea for speeding up the computation of extrema of a partially ordered set turns out to have a number of interesting applications in geometric algorithms; the resulting algorithms generally replace an appearance of the input size n in the running time by an output size A n. In particular, the A coordinatewise minima of a set of n points in R d can be found by an algorithm needing O(nA) time. Given n points uniformly distributed in the unit square, the algorithm needs n + O(n 5=8 ) point comparisons on average. Given a set of n points in R d , another algorithm can find its A extreme points in O(nA) time. Thinning for nearestneighbor classification can be done in time O(n log n) P i A i n i , finding the A i irredundant points among n i points for each class i, where n = P i n i is the total number of input points. This sharpens a more obvious O(n 3 ) algorithm, which is also given here. Another algorithm is given that needs O(n) space to compute the convex ...