Results 1  10
of
544,619
ModelBased Clustering, Discriminant Analysis, and Density Estimation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract

Cited by 557 (28 self)
 Add to MetaCart
for modelbased clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster
Approximate Signal Processing
, 1997
"... It is increasingly important to structure signal processing algorithms and systems to allow for trading off between the accuracy of results and the utilization of resources in their implementation. In any particular context, there are typically a variety of heuristic approaches to managing these tra ..."
Abstract

Cited by 516 (2 self)
 Add to MetaCart
these tradeoffs. One of the objectives of this paper is to suggest that there is the potential for developing a more formal approach, including utilizing current research in Computer Science on Approximate Processing and one of its central concepts, Incremental Refinement. Toward this end, we first summarize a
The space complexity of approximating the frequency moments
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1996
"... The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, ..."
Abstract

Cited by 855 (12 self)
 Add to MetaCart
The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly
OPTICS: Ordering Points To Identify the Clustering Structure
, 1999
"... Cluster analysis is a primary method for database mining. It is either used as a standalone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of ..."
Abstract

Cited by 511 (49 self)
 Add to MetaCart
Cluster analysis is a primary method for database mining. It is either used as a standalone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all
PseudoRandom Generation from OneWay Functions
 PROC. 20TH STOC
, 1988
"... Pseudorandom generators are fundamental to many theoretical and applied aspects of computing. We show howto construct a pseudorandom generator from any oneway function. Since it is easy to construct a oneway function from a pseudorandom generator, this result shows that there is a pseudorandom gene ..."
Abstract

Cited by 887 (22 self)
 Add to MetaCart
Pseudorandom generators are fundamental to many theoretical and applied aspects of computing. We show howto construct a pseudorandom generator from any oneway function. Since it is easy to construct a oneway function from a pseudorandom generator, this result shows that there is a pseudorandom
A Guided Tour to Approximate String Matching
 ACM COMPUTING SURVEYS
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 584 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems.
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 766 (29 self)
 Add to MetaCart
propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length
GPFS: A SharedDisk File System for Large Computing Clusters
 In Proceedings of the 2002 Conference on File and Storage Technologies (FAST
, 2002
"... GPFS is IBM's parallel, shareddisk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community ove ..."
Abstract

Cited by 518 (3 self)
 Add to MetaCart
GPFS is IBM's parallel, shareddisk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community
Results 1  10
of
544,619