Results 1  10
of
556
Approximate Clustering of Fingerprint Vectors with Missing Values
 In Proc. 11th Computing: The Australasian Theory Symposium (CATS), volume 41 of CRPIT
, 2005
"... We study the problem of clustering fingerprints with at most p missing values (CMV(p) for short) naturally arising in oligonucleotide fingerprinting, which is an e#cient method for characterizing DNA clone libraries. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We study the problem of clustering fingerprints with at most p missing values (CMV(p) for short) naturally arising in oligonucleotide fingerprinting, which is an e#cient method for characterizing DNA clone libraries.
Parallel Preconditioning with Sparse Approximate Inverses
 SIAM J. Sci. Comput
, 1996
"... A parallel preconditioner is presented for the solution of general sparse linear systems of equations. A sparse approximate inverse is computed explicitly, and then applied as a preconditioner to an iterative method. The computation of the preconditioner is inherently parallel, and its application o ..."
Abstract

Cited by 226 (10 self)
 Add to MetaCart
only requires a matrixvector product. The sparsity pattern of the approximate inverse is not imposed a priori but captured automatically. This keeps the amount of work and the number of nonzero entries in the preconditioner to a minimum. Rigorous bounds on the clustering of the eigenvalues
Practical privacy: the sulq framework
 In PODS ’05: Proceedings of the twentyfourth ACM SIGMODSIGACTSIGART symposium on Principles of database systems
, 2005
"... We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping ..."
Abstract

Cited by 223 (35 self)
 Add to MetaCart
analysis to realvalued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations
Clustering binary fingerprint vectors with missing values for dna array data analysis
 Journal of Computational Biology
, 2003
"... Oligonucleotide fingerprinting is a powerful DNA array based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the metho ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized
Clustering aggregation
 in ICDE 2005, 2005
"... We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each cat ..."
Abstract

Cited by 109 (1 self)
 Add to MetaCart
for handling missing values. We give a formal statement of the clusteringaggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale
Performance Evaluation of Some Clustering Algorithms and Validity Indices
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently de ..."
Abstract

Cited by 110 (2 self)
 Add to MetaCart
developed index I. Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard Kpartition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering
Fuzzy kmeans clustering with missing values
 Proc AMIA Symp
, 2001
"... Fuzzy Kmeans clustering algorithm is a popular approach for exploring the structure of a set of patterns, especially when the clusters are overlapping or fuzzy. However, the fuzzy Kmeans clustering algorithm cannot be applied when the data contain missing values. In many cases, the number of patte ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Fuzzy Kmeans clustering algorithm is a popular approach for exploring the structure of a set of patterns, especially when the clusters are overlapping or fuzzy. However, the fuzzy Kmeans clustering algorithm cannot be applied when the data contain missing values. In many cases, the number
A local search approximation algorithm for kmeans clustering
, 2004
"... In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are kno ..."
Abstract

Cited by 113 (1 self)
 Add to MetaCart
fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study
Distributed optimization in sensor networks
 In 3rd Int. Symp. on Information Processing in Sensor Networks (IPSN’04
, 2004
"... Wireless sensor networks are capable of collecting an enormous amount of data over space and time. Often, the ultimate objective is to derive an estimate of a parameter or function from these data. This paper investigates a general class of distributed algorithms for “innetwork ” data processing, e ..."
Abstract

Cited by 141 (2 self)
 Add to MetaCart
that for a broad class of estimation problems the distributed algorithms converge to within an ɛball around the globally optimal value. Furthermore, bounds on the number incremental steps required for a particular level of accuracy provide insight into the tradeoff between estimation performance
Random Sampling for Histogram Construction: How much is enough?
, 1998
"... Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of determining "How much sampling is enough?" We address this critical question in the context ..."
Abstract

Cited by 126 (14 self)
 Add to MetaCart
establishing an optimal bound on the amount of sampling required for prespecified error bounds. We also describe an adaptive page sampling algorithm which achieves greater efficiency by using all values in a sampled page but adjusts the amount of sampling depending on clustering of values in pages. Next, we
Results 1  10
of
556