Results 1 - 10
of
15
Applications of Resampling Methods to Estimate the Number of Clusters and to Improve the Accuracy of a Clustering Method
, 2001
"... The burgeoning field of genomics, and in particular microarray experiments, have revived interest in both discriminant and cluster analysis, by raising new methodological and computational challenges. The present paper discusses applications of resampling methods to problems in cluster analysis. A r ..."
Abstract
-
Cited by 111 (0 self)
- Add to MetaCart
The burgeoning field of genomics, and in particular microarray experiments, have revived interest in both discriminant and cluster analysis, by raising new methodological and computational challenges. The present paper discusses applications of resampling methods to problems in cluster analysis. A resampling method, known as bagging in discriminant analysis, is applied to increase clustering accuracy and to assess the confidence of cluster assignments for individual observations. A novel prediction-based resampling method is also proposed to estimate the number of clusters, if any, in a dataset. The performance of the proposed and existing methods are compared using simulated data and gene expression data from four recently published cancer microarray studies.
Consensus clustering -- A resampling-based method for class discovery and visualization of gene expression microarray data
- MACHINE LEARNING, FUNCTIONAL GENOMICS SPECIAL ISSUE
, 2003
"... ..."
Resampling Method For Unsupervised Estimation Of Cluster Validity
- Neural Computation
, 2001
"... We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resampling give ris ..."
Abstract
-
Cited by 56 (3 self)
- Add to MetaCart
We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a one-dimensional data set, for which an analytic approximation for the figure of merit is derived and compared with numerical measurements. Next, the applicability of the method is demonstrated for higher dimensional data, including gene microarray expression data.
Dynamic Profiling of Online Auctions Using Curve Clustering”, Working
, 2003
"... Electronic commerce, and in particular online auctions, have received an extreme surge of popularity in recent years. While auction theory has been studied for a long time from a game-theory perspective, the electronic implementation of the auction mechanism poses new and challenging research questi ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Electronic commerce, and in particular online auctions, have received an extreme surge of popularity in recent years. While auction theory has been studied for a long time from a game-theory perspective, the electronic implementation of the auction mechanism poses new and challenging research questions. Although the body of empirical research on online auctions is growing, there is a lack of treatment of these data from a modern statistical point of view. In this work, we present a new source of rich auction data and introduce an innovative way of modelling and analyzing online bidding behavior. In particular, we use functional data analysis to investigate and scrutinize online auction dynamics. We describe the structure of such data and suggest suitable methods, including data smoothing and curve clustering, that allow one to profile online auctions and display different bidding behavior. We illustrate the methods on a set of eBay auction data and tie our results to the existing literature on online auctions. Key words and phrases: functional data analysis, smoothing, penalized splines, clustering, unsupervised
A novel approach for clustering proteomics data using bayesian fast fourier transform
- Bioinformatics
, 2005
"... * To Whom correspondence should be addressed ..."
T.: Consensus clustering
- Machine Learning 52 (2003) 91–118 Functional Genomics Special Issue
"... A resampling-based method for class discovery and visualization of gene expression microarray data ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A resampling-based method for class discovery and visualization of gene expression microarray data
An a contrario approach to hierarchical clustering validity assessment
, 1647
"... In this paper we present a method to detect natural groups in a data set, based on hierarchical clustering. A measure of the meaningfulness of clusters, derived from a background model assuming no class structure in the data, provides a way to compare clusters, and leads to a cluster validity criter ..."
Abstract
- Add to MetaCart
In this paper we present a method to detect natural groups in a data set, based on hierarchical clustering. A measure of the meaningfulness of clusters, derived from a background model assuming no class structure in the data, provides a way to compare clusters, and leads to a cluster validity criterion. This criterion is applied to every cluster in the nested structure. While all clusters passing the validity test are meaningful in themselves, the set of all of them will probably provide a redundant data representation. By selecting a subset of the meaningful clusters, a good data representation, which also discards outliers, can be achieved. The strategy we propose combines a new merging criterion (also derived from the background model) with a selection of local maxima of the meaningfulness with respect to inclusion, in the nested hierarchical structure.
IJDAR DOI 10.1007/s10032-009-0089-5 ORIGINAL PAPER
"... Abstract When searching for blogs on a specific topic, information seekers prefer blogs that place a central focus on that topic over blogs whose mention of the topic is diffuse or incidental. In order to present users with better blog feed search results, we developed a measure of topical consisten ..."
Abstract
- Add to MetaCart
Abstract When searching for blogs on a specific topic, information seekers prefer blogs that place a central focus on that topic over blogs whose mention of the topic is diffuse or incidental. In order to present users with better blog feed search results, we developed a measure of topical consistency that is able to capture whether or not a blog is topically focused. The measure, called the coherence score, is inspired by the genetics literature and captures the tightness of the clustering structure of a data set relative to a background collection. In a set of experiments on synthetic data, the coherence score is shown to provide a faithful reflection of topic clustering structure. The properties that make the coherence score more appropriate than lexical cohesion, a common measure of topical structure, are discussed. Retrieval experiments show that integrating the coherence score as a prior in a language modeling-based approach to blog feed search improves retrieval effectiveness. The coherence score must, however, be used judiciously in order to avoid boosting the ranking of irrelevant but topically focused blogs. To this end, we experiment with a series of weighting schemes that adjust the contribution of the coherence score according This paper is a revised and extended version of [19].

