Results 1  10
of
10
A Clustering Algorithm based on Graph Connectivity
 Information Processing Letters
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. ..."
Abstract

Cited by 99 (3 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques.
Cluster Analysis for Gene Expression Data: A Survey
 IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
Seer: Predictive File Hoarding for Disconnected Mobile Operation
, 1997
"... of the Dissertation Seer: Predictive File Hoarding for Disconnected Mobile Operation by Geoffrey H. Kuenning Doctor of Philosophy in Computer Science University of California, Los Angeles, 1997 Professor Gerald J. Popek, Cochair Professor Wesley W. Chu, Cochair Because of the limited stor ..."
Abstract
 Add to MetaCart
of the Dissertation Seer: Predictive File Hoarding for Disconnected Mobile Operation by Geoffrey H. Kuenning Doctor of Philosophy in Computer Science University of California, Los Angeles, 1997 Professor Gerald J. Popek, Cochair Professor Wesley W. Chu, Cochair Because of the limited storage space available on portable computers, disconnected mobile users must restrict their work to a subset of the files available on their network. The list of files needed to accomplish useful work is large, nonintuitive, and constantly changing. Selecting a subset by hand is difficult, timeconsuming, and errorprone, suggesting that an automated solution is desirable. Our thesis is that it is possible and practical to automate the process of choosing files to be stored on a portable computer. To validate this thesis, we conducted a preliminary study in a live business environment, which demonstrated that the approach was feasible. We then developed a new metric, semantic distance...
Bioinformatics
, 2003
"... Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We e ..."
Abstract
 Add to MetaCart
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.
POPULATIONS WHEN CLASS MEMBERSHIP IS UNKNOWN: DEFINING AND DEVELOPING THE LATENT CLASSIFICATION DIFFERENTIAL CHANGE MODEL
, 2005
"... by Kenneth Kelley III Standard methods for analyzing change generally assume that the population of interest is homogeneous or that heterogeneity is known. When a population consists of unknown subpopulations, the parameters within each of the latent classes may be unique to that particular class. I ..."
Abstract
 Add to MetaCart
by Kenneth Kelley III Standard methods for analyzing change generally assume that the population of interest is homogeneous or that heterogeneity is known. When a population consists of unknown subpopulations, the parameters within each of the latent classes may be unique to that particular class. In such a situation the results of standard techniques for analyzing change are misleading, because such methods ignore unobserved heterogeneity and treat the population as if it were homogeneous. The growth mixture model (GMM; Muthén, 2001a; Muthén, 2001b; Muthén, 2002) partly addresses the problem of unknown heterogeneity because the parameters of the GMM are conditional on latent class membership. However, the GMM is necessarily restricted to models of change linear in their parameters (such as polynomial change models). The latent classification
Invariant Hierarchical Clustering Schemes
"... Summary. A general parametric scheme of hierarchical clustering procedures with invariance under monotone transformations of similarity values and invariance under numeration of objects is described. This scheme consists of two steps: correction of given similarity values between objects and transit ..."
Abstract
 Add to MetaCart
Summary. A general parametric scheme of hierarchical clustering procedures with invariance under monotone transformations of similarity values and invariance under numeration of objects is described. This scheme consists of two steps: correction of given similarity values between objects and transitive closure of obtained valued relation. Some theoretical properties of considered scheme are studied. Different parametric classes of clustering procedures from this scheme based on perceptions like “keep similarity classes, ” “break bridges between clusters,” etc. are considered. Several examples are used to illustrate the application of proposed clustering procedures to analysis of similarity structures of data. 1
A Study of Bioinspired Algorithm to Data Clustering using Different Distance Measures
"... Data mining is the process of extracting previously unknown and valid information from large databases. Clustering is an important data analysis and data mining method. It is the unsupervised classification of objects into clusters such that the objects from same cluster are similar and objects from ..."
Abstract
 Add to MetaCart
Data mining is the process of extracting previously unknown and valid information from large databases. Clustering is an important data analysis and data mining method. It is the unsupervised classification of objects into clusters such that the objects from same cluster are similar and objects from different clusters are dissimilar. Data clustering is a difficult unsupervised learning problem because many factors such as distance measures, criterion functions, and initial conditions have come into play. Many algorithms have been proposed in literature. However, some traditional algorithms have drawbacks such as sensitive to initialization and easily trapped in local optima. Recently, bioinspired algorithms such as ant colony algorithms (ACO) and particle swarm optimization algorithms (PSO) have found success in solving clustering problems. These algorithms have also been used in several other reallife applications. They are global optimization techniques. The distance based algorithms have been studied for the clustering problems. This paper provides a study of particle swarm optimization algorithm to data clustering using different distance measures including Euclidean, Manhattan and Chebyshev for well known reallife benchmark medical data sets and an artificially generated data set. The PSObased clustering algorithm using Chebyshev distance measure is better fitness value than those of Euclidean and Manhattan distance measures.