Results 1  10
of
13
A Clustering Algorithm based on Graph Connectivity
 Information Processing Letters
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. ..."
Abstract

Cited by 102 (3 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques.
Cluster Analysis for Gene Expression Data: A Survey
 IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract

Cited by 84 (4 self)
 Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
Open Access
"... Full open access to this and thousands of other papers at ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Full open access to this and thousands of other papers at
A Study of Bioinspired Algorithm to Data Clustering using Different Distance Measures
"... Data mining is the process of extracting previously unknown and valid information from large databases. Clustering is an important data analysis and data mining method. It is the unsupervised classification of objects into clusters such that the objects from same cluster are similar and objects from ..."
Abstract
 Add to MetaCart
Data mining is the process of extracting previously unknown and valid information from large databases. Clustering is an important data analysis and data mining method. It is the unsupervised classification of objects into clusters such that the objects from same cluster are similar and objects from different clusters are dissimilar. Data clustering is a difficult unsupervised learning problem because many factors such as distance measures, criterion functions, and initial conditions have come into play. Many algorithms have been proposed in literature. However, some traditional algorithms have drawbacks such as sensitive to initialization and easily trapped in local optima. Recently, bioinspired algorithms such as ant colony algorithms (ACO) and particle swarm optimization algorithms (PSO) have found success in solving clustering problems. These algorithms have also been used in several other reallife applications. They are global optimization techniques. The distance based algorithms have been studied for the clustering problems. This paper provides a study of particle swarm optimization algorithm to data clustering using different distance measures including Euclidean, Manhattan and Chebyshev for well known reallife benchmark medical data sets and an artificially generated data set. The PSObased clustering algorithm using Chebyshev distance measure is better fitness value than those of Euclidean and Manhattan distance measures.
POPULATIONS WHEN CLASS MEMBERSHIP IS UNKNOWN: DEFINING AND DEVELOPING THE LATENT CLASSIFICATION DIFFERENTIAL CHANGE MODEL
, 2005
"... by Kenneth Kelley III Standard methods for analyzing change generally assume that the population of interest is homogeneous or that heterogeneity is known. When a population consists of unknown subpopulations, the parameters within each of the latent classes may be unique to that particular class. I ..."
Abstract
 Add to MetaCart
by Kenneth Kelley III Standard methods for analyzing change generally assume that the population of interest is homogeneous or that heterogeneity is known. When a population consists of unknown subpopulations, the parameters within each of the latent classes may be unique to that particular class. In such a situation the results of standard techniques for analyzing change are misleading, because such methods ignore unobserved heterogeneity and treat the population as if it were homogeneous. The growth mixture model (GMM; Muthén, 2001a; Muthén, 2001b; Muthén, 2002) partly addresses the problem of unknown heterogeneity because the parameters of the GMM are conditional on latent class membership. However, the GMM is necessarily restricted to models of change linear in their parameters (such as polynomial change models). The latent classification
Invariant Hierarchical Clustering Schemes
"... Summary. A general parametric scheme of hierarchical clustering procedures with invariance under monotone transformations of similarity values and invariance under numeration of objects is described. This scheme consists of two steps: correction of given similarity values between objects and transit ..."
Abstract
 Add to MetaCart
Summary. A general parametric scheme of hierarchical clustering procedures with invariance under monotone transformations of similarity values and invariance under numeration of objects is described. This scheme consists of two steps: correction of given similarity values between objects and transitive closure of obtained valued relation. Some theoretical properties of considered scheme are studied. Different parametric classes of clustering procedures from this scheme based on perceptions like “keep similarity classes, ” “break bridges between clusters,” etc. are considered. Several examples are used to illustrate the application of proposed clustering procedures to analysis of similarity structures of data. 1
Fall Term
, 2004
"... Any reasonably large group of individuals, families, states, and parties exhibits the phenomenon of subgroup formations within the group such that the members of each group have a strong connection or bonding between each other. The reasons of the formation of these subgroups that we call alliances ..."
Abstract
 Add to MetaCart
Any reasonably large group of individuals, families, states, and parties exhibits the phenomenon of subgroup formations within the group such that the members of each group have a strong connection or bonding between each other. The reasons of the formation of these subgroups that we call alliances differ in different situations, such as, kinship and friendship (in the case of individuals), common economic interests (for both individuals and states), common political interests, and geographical proximity. This structure of alliances is not only prevalent in social networks, but it is also an important characteristic of similarity networks of natural and unnatural objects. (A similarity network defines the links between two objects based on their similarities). Discovery of such structure in a data set is called clustering or unsupervised learning and the ability to do it automatically is desirable for many applications in the areas of pattern recognition, computer vision, artificial intelligence, behavioral and social sciences, life sciences, earth sciences, medicine, and information theory. In this dissertation, we study a graph theoretical model of alliances where an alliance of the vertices of a graph is a set of vertices in the graph, such that every vertex in the set