Results 1 - 10
of
26
Cluster Analysis for Gene Expression Data: A Survey
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
- Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
Clustering objects on subsets of attributes
- Journal of the Royal Statistical Society
, 2004
"... Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor.
Pattern Recognition Techniques in Microarray Data Analysis: A Survey. Annals of the New York Academy of Sciences
- of Sciences, techniques in Bioinformatics and Medical Informatics
, 2002
"... analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many re ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many researchers have begun an endeavor in this direction to devise such datamining techniques. As such, there is a need for survey articles that periodically review and summarize the work that has been done in the area. This article presents one such survey. The first portion of the paper is meant to provide the basic biology (mostly for non-biologists) that is required in such a project. This part is only meant to be a starting point for those experts in the technical fields who wish to embark on this new area of bioinformatics. The second portion of the paper is a survey of various data mining techniques that have been used in mining microarray data for biological knowledge and information (such as sequence information). This survey is not meant to be treated as complete in any form, as the area is currently one of the most active, and the body of research is very large. Furthermore, the applications of the techniques mentioned here are not meant to be taken as the most significant applications of the techniques, but simply as some examples among many. Molecular Genome Biology
Functional Bioinformatics of Microarray Data: From Expression to Regulation
, 2002
"... Microarrays are a powerful technique to monitor the expression of thousands of genes in a single experiment. From series of such experiments, it is possible identify the mechanisms that govern the activation of genes in an organism. Short DNA patterns (called binding sites) in or around the genes se ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Microarrays are a powerful technique to monitor the expression of thousands of genes in a single experiment. From series of such experiments, it is possible identify the mechanisms that govern the activation of genes in an organism. Short DNA patterns (called binding sites) in or around the genes serve as switches that control gene expression. As a result similar patterns of expression can correspond to similar binding site patterns. We integrate clustering of coexpressed genes with the discovery of binding motifs. We overview several important clustering techniques and present a clustering algorithm (called adaptive quality-based clustering), which we have developed to address several shortcomings of existing methods. We overview the dierent techniques for motif nding, in particular the technique of Gibbs sampling, and we present several extension of this technique in our Motif Sampler. Finally, we present an integrated web tool called INCLUSive (http://www.esat.kuleuven.ac.be/ ~dna/BioI/Software.html) that allows the easy analysis of microarray data for motif nding.
Gene Clustering Based on Clusterwide Mutual Information
- of the school of engineering. Michel Verleysen was born in 1965 in Belgium. He received the M.S. and Ph.D. degrees in electrical engineering from the Université catholique de Louvain (Belgium) in 1987 and 1992, respectively. He was an Invited Professor at
, 2004
"... Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a ge ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on minimizing mutual information among gene clusters. Simulated annealing is employed to solve the optimization problem. Bootstrap techniques are employed to get more accurate estimates of mutual information when the data sample size is small. Moreover, we propose to combine the mutual information criterion and traditional distance criteria such as the Euclidean distance and the fuzzy membership metric in designing the clustering algorithm. The performances of the new clustering methods are compared with those of some existing methods, using both synthesized data and experimental data. It is seen that the clustering algorithm based on a combined metric of mutual information and fuzzy membership achieves the best performance. The supplemental material is available at www.gspsnap.tamu.edu/gspweb/zxb/glioma_zxb.
Analysis of Microarray Gene Expression Data
- in ‘Handbook of Statistical Genetics’, 2nd edn
, 2003
"... This article reviews the methods utilized in processing and analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of mRNA abundance in a set of tissues or cell populations for thousands of genes simultaneously. Naturally, suc ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This article reviews the methods utilized in processing and analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of mRNA abundance in a set of tissues or cell populations for thousands of genes simultaneously. Naturally, such an experiment requires computational and statistical analysis techniques. At the outset of the processing pipeline, the computational procedures are largely determined by the technology and experimental setup that are used. Subsequently, as more reliable intensity values for genes emerge, pattern discovery methods come into play. The most striking peculiarity of this kind of data is that one usually obtains measurements for thousands of genes for only a much smaller number of conditions. This is at the root of several of the statistical questions discussed here.
A Mixed Factors Model for Dimension Reduction and Extraction of a Group Structure in Gene Expression Data
"... When we cluster tissue samples on the basis of genes, the number of observations to be grouped is much smaller than the dimension of feature vector. In such a case, the applicability of conventional model-based clustering is limited since the high dimensionality of feature vector leads to overfittin ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
When we cluster tissue samples on the basis of genes, the number of observations to be grouped is much smaller than the dimension of feature vector. In such a case, the applicability of conventional model-based clustering is limited since the high dimensionality of feature vector leads to overfitting during the density estimation process. To overcome such difficulty, we attempt a methodological extension of the factor analysis. Our approach enables us not only to prevent from the occurrence of overfitting, but also to handle the issues of clustering, data compression and extracting a set of genes to be relevant to explain the group structure. The potential usefulness are demonstrated with the application to the leukemia dataset. 1.
Statistical methods for microarray assays
- J Appl Genet
, 2002
"... Abstract: The paper shortly reviews statistical methods used in the area of DNA microarray studies. All stages of the experiment are taken into account: planning, data collection, data preprocessing, analysis and validation. Among the methods of data analysis, the algorithms for estimating different ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract: The paper shortly reviews statistical methods used in the area of DNA microarray studies. All stages of the experiment are taken into account: planning, data collection, data preprocessing, analysis and validation. Among the methods of data analysis, the algorithms for estimating differential expression, multivariate approaches, clustering methods, as well as classification and discrimination are reviewed. The need is stressed for routine statistical data processing protocols and for the search of links of microarray data analysis with quantitative genetic models. Key words: data analysis, data collection, DNA microarrays, planning experiments, statistical methods, validation.
Pattern Recognition Techniques in Microarray Data Analysis
- Proc Int Conf on Mathematics and Engineering Techniques in Medicine and Biological Sciences 2002 (METMBS’02), Las Vegas
, 2002
"... Abstract: The explosion in the production rate of expression level data has highlighted the need for automated techniques that help scientists analyze, understand, and cluster (sequence) the enormous amount of data that is being produced. Example of such problems are analyzing gene expression level ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract: The explosion in the production rate of expression level data has highlighted the need for automated techniques that help scientists analyze, understand, and cluster (sequence) the enormous amount of data that is being produced. Example of such problems are analyzing gene expression level data produced by microarray technology on the genomic scale, sequencing genes on the genomic scale, sequencing proteins and amino acids, etc. In this paper we review some of the pattern recognition techniques that have been used in this area.

