• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

AD: A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes (0)

by P Baldi, Long
Venue:Bioinformatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 117
Next 10 →

Linear models and empirical Bayes methods for assessing differential expression in microarray experiments

by Gordon K. Smyth - STAT. APPL. GENET. MOL. BIOL , 2004
"... ..."
Abstract - Cited by 201 (3 self) - Add to MetaCart
Abstract not found

Cluster Analysis for Gene Expression Data: A Survey

by Daxin Jiang, Chun Tang, Aidong Zhang - IEEE Transactions on Knowledge and Data Engineering , 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract - Cited by 48 (3 self) - Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.

Fundamentals of cDNA Microarray Data Analysis

by Yuk Fai Leung, Duccio Cavalieri - Trends Genet , 2003
"... genomics research. The multi-step, data-intensive nature of this technology has created an unprecedented informatics and analytical challenge. It is important to understand the crucial steps that can affect the outcome of the analysis. In this review, we provide an overview of the contemporary trend ..."
Abstract - Cited by 32 (0 self) - Add to MetaCart
genomics research. The multi-step, data-intensive nature of this technology has created an unprecedented informatics and analytical challenge. It is important to understand the crucial steps that can affect the outcome of the analysis. In this review, we provide an overview of the contemporary trend on various main analysis steps in the microarray data analysis process, which includes experimental design, data standardization, image acquisition and analysis, normalization, statistical significance inference, exploratory data analysis, class prediction and pathway analysis, as well as various considerations relevant to their implementation. The development of microarray technology has been phenomenal in the past few years. It has become a standard tool in many genomics research laboratories. The reason for this popularity is that microarrays have revolutionized the approach to biological research. Instead of working on a gene-by-gene basis, scientists can now study tens of thousands of genes at once. Unfortunately, they are often daunted and confused by the complexity of data analyses. Although it is advisable to collaborate with statisticians and mathematicians on performing a proper data analysis, it is crucial to understand the fundamentals of data analysis. In this review, we explain these fundamentals step-by-step (Figure 1; Table 1). Instead of discussing any particular analysis software, we focus primarily on the rationale behind the analysis processes and the key factors that affect the quality of the result. For a compilation of current microarray analysis software see a recent article [1] and author’s website

Use of within-array replicate spots for assessing differential expression in microarray experiments

by Gordon K. Smyth, Joëlle Michaud, Hamish S. Scott - Bioinformatics , 2005
"... Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing ..."
Abstract - Cited by 30 (0 self) - Add to MetaCart
Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This loses valuable information about gene-wise variability. Results. A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. Availability. The methodology is implemented in the limma software package for R, available from the CRAN repository

Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays

by Peter Müller, Giovanni Parmigiani, Christian Robert, Judith Rousseau - Journal of the American Statistical Association , 2004
"... We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about dierential gene expression. However, the approach is valid in any application that involves multip ..."
Abstract - Cited by 30 (1 self) - Add to MetaCart
We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about dierential gene expression. However, the approach is valid in any application that involves multiple comparison in a large number of hypothesis tests.

Pattern Recognition Techniques in Microarray Data Analysis: A Survey. Annals of the New York Academy of Sciences

by Faramarz Valafar - of Sciences, techniques in Bioinformatics and Medical Informatics , 2002
"... analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many re ..."
Abstract - Cited by 21 (0 self) - Add to MetaCart
analysis Abstract: Recent development of technologies (e.g. microarray technology) that are capable of producing massive amounts of genetic data has highlighted the need for new pattern recognition techniques that can mine and discover “biologically meaningful ” knowledge in large data sets. Many researchers have begun an endeavor in this direction to devise such datamining techniques. As such, there is a need for survey articles that periodically review and summarize the work that has been done in the area. This article presents one such survey. The first portion of the paper is meant to provide the basic biology (mostly for non-biologists) that is required in such a project. This part is only meant to be a starting point for those experts in the technical fields who wish to embark on this new area of bioinformatics. The second portion of the paper is a survey of various data mining techniques that have been used in mining microarray data for biological knowledge and information (such as sequence information). This survey is not meant to be treated as complete in any form, as the area is currently one of the most active, and the body of research is very large. Furthermore, the applications of the techniques mentioned here are not meant to be taken as the most significant applications of the techniques, but simply as some examples among many. Molecular Genome Biology

Practical Approaches to Analyzing Results of Microarray Experiments

by Naftali Kaminski, Nir Friedman
"... this article we provide a practically oriented review focusing on methods for analysis of large-scale gene expression data in the research laboratory. We describe the various common clustering methods and outline our approach to using them. We dis- cuss methods for scoring genes for their relevance, ..."
Abstract - Cited by 19 (5 self) - Add to MetaCart
this article we provide a practically oriented review focusing on methods for analysis of large-scale gene expression data in the research laboratory. We describe the various common clustering methods and outline our approach to using them. We dis- cuss methods for scoring genes for their relevance, focusing on the statistical meaning of microarray results, especially with regard to the problem of multiple testing. We also deal with the problem of adding biologic meaning to the results of microarray experiments and describe advanced tools that represent different but valid directions in providing automated solutions to this problem. The tools and approaches described and discussed here should provide the reader with a preliminary understanding of the analysis of the results of microarray experiments. The practical focus of this review should remove the mystery behind the analysis of microarray experiments, thus leading to more productive and efficient use of the technology. Microarray technology is rapidly becoming a standard technique used in research laboratories all across the world. In essence, all the variants of the technology allow simultaneous profiling of the expression levels of tens of thousands of genes, potentially whole genomes in a single experiment (1--3). This unique power provides scientists with an opportunity to look at the transcriptional profile of biologic systems, processes, and diseases in an unbiased fashion. The relative ease (despite the prohibitive cost) of performing microarray experiments in molecular laboratory settings, combined with the potential power of the technology, have captured the imagination of scientists in academic and industry research institutes. This combination of ease of use with unforeseen power also appealed to adminis...

Bayesian robust inference for differential gene expression in microarrays with multiple samples

by Raphael Gottardo, Adrian E. Raftery, Ka Yee Yeung, Roger E. Bumgarner - Biometrics , 2006
"... We consider the problem of identifying differentially expressed genes under different conditions using cDNA microarrays. Standard statistical methods cannot be used because typically there are thousands of genes and few replicates. Because of the many steps involved in the experimental process, from ..."
Abstract - Cited by 17 (3 self) - Add to MetaCart
We consider the problem of identifying differentially expressed genes under different conditions using cDNA microarrays. Standard statistical methods cannot be used because typically there are thousands of genes and few replicates. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Outliers are modeled explicitly using a t-distribution. The model includes an exchangeable prior for the variances which allow different variances for the genes but still shrink extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov Chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available

Next station in microarray data analysis: GEPAS

by David Montaner, Joaquín Tárraga, Jaime Huerta-cepas, Jordi Burguet, Juan M. Vaquerizas, Lucía Conde, Pablo Minguez, Javier Vera, Sach Mukherjee, Joan Valls, Miguel A. G. Pujana, Eva Alloza, Javier Herrero, Fátima Al-shahrour, Joaquín Dopazo - Nucleic Acids Res , 2006
"... The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web- ..."
Abstract - Cited by 16 (9 self) - Add to MetaCart
The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and arraycomparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at

Distribution Patterns of Over-Represented k-mers in Non-Coding Yeast DNA

by Steven Hampson, Dennis Kibler, Pierre Baldi
"... Motivation: Over-represented k-mers in genomic DNA regions are often of particular biological interest. For example, over-represented k-mers in co-regulated families of genes are associated with the DNA binding sites of transcription factors. To measure over-representation, we introduce a statistic ..."
Abstract - Cited by 14 (3 self) - Add to MetaCart
Motivation: Over-represented k-mers in genomic DNA regions are often of particular biological interest. For example, over-represented k-mers in co-regulated families of genes are associated with the DNA binding sites of transcription factors. To measure over-representation, we introduce a statistical background model based on single-mismatches, and apply it to the pooled 500bp ORF upstream regions of yeast. More importantly, we investigate the context and spatial distribution of overrepresented k-mers in yeast upstream regions.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University