#### DMCA

## Linear models and empirical bayes methods for assessing differential expression in microarray experiments. (2004)

### Cached

### Download Links

Venue: | Stat. Appl. Genet. Mol. Biol. |

Citations: | 1319 - 24 self |

### Citations

2481 |
Significance analysis of microarrays applied to the ionizing radiation response
- VG, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...statistics. The idea of using a t-statistic with a Bayesian adjusted denominator was also proposed by Baldi and Long (2001) who developed the useful cyberT program. Their work was limited though to two-sample control versus treatment designs and their model did not distinguish between differentially and non-differentially expressed genes. They also did not develop consistent estimators for the hyperparameters. The degrees of freedom associated with the prior distribution of the variances was set to a default value while the prior variance was simply equated to locally pooled sample variances. Tusher et al (2001), Efron et al (2001) and Broberg (2003) have used t statistics with offset standard deviations. This is similar in principle to the moderated t-statistics used here but the offset t-statistics are not motivated by a model and do not have an associated distributional theory. Tusher et al (2001) estimated the offset by minimizing a coefficient of variation while Efron et al (2001) used a percentile of the distribution of sample standard deviations. Broberg (2003) considered the two sample problem and proposed a computationally intensive method of determining the offset by minimizing a combinatio... |

774 | Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection
- Li, Wong
- 2001
(Show Context)
Citation Context ...a set of n microarrays yielding a response vector yTg = (yg1, . . . , ygn) for the gth gene. The responses will usually be log-ratios for twocolor data or log-intensities for single channel data, although other transformations are possible. The responses are assumed to be suitably normalized to remove dye-bias and other technological artifacts; see for example Huber et al (2002) or Smyth and Speed (2003). In the case of high density oligonucleotide array, the probes are assumed to have been normalized to produce an expression summary, represented here as ygi, for each gene on each array as in Li and Wong (2001) or Irizarry et al (2003). We assume that E(yg) = Xαg where X is a design matrix of full column rank and αg is a coefficient vector. We assume var(yg) = Wgσ 2 g where Wg is a known non-negative definite weight matrix. The vector yg may contain missing values and the matrix Wg may contain diagonal weights which are zero. Certain contrasts of the coefficients are assumed to be of biological interest and these are defined by βg = C T αg. We assume that it is of interest to test whether individual contrast values βgj are equal to zero. For example, with design (d) above the experimenter might want... |

772 | Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R
- Smyth
- 2005
(Show Context)
Citation Context ...proved test statistics. In many gene discovery experiments for which microarrays are used the primary aim is to rank the genes in order of evidence against H0 rather than to assign absolute p-values (=-=Smyth et al, 2003-=-). This is because only a limited number of genes may be followed up for further study regardless of the number which are significant. Even when the above distributional assumptions fail for a given d... |

492 | Empirical Bayes analysis of a microarray experiment
- Efron, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...alysis of microarray experiments for the amount of multiple testing, perhaps by controlling the familywise error rate or the false discovery rate, even though this reduces the power available to detect changes in expression for individual genes (Ge et al, 2002). On the other hand, the parallel nature of the inference in microarrays allows some compensating possibilities for borrowing information from the ensemble of genes which can assist in inference about each gene individually. One way that this can be done is through the application of Bayes or empirical Bayes methods (Efron, 2001, 2003). Efron et al (2001) used a non-parametric empirical Bayes approach for the analysis of factorial data with high density oligonucleotide microarray data. This approach has much potential but can be difficult to apply in practical situations especially by less experienced practitioners. Lonnstedt and Speed (2002), considering replicated two-color microarray experiments, took instead a parametric empirical Bayes approach using a simple mixture of normal models and a conjugate prior and derived a pleasingly simple expression for the posterior odds of differential expression for each gene. The posterior odds express... |

491 | A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes
- Baldi, Long
- 2001
(Show Context)
Citation Context ...ferential Expression Published by The Berkeley Electronic Press, 2004 the posterior odds of reducing the number of hyperparameters which need to estimated under the hierarchical model; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests involving two or more contrasts through the use of moderated F -statistics. The idea of using a t-statistic with a Bayesian adjusted denominator was also proposed by Baldi and Long (2001) who developed the useful cyberT program. Their work was limited though to two-sample control versus treatment designs and their model did not distinguish between differentially and non-differentially expressed genes. They also did not develop consistent estimators for the hyperparameters. The degrees of freedom associated with the prior distribution of the variances was set to a default value while the prior variance was simply equated to locally pooled sample variances. Tusher et al (2001), Efron et al (2001) and Broberg (2003) have used t statistics with offset standard deviations. This is ... |

470 | Summaries of affymetrix genechip probe level data
- Irizarry, Boldstad, et al.
(Show Context)
Citation Context ... yielding a response vector yTg = (yg1, . . . , ygn) for the gth gene. The responses will usually be log-ratios for twocolor data or log-intensities for single channel data, although other transformations are possible. The responses are assumed to be suitably normalized to remove dye-bias and other technological artifacts; see for example Huber et al (2002) or Smyth and Speed (2003). In the case of high density oligonucleotide array, the probes are assumed to have been normalized to produce an expression summary, represented here as ygi, for each gene on each array as in Li and Wong (2001) or Irizarry et al (2003). We assume that E(yg) = Xαg where X is a design matrix of full column rank and αg is a coefficient vector. We assume var(yg) = Wgσ 2 g where Wg is a known non-negative definite weight matrix. The vector yg may contain missing values and the matrix Wg may contain diagonal weights which are zero. Certain contrasts of the coefficients are assumed to be of biological interest and these are defined by βg = C T αg. We assume that it is of interest to test whether individual contrast values βgj are equal to zero. For example, with design (d) above the experimenter might want to make all the pairwise... |

360 | Analysis of variance for gene expression microarray data.
- Kerr, Martin, et al.
- 2001
(Show Context)
Citation Context ...) give a review of test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linear model for an entire microarray experiment whereas in this paper a separate linear model is fitted for each gene. The single linear model approach assumes all equal variances across genes whereas the current paper is designed to accommodate different variances. Jin et al (2001) and Wolfinger et al (2001) fit separate models for each gene but model the individual channels of two color microarray data requiring the use of mixed linear models to... |

289 | Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 Suppl 1 - Huber, Heydebreck, et al. - 2002 |

265 | On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data.
- Newton, Kendziorski, et al.
- 2001
(Show Context)
Citation Context ...odel and do not have an associated distributional theory. Tusher et al (2001) estimated the offset by minimizing a coefficient of variation while Efron et al (2001) used a percentile of the distribution of sample standard deviations. Broberg (2003) considered the two sample problem and proposed a computationally intensive method of determining the offset by minimizing a combination of estimated false positive and false negative rates over a grid of significance levels and offsets. Cui and Churchill (2003) give a review of test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonn... |

242 | Normalization of cdna microarray data
- Smyth, Speed
- 2003
(Show Context)
Citation Context ...ations are absent. For such microarrays, design matrices can be formed exactly as in classical linear model practice from the biological factors underlying the experimental layout. In general we assume that we have a set of n microarrays yielding a response vector yTg = (yg1, . . . , ygn) for the gth gene. The responses will usually be log-ratios for twocolor data or log-intensities for single channel data, although other transformations are possible. The responses are assumed to be suitably normalized to remove dye-bias and other technological artifacts; see for example Huber et al (2002) or Smyth and Speed (2003). In the case of high density oligonucleotide array, the probes are assumed to have been normalized to produce an expression summary, represented here as ygi, for each gene on each array as in Li and Wong (2001) or Irizarry et al (2003). We assume that E(yg) = Xαg where X is a design matrix of full column rank and αg is a coefficient vector. We assume var(yg) = Wgσ 2 g where Wg is a known non-negative definite weight matrix. The vector yg may contain missing values and the matrix Wg may contain diagonal weights which are zero. Certain contrasts of the coefficients are assumed to be of biologic... |

217 | Assessing gene significance from cDNA microarray expression data via mixed models.
- Wolfinger, Gibson, et al.
- 2001
(Show Context)
Citation Context ...r differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linear model for an entire microarray experiment whereas in this paper a separate linear model is fitted for each gene. The single linear model approach assumes all equal variances across genes whereas the current paper is designed to accommodate different variances. Jin et al (2001) and Wolfinger et al (2001) fit separate models for each gene but model the individual channels of two color microarray data requiring the use of mixed linear models to accommodate the correlation between obser... |

209 | Replicated microarray data. - Lonnstedt, Speed - 2002 |

195 |
G.: Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 4
- Cui, Churchill
- 2003
(Show Context)
Citation Context ... similar in principle to the moderated t-statistics used here but the offset t-statistics are not motivated by a model and do not have an associated distributional theory. Tusher et al (2001) estimated the offset by minimizing a coefficient of variation while Efron et al (2001) used a percentile of the distribution of sample standard deviations. Broberg (2003) considered the two sample problem and proposed a computationally intensive method of determining the offset by minimizing a combination of estimated false positive and false negative rates over a grid of significance levels and offsets. Cui and Churchill (2003) give a review of test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by K... |

161 | Experimental design for gene expression microarrays. - Kerr, Churchill - 2001 |

140 |
The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster.
- Jin, RM, et al.
- 2001
(Show Context)
Citation Context ...test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linear model for an entire microarray experiment whereas in this paper a separate linear model is fitted for each gene. The single linear model approach assumes all equal variances across genes whereas the current paper is designed to accommodate different variances. Jin et al (2001) and Wolfinger et al (2001) fit separate models for each gene but model the individual channels of two color microarray data requiring the use of mixed linear models to accommodate the c... |

118 | Normalization of cDNAmicroarray data,”Methods, - Smyth, Speed - 2003 |

115 | On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine.
- Kendziorski, Newton, et al.
- 2003
(Show Context)
Citation Context ...y. Tusher et al (2001) estimated the offset by minimizing a coefficient of variation while Efron et al (2001) used a percentile of the distribution of sample standard deviations. Broberg (2003) considered the two sample problem and proposed a computationally intensive method of determining the offset by minimizing a combination of estimated false positive and false negative rates over a grid of significance levels and offsets. Cui and Churchill (2003) give a review of test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linea... |

85 |
A systematic statistical linear modeling approach to oligonucleotide array experiments.
- Chu, Weir, et al.
- 2002
(Show Context)
Citation Context ...n for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linear model for an entire microarray experiment whereas in this paper a separate linear model is fitted for each gene. The single linear model approach assumes all equal variances across genes whereas the current paper is designed to accommodate different variances. Jin et al (2001) and Wolfinger et al (2001) fit separate models for each gene but model the individual channels of two color microarray data requiring the use of mixed linear models to accommodate the correlation between observations on the sam... |

82 | Statistical Issues in cDNA Microarray Data Analysis
- Yang, Speed
- 2003
(Show Context)
Citation Context ...proved test statistics. In many gene discovery experiments for which microarrays are used the primary aim is to rank the genes in order of evidence against H0 rather than to assign absolute p-values (=-=Smyth et al, 2003-=-). This is because only a limited number of genes may be followed up for further study regardless of the number which are significant. Even when the above distributional assumptions fail for a given d... |

76 |
Microarray expression profiling identifies genes with altered expression in HDL deficient mice.
- Callow, Dudoit, et al.
- 2000
(Show Context)
Citation Context ...ownregulated as expected. Several of the other genes are closely related to ApoAI. The top eight genes here have been confirmed to be differentially expressed in the knockout versus the control line (=-=Callow et al, 2000-=-). For these data the top eight genes stand out clearly from the other genes and all methods clearly separate these genes from the 21sTable 4: Top 15 genes from the ApoAI data Annotation M-value Ord t... |

73 |
Estimating the proportion of true null hypotheses, with application to dna microarray data.
- Langaas, Lindqvist, et al.
- 2006
(Show Context)
Citation Context ... Bgj. Even when not on the boundary, the estimator for pj is likely to be sensitive to the particular form of the prior distribution assumed for βgj and possibly also to dependence between the genes (=-=Ferkingstad et al, 2003-=-). A practical strategy to bypass these problems is to set the pj to values chosen by the user, perhaps pj = 0.01 or some other small value. Since v 1/2 0j σg is the standard deviation of the log-fold... |

53 |
Statistical methods for ranking differentially expressed genes.
- Broberg
- 2003
(Show Context)
Citation Context ...with a Bayesian adjusted denominator was also proposed by Baldi and Long (2001) who developed the useful cyberT program. Their work was limited though to two-sample control versus treatment designs and their model did not distinguish between differentially and non-differentially expressed genes. They also did not develop consistent estimators for the hyperparameters. The degrees of freedom associated with the prior distribution of the variances was set to a default value while the prior variance was simply equated to locally pooled sample variances. Tusher et al (2001), Efron et al (2001) and Broberg (2003) have used t statistics with offset standard deviations. This is similar in principle to the moderated t-statistics used here but the offset t-statistics are not motivated by a model and do not have an associated distributional theory. Tusher et al (2001) estimated the offset by minimizing a coefficient of variation while Efron et al (2001) used a percentile of the distribution of sample standard deviations. Broberg (2003) considered the two sample problem and proposed a computationally intensive method of determining the offset by minimizing a combination of estimated false positive and false... |

39 | The analysis of gene expression data: methods and software. - Parmigiani, Garret, et al. - 2003 |

36 |
Spot User’s Guide.
- Buckley
- 2000
(Show Context)
Citation Context ...e microarrays used in this experiment were printed with 8448 probes (spots) including 768 control spots. The hybridized microarrays were scanned with an Axon scanner and SPOT image analysis software (=-=Buckley, 2000-=-) was used to capture red and green intensities for each spot. The data was normalized using print-tip loess normalization and between arrays scale normalization using the LIMMA package (Smyth, 2003).... |

36 |
Bayesian models for gene expression with DNA microarray data.
- Ibrahim, Chen, et al.
- 2002
(Show Context)
Citation Context ...sample problem and proposed a computationally intensive method of determining the offset by minimizing a combination of estimated false positive and false negative rates over a grid of significance levels and offsets. Cui and Churchill (2003) give a review of test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linear model for an entire microarray experiment whereas in this paper a separate linear model is fitted for each gene. The single linear model approach assumes all equal variances across genes whereas the current ... |

34 |
Design and analysis of comparative microarray experiments. In
- Yang, Speed
- 2003
(Show Context)
Citation Context ...xperiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al (2000) propose a single linear model for an entire microarray experiment whereas in this paper a separate linear model is fitted for each gene. The single linear model approach assumes all equal variances across genes whereas the current paper is designed to accommodate different variances. Jin et al (2001) and Wolfinger et al (2001) fit separate models for each gene but model the individual channels of two color microarray data requiring the use of mixed linear models to accommodate the correlation between observations on the same spot. Chu et al (2002... |

31 |
Bioconductor R packages for exploratory analysis and normalization of cDNA microarray data
- DUDOIT, YANG
- 2003
(Show Context)
Citation Context ...in Table 2 are not affected by the prior limits on v0s 2 0 discussed in Section 6. 9 Data Examples 9.1 Swirl Consider the Swirl data set which is distributed as part of the marrayInput package for R (=-=Dudoit and Yang, 2003-=-). The experiment was carried out using zebrafish as a model organism to study the early development in vertebrates. Swirl is a point mutant in the BMP2 gene that affects the dorsal/ventral body axis.... |

25 |
Statistical issues in microarray data analysis. In: Functional Genomics: Methods
- Smyth, Yang, et al.
- 2003
(Show Context)
Citation Context ...proved test statistics. In many gene discovery experiments for which microarrays are used the primary aim is to rank the genes in order of evidence against H0 rather than to assign absolute p-values (=-=Smyth et al, 2003-=-). This is because only a limited number of genes may be followed up for further study regardless of the number which are significant. Even when the above distributional assumptions fail for a given d... |

20 | Robbins, empirical Bayes and microarrays. - Efron - 2003 |

20 | Parametric empirical Bayes methods for microarrays. In: The analysis of gene expression data: methods and software.
- Newton, Kendziorski
- 2003
(Show Context)
Citation Context ...an associated distributional theory. Tusher et al (2001) estimated the offset by minimizing a coefficient of variation while Efron et al (2001) used a percentile of the distribution of sample standard deviations. Broberg (2003) considered the two sample problem and proposed a computationally intensive method of determining the offset by minimizing a combination of estimated false positive and false negative rates over a grid of significance levels and offsets. Cui and Churchill (2003) give a review of test statistics for differential expression for microarray experiments. Newton et al (2001), Newton and Kendziorski (2003) and Kendziorski et al (2003) have considered empirical Bayes models for expression based on gamma and log-normal distributions. Other authors have used Bayesian methods for other purposes in microarray data analysis. Ibrahim et al (2002) for example propose Bayesian models with correlated priors to model gene expression and to classify between normal and tumor tissues. Other approaches to linear models for microarray data analysis have been described by Kerr et al (2000), Jin et al (2001), Wolfinger et al (2001), Chu et al (2002), Yang and Speed (2003) and Lonnstedt et al (2003). Kerr et al ... |

12 | Microarray analysis of two interacting treatments: a linear model and trends in expression over time. - Lonnstedt, Grant, et al. - 2003 |

4 |
Bioconductor: a software development project
- Gentleman, Bates, et al.
- 2003
(Show Context)
Citation Context ...istics and posterior odds, are implemented in the software package Limma for the R computing environment (Smyth et al, 2003). Limma is part of the Bioconductor project at http://www.bioconductor.org (=-=Gentleman et al, 2003-=-). The Limma software has been tested on a wide range of microarray data sets from many different facilities and has been used routinely at the author’s institution since the middle of 2002. Colophon ... |

3 | Produced by The Berkeley Electronic Press, 2006 15 Review of Undergraduate Research in Agricultural and - Churchill - 1977 |

1 | Experimental design for gene expression - Biostatistics - 2000 |

1 | Methods 31, 265–273. Normalization of cDNA microarray data - Smyth, Speed - 2003 |

1 | 23Smyth: Empirical Bayes Methods for Differential Expression Produced by The Berkeley Electronic - Kerr, Churchill - 2005 |

1 | Empirical Bayes Methods for Differential Expression Published by The Berkeley Electronic Press, - Smyth - 2004 |

1 | Replicated Microarray Data. Licentiate Thesis, - Lonnstedt - 2001 |