Results 1  10
of
59
Identifying cancer driver genes in tumor genome sequencing studies
 Bioinformatics
, 2011
"... ABSTRACT Motivation: Major tumor sequencing projects have been conducted in the past few years to identify genes that contain "driver" somatic mutations in tumor samples. These genes have been defined as those for which the nonsilent mutation rate is significantly greater than a backgroun ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT Motivation: Major tumor sequencing projects have been conducted in the past few years to identify genes that contain "driver" somatic mutations in tumor samples. These genes have been defined as those for which the nonsilent mutation rate is significantly greater than a background mutation rate estimated from silent mutations. Several methods have been used for estimating the background mutation rate. Results: We propose a new method for identifying cancer driver genes which we believe provides improved accuracy. The new method accounts for the functional impact of mutations on proteins, variation in background mutation rate among tumors and the redundancy of the genetic code. We reanalyzed sequence data for 623 candidate genes in 188 nonsmall cell lung tumors using the new method. We found several important genes like PTEN which were not deemed significant by the previous method. At the same time, we determined some genes previously reported as drivers were not significant by the new analysis because mutations in these genes occurred mainly in tumors with large background mutation rates. Availability: The software is available at : http://linus.nci.nih.gov/Data/YounA/software.zip Contact: rsimon@mail.nih.gov
Prior Information and Uncertainty in Inverse Problems
, 2001
"... Solving any inverse problem requires understanding the uncertainties in the data to know what it means to fit the data. We also need methods to incorporate dataindependent prior information to eliminate unreasonable models that fit the data. Both of these issues involve subtle choices that may ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
Solving any inverse problem requires understanding the uncertainties in the data to know what it means to fit the data. We also need methods to incorporate dataindependent prior information to eliminate unreasonable models that fit the data. Both of these issues involve subtle choices that may significantly influence the results of inverse calculations. The specification of prior information is especially controversial. How does one quantify information? What does it mean to know something about a parameter a priori? In this tutorial we discuss Bayesian and frequentist methodologies that can be used to incorporate information into inverse calculations. In particular we show that apparently conservative Bayesian choices, such as representing interval constraints by uniform probabilities (as is commonly done when using genetic algorithms, for example) may lead to artificially small uncertainties. We also describe tools from statistical decision theory that can be used to...
Accounting for Phylogenetic Uncertainty in Biogeography: A Bayesian Approach to DispersalVicariance Analysis of the Thrushes (Aves: Turdus)
"... Abstract. — The phylogeny of the thrushes (Aves: Turdus) has been difficult to reconstruct due to short internal branches and lack of node support for certain parts of the tree. Reconstructing the biogeographic history of this group is further complicated by the fact that current implementations of ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
Abstract. — The phylogeny of the thrushes (Aves: Turdus) has been difficult to reconstruct due to short internal branches and lack of node support for certain parts of the tree. Reconstructing the biogeographic history of this group is further complicated by the fact that current implementations of biogeographic methods, such as dispersalvicariance analysis (DIVA; Ronquist, 1997), require a fully resolved tree. Here, we apply a Bayesian approach to dispersalvicariance analysis that accounts for phylogenetic uncertainty and allows a more accurate analysis of the biogeographic history of lineages. Specifically, ancestral area reconstructions can be presented as marginal distributions, thus displaying the underlying topological uncertainty. Moreover, if there are multiple optimal solutions for a single node on a certain tree, integrating over the posterior distribution of trees often reveals a preference for a narrower set of solutions. We find that despite the uncertainty in tree topology, ancestral area reconstructions indicate that the Turdus clade originated in the eastern Palearctic during the Late Miocene. This was followed by an early dispersal to Africa from where a worldwide radiation took place. The uncertainty in tree topology and short branch lengths seems to indicate that this radiation took place within a limited time span during the Late Pliocene. The
Learning to be Bayesian without supervision
 in Adv. Neural Information Processing Systems (NIPS*06
, 2007
"... Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from data since one does not have access to uncorrupted samples of the variable being estimated. We show that for a wide variety of observation models, the Bayes least squares (BLS) estimator may be formulated without explicit reference to the prior. Specifically, we derive a direct expression for the estimator, and a related expression for the mean squared estimation error, both in terms of the density of the observed measurements. Each of these priorfree formulations allows us to approximate the estimator given a sufficient amount of observed data. We use the first form to develop practical nonparametric approximations of BLS estimators for several different observation processes, and the second form to develop a parametric family of estimators for use in the additive Gaussian noise case. We examine the empirical performance of these estimators as a function of the amount of observed data. 1
Using Loss Functions for DIF Detection: An Empirical Bayes Approach
"... We investigated a DIF flaggbzg method based on loss functions. The approach builds on earlier research that involved the development of an empirical Bayes (EB) enhancement to MantelHaenszel (MH) DIF analysis. The posterior distribution of DIF parameters was estimated nd used to obtain the posterio ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We investigated a DIF flaggbzg method based on loss functions. The approach builds on earlier research that involved the development of an empirical Bayes (EB) enhancement to MantelHaenszel (MH) DIF analysis. The posterior distribution of DIF parameters was estimated nd used to obtain the posterior expected loss for the proposed approach ndfor competing classification rules. Under reasonable assumptions about the relative seriousness of Type I and Type II errors, the lossfunctionbased DIF etection rule was found to perform better than the comnzonly used "A, " "B, " and "C " DIF classification system, especially in small samples. The results of a ManteIHaenszel (MH; 1959) analysis of differential item functioning (DIF) typically include an index of the magnitude of DIF, along with an estimated standard error (see Holland & Thayer, 1988). Decisions about whether to discard items or flag them for review are typically based on the statistical significance of the MH chisquare or the magnitude of the MH odds ratio estimate. An approach to DIF classification that incorporates both these criteria is the system developed by Educational Testing Service (ETS) for categorizing DIF as negligible ("A"), slight to moderate ("B"), or moderate to severe ("C"). In this study, we explore an alternative DIF detection method based on loss functions. A decision is made to identify an item as a potential DIF item if the expected loss associated with failing to flag the item is greater than that associated with flagging the item. The work was a spinoff of earlier research (Zwick, Thayer, & Lewis, 1997; in press) in which we used a Bayesian variant of the ETS DIF classification system to estimate the probabilities that the true DIF for
Least Squares Estimation Without Priors or Supervision
, 2011
"... Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with kn ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with known statistics (i.e., a likelihood function) and a set of unsupervised measurements, each arising from a corresponding true value drawn randomly from an unknown distribution. We develop a general expression for a nonparametric empirical Bayes least squares (NEBLS) estimator, which expresses the optimal least squares estimator in terms of the measurement density, with no explicit reference to the unknown (prior) density. We study the conditions under which such estimators exist and derive specific forms for a variety of different measurement processes. We further show that each of these NEBLS estimators may be used to express the mean squared estimation error as an expectation over the measurement density alone, thus generalizing Stein’s unbiased
Empirical Bayes modeling, computation, and accuracy
"... This article is intended as an expositional overview of empirical Bayes modeling methodology, presented in a simplified framework that reduces technical difficulties. The two principal empirical Bayes approaches, called fmodeling and gmodeling here, are described and compared. A series of computat ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This article is intended as an expositional overview of empirical Bayes modeling methodology, presented in a simplified framework that reduces technical difficulties. The two principal empirical Bayes approaches, called fmodeling and gmodeling here, are described and compared. A series of computational formulas are developed to assess the frequentist accuracy of empirical Bayes applications. Several examples, both artificial and genuine, show the strengths and limitations of the two methodologies.
Empirical Bayes least squares estimation without an explicit prior (Tech
 York University
, 2007
"... ..."
(Show Context)
Empirical Bayes Analysis of Families of Survival Curves: Applications to the Analysis of Degree Attainment
"... We present a novel approach to the empirical Bayes analysis of aggregated survival data from different groups of subjects. The method is based on a contingency table representation of the data and employs transformations to permit the use of normal priors. In contrast to the case of a single surviva ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present a novel approach to the empirical Bayes analysis of aggregated survival data from different groups of subjects. The method is based on a contingency table representation of the data and employs transformations to permit the use of normal priors. In contrast to the case of a single survival curve, the empirical Bayes analysis of families of such curves leads to estimates which offer a qualitative improvement over classical estimates based on the ratio of occurrence to exposure rates. This method is illustrated with data on the attainment of the doctoral degree from three major universities. Survival analysis methods are statistical techniques that are used to model the time until the occurrence of some event. The terms failure time analysis, lifetime data analysis, and event history analysis are also used. Although survival analysis has its origins in medical research, where the events of interest are typically the deaths of individuals, the methods have been gaining popularity in other fields. Two examples from the field of education are an analysis of PhD attainment at Stanford University (1983) and an analysis of teachers ' career patterns in Michigan public schools (Murnane, Singer, & Willett, 1988). Further references to educational applications of survival analysis are provided by Willett and Singer (1991). In survival analysis, we wish to estimate S(t), the probability that an event will take more than t units of time to occur. S(t) is called the survival function and is defined as follows: S(t) = P(r => 0 = 1 " (̂0> The authors thank Dorothy Thayer, who wrote the survival analysis program described here, Kaling Chan and Thomas Florek, who provided statistical programming, Liz Brophy, who prepared the figures and the manuscript, and Nick Longford, Neal Thomas, and Paul Rosenbaum, who provided insightful comments.
Learning least squares estimators without assumed priors or supervision
, 2009
"... The two standard methods of obtaining a leastsquares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one opti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The two standard methods of obtaining a leastsquares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one optimizes a parametric estimator over a training set containing pairs of corrupted measurements and their associated true values. But many realworld systems do not have access to either supervised training examples or a prior model. Here, we study the problem of obtaining an optimal estimator given a measurement process with known statistics, and a set of corrupted measurements of random values drawn from an unknown prior. We develop a general form of nonparametric empirical Bayesian estimator that is written as a direct function of the measurement density, with no explicit reference to the prior. We study the observation conditions under which such “priorfree ” estimators may be obtained, and we derive specific forms for a variety of different corruption processes. Each of these priorfree estimators may also be used to express the mean squared estimation error as an expectation over the measurement density, thus generalizing Stein’s unbiased risk estimator (SURE) which provides such an expression for the additive Gaussian noise case. Minimizing this expression over measurement samples provides an “unsupervised