Results 1  10
of
94
Bayesian measures of model complexity and fit
 Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract

Cited by 435 (4 self)
 Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
2004. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol
"... Abstract. — What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th> ..."
Abstract

Cited by 93 (6 self)
 Add to MetaCart
Abstract. — What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the model is correct. At the same time, the BayLsian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity ot " the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimatLs of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are Ukely to be similar, the assessment of the uncertainty of inferred trees via either bootstriipping (t"or maximum likelihood estimates) or petsterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently avaiiable, as tliis should reduce the chance that the metliod will concentrate too much probability on tuo few trees. [Bayesian estimation; Markov ch^iin Monte Carlo; posterior probability; prior probability.] Quantify ing the uncertainty of a phylogcneticesti mil te is at least as important a goal as obtaining the phylogenetic estimate itself. Measures of phylogenetic reliability not only point out what parts of a tree can be trusted when interpreting the evolution of a group, but can guide
How Many Genes Are Needed for a Discriminant Microarray Data Analysis
 Proc. Critical Assessment of Techniques for Microarray Data Mining Workshop
, 2000
"... The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of t ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of tissues/cancers, or it may be redundant because it is highly correlated with other genes. There are two theoretical frameworks in which variable selection (or gene selection in our case) can be addressed. The first is model selection, and the second is model averaging. We have carried out model selection using Akaike information criterion and Bayesian information criterion with logistic regression (discrimination, prediction, or classification) to determine the number of genes that provide the best model. These model selection criteria set upper limits of 2225 and 1213 genes for this data set with 38 samples, and the best model consists of only one (no.4847, zyxin) or two genes. We have also carried out model averaging over the best singlegene logistic predictors using three different weights: maximized likelihood, prediction rate on training set, and equal weight. We have observed that the performance of most of these weighted predictors on the testing set is gradually reduced as more genes are included, but a clear cutoff that separates good and bad prediction performance is not found. 1 Li Yang 2
Algorithm for finding optimal gene sets in microarray prediction. http://stravinsky.ucsc.edu/josh/gesses
, 2006
"... Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importa ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of their test data.
Assessing the fit of siteoccupancy models
 J. Agric. Biol. Environ. Stat
, 2004
"... Few species are likely to be so evident that they will always be detected at a site when present. Recently a model has been developed that enables estimation of the proportion of area occupied, when the target species is not detected with certainty. Here we apply this modeling approach to data colle ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Few species are likely to be so evident that they will always be detected at a site when present. Recently a model has been developed that enables estimation of the proportion of area occupied, when the target species is not detected with certainty. Here we apply this modeling approach to data collected on terrestrial salamanders in the Plethodon glutinosus complex in the Great Smoky Mountains National Park, USA, and wish to address the question “how accurately does the fitted model represent the data? ” The goodnessoffit of the model needs to be assessed in order to make accurate inferences. This article presents a method where a simple Pearson chisquare statistic is calculated and a parametric bootstrap procedure is used to determine whether the observed statistic is unusually large. We found evidence that the most global model considered provides a poor fit to the data, hence estimated an overdispersion factor to adjust model selection procedures and inflate standard errors. Two hypothetical datasets with known assumption violations are also analyzed, illustrating that the method may be used to guide researchers to making appropriate inferences. The results of a simulation study are presented to provide a broader view of the methods properties.
Cultural evolution in laboratory microsocieties including traditions of rule giving and rule following
 Evolution and Human Behavior
, 2004
"... Experiments may contribute to understanding the basic processes of cultural evolution. We drew features from previous laboratory research with small groups in which traditions arose during several generations. Groups of four participants chose by consensus between solving anagrams printed on red car ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Experiments may contribute to understanding the basic processes of cultural evolution. We drew features from previous laboratory research with small groups in which traditions arose during several generations. Groups of four participants chose by consensus between solving anagrams printed on red cards and on blue cards. Payoffs for the choices differed. After 12 min, the participant who had been in the experiment the longest was removed and replaced with a naïve person. These replacements, each of which marked the end of a generation, continued for 10–15 generations, at which time the day’s session ended. Timeout duration, which determined whether the group earned more by choosing red or blue, and which was fixed for a day’s session, was varied across three conditions to equal 1, 2, or 3 min. The groups developed choice traditions that tended toward maximizing earnings. The stronger the dependence between choice and earnings, the stronger was the tradition. Once a choice tradition evolved, groups passed it on by instructing newcomers, using some combination of accurate information, mythology, and coercion. Among verbal traditions, frequency of mythology varied directly
Missing the forest for the trees: Phylogenetic compression and its implications for inferring complex evolutionary histories
 Syst. Biol
, 2005
"... Abstract. — Phylogenetic tree reconstruction is difficult in the presence of lateral gene transfer and other processes generating conflicting signals. We develop a new approach to this problem using ideas borrowed from algorithmic information theory. It selects the hypothesis that simultaneously min ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
Abstract. — Phylogenetic tree reconstruction is difficult in the presence of lateral gene transfer and other processes generating conflicting signals. We develop a new approach to this problem using ideas borrowed from algorithmic information theory. It selects the hypothesis that simultaneously minimizes the descriptive complexity of the tree(s) plus the data when encoded using those tree(s). In practice this is the hypothesis that can compress the data the most. We show not only that phylogenetic compression is an efficient method for encoding most phylogenetic data sets and is more efficient than compression schemes designed for single sequences, but also that it provides a clear information theoretic rule for determining when a collection of conflicting trees is a better explanation of the data than a single tree. By casting the parsimony problem in this more general framework, we also conclude that the socalled totalevidence tree—the tree constructed from all the data simultaneously—is not always the most economical explanation of the data. [Compression; information; Kolmogorov complexity; phylogenetics; total evidence.] Recombination, lateral gene transfer, hybridization, and other biological processes generate conflict between phylogenetic trees constructed from different loci or different partitions of a sequence data set. Lateral gene transfer in bacteria provides many examples (Kurland
Evolutionary Theory and the Reality of Macro Probabilities
"... Evolutionary theory is awash with probabilities. For example, natural selection is said to occur when there is variation in fitness, and fitness is standardly decomposed into two components, viability and fertility, each of which is understood probabilistically. With respect to viability, a fertiliz ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Evolutionary theory is awash with probabilities. For example, natural selection is said to occur when there is variation in fitness, and fitness is standardly decomposed into two components, viability and fertility, each of which is understood probabilistically. With respect to viability, a fertilized egg is said to have a certain chance of surviving to reproductive age; with respect to fertility, an adult is said to have an expected number of offspring. There is more to evolutionary theory than the theory of natural selection, and here too one finds probabilistic concepts aplenty. When there is no selection, the theory of neutral evolution says that a gene’s chance of eventually reaching fixation is 1/(2N), where N is the number of organisms in the generation of the diploid population to which the gene belongs. The evolutionary consequences of mutation are likewise conceptualized in terms of the probability per unit time a gene has of changing from one state to another. The examples just mentioned are all “forwarddirected” probabilities; they describe the probability of later events, conditional on earlier events. However, evolutionary theory also uses “backwards probabilities ” that describe the probability of a cause conditional on its effects; for example, coalescence theory allows one to calculate the expected number of generations in the past that the genes in the present generation find their most recent common
Error distribution for gene expression data
 Statistical Applications in Genetics and Molecular Biology
, 2005
"... Copyright c©2005 by the authors. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepre ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Copyright c©2005 by the authors. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the author. Statistical Applications in Genetics and Molecular Biology is produced by Berkeley Electronic Press (bepress).