Results 1  10
of
1,052
A sequential importance sampling algorithm for generating random graphs with prescribed degrees
, 2006
"... Random graphs with a given degree sequence are a useful model capturing several features absent in the classical ErdősRényi model, such as dependent edges and nonbinomial degrees. In this paper, we use a characterization due to Erdős and Gallai to develop a sequential algorithm for generating a ra ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
(Show Context)
Random graphs with a given degree sequence are a useful model capturing several features absent in the classical ErdősRényi model, such as dependent edges and nonbinomial degrees. In this paper, we use a characterization due to Erdős and Gallai to develop a sequential algorithm for generating a random labeled graph with a given degree sequence. The algorithm is easy to implement and allows surprisingly efficient sequential importance sampling. Applications are given, including simulating a biological network and estimating the number of graphs with a given degree sequence. 1. Introduction. Random
Repeatability for Gaussian and nonGaussian data: a practical guide for biologists. Biol Rev Camb Philos Soc 85:935–956
"... Repeatability (more precisely the common measure of repeatability, the intraclass correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to betweensubject ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
(Show Context)
Repeatability (more precisely the common measure of repeatability, the intraclass correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to betweensubject (or betweengroup) variation. As a consequence, the nonrepeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for nonGaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and nonGaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlationbased, analysis of variance (ANOVA)based and linear mixedeffects model (LMM)based methods, while for nonGaussian data, we focus on generalised linear mixedeffects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM
A toolbox for kcentroids cluster analysis
 Computational Statistics and Data Analysis
, 2006
"... A methodological and computational framework for centroidbased partitioning cluster analysis using arbitrary distance or similarity measures is presented. The power of highlevel statistical computing environments like R enables data analysts to easily try out various distance measures with only min ..."
Abstract

Cited by 43 (17 self)
 Add to MetaCart
(Show Context)
A methodological and computational framework for centroidbased partitioning cluster analysis using arbitrary distance or similarity measures is presented. The power of highlevel statistical computing environments like R enables data analysts to easily try out various distance measures with only minimal programming effort. A new variant of centroid neighborhood graphs is introduced which gives insight into the relationships between adjacent clusters. Artificial examples and a case study from marketing research are used to demonstrate the influence of distances measures on partitions and usage of neighborhood graphs. 1
RandomSet Methods Identify Distinct Aspects of the Enrichment Signal in GeneSet Analysis,” The Annals of Applied Statistics
, 2007
"... A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biologi ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by crossclassifying genes according to membership in a functional category and membership on a selected list of significantly altered genes. A small Fisher’s exact test pvalue, for example, in this 2 × 2 table is indicative of enrichment. Other category analysis methods retain the quantitative genelevel scores and measure significance by referring a categorylevel statistic to a permutation distribution associated with the original differential expression problem. We describe a class of randomset scoring methods that measure distinct components of the enrichment signal. The class includes Fisher’s test based on selected genes and also tests that average genelevel evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiplecategory inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Randomset enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.
Time series analysis via mechanistic models. In review; prepublished at arxiv.org/abs/0802.0021
, 2008
"... The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consi ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consideration of implicit dynamic models, meaning statistical models for stochastic dynamical systems which are specified by a simulation algorithm to generate sample paths. Inference procedures that operate on implicit models are said to have the plugandplay property. Our work builds on recently developed plugandplay inference methodology for partially observed Markov models. We introduce a class of implicitly specified Markov chains with stochastic transition rates, and we demonstrate its applicability to open problems in statistical inference for biological systems. As one example, these models are shown to give a fresh perspective on measles transmission dynamics. As a second example, we present a mechanistic analysis of cholera incidence data, involving interaction between two competing strains of the pathogen Vibrio cholerae. 1. Introduction. A
How to Effectively Use Topic Models for Software Engineering Tasks? An Approach based on Genetic Algorithms
 in ICSE, 2013
"... Abstract—Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar mann ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
(Show Context)
Abstract—Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results. Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDAGA, which uses Genetic Algorithms (GA) to determine a nearoptimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDAGA is able to identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.
Interpretable classifiers for FMRI improve prediction of purchases
 IEEE Trans. Neural Systems and Rehabilitation Engineering
"... Abstract—Despite growing interest in applying machine learning to neuroimaging analyses, few studies have gone beyond classifying sensory input to directly predicting behavioral output. With spatial resolution on the order of millimeters and temporal resolution on the order of seconds, functional ma ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Despite growing interest in applying machine learning to neuroimaging analyses, few studies have gone beyond classifying sensory input to directly predicting behavioral output. With spatial resolution on the order of millimeters and temporal resolution on the order of seconds, functional magnetic resonance imaging (fMRI) is a promising technology for such applications. However, fMRI data’s low signaltonoise ratio, high dimensionality, and extensive spatiotemporal correlations present formidable analytic challenges. Here, we apply different machinelearning algorithms to previously acquired data to examine the ability of fMRI activation in three regions—the nucleus accumbens (NAcc), medial prefrontal cortex (MPFC), and insula—to predict purchasing. Our goal was to improve spatiotemporal interpretability as well as classification accuracy. To this end, sparse penalized discriminant analysis (SPDA) enabled automatic selection of correlated variables, yielding interpretable models that generalized well to new data. Relative to logistic regression, linear discriminant analysis, and linear support vector machines, SPDA not only increased interpretability but also improved classification accuracy. SPDA promises to allow more precise inferences about when specific brain regions contribute to purchasing decisions. More broadly, this approach provides a general framework for using neuroimaging data to build interpretable models, including those that predict choice. Index Terms—Accumbens, classification, discriminant, elastic net, frontal, functional magnetic resonance imaging (fMRI),
Book reviews
 Statistics of the British Economy, by F. M. M
"... Overdiagnosis and overtreatment of breast cancer Progression of ductal carcinoma in situ: the pathological perspective ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Overdiagnosis and overtreatment of breast cancer Progression of ductal carcinoma in situ: the pathological perspective