Results 1  10
of
179
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Adaptive importance sampling in general mixture classes
, 2007
"... In this paper, we propose an adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion. The method, called MPMC, is shown to be applicab ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
In this paper, we propose an adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion. The method, called MPMC, is shown to be applicable to a wide class of importance sampling densities, which includes in particular mixtures of multivariate Student t distributions. The performance of the proposed scheme is studied on both artificial and real examples, highlighting in particular the benefit of a novel RaoBlackwellisation device which can be easily incorporated in the updating scheme.
Solutionguided multipoint constructive search for job shop scheduling
 Journal of Artificial Intelligence Research
"... SolutionGuided MultiPoint Constructive Search (SGMPCS) is a novel constructive search technique that performs a series of resourcelimited tree searches where each search begins either from an empty solution (as in randomized restart) or from a solution that has been encountered during the search. ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
SolutionGuided MultiPoint Constructive Search (SGMPCS) is a novel constructive search technique that performs a series of resourcelimited tree searches where each search begins either from an empty solution (as in randomized restart) or from a solution that has been encountered during the search. A small number of these “elite ” solutions is maintained during the search. We introduce the technique and perform three sets of experiments on the job shop scheduling problem. First, a systematic, fully crossed study of SGMPCS is carried out to evaluate the performance impact of various parameter settings. Second, we inquire into the diversity of the elite solution set, showing, contrary to expectations, that a less diverse set leads to stronger performance. Finally, we compare the best parameter setting of SGMPCS from the first two experiments to chronological backtracking, limited discrepancy search, randomized restart, and a sophisticated tabu search algorithm on a set of wellknown benchmark problems. Results demonstrate that SGMPCS is significantly better than the other constructive techniques tested, though lags behind the tabu search. 1.
ltm: An R package for latent variable modeling and item response theory analyses
 Journal of Statistical Software
"... The R package ltm has been developed for the analysis of multivariate dichotomous and polytomous data using latent variable models, under the Item Response Theory approach. For dichotomous data the Rasch, the TwoParameter Logistic, and Birnbaum’s ThreeParameter models have been implemented, wherea ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
The R package ltm has been developed for the analysis of multivariate dichotomous and polytomous data using latent variable models, under the Item Response Theory approach. For dichotomous data the Rasch, the TwoParameter Logistic, and Birnbaum’s ThreeParameter models have been implemented, whereas for polytomous data Semejima’s Graded Response model is available. Parameter estimates are obtained under marginal maximum likelihood using the GaussHermite quadrature rule. The capabilities and features of the package are illustrated using two real data examples.
Time series analysis via mechanistic models. In review; prepublished at arxiv.org/abs/0802.0021
, 2008
"... The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consi ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consideration of implicit dynamic models, meaning statistical models for stochastic dynamical systems which are specified by a simulation algorithm to generate sample paths. Inference procedures that operate on implicit models are said to have the plugandplay property. Our work builds on recently developed plugandplay inference methodology for partially observed Markov models. We introduce a class of implicitly specified Markov chains with stochastic transition rates, and we demonstrate its applicability to open problems in statistical inference for biological systems. As one example, these models are shown to give a fresh perspective on measles transmission dynamics. As a second example, we present a mechanistic analysis of cholera incidence data, involving interaction between two competing strains of the pathogen Vibrio cholerae. 1. Introduction. A
AWTY (Are We There Yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics
, 2007
"... Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations—convergence diagnostics—in phylogenetics is still uncommon. Here we present a simple t ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations—convergence diagnostics—in phylogenetics is still uncommon. Here we present a simple tool that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths. Graphical exploration of the output from phylogenetic MCMC simulations gives intuitive and often crucial information on the success and reliability of the analysis. The tool presented here complements convergence diagnostics already available in other software packages primarily designed for other applications of MCMC. Importantly, the common practice of using traceplots of a single parameter or summary statistic, such as the likelihood score of sampled trees, can be misleading for assessing the success of a phylogenetic MCMC simulation.
Identifying functional modules in proteinprotein interaction networks: an integrated exact approach
 Bioinformatics
, 2008
"... Motivation: With the exponential growth of expression and proteinprotein interaction (PPI) data, the frontier of research in system biology shifts more and more to the integrated analysis of these large datasets. Of particular interest is the identification of functional modules in PPI networks, sha ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Motivation: With the exponential growth of expression and proteinprotein interaction (PPI) data, the frontier of research in system biology shifts more and more to the integrated analysis of these large datasets. Of particular interest is the identification of functional modules in PPI networks, sharing common cellular function beyond the scope of classical pathways, by means of detecting differentially expressed regions in PPI networks. This requires on the one hand an adequate scoring of the nodes in the network to be identified and on the other hand the availability of an effective algorithm to find the maximally scoring network regions. Various heuristic approaches have been proposed in the literature. Results: Here we present the first exact solution for this problem, which is based on integer linear programming and its connection to the wellknown prizecollecting Steiner tree problem from Operations
Functional data analysis for sparse auction data
 In Statistical Methods in eCommerce Research
, 2008
"... Bid arrivals of eBay auctions often exhibit “bid sniping”, a phenomenon where “snipers ” place their bids at the last moments of an auction. This is one reason why bid histories for eBay auctions tend to have sparse data in the middle and denser data both in the beginning and at the end of the aucti ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Bid arrivals of eBay auctions often exhibit “bid sniping”, a phenomenon where “snipers ” place their bids at the last moments of an auction. This is one reason why bid histories for eBay auctions tend to have sparse data in the middle and denser data both in the beginning and at the end of the auction. Time spacing of the bids is thus irregular and sparse. For nearly identical products that are auctioned repeatedly, one may view the price history of each of these auctions as realization of an underlying smooth stochastic process, the price process. While the traditional Functional Data Analysis (FDA) approach requires that entire trajectories of the underlying process are observed without noise, this assumption is not satisfied for typical auction data. We provide a review of a recently developed version of functional principal component analysis (Yao et al., 2005), which is geared towards sparse, irregularly observed and noisy data, the principal analysis through conditional expectation (PACE) method. The PACE method borrows and pools information from the sparse data in all auctions. This allows the recovery of the price process even in situations where only few bids are observed. In a modified approach, we adapt PACE to summarize the bid history for varying current times during an ongoing auction through timevarying principal component scores. These scores then serve as timevarying predictors for the closing price. We study the resulting timevarying predictions using both linear regression and generalized additive modelling, with current scores as predictors. These methods will be illustrated with a case study for 157 Palm M515 PDA auctions from eBay, and the proposed methods are seen to work reasonably well. Other related issues will also be discussed. 1 1
Comparison of Maximum Pseudo Likelihood and Maximum Likelihood Estimation of Exponential Family Random Graph Models
, 2007
"... The statistical modeling of social network data is difficult due to the complex dependence structure of the tie variables. Statistical exponential families of distributions provide a flexible way to model such dependence. They enable the statistical characteristics of the network to be encapsulated ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
The statistical modeling of social network data is difficult due to the complex dependence structure of the tie variables. Statistical exponential families of distributions provide a flexible way to model such dependence. They enable the statistical characteristics of the network to be encapsulated within an exponential family random graph (ERG) model. For a long time, however, likelihoodbased estimation was only feasible for ERG models assuming dyad independence. For more realistic and complex models inference has been based on the pseudolikelihood. Recent advances in computational methods have made likelihoodbased inference practical, and comparison of the different estimators possible. In this paper, we compare the bias, standard errors, coverage rates and efficiency of maximum likelihood and maximum pseudolikelihood estimators. We also propose an improved pseudolikelihood estimation method aimed at reducing bias. The comparison is performed using simulated social network data based on two versions of an empirically realistic network model, the first representing Lazega’s law firm data and the second a modified version
Designing and analyzing randomized experiments: application to a Japanese election survey experiment
 Am. J. Polit. Sci
, 2007
"... Randomized experiments are becoming increasingly common in political science. Despite their wellknown advantages over observational studies, randomized experiments are not free from complications. In particular, researchers often cannot force subjects to comply with treatment assignment and to prov ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Randomized experiments are becoming increasingly common in political science. Despite their wellknown advantages over observational studies, randomized experiments are not free from complications. In particular, researchers often cannot force subjects to comply with treatment assignment and to provide the requested information. Furthermore, simple randomization of treatments remains the most commonly used method in the discipline even though more efficient procedures are available. Building on the recent statistical literature, we address these methodological issues by offering general recommendations for designing and analyzing randomized experiments to improve the validity and efficiency of causal inference. We also develop anew statistical methodology to explore causal heterogeneity. The proposed methods are applied to a survey experiment conducted during Japan’s 2004 Upper House election, where randomly selected voters were encouraged to obtain policy information from political parties ’ websites. An R package is publicly available for implementing various methods useful for designing and analyzing randomized experiments. In this article, we demonstrate how to effectively design and analyze randomized experiments, which are becoming increasingly common in political science research (Druckman et al. 2006; McDermott 2002). Indeed, the number of articles in major political science