Results 1  10
of
314
Nonparametric econometrics: The np package
 Journal of Statistical Software
"... We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernelbased estimators that are popular among econometricians. There are also procedures for nonparametric tests of s ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernelbased estimators that are popular among econometricians. There are also procedures for nonparametric tests of significance and consistent model specification tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Datadriven methods of bandwidth selection are emphasized throughout, though we caution the user that datadriven bandwidth selection methods can be computationally demanding.
Bayesian Modeling of Uncertainty in Ensembles of Climate Models
, 2008
"... Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be co ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be combined into a probability distribution of future climate change. For this analysis, we have collected both current and future projected mean temperatures produced by nine climate models for 22 regions of the earth. We also have estimates of current mean temperatures from actual observations, together with standard errors, that can be used to calibrate the climate models. We propose a Bayesian analysis that allows us to combine the different climate models into a posterior distribution of future temperature increase, for each of the 22 regions, while allowing for the different climate models to have different variances. Two versions of the analysis are proposed, a univariate analysis in which each region is analyzed separately, and a multivariate analysis in which the 22 regions are combined into an overall statistical model. A crossvalidation approach is proposed to confirm the reasonableness of our Bayesian predictive distributions. The results of this analysis allow for a quantification of the uncertainty of climate model projections as a Bayesian posterior distribution, substantially extending previous approaches to uncertainty in climate models.
Toward a Common Framework for Statistical Analysis and Development
 Journal of Computational and Graphical Statistics
, 2008
"... We develop a general ontology of statistical methods and use it to propose a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. This framework offers a simple unified structure and syntax that can encompass ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
We develop a general ontology of statistical methods and use it to propose a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. This framework offers a simple unified structure and syntax that can encompass a large fraction of existing statistical procedures. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, without requiring changes in existing approaches, and regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
Beanplot: A Boxplot Alternative for Visual Comparison of Distributions
 URL http: //www.jstatsoft.org/v28/c01
, 2008
"... This introduction to the R package beanplot is a (slightly) modified version of Kampstra (2008), published in the Journal of Statistical Software. Boxplots and variants thereof are frequently used to compare univariate data. Boxplots have the disadvantage that they are not easy to explain to nonmat ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
This introduction to the R package beanplot is a (slightly) modified version of Kampstra (2008), published in the Journal of Statistical Software. Boxplots and variants thereof are frequently used to compare univariate data. Boxplots have the disadvantage that they are not easy to explain to nonmathematicians, and that some information is not visible. A beanplot is an alternative to the boxplot for visual comparison of univariate data between groups. In a beanplot, the individual observations are shown as small lines in a onedimensional scatter plot. Next to that, the estimated density of the distributions is visible and the average is shown. It is easy to compare different groups of data in a beanplot and to see if a group contains enough observations to make the group interesting from a statistical point of view. Anomalies in the data, such as bimodal distributions and duplicate measurements, are easily spotted in a beanplot. For groups with two subgroups (e.g., male and female), there is a special asymmetric beanplot. For easy usage, an implementation was made in R.
MLDS: Maximum Likelihood Difference Scaling in R
 Journal of Statistical Software
, 2008
"... This introduction to the R package MLDS is a modified and updated version of Knoblauch and Maloney (2008) published in the Journal of Statistical Software. The MLDS package in the R programming language can be used to estimate perceptual scales based on the results of psychophysical experiments usin ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
This introduction to the R package MLDS is a modified and updated version of Knoblauch and Maloney (2008) published in the Journal of Statistical Software. The MLDS package in the R programming language can be used to estimate perceptual scales based on the results of psychophysical experiments using the method of difference scaling. In a difference scaling experiment, observers compare two suprathreshold differences (a,b) and (c,d) on each trial. The approach is based on a stochastic model of how the observer decides which perceptual difference (or interval) (a, b) or (c, d) is greater, and the parameters of the model are estimated using a maximum likelihood criterion. We also propose a method to test the model by evaluating the selfconsistency of the estimated scale. The package includes an example in which an observer judges the differences in correlation between scatterplots. The example may be readily adapted to estimate perceptual scales for arbitrary physical continua.
Panel Data Econometrics in R: The plm Package
 Journal of Statistical Software
, 2008
"... This introduction to the plm package is a slightly modified version of Croissant and Millo (2008), published in the Journal of Statistical Software. Panel data econometrics is obviously one of the main fields in the profession, but most of the models used are difficult to estimate with R. plm is a p ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This introduction to the plm package is a slightly modified version of Croissant and Millo (2008), published in the Journal of Statistical Software. Panel data econometrics is obviously one of the main fields in the profession, but most of the models used are difficult to estimate with R. plm is a package for R which intends to make the estimation of linear panel models straightforward. plm provides functions to estimate a wide variety of models and to make (robust) inference. Keywords:˜panel data, covariance matrix estimators, generalized method of moments, R. 1.
Semisupervised training for the averaged perceptron POS tagger
 In Proceedings of the EACL
, 2009
"... ufal.mff.cuni.cz This paper describes POS tagging experiments with semisupervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standardsized supervised (manually annotated) d ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
ufal.mff.cuni.cz This paper describes POS tagging experiments with semisupervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standardsized supervised (manually annotated) dataset (10 6 tokens) combined with a relatively modest (in the order of 10 8 tokens) unsupervised (plain) data in a bagginglike fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than stateoftheart results for English and Czech (4.12 % and 4.86 % relative error reduction, respectively; absolute accuracies being 97.44 % and 95.89 %). 1
Entropy Inference and the JamesStein Estimator, with Application to Nonlinear Gene Association Networks
"... We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly effic ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and datagenerating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropybased geneassociation network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, JamesStein estimator, “small n, large p ” setting, mutual information, gene association network
Regression Models for Count Data in R
"... The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of th ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zeroinflated regression models in the functions hurdle() and zeroinfl() from the package pscl is introduced. It reuses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zeroinflated model, are able to incorporate overdispersion and excess zeros—two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using crosssection data on the demand for medical care, it is illustrated how the classical as well as the zeroaugmented models can be fitted, inspected and tested in practice.
mixtools: An R package for analyzing finite mixture models
 Journal of Statistical Software
, 2009
"... The mixtools package for R provides a set of functions for analyzing a variety of finite mixture models. These functions include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reflect some recent research in finite mixture mode ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
The mixtools package for R provides a set of functions for analyzing a variety of finite mixture models. These functions include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reflect some recent research in finite mixture models. In the latter category, mixtools provides algorithms for estimating parameters in a wide range of different mixtureofregression contexts, in multinomial mixtures such as those arising from discretizing continuous multivariate data, in nonparametric situations where the multivariate component densities are completely unspecified, and in semiparametric situations such as a univariate location mixture of symmetric but otherwise unspecified densities. Many of the algorithms of the mixtools package are EM algorithms or are based on EMlike ideas, so this article includes an overview of EM algorithms for finite mixture models.