Results 1  10
of
157
Unbiased recursive partitioning: a conditional inference framework
 J. Comput. Graph. Statist
, 2006
"... Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While ..."
Abstract

Cited by 99 (12 self)
 Add to MetaCart
(Show Context)
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of treestructured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds treestructured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are reanalyzed.
Simultaneous Inference in General Parametric Models
, 2008
"... Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the prespecified significance level. Simultaneous inference procedures have to be used which ..."
Abstract

Cited by 88 (5 self)
 Add to MetaCart
Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the prespecified significance level. Simultaneous inference procedures have to be used which adjust for multiplicity and thus control the overall type I error rate. In this paper we describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. The framework described here is quite general and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalized linear models, linear mixed effects models, the Cox model, robust linear models, etc. Several examples using a variety of different statistical models illustrate the breadth of the results. For the analyses we use the R addon package multcomp, which provides a convenient interface to the general approach adopted here. Key words: multiple tests, multiple comparisons, simultaneous confidence intervals, adjusted pvalues, multivariate normal distribution, robust statistics. 1
Maximum likelihood estimation of a stochastic integrateandfire neural model
 NIPS
, 2003
"... We examine a cascade encoding model for neural response in which a linear filtering stage is followed by a noisy, leaky, integrateandfire spike generation mechanism. This model provides a biophysically more realistic alternative to models based on Poisson (memoryless) spike generation, and can eff ..."
Abstract

Cited by 74 (24 self)
 Add to MetaCart
(Show Context)
We examine a cascade encoding model for neural response in which a linear filtering stage is followed by a noisy, leaky, integrateandfire spike generation mechanism. This model provides a biophysically more realistic alternative to models based on Poisson (memoryless) spike generation, and can effectively reproduce a variety of spiking behaviors seen in vivo. We describe the maximum likelihood estimator for the model parameters, given only extracellular spike train responses (not intracellular voltage data). Specifically, we prove that the log likelihood function is concave and thus has an essentially unique global maximum that can be found using gradient ascent techniques. We develop an efficient algorithm for computing the maximum likelihood solution, demonstrate the effectiveness of the resulting estimator with numerical simulations, and discuss a method of testing the model’s validity using timerescaling and density evolution techniques. Paninski et al., November 30, 2004 2 1
Estimating Fully Observed Recursive MixedProcess Models with cmp,” Working Papers 168
, 2009
"... At the heart of many econometric models is a linear function and a normal error. Examples include the classical smallsample linear regression model and the probit, ordered probit, multinomial probit, Tobit, interval regression, and truncateddistribution regression models. Because the normal distri ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
At the heart of many econometric models is a linear function and a normal error. Examples include the classical smallsample linear regression model and the probit, ordered probit, multinomial probit, Tobit, interval regression, and truncateddistribution regression models. Because the normal distribution has a natural multidimensional generalization, such models can be combined into multiequation systems in which the errors share a multivariate normal distribution. The literature has historically focused on multistage procedures for estimating mixed models, which are more efficient computationally, if less so statistically, than maximum likelihood (ML). But faster computers and simulated likelihood methods such as the Geweke, Hajivassiliou, and Keane (GHK) algorithm for estimating higherdimensional cumulative normal distributions have made direct ML estimation practical. ML also facilitates a generalization to switching, selection, and other models in which the number and types of equations vary by observation. The Stata module cmp fits Seemingly Unrelated Regressions (SUR) models of this broad family. Its estimator is also consistent for recursive systems in which all endogenous variables appear on the righthandsides as observed. If all the equations are structural, then estimation is fullinformation maximum likelihood (FIML). If only the final stage or stages are, then it is limitedinformation maximum likelihood (LIML). cmp can mimic a dozen builtin Stata commands and several userwritten ones. It is also appropriate for a panoply of models previously hard to estimate. Heteroskedasticity, however, can render it inconsistent. This paper explains the theory and implementation of cmp and of a related Mata function, ghk2(), that implements the GHK algorithm.
Methods for the Computation of Multivariate tProbabilities
 Computing Sciences and Statistics
, 2000
"... This paper compares methods for the numerical computation of multivariate tprobabilities for hyperrectangular integration regions. Methods based on acceptancerejection, sphericalradial transformations and separationofvariables transformations are considered. Tests using randomly chosen problems ..."
Abstract

Cited by 60 (10 self)
 Add to MetaCart
(Show Context)
This paper compares methods for the numerical computation of multivariate tprobabilities for hyperrectangular integration regions. Methods based on acceptancerejection, sphericalradial transformations and separationofvariables transformations are considered. Tests using randomly chosen problems show that the most efficient numerical methods use a transformation developed by Genz (1992) for multivariate normal probabilities. These methods allow moderately accurate multivariate tprobabilities to be quickly computed for problems with as many as twenty variables. Methods for the noncentral multivariate tdistribution are also described. Key Words: multivariate tdistribution, noncentral distribution, numerical integration, statistical computation. 1 Introduction A common problem in many statistics applications is the numerical computation of the multivariate t (MVT) distribution function (see Tong, 1990) defined by T(a; b; \Sigma; ) = \Gamma( +m 2 ) \Gamma( 2 ) p j\Sigma...
Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems
 Statistical Science
"... This paper is a survey of the major techniques and approaches available for the numerical approximation of integrals in statistics. We classify these into five broad categories; namely, asymptotic methods, importance sampling, adaptive importance sampling, multiple quadrature and Markov chain method ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
This paper is a survey of the major techniques and approaches available for the numerical approximation of integrals in statistics. We classify these into five broad categories; namely, asymptotic methods, importance sampling, adaptive importance sampling, multiple quadrature and Markov chain methods. Each method is discussed giving an outline of the basic supporting theory and particular features of the technique. Conclusions are drawn concerning the relative merits of the methods based on the discussion and their application to three examples. The following broad recommendations are made. Asymptotic methods should only be considered in contexts where the integrand has a dominant peak with approximate ellipsoidal symmetry. Importance sampling, and preferably adaptive importance sampling, based on a multivariate Student should be used instead of asymptotics methods in such a context. Multiple quadrature, and in particular subregion adaptive integration, are the algorithms of choice for...
On approximate graph colouring and MAXkCUT algorithms based on the θfunction
, 2002
"... The problem of colouring a kcolourable graph is wellknown to be NPcomplete, for k 3. The MAXkCUT approach to approximate kcolouring is to assign k colours to all of the vertices in polynomial time such that the fraction of `defect edges' (with endpoints of the same colour) is provably s ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
The problem of colouring a kcolourable graph is wellknown to be NPcomplete, for k 3. The MAXkCUT approach to approximate kcolouring is to assign k colours to all of the vertices in polynomial time such that the fraction of `defect edges' (with endpoints of the same colour) is provably small. The best known approximation was obtained by Frieze and Jerrum [9], using a semidefinite programming (SDP) relaxation which is related to the Lovasz #function. In a related work, Karger et al. [18] devised approximation algorithms for colouring kcolourable graphs exactly in polynomial time with as few colours as possible. They also used an SDP relaxation related to the #function.
Numerical Computation Of Multivariate tProbabilities With Application To Power Calculation Of Multiple Contrasts
, 1993
"... ..."
Bayesian Gaussian process classification with the EMEP algorithm
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1948
"... ..."
(Show Context)
On the likelihood function of Gaussian maxstable processes indexed by R d, d ≥ 1
, 2010
"... We derive a closed form expression of the likelihood function of a Gaussian maxstable process indexed by Rd at p ≤ d + 1 sites, d ≥ 1. We demonstrate the gain in efficiency in the maximum composite likelihood estimates from p = 2 to p = 3 sites in R2 by means of a simulation study. ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We derive a closed form expression of the likelihood function of a Gaussian maxstable process indexed by Rd at p ≤ d + 1 sites, d ≥ 1. We demonstrate the gain in efficiency in the maximum composite likelihood estimates from p = 2 to p = 3 sites in R2 by means of a simulation study.