Results 1 - 10
of
78
Flexible smoothing with B-splines and penalties
- Statistical Science
, 1996
"... B-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots ..."
Abstract
-
Cited by 111 (2 self)
- Add to MetaCart
B-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent B-splines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of B-splines, their construction, and penalized likelihood is presented. We discuss properties of penalized B-splines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented. Keywords: Generalized linear models, smoothing, nonparametric models, splines, density estimation. Address for correspondence: DCMR Milieudienst Rijnmond, 's-Gravelandse...
Bayesian measures of model complexity and fit
- Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional manifold embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, high-dimensional belief spaces using low-dimensional sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications,” manuscript, available at www-stat.wharton.upenn.edu/~buja
, 2005
"... What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of cost-weighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 1-1 correspondence with information measures for tree-based classification.
Bayesian Deviance, the Effective Number of Parameters, and the Comparison of Arbitrarily Complex Models
, 1998
"... We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. We follow Dempster in examining the posterior distribution of the log-likelihood under each model, from which we derive measures of fit and complexity (the effective number of p ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. We follow Dempster in examining the posterior distribution of the log-likelihood under each model, from which we derive measures of fit and complexity (the effective number of parameters). These may be combined into a Deviance Information Criterion (DIC), which is shown to have an approximate decision-theoretic justification. Analytic and asymptotic identities reveal the measure of complexity to be a generalisation of a wide range of previous suggestions, with particular reference to the neural network literature. The contributions of individual observations to fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. The procedure is illustrated in a number of examples, and throughout it is emphasised that the required quantities are trivial to compute in a Markov chain Monte Carlo analysis, and require no analytic work for new...
Piecewise-polynomial regression trees
- Statistica Sinica
, 1994
"... A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed Piecewise-Polynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed Piecewise-Polynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data space. Partitioning is car-ried out recursively as in a tree-structured method. If the estimate is required to be smooth, the polynomial pieces may be glued together by means of weighted averaging. The smoothed estimate is thus obtained in three steps. In the first step, the regressor space is recursively partitioned until the data in each piece are adequately fitted by a polynomial of a fixed order. Partitioning is guided by analysis of the distributions of residuals and cross-validation estimates of prediction mean square error. In the sec-ond step, the data within a neighborhood of each partition are fitted by a polynomial. The final estimate of the regression function is obtained by averaging the polynomial pieces, using smooth weight functions each of which diminishes rapidly to zero outside its associated partition. Estimates of derivatives of the regression function may be
On the Misuses of Artificial Neural Networks for Prognostic and Diagnostic Classification in Oncology
- Statistics in Medicine
, 2000
"... The application of artificial neural networks (ANNs) for prognostic and diagnostic classification in clinical medicine has become very popular. Some indications might be derived from a recent "mini-series" in the Lancet 7,23,30,94 with three more or less enthusiastic review articles and an additio ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
The application of artificial neural networks (ANNs) for prognostic and diagnostic classification in clinical medicine has become very popular. Some indications might be derived from a recent "mini-series" in the Lancet 7,23,30,94 with three more or less enthusiastic review articles and an additional commentary expressing at least some scepticism. In this paper, the essentials of feed-forward neural networks and their statistical counterparts (e.g. logistic regression models) are reviewed. We point to serious problems of ANNs as the fitting of implausible functions to describe the probability of class membership and the underestimation of misclassification probabilities. In applications of ANNs to survival data many suggested procedures result in predicted survival probabilities which are not necessarily monotone functions of time and lack a proper incorporation of censored observations. Finally, the results of a search in the medical literature from 1991 to 1995 on applications of A...
Bayesian Modelling of Inseparable Space-Time Variation in Disease Risk
, 1998
"... This paper proposes a unified framework for the analysis of incidence or mortality data in space and time. The problem with such analysis is that the number of cases and the corresponding population at risk in any single unit of space \Theta time are too small to produce a reliable estimate of the u ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
This paper proposes a unified framework for the analysis of incidence or mortality data in space and time. The problem with such analysis is that the number of cases and the corresponding population at risk in any single unit of space \Theta time are too small to produce a reliable estimate of the underlying disease risk without "borrowing strength" from neighbouring cells. The goal here could be described as one of smoothing, in which both spatial and non--spatial considerations may arise, and spatio--temporal interactions may become an important feature. Based on an extended version of the main effects model proposed in KnorrHeld and Besag (1998), four generic types of space \Theta time interactions are introduced. Each type implies a certain degree of prior (in)dependence for interaction parameters, and corresponds to the product of one of the two spatial main effects with one of the two temporal main effects. Data analysis is implemented via Markov chain Monte Carlo methods. The methodology is illustrated by an analysis of Ohio lung cancer data 1968-88. We compare the fit and the complexity of each model by the DIC criterion, recently proposed in Spiegelhalter et al. (1998).
Bowhead whale, Balaena mysticetus, population size estimated from acoustic and visual census data collected near
, 1994
"... Commission). We are very grateful to Andrew A. Scha ner for excellent research assistance. We thank Dr. Thomas F. Albert, Craig George and other scientists and personnel from the Borough's Department of Wildlife Management, the many other researchers with whom we have worked, most of whose names app ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Commission). We are very grateful to Andrew A. Scha ner for excellent research assistance. We thank Dr. Thomas F. Albert, Craig George and other scientists and personnel from the Borough's Department of Wildlife Management, the many other researchers with whom we have worked, most of whose names appear on papers in our reference list, the census crew, and the Eskimo hunters of the Borough, for their contributions to our understanding of bowhead whales and the census. We are also grateful to Geof Givens for useful discussions, and to Doug Butterworth and Andre Punt Estimating the population size and rate of increase of bowhead whales, Balaena mysticetus, is important because bowheads were the rst species of great whale for which commercial whaling stopped and so their status indicates the recovery prospects of other great whales, and also because this information is used by the International Whaling Commission (IWC) to set the aboriginal subsistence whaling quota for Alaskan Eskimos. We describe the 1993 visual and acoustic census o Point Barrow, Alaska, which provides the best data available for estimating these quantities. We outline the de nitive version of two statistical methods for estimating the population, the generalized removal method and the Bayes empirical Bayes
A Bayesian Approach to Robust Binary Nonparametric Regression
, 1997
"... This paper presents a Bayesian approach to binary nonparametric regression which assumes that the argument of the link is an additive function of the explanatory variables and their multiplicative interactions. The paper makes the following contributions. First, a comprehensive approach is presented ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
This paper presents a Bayesian approach to binary nonparametric regression which assumes that the argument of the link is an additive function of the explanatory variables and their multiplicative interactions. The paper makes the following contributions. First, a comprehensive approach is presented in which the function estimates are smoothing splines with the smoothing parameters integrated out, and the estimates made robust to outliers. Second, the approach can handle a wide rage of link functions. Third, efficient state space based algorithms are used to carry out the computations. Fourth, an extensive set of simulations is carried out which show that the Bayesian estimator works well and compares favorably to two estimators which are widely used in practice.

