Results 1  10
of
125
Flexible smoothing with Bsplines and penalties
 Statistical Science
, 1996
"... Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots ..."
Abstract

Cited by 179 (3 self)
 Add to MetaCart
Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent Bsplines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of Bsplines, their construction, and penalized likelihood is presented. We discuss properties of penalized Bsplines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented. Keywords: Generalized linear models, smoothing, nonparametric models, splines, density estimation. Address for correspondence: DCMR Milieudienst Rijnmond, 'sGravelandse...
Bayesian measures of model complexity and fit
 Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract

Cited by 138 (2 self)
 Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications,” manuscript, available at wwwstat.wharton.upenn.edu/~buja
, 2005
"... What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: socalled “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisherconsistent manner. Proper scoring rules comprise most ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: socalled “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisherconsistent manner. Proper scoring rules comprise most loss functions currently in use: logloss, squared error loss, boosting loss, and as limiting cases costweighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of costweighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 11 correspondence with information measures for treebased classification.
Piecewisepolynomial regression trees
 Statistica Sinica
, 1994
"... A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed PiecewisePolynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed PiecewisePolynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data space. Partitioning is carried out recursively as in a treestructured method. If the estimate is required to be smooth, the polynomial pieces may be glued together by means of weighted averaging. The smoothed estimate is thus obtained in three steps. In the first step, the regressor space is recursively partitioned until the data in each piece are adequately fitted by a polynomial of a fixed order. Partitioning is guided by analysis of the distributions of residuals and crossvalidation estimates of prediction mean square error. In the second step, the data within a neighborhood of each partition are fitted by a polynomial. The final estimate of the regression function is obtained by averaging the polynomial pieces, using smooth weight functions each of which diminishes rapidly to zero outside its associated partition. Estimates of derivatives of the regression function may be
On the Misuses of Artificial Neural Networks for Prognostic and Diagnostic Classification in Oncology
 Statistics in Medicine
, 2000
"... The application of artificial neural networks (ANNs) for prognostic and diagnostic classification in clinical medicine has become very popular. Some indications might be derived from a recent "miniseries" in the Lancet 7,23,30,94 with three more or less enthusiastic review articles and an additio ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
The application of artificial neural networks (ANNs) for prognostic and diagnostic classification in clinical medicine has become very popular. Some indications might be derived from a recent "miniseries" in the Lancet 7,23,30,94 with three more or less enthusiastic review articles and an additional commentary expressing at least some scepticism. In this paper, the essentials of feedforward neural networks and their statistical counterparts (e.g. logistic regression models) are reviewed. We point to serious problems of ANNs as the fitting of implausible functions to describe the probability of class membership and the underestimation of misclassification probabilities. In applications of ANNs to survival data many suggested procedures result in predicted survival probabilities which are not necessarily monotone functions of time and lack a proper incorporation of censored observations. Finally, the results of a search in the medical literature from 1991 to 1995 on applications of A...
Bayesian Deviance, the Effective Number of Parameters, and the Comparison of Arbitrarily Complex Models
, 1998
"... We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. We follow Dempster in examining the posterior distribution of the loglikelihood under each model, from which we derive measures of fit and complexity (the effective number of p ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. We follow Dempster in examining the posterior distribution of the loglikelihood under each model, from which we derive measures of fit and complexity (the effective number of parameters). These may be combined into a Deviance Information Criterion (DIC), which is shown to have an approximate decisiontheoretic justification. Analytic and asymptotic identities reveal the measure of complexity to be a generalisation of a wide range of previous suggestions, with particular reference to the neural network literature. The contributions of individual observations to fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. The procedure is illustrated in a number of examples, and throughout it is emphasised that the required quantities are trivial to compute in a Markov chain Monte Carlo analysis, and require no analytic work for new...
Bayesian Modelling of Inseparable SpaceTime Variation in Disease Risk
, 1998
"... This paper proposes a unified framework for the analysis of incidence or mortality data in space and time. The problem with such analysis is that the number of cases and the corresponding population at risk in any single unit of space \Theta time are too small to produce a reliable estimate of the u ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
This paper proposes a unified framework for the analysis of incidence or mortality data in space and time. The problem with such analysis is that the number of cases and the corresponding population at risk in any single unit of space \Theta time are too small to produce a reliable estimate of the underlying disease risk without "borrowing strength" from neighbouring cells. The goal here could be described as one of smoothing, in which both spatial and nonspatial considerations may arise, and spatiotemporal interactions may become an important feature. Based on an extended version of the main effects model proposed in KnorrHeld and Besag (1998), four generic types of space \Theta time interactions are introduced. Each type implies a certain degree of prior (in)dependence for interaction parameters, and corresponds to the product of one of the two spatial main effects with one of the two temporal main effects. Data analysis is implemented via Markov chain Monte Carlo methods. The methodology is illustrated by an analysis of Ohio lung cancer data 196888. We compare the fit and the complexity of each model by the DIC criterion, recently proposed in Spiegelhalter et al. (1998).
Estimation in generalized linear models for functional data via penalized likelihood
 Journal of Multivariate Analysis
, 2005
"... We analyze in a regression setting the link between a scalar response and a functional predictor by means of a Functional Generalized Linear Model. We first give a theoretical framework and then discuss identifiability of the model. The functional coefficient of the model is estimated via penalized ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We analyze in a regression setting the link between a scalar response and a functional predictor by means of a Functional Generalized Linear Model. We first give a theoretical framework and then discuss identifiability of the model. The functional coefficient of the model is estimated via penalized likelihood with spline approximation. The L² rate of convergence of this estimator is given under smoothness assumption on the functional coefficient. Heuristic arguments show how these rates may be improved for some particular frameworks.
Bowhead whale, Balaena mysticetus, population size estimated from acoustic and visual census data collected near
, 1994
"... Commission). We are very grateful to Andrew A. Scha ner for excellent research assistance. We thank Dr. Thomas F. Albert, Craig George and other scientists and personnel from the Borough's Department of Wildlife Management, the many other researchers with whom we have worked, most of whose names app ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
Commission). We are very grateful to Andrew A. Scha ner for excellent research assistance. We thank Dr. Thomas F. Albert, Craig George and other scientists and personnel from the Borough's Department of Wildlife Management, the many other researchers with whom we have worked, most of whose names appear on papers in our reference list, the census crew, and the Eskimo hunters of the Borough, for their contributions to our understanding of bowhead whales and the census. We are also grateful to Geof Givens for useful discussions, and to Doug Butterworth and Andre Punt Estimating the population size and rate of increase of bowhead whales, Balaena mysticetus, is important because bowheads were the rst species of great whale for which commercial whaling stopped and so their status indicates the recovery prospects of other great whales, and also because this information is used by the International Whaling Commission (IWC) to set the aboriginal subsistence whaling quota for Alaskan Eskimos. We describe the 1993 visual and acoustic census o Point Barrow, Alaska, which provides the best data available for estimating these quantities. We outline the de nitive version of two statistical methods for estimating the population, the generalized removal method and the Bayes empirical Bayes