Results 1 - 10
of
62
A Statistical Perspective on Knowledge Discovery in Databases
, 1996
"... The quest to find models usefully characterizing data is a process central to the scientific method, and has been carried out on many fronts. Researchers from an expanding number of fields have designed algorithms to discover rules or equations that capture key relationships between variables in a d ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
The quest to find models usefully characterizing data is a process central to the scientific method, and has been carried out on many fronts. Researchers from an expanding number of fields have designed algorithms to discover rules or equations that capture key relationships between variables in a database. The task of this chapter is to provide a perspective on statistical techniques applicable to KDD; accordingly, we review below some major advances in statistics in the last few decades. We next highlight some distinctives of what may be called a "statistical viewpoint." Finally we overview some influential classical and modern statistical methods for practical model induction.
Bayesian inference for generalized linear mixed models of portfolio credit risk
- Journal of Empirical Finance
, 2007
"... The aims of this paper are threefold. First we highlight the usefulness of generalized linear mixed models (GLMMs) in the modelling of portfolio credit default risk. The GLMM-setting allows for a flexible specification of the systematic portfolio risk in terms of observed fixed effects and unobserve ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The aims of this paper are threefold. First we highlight the usefulness of generalized linear mixed models (GLMMs) in the modelling of portfolio credit default risk. The GLMM-setting allows for a flexible specification of the systematic portfolio risk in terms of observed fixed effects and unobserved random effects, in order to explain the phenomena of default dependence and time-inhomogeneity in empirical default data. Second we show that computational Bayesian techniques such as the Gibbs sampler can be successfully applied to fit models with serially correlated random effects, which are special instances of state space models. Third we provide an empirical study using Standard & Poor’s data on US firms. A model incorporating rating category and sector effects and a macroeconomic proxy variable for state-ofthe-economy suggests the presence of a residual, cyclical, latent component in the systematic risk.
Localization of Function Via Lesion Analysis
, 2003
"... This paper presents a general approach for employing lesion analysis to address the fundamental challenge of localizing functions in a neural system. We describe the Functional Contribution Analysis (FCA) which assigns contribution values to the elements of the network such that the ability to predi ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
This paper presents a general approach for employing lesion analysis to address the fundamental challenge of localizing functions in a neural system. We describe the Functional Contribution Analysis (FCA) which assigns contribution values to the elements of the network such that the ability to predict the network's performance in response to multi-lesions is maximized. The approach is thoroughly examined on neurocontroller networks of evolved autonomous agents. The FCA portrays a stable set of neuronal contributions and accurate multi-lesion predictions, which are significantly better than those obtained based on the classical single lesion approach. It is also utilized for a detailed synaptic analysis of the neurocontroller connectivity network, delineating its main functional backbone. The FCA provides a...
Efficient Estimation and Inferences for Varying-Coefficient Models
- Journal of the American Statistical Association
, 1999
"... This paper deals with statistical inferences based on the varying-coefficient models proposed by Hastie and Tibshirani (1993). Local polynomial regression techniques are used to estimate coefficient functions and the asymptotic normality of the resulting estimators is established. The standard error ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
This paper deals with statistical inferences based on the varying-coefficient models proposed by Hastie and Tibshirani (1993). Local polynomial regression techniques are used to estimate coefficient functions and the asymptotic normality of the resulting estimators is established. The standard error formulas for estimated coefficients are derived and are empirically tested. A goodness-of-fit test technique, based on a nonparametric maximum likelihood ratio type of test, is also proposed to detect whether certain coefficient functions in a varying-coefficient model are constant or whether any covariates are statistically significant in the model. The null distribution of the test is estimated by a conditional bootstrap method. Our estimation techniques involve solving hundreds of local likelihood equations. To reduce computational burden, a onestep Newton-Raphson estimator is proposed and implemented. We show that the resulting one-step procedure can save computational cost in an order ...
Variable Selection via Penalized Likelihood
, 1999
"... Variable selection is vital to statistical data analyses. Many of procedures in use are stepwise selection procedures, which can be expensive in computation and ignore stochastic errors in the variable selection process of previous steps. An automatic and simultaneous variable selection procedure ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Variable selection is vital to statistical data analyses. Many of procedures in use are stepwise selection procedures, which can be expensive in computation and ignore stochastic errors in the variable selection process of previous steps. An automatic and simultaneous variable selection procedure can be obtained by using a penalized likelihood method. In traditional linear models, the best subset selection and stepwise deletion methods coincide with a penalized least-squares method when design matrices are orthonormal. In this paper, we propose a few new approaches to selecting variables for linear models, robust regression models and generalized linear models based on a penalized likelihood approach. A family of thresholding functions are proposed. The LASSO proposed by Tibshirani (1996) is a member of the penalized least-squares with the L 1 -penalty. A smoothly clipped absolute deviation (SCAD) penalty function is introduced to ameliorate the properties of L 1 -penalty. A ...
Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation
- Monthly Weather Review
, 2005
"... Ensemble prediction systems typically show positive spread-error correlation, but they are subject to forecast bias and underdispersion, and therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy to imple-ment post-processing technique that addresses b ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Ensemble prediction systems typically show positive spread-error correlation, but they are subject to forecast bias and underdispersion, and therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy to imple-ment post-processing technique that addresses both forecast bias and underdispersion and takes account of the spread-skill relationship. The technique is based on multiple lin-ear regression and akin to the superensemble approach that has traditionally been used for deterministic-style forecasts. The EMOS technique yields probabilistic forecasts that take the form of Gaussian predictive probability density functions (PDFs) for continuous weather variables, and can be applied to gridded model output. The EMOS predictive mean is an optimal, bias-corrected weighted average of the ensemble member forecasts, with coefficients that are constrained to be nonnegative and associated with the member model skill. The EMOS predictive mean provides a highly accurate deterministic-style forecast. The EMOS predictive variance is a linear function of the ensemble spread. For fitting the EMOS coefficients, the method of minimum CRPS estimation is introduced.
Binary models for marginal independence
- JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B
, 2005
"... A number of authors have considered multivariate Gaussian models for marginal independence. In this paper we develop models for binary data with the same independence structure. The models can be parameterized based on Möbius inversion and maximum likelihood estimation can be performed using a versi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
A number of authors have considered multivariate Gaussian models for marginal independence. In this paper we develop models for binary data with the same independence structure. The models can be parameterized based on Möbius inversion and maximum likelihood estimation can be performed using a version of the Iterated Conditional Fitting algorithm. The approach is illustrated on a simple example. Relations to multivariate logistic and dependence ratio models are discussed.
Time series analysis via mechanistic models. In review; pre-published at arxiv.org/abs/0802.0021
, 2008
"... The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consi ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consideration of implicit dynamic models, meaning statistical models for stochastic dynamical systems which are specified by a simulation algorithm to generate sample paths. Inference procedures that operate on implicit models are said to have the plug-and-play property. Our work builds on recently developed plug-and-play inference methodology for partially observed Markov models. We introduce a class of implicitly specified Markov chains with stochastic transition rates, and we demonstrate its applicability to open problems in statistical inference for biological systems. As one example, these models are shown to give a fresh perspective on measles transmission dynamics. As a second example, we present a mechanistic analysis of cholera incidence data, involving interaction between two competing strains of the pathogen Vibrio cholerae. 1. Introduction. A
Spherical Subfamily Models
, 1999
"... A new method is presented for modeling low-dimensional representations of high-dimensional multinomial and compositional data. The data are fit to subfamilies of the multinomial family which are defined using the multinomial information geometry. These collections of spherical subfamilies have a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
A new method is presented for modeling low-dimensional representations of high-dimensional multinomial and compositional data. The data are fit to subfamilies of the multinomial family which are defined using the multinomial information geometry. These collections of spherical subfamilies have a number of advantages over the affine subfamilies contructed by methods such as canonical and correspondence analysis, traditionally fit to such data. First, they can describe more complex shapes in the data, and are particularly well-suited to modelling sparse data.

