Results 1  10
of
120
A Statistical Perspective on Knowledge Discovery in Databases
, 1996
"... The quest to find models usefully characterizing data is a process central to the scientific method, and has been carried out on many fronts. Researchers from an expanding number of fields have designed algorithms to discover rules or equations that capture key relationships between variables in a d ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
The quest to find models usefully characterizing data is a process central to the scientific method, and has been carried out on many fronts. Researchers from an expanding number of fields have designed algorithms to discover rules or equations that capture key relationships between variables in a database. The task of this chapter is to provide a perspective on statistical techniques applicable to KDD; accordingly, we review below some major advances in statistics in the last few decades. We next highlight some distinctives of what may be called a "statistical viewpoint." Finally we overview some influential classical and modern statistical methods for practical model induction.
Separate modifiability, mental modules, and the use of pure and composite measures to reveal them
 ACTA PSYCHOLOGICA
, 2001
"... ..."
Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation
 Monthly Weather Review
, 2005
"... Ensemble prediction systems typically show positive spreaderror correlation, but they are subject to forecast bias and underdispersion, and therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy to implement postprocessing technique that addresses b ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
Ensemble prediction systems typically show positive spreaderror correlation, but they are subject to forecast bias and underdispersion, and therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy to implement postprocessing technique that addresses both forecast bias and underdispersion and takes account of the spreadskill relationship. The technique is based on multiple linear regression and akin to the superensemble approach that has traditionally been used for deterministicstyle forecasts. The EMOS technique yields probabilistic forecasts that take the form of Gaussian predictive probability density functions (PDFs) for continuous weather variables, and can be applied to gridded model output. The EMOS predictive mean is an optimal, biascorrected weighted average of the ensemble member forecasts, with coefficients that are constrained to be nonnegative and associated with the member model skill. The EMOS predictive mean provides a highly accurate deterministicstyle forecast. The EMOS predictive variance is a linear function of the ensemble spread. For fitting the EMOS coefficients, the method of minimum CRPS estimation is introduced.
Efficient Estimation and Inferences for VaryingCoefficient Models
 Journal of the American Statistical Association
, 1999
"... This paper deals with statistical inferences based on the varyingcoefficient models proposed by Hastie and Tibshirani (1993). Local polynomial regression techniques are used to estimate coefficient functions and the asymptotic normality of the resulting estimators is established. The standard error ..."
Abstract

Cited by 32 (15 self)
 Add to MetaCart
This paper deals with statistical inferences based on the varyingcoefficient models proposed by Hastie and Tibshirani (1993). Local polynomial regression techniques are used to estimate coefficient functions and the asymptotic normality of the resulting estimators is established. The standard error formulas for estimated coefficients are derived and are empirically tested. A goodnessoffit test technique, based on a nonparametric maximum likelihood ratio type of test, is also proposed to detect whether certain coefficient functions in a varyingcoefficient model are constant or whether any covariates are statistically significant in the model. The null distribution of the test is estimated by a conditional bootstrap method. Our estimation techniques involve solving hundreds of local likelihood equations. To reduce computational burden, a onestep NewtonRaphson estimator is proposed and implemented. We show that the resulting onestep procedure can save computational cost in an order ...
Do GetOutTheVote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments
 American Political Science Review
, 2005
"... In their landmark study of a field experiment, Gerber and Green (2000) found that getoutthevote calls reduce turnout by five percentage points. In this article, I introduce statistical methods that can uncover discrepancies between experimental design and actual implementation. The application of ..."
Abstract

Cited by 29 (13 self)
 Add to MetaCart
In their landmark study of a field experiment, Gerber and Green (2000) found that getoutthevote calls reduce turnout by five percentage points. In this article, I introduce statistical methods that can uncover discrepancies between experimental design and actual implementation. The application of this methodology shows that Gerber and Green’s negative finding is caused by inadvertent deviations from their stated experimental protocol. The initial discovery led to revisions of the original data by the authors and retraction of the numerical results in their article. Analysis of their revised data, however, reveals new systematic patterns of implementation errors. Indeed, treatment assignments of the revised data appear to be even less randomized than before their corrections. To adjust for these problems, I employ a more appropriate statistical method and demonstrate that telephone canvassing increases turnout by five percentage points. This article demonstrates how statistical methods can find and correct complications of field experiments. Voter mobilization campaigns are a central part of democratic elections. In the 2000 general election, for example, the Democratic and Republican parties spent an estimated $100 million on
Bayesian inference for generalized linear mixed models of portfolio credit risk
 Journal of Empirical Finance
, 2007
"... The aims of this paper are threefold. First we highlight the usefulness of generalized linear mixed models (GLMMs) in the modelling of portfolio credit default risk. The GLMMsetting allows for a flexible specification of the systematic portfolio risk in terms of observed fixed effects and unobserve ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
The aims of this paper are threefold. First we highlight the usefulness of generalized linear mixed models (GLMMs) in the modelling of portfolio credit default risk. The GLMMsetting allows for a flexible specification of the systematic portfolio risk in terms of observed fixed effects and unobserved random effects, in order to explain the phenomena of default dependence and timeinhomogeneity in empirical default data. Second we show that computational Bayesian techniques such as the Gibbs sampler can be successfully applied to fit models with serially correlated random effects, which are special instances of state space models. Third we provide an empirical study using Standard & Poor’s data on US firms. A model incorporating rating category and sector effects and a macroeconomic proxy variable for stateoftheeconomy suggests the presence of a residual, cyclical, latent component in the systematic risk.
Processes with Long Memory: Regenerative Construction and Perfect
 k=1 j1,...,jk∈A j1∈A ( = ξt a + ∑ j1∈A αj1ξt−j1 αj1Xt−j1 ) αj1ξt−j1 · · · αjkξt−j1−···−jka ( a + +∞∑ k=2 ) . □ αj2ξt−j1−j2 . . . αjk ξt−j1−j2−···−jk
, 2002
"... We present a perfect simulation algorithm for stationary processes indexed by Z, with summable memory decay. Depending on the decay, we construct the process on finite or semiinfinite intervals, explicitly from an i.i.d. uniform sequence. Even though the process has infinite memory, its value at ti ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We present a perfect simulation algorithm for stationary processes indexed by Z, with summable memory decay. Depending on the decay, we construct the process on finite or semiinfinite intervals, explicitly from an i.i.d. uniform sequence. Even though the process has infinite memory, its value at time 0 depends only on a finite, but random, number of these uniform variables. The algorithm is based on a recent regenerative construction of these measures by Ferrari, Maass, Martínez and Ney. As applications, we discuss the perfect simulation of binary autoregressions and Markov chains on the unit interval. 1. Introduction. In
Localization of Function Via Lesion Analysis
, 2003
"... This paper presents a general approach for employing lesion analysis to address the fundamental challenge of localizing functions in a neural system. We describe the Functional Contribution Analysis (FCA) which assigns contribution values to the elements of the network such that the ability to predi ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
This paper presents a general approach for employing lesion analysis to address the fundamental challenge of localizing functions in a neural system. We describe the Functional Contribution Analysis (FCA) which assigns contribution values to the elements of the network such that the ability to predict the network's performance in response to multilesions is maximized. The approach is thoroughly examined on neurocontroller networks of evolved autonomous agents. The FCA portrays a stable set of neuronal contributions and accurate multilesion predictions, which are significantly better than those obtained based on the classical single lesion approach. It is also utilized for a detailed synaptic analysis of the neurocontroller connectivity network, delineating its main functional backbone. The FCA provides a...
Modeling the survival of Chinook salmon smolts outmigrating through the lower Sacramento River system
 Journal of the American Statistical Association
, 2002
"... A quasilikelihood model with a ridge parameter was developed to understand the factors possibly associated with the survival of juvenile chinook salmon smolts outmigrating through the lower portions of the Sacramento river system. Codedwiretagged (CWT) chinook salmon smolts were released at vario ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
A quasilikelihood model with a ridge parameter was developed to understand the factors possibly associated with the survival of juvenile chinook salmon smolts outmigrating through the lower portions of the Sacramento river system. Codedwiretagged (CWT) chinook salmon smolts were released at various locations within the river between the years 1979 and 1995. Recoveries of these juvenile salmon in a lower river trawl fishery and later recoveries of adults from samples of ocean catches provided the basic data. Due to the number of interested parties with differing aprioriopinions as to which factors most affected survival, a large number of covariates were required relative to the number of cases. To stabilize the parameter estimates and to improve predictive ability, a ridge parameter was included. Given the complexity of the processes generating recoveries, including possible dependencies between fish, and the additional sources of variation experienced by ocean recoveries relative to river recoveries, separate dispersion parameters were applied to the river and ocean recoveries. Interpretation of estimated coefficients was delicate given correlation between some of the covariates, the biases introduced by the ridge parameter, and possible confounding factors. With these caveats in mind, we found the most influential covariates to be the water