Results 1  10
of
19
Bayesian measures of model complexity and fit
 Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract

Cited by 132 (2 self)
 Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
Bayesian Statistics
 in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the KullbackLeibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum KullbackLiebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum KullbackLeibler distance through bias reduction. This bias, which is inevitable in model
Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm
, 2000
"... We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent nonequivalent ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent nonequivalent risks, such as nal prediction error or expected KullbackLeibler information. We consider the asymptotic behavior of dierent risk functions and show how they can be generally estimated with the same resampling strategy. Such estimated risks then yield new model selection criteria. In particular, we obtain a datadriven tuning of Rissanen's tree structured context algorithm which is a computationally feasible procedure for selection and estimation of a VLMC. Key words and phrases. Bootstrap, zeroone loss, nal prediction error, nitememory source, FSMX model, KullbackLeibler information, L 2 loss, optimal tree pruning, resampling, tree model. Short title: Selecting variable length Mar...
InSample OutofSample Fit: Their Joint Distribution and Its Implications for Model Selection.” Unpublished manuscript
, 2008
"... Prepared for the 5th ECB Workshop on Forecasting Techniques We consider the case where a parameter, ; is estimated by maximizing a criterion function, Q(X;). The estimate is then used to evaluate the criterion function with the same data, X, as well as with an independent data set, Y. The insample ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Prepared for the 5th ECB Workshop on Forecasting Techniques We consider the case where a parameter, ; is estimated by maximizing a criterion function, Q(X;). The estimate is then used to evaluate the criterion function with the same data, X, as well as with an independent data set, Y. The insample …t and outofsample …t relative to that of 0; the “true ” parameter, are given by Tx;x =
Learning Efficiency of Redundant Neural Networks in Bayesian Estimation
, 2001
"... This paper proves that the Bayesian stochastic complexity of a layered neural network is asymptotically smaller than that of a regular statistical model if it contains the true distribution. We consider a case when a threelayer perceptron with M input units, H hidden units, and N output units is t ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This paper proves that the Bayesian stochastic complexity of a layered neural network is asymptotically smaller than that of a regular statistical model if it contains the true distribution. We consider a case when a threelayer perceptron with M input units, H hidden units, and N output units is trained to estimate the true distribution represented by the model with H 0 hidden units, and prove that the stochastic complexity is asymptotically smaller than (1/2){H 0 (M + N) + R} log n where n is the number of training samples and R is a function of HH 0 , M , and N that is far smaller than the number of redundant parameters. Since the generalization error of Bayesian estimation is equal to the increase of stochastic complexity, it is smaller than (1/2n){H 0 (M +N)+R} if it has an asymptotic expansion. Based on the results, the di#erence between layered neural networks and regular statistical models is discussed from the statistical point of view. Key Words: Generalization Error, Kullback Information, Free Energy, Bayesian Learning, Nonidentifiable model. 1
Construction of Phoneme Models Model Search Of Hidden Markov Models
 In International Workshop on Intelligent Signal Processing and Communication Systems
, 1993
"... The author proposes an algorithm to define a structure of a HMM(hidden Markov model)[1]. HMMs are widely used in the speech recognition systems and at that time structures are usually determined according to the heuristic knowledge.In this article this problem is treated as socalled "model selectio ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The author proposes an algorithm to define a structure of a HMM(hidden Markov model)[1]. HMMs are widely used in the speech recognition systems and at that time structures are usually determined according to the heuristic knowledge.In this article this problem is treated as socalled "model selection" problem in statistics. Two recognition experiments using the algorithm are shown. First, artificial data then, ATR speech database are used for the source. Through these experiments, the author shows that such model selection is effective.
Distributions of Maximum Likelihood Estimators and Model Comparisons
"... Abstract—Experimental data need to be assessed for purposes of model identification, estimation of model parameters and consequences of misspecified model fits. Here the first and third factors are considered via analytic formulations for the distribution of the maximum likelihood estimates. When es ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract—Experimental data need to be assessed for purposes of model identification, estimation of model parameters and consequences of misspecified model fits. Here the first and third factors are considered via analytic formulations for the distribution of the maximum likelihood estimates. When estimating this distribution with statistics, it is a tradition to invert the roles of population quantities and quantities that have been estimated from the observed sample. If the model is known, simulations, normal approximations and p*formula methods can be used. However, exact analytic methods for describing the estimator density are recommended. One of the methods (TED) can be used when the data generating model differs from the estimation model, which allows for the estimation of common parameters across a suite of candidate models. Information criteria such as AIC can be used to pick a winning model. AIC is however approximate and generally only asymptotically correct. For fairly simple models, where expressions remain tractable, the exact estimator density under TED allows for comparisons between models. This is illustrated via a novel information criterion. Three linear models are compared and fitted to econometric data on patent filings.
MIXTURE ESTIMATION WITH MULTICHANNEL IMAGE DATA
, 1990
"... The following problem arises in computer vision, diagnostic medical imaging and remote sensing: At each pixel in an image a vector of observations is measured. The distribution of these measurements is modeled by a mixture of certain pure class distributions. The goal is to estimate the mixing pro ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The following problem arises in computer vision, diagnostic medical imaging and remote sensing: At each pixel in an image a vector of observations is measured. The distribution of these measurements is modeled by a mixture of certain pure class distributions. The goal is to estimate the mixing proportions of the classes by pixel in the image together with any unknown parameters in the pure class distributions. In many problems of this type it is appropriate to incorporate constraints on the mixing proportions. This paper deals with spatial smoothness constraints. An estimation methodology using penalized likelihood with multiple smoothing parameters is applied. Numerical methods for evaluating parameters in the model are developed. These methods make essential use of the Expectation Maximization formalism. A novel Monte Carlo importance sampling technique for approximating the effective degrees of freedom of the model is described. The methodology is illustrated with
Information and Posterior Probability Criteria for Model Selection in Local Likelihood Estimation
 J Amer. Stat. Ass
, 1998
"... this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also m ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also made useful for window span selection. The usefulness of the weighted version of these criteria are demonstrated through a simulation study and an application to three data sets. KEY WORDS: Information Criteria; Posterior Probability Criteria; Model Selection; Local Likelihood. 1. INTRODUCTION Local regression has become a popular method for smoothing scatterplots and for nonparametric regression in general. It has proven to be a useful tool in finding structure in datasets (Cleveland and Devlin 1988). Local regression estimation is a method for smoothing scatterplots (x i ; y i ), i = 1; : : : ; n in which the fitted value at x 0 is the value of a polynomial fit to the data using weighted least squares where the weight given to (x i ; y i ) is related to the distance between x i and x 0 . Stone (1977) shows that estimates obtained using the local regression methods have desirable theoretical properties. Recently, Fan (1993) has studied minimax properties of local linear regression. Tibshirani and Hastie (1987) extend the ideas of local regression to a local likelihood procedure. This procedure is designed for nonparametric regression modeling in situations where weighted least squares is inappropriate as an estimation method, for example binary data. Local regression may be viewed as a special case of local likelihood estimation. Tibshirani and Hastie (1987), Staniswalis (1989), and Loader (1999) apply local likelihood estimation to several types of data where local regressio...
Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation
 Journal of Machine Learning Research
"... In this paper, we consider the asymptotic form of the generalization error for the restricted Boltzmann machine in Bayesian estimation. It has been shown that obtaining the maximum pole of zeta functions is related to the asymptotic form of the generalization error for hierarchical learning models ( ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper, we consider the asymptotic form of the generalization error for the restricted Boltzmann machine in Bayesian estimation. It has been shown that obtaining the maximum pole of zeta functions is related to the asymptotic form of the generalization error for hierarchical learning models (Watanabe, 2001a,b). The zeta function is defined by using a Kullback function. We use two methods to obtain the maximum pole: a new eigenvalue analysis method and a recursive blowing up process. We show that these methods are effective for obtaining the asymptotic form of the generalization error of hierarchical learning models.