Results 1  10
of
21
Bayesian Statistics
 in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the KullbackLeibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum KullbackLiebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum KullbackLeibler distance through bias reduction. This bias, which is inevitable in model
Advances on BYY Harmony Learning: Information Theoretic Perspective, Generalized Projection Geometry, and Independent Factor Autodetermination
, 2004
"... The nature of Bayesian YingYang harmony learning is reexamined from an information theoretic perspective. Not only its ability for model selection and regularization is explained with new insights, but also discussions are made on its relations and differences from the studies of minimum descripti ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
The nature of Bayesian YingYang harmony learning is reexamined from an information theoretic perspective. Not only its ability for model selection and regularization is explained with new insights, but also discussions are made on its relations and differences from the studies of minimum description length (MDL), Bayesian approach, the bitback based MDL, Akaike information criterion (AIC), maximum likelihood, information geometry, Helmholtz machines, and variational approximation. Moreover, a generalized projection geometry is introduced for further understanding such a new mechanism. Furthermore, new algorithms are also developed for implementing Gaussian factor analysis (FA) and nonGaussian factor analysis (NFA) such that selecting appropriate factors is automatically made during parameter learning.
Natural expectations, macroeconomic dynamics, and asset pricing
 NBER MACROECONOMICS ANNUAL
, 2011
"... How does an economy behave if (1) fundamentals are truly humpshaped, exhibiting momentum in the short run and partial mean reversion in the long run, and (2) agents do not know that fundamentals are humpshaped and base their beliefs on parsimonious models that they fit to the available data? A cla ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
How does an economy behave if (1) fundamentals are truly humpshaped, exhibiting momentum in the short run and partial mean reversion in the long run, and (2) agents do not know that fundamentals are humpshaped and base their beliefs on parsimonious models that they fit to the available data? A class of parsimonious models leads to qualitatively similar biases and generates empirically observed patterns in asset prices and macroeconomic dynamics. First, parsimonious models will robustly pick up the shortterm momentum in fundamentals but will generally fail to fully capture the longrun mean reversion. Beliefs will therefore be characterized by endogenous extrapolation bias and procyclical excess optimism. Second, asset prices will be highly volatile and exhibit partial mean reversion—i.e., overreaction. Excess returns will be negatively predicted by lagged excess returns, P/E ratios, and consumption growth. Third, real economic activity will have amplified cycles. For example, consumption growth will be negatively autocorrelated in the medium run. Fourth, the equity premium will be large. Agents will perceive that equities are very risky when in fact longrun equity returns will covary only weakly with longrun consumption growth. If agents had rational expectations, the equity premium would be close to zero. Fifth, sophisticated agents—i.e., those who are assumed to know the true model—will hold far more equity than investors who use parsimonious models. Moreover, sophisticated agents will follow a countercyclical asset allocation policy. These predicted effects are qualitatively confirmed in U.S. data.
Temporal BYY Encoding, Markovian State Spaces, and Space Dimension Determination
, 2004
"... As a complementary to those temporal coding approaches of the current major stream, this paper aims at the Markovian state space temporal models from the perspective of the temporal Bayesian YingYang (BYY) learning with both new insights and new results on not only the discrete state featured Hidde ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
As a complementary to those temporal coding approaches of the current major stream, this paper aims at the Markovian state space temporal models from the perspective of the temporal Bayesian YingYang (BYY) learning with both new insights and new results on not only the discrete state featured Hidden Markov model and extensions but also the continuous state featured linear state spaces and extensions, especially with a new learning mechanism that makes selection of the state number or the dimension of state space either automatically during adaptive learning or subsequently after learning via model selection criteria obtained from this mechanism. Experiments are demonstrated to show how the proposed approach works.
Generalizing The Derivation Of The Schwarz Information Criterion
, 1999
"... The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Althoug ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Although the original derivation assumes that the observed data is independent, identically distributed, and arising from a probability distribution in the regular exponential family, SIC has traditionally been used in a much larger scope of model selection problems. To better justify the widespread applicability of SIC, we derive the criterion in a very general framework: one which does not assume any specific form for the likelihood function, but only requires that it satisfies certain nonrestrictive regularity conditions.
Don’t shed tears over breaks
 DMV Nachrichten
, 2005
"... imaging Mathematical Subject Classification: 93E14, 62G08, 68T45, 49M20, 90C31 This essay deals with ‘discontinuous phenomena ’ in timeseries. It is an introduction to, and a brief survey of aspects concerning the concepts of segmentation into ‘smooth ’ pieces on the one hand, and the complementary ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
imaging Mathematical Subject Classification: 93E14, 62G08, 68T45, 49M20, 90C31 This essay deals with ‘discontinuous phenomena ’ in timeseries. It is an introduction to, and a brief survey of aspects concerning the concepts of segmentation into ‘smooth ’ pieces on the one hand, and the complementary notion of the identification of jumps, on the other hand. We restrict ourselves to variational approaches, both in discrete, and in continuous time. They will define ‘filters’, with data as ‘inputs ’ and minimizers of functionals as ‘outputs’. The main example is a particularly simple model, which, for historical reasons, we decided to call the Potts functional. We will argue that it is an appropriate tool for the extraction of the simplest and most basic morphological features from data. This is an attempt to interpret data from a welldefined point of view. It is in contrast to restoration of a true signal perhaps distorted and degraded by noise which is not in the main focus of this paper.
Information and Posterior Probability Criteria for Model Selection in Local Likelihood Estimation
 J Amer. Stat. Ass
, 1998
"... this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also m ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also made useful for window span selection. The usefulness of the weighted version of these criteria are demonstrated through a simulation study and an application to three data sets. KEY WORDS: Information Criteria; Posterior Probability Criteria; Model Selection; Local Likelihood. 1. INTRODUCTION Local regression has become a popular method for smoothing scatterplots and for nonparametric regression in general. It has proven to be a useful tool in finding structure in datasets (Cleveland and Devlin 1988). Local regression estimation is a method for smoothing scatterplots (x i ; y i ), i = 1; : : : ; n in which the fitted value at x 0 is the value of a polynomial fit to the data using weighted least squares where the weight given to (x i ; y i ) is related to the distance between x i and x 0 . Stone (1977) shows that estimates obtained using the local regression methods have desirable theoretical properties. Recently, Fan (1993) has studied minimax properties of local linear regression. Tibshirani and Hastie (1987) extend the ideas of local regression to a local likelihood procedure. This procedure is designed for nonparametric regression modeling in situations where weighted least squares is inappropriate as an estimation method, for example binary data. Local regression may be viewed as a special case of local likelihood estimation. Tibshirani and Hastie (1987), Staniswalis (1989), and Loader (1999) apply local likelihood estimation to several types of data where local regressio...
Characterizing Clutter in the Context of Detecting Weak Gaseous Plumes using the
 SEBASS Sensor. Los Alamos National Laboratory Restricted Release Report
, 2006
"... Abstract: Weak gaseous plume detection in hyperspectral imagery requires that background clutter consisting of a mixture of components such as water, grass, and asphalt be well characterized. The appropriate characterization depends on analysis goals. Although we almost never see clutter as a single ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract: Weak gaseous plume detection in hyperspectral imagery requires that background clutter consisting of a mixture of components such as water, grass, and asphalt be well characterized. The appropriate characterization depends on analysis goals. Although we almost never see clutter as a singlecomponent multivariate Gaussian (SCMG), alternatives such as various mixture distributions that have been proposed might not be necessary for modeling clutter in the context of plume detection when the chemical targets that could be present are known at least approximately. Our goal is to show to what extent the generalized least squares (GLS) approach applied to real data to look for evidence of known chemical targets leads to chemical concentration estimates and to chemical probability estimates (arising from repeated application of the GLS approach) that are similar to corresponding estimates arising from simulated SCMG data. In some cases, approximations to decision thresholds or confidence estimates based on assuming the clutter has a SCMG distribution will not be sufficiently accurate. Therefore, we also describe a strategy that uses a scenespecific reference distribution to estimate decision thresholds for plume detection and associated confidence measures.
Machine learning problems from optimization perspective
"... Abstract Both optimization and learning play important roles in a system for intelligent tasks. On one hand, we introduce three types of optimization tasks studied in the machine learning literature, corresponding to the three levels of inverse problems in an intelligent system. Also, we discuss thr ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract Both optimization and learning play important roles in a system for intelligent tasks. On one hand, we introduce three types of optimization tasks studied in the machine learning literature, corresponding to the three levels of inverse problems in an intelligent system. Also, we discuss three major roles of convexity in machine learning, either directly towards a convex programming or approximately transferring a difficult problem into a tractable one in help of local convexity and convex duality. No doubly, a good optimization algorithm takes an essential role in a learning process and new developments in the literature of optimization may thrust the advances of machine learning. On the other hand, we also interpret that the key task of learning is not simply optimization, as sometimes misunderstood in the optimization literature. We introduce the key challenges of learning and the current status of efforts towards the challenges. Furthermore, learning versus optimization has also been examined from a unified perspective under the name of Bayesian YingYang learning, with combinatorial optimization made more effectively in help of learning.
Polynomial Neural Network for Linear and Nonlinear Model Selection In QuantitativeStructure Activity . . .
 SAR QSAR ENVIRON. RES
, 2000
"... This article presents a selforganising multilayered iterative algorithm that provides linear and nonlinear polynomial regression models thus allowing the user to control the number and the power of the terms in the models. The accuracy of the algorithm is compared to the partial least squares (PLS ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This article presents a selforganising multilayered iterative algorithm that provides linear and nonlinear polynomial regression models thus allowing the user to control the number and the power of the terms in the models. The accuracy of the algorithm is compared to the partial least squares (PLS) algorithm using fourteen data sets in quantitativestructure activity relationship studies. The calculated data shows that the proposed method is able to select simple models characterized by a high prediction ability and thus provide a considerable interest in quantitative  structure activity relationship studies. The software is developed using clientserver protocol (Java and C++ languages) and is available for worldwide users on the Web site of the authors.