Results 11  20
of
37
Missing Data Problems in Machine Learning
, 2008
"... Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with nonrandom missing data and classification with missing features. We begin by presenting ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with nonrandom missing data and classification with missing features. We begin by presenting and elaborating on the theory of missing data due to Little and Rubin. We place a particular emphasis on the missing at random assumption in the multivariate setting with arbitrary patterns of missing data. We derive inference and prediction methods in the presence of random missing data for a variety of probabilistic models including finite mixture models, Dirichlet process mixture models, and factor analysis. Based on this foundation, we develop several novel models and inference procedures for both the collaborative prediction problem and the problem of classification with missing features. We develop models and methods for collaborative prediction with nonrandom missing data by combining standard models for complete data with models of the missing data process. Using a novel recommender system data set and experimental protocol, we show that each proposed method achieves a substantial increase in rating prediction performance compared to models that assume missing ratings are missing at random.
PitmanYor ProcessBased Language Models for Machine Translation
"... The hierarchical PitmanYor processbased smoothing method applied to language model was proposed by Goldwater and by Teh; the performance of this smoothing method is shown comparable with the modified KneserNey method in terms of perplexity. Although this method was presented four years ago, there ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The hierarchical PitmanYor processbased smoothing method applied to language model was proposed by Goldwater and by Teh; the performance of this smoothing method is shown comparable with the modified KneserNey method in terms of perplexity. Although this method was presented four years ago, there has been no paper which reports that this language model indeed improves translation quality in the context of Machine Translation (MT). This is important for the MT community since an improvement in perplexity does not always lead to an improvement in BLEU score; for example, the success of word alignment measured by Alignment Error Rate (AER) does not often lead to an improvement in BLEU. This paper reports in the context of MT that an improvement in perplexity really leads to an improvement in BLEU score. It turned out that an application of the Hierarchical PitmanYor Language Model (HPYLM) requires a minor change in the conventional decoding process. Additionally to this, we propose a new PitmanYor processbased statistical smoothing method similar to the GoodTuring method although the performance of this is inferior to HPYLM. We conducted experiments; HPYLM improved by 1.03 BLEU points absolute and 6 % relative for 50k ENJP, which was statistically significant.
Bayesian Inference for Mixtures of Stable Distributions
"... In many different fields such as hydrology, telecommunications, physics of condensed matter and finance, the gaussian model results unsatisfactory and reveals difficulties in fitting data with skewness, heavy tails and multimodality. The use of stable distributions allows for modelling skewness and ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
In many different fields such as hydrology, telecommunications, physics of condensed matter and finance, the gaussian model results unsatisfactory and reveals difficulties in fitting data with skewness, heavy tails and multimodality. The use of stable distributions allows for modelling skewness and heavy tails but gives rise to inferential problems related to the estimation of the stable distribution's parameters. The aim of this work is to generalise the stable distribution framework by introducing a model that accounts also for multimodality. In particular we introduce a stable mixture model and a suitable reparameterisation of the mixture, which allow us to make inference on the mixture parameters. We use a full Bayesian approach and MCMC simulation techniques for the estimation of the posterior distribution.
EÆcient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables
"... Abstract. We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine largesample approximations for the marginal likelihood of naiveBayes models in which the root node is hidden. Such models are useful for clu ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine largesample approximations for the marginal likelihood of naiveBayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally eÆcient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) MinimumDescription Length (MDL). Also, we consider approximations that ignore some odiagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a MonteCarlo gold standard. In experiments with articial and real examples, we nd that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate approximations, the Cheeseman{Stutz and Diagonal approximations are the most computationally eÆcient, (4) all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model parameters, and (5) the Cheeseman{Stutz approximation can be more accurate than the other approximations, including the Laplace approximation, in situations where the parameters in the maximum a posteriori conguration are near a boundary.
Modeling the Acquisition of Domain Structure and Feature Understanding
"... Young children face a difficult learning task: without having access to the cues that full language mastery would provide, they must acquire conceptual knowledge about the world; likewise, they must somehow learn words while lacking the full conceptual structure that the words refer to. We present ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Young children face a difficult learning task: without having access to the cues that full language mastery would provide, they must acquire conceptual knowledge about the world; likewise, they must somehow learn words while lacking the full conceptual structure that the words refer to. We present a Bayesian framework that can model aspects of the acquisition of theory knowledge as a function of different types of input. We describe a set of developmental phenomena that our model can address.
Mental Models and the
 Control of Actions in Complex Environments. Risø National Laboratory
, 1987
"... ..."
Generative GoalDriven User Simulation for Dialog Management
"... User simulation is frequently used to train statistical dialog managers for taskoriented domains. At present, goaldriven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some taskspecific engineering, making them impossible to evaluate intrinsica ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
User simulation is frequently used to train statistical dialog managers for taskoriented domains. At present, goaldriven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some taskspecific engineering, making them impossible to evaluate intrinsically. Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. In this paper, we propose the first fully generative goaldriven simulator that is fully induced from data, without handcrafting or goal annotation. Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. We also show that features derived from our model allow significantly greater improvement over a baseline at distinguishing real from randomly permuted dialogs. 1
Nonparametric Bayesian methods for extracting structure from data
, 2008
"... One desirable property of machine learning algorithms is the ability to balance the number of parameters in a model in accordance with the amount of available data. Incorporating nonparametric Bayesian priors into models is one approach of automatically adjusting model capacity to the amount of avai ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
One desirable property of machine learning algorithms is the ability to balance the number of parameters in a model in accordance with the amount of available data. Incorporating nonparametric Bayesian priors into models is one approach of automatically adjusting model capacity to the amount of available data: with small datasets, models are less complex (require storing fewer parameters in memory), whereas with larger datasets, models are implicitly more complex (require storing more parameters in memory). Thus, nonparametric Bayesian priors satisfy frequentist intuitions about model complexity within a fully Bayesian framework. This thesis presents several novel machine learning models and applications that use nonparametric Bayesian priors. We introduce two novel models that use flat, Dirichlet process priors. The first is an infinite mixture of experts model, which builds a fully generative, joint density model of the input and output space. The second is a Bayesian biclustering model, which simultaneously organizes a data matrix into blockconstant biclusters. The model capable of efficiently processing very large, sparse matrices, enabling cluster analysis on incomplete data matrices. We introduce binary matrix factorization, a novel matrix factorization model that, in contrast to classic factorization methods, such as singular value decomposition, decomposes a matrix using
Bayesian Inference for αStable Mixtures
, 2008
"... The Gaussian model results unsatisfactory and reveals difficulties in fitting data with skewness, heavy tails and multimodality. The use of αstable distributions allows for modelling skewness and heavy tails but gives rise to inferential problems related to the estimation of the parameters of the d ..."
Abstract
 Add to MetaCart
(Show Context)
The Gaussian model results unsatisfactory and reveals difficulties in fitting data with skewness, heavy tails and multimodality. The use of αstable distributions allows for modelling skewness and heavy tails but gives rise to inferential problems related to the estimation of the parameters of the distributions. The aim of this work is to generalise the stable distribution framework by introducing a model that accounts also for multimodality. In particular we introduce a stable mixture model and a suitable reparameterisation of the mixture, which allows us to make inference on the parameters of the mixture. We use a full Bayesian approach and MCMC simulation techniques for the estimation of the posterior distribution. Some applications of stable mixtures to financial data are provided. Keywords: sampling.