Results 1  10
of
20
On testing the missing at random assumption
 In Proceedings of the 17th European Conference on Machine Learning (ECML2006
, 2006
"... Abstract. Most approaches to learning from incomplete data are based on the assumption that unobserved values are missing at random (mar). While the mar assumption, as such, is not testable, it can become testable in the context of other distributional assumptions, e.g. the naive Bayes assumption. I ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Most approaches to learning from incomplete data are based on the assumption that unobserved values are missing at random (mar). While the mar assumption, as such, is not testable, it can become testable in the context of other distributional assumptions, e.g. the naive Bayes assumption. In this paper we investigate a method for testing the mar assumption in the presence of other distributional constraints. We present methods to (approximately) compute a test statistic consisting of the ratio of two profile likelihood functions. This requires the optimization of the likelihood under no assumptions on the missingness mechanism, for which we use our recently proposed AI & M algorithm. We present experimental results on synthetic data that show that our approximate test statistic is a good indicator for whether data is mar relative to the given distributional assumptions. 1
The AI & M procedure for learning from incomplete data
 IN R. DECHTER AND T. RICHARDSON (EDS.), PROCEEDINGS OF THE TWENTYSECOND CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2006
, 2006
"... We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihoodbased methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimizing this profile likelihood poses two main difficulties ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihoodbased methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimizing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very highdimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and maximization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihoodbased inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random.
PRISM User’s Manual (Version 2.0.3)
"... The past several years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, proba ..."
Abstract
 Add to MetaCart
The past several years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, probabilistic reasoning mechanisms, and machine learning and data mining principles. Logicbased probabilistic learning has found its way into many application areas including bioinformatics, diagnosis and troubleshooting, stochastic language processing, information retrieval, linkage analysis and discovery, robot control, and probabilistic constraint solving. PRISM (PRogramming In Statistical Modeling) is a logicbased language that integrates logic programming and probabilistic reasoning including parameter learning. It allows for the description of independent probabilistic choices and their consequences in general logic programs. PRISM supports parameter learning, i.e. for a given set of possibly incomplete observed data, PRISM can estimate the probability distributions to best explain the data. This power is suitable for applications such as learning parameters of stochastic grammars, training stochastic models for gene sequence analysis, game record analysis, user modeling, and obtaining probabilistic information for tuning systems performance. PRISM offers incomparable flexibility compared with specific statistical tools such as hidden Markov models (HMMs) [4, 31], probabilistic context free grammars (PCFGs) [4] and discrete Bayesian networks. PRISM employs a prooftheoretic approach to learning. It conducts learning in two phases: the first phase searches for all the explanations for the observed data, and the second phase estimates the probability distributions by using the EM algorithm. Learning from flat explanations can be exponential in both space and time. To speed up learning, the authors proposed learning from explanation graphs and using tabling to reduce redundancy in the construction of explanation graphs. The PRISM programming system is implemented on top of BProlog
AN ALGORITHMIC AND A GEOMETRIC CHARACTERIZATION OF COARSENING AT RANDOM
"... We show that the class of conditional distributions satisfying the coarsening at random (CAR) property for discrete data has a simple and robust algorithmic description based on randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexi ..."
Abstract
 Add to MetaCart
(Show Context)
We show that the class of conditional distributions satisfying the coarsening at random (CAR) property for discrete data has a simple and robust algorithmic description based on randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexity of a given CAR mechanism can be large: the maximal “height ” of the needed multicovers can be exponential in the number of points in the sample space. The results stem from a geometric interpretation of the set of CAR distributions as a convex polytope and a characterization of its extreme points. The hierarchy of CAR models defined in this way could be useful in parsimonious statistical modeling of CAR mechanisms, though the results also raise doubts in applied work as to the meaningfulness of the CAR assumption in its full generality. 1. Introduction. In
PRISM User’s Manual (Version 2.0)
"... The past several years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, proba ..."
Abstract
 Add to MetaCart
The past several years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, probabilistic reasoning mechanisms, and machine learning and data mining principles. Logicbased probabilistic learning has found its way into many application areas including bioinformatics, diagnosis and troubleshooting, stochastic language processing, information retrieval, linkage analysis and discovery, robot control, and probabilistic constraint solving. PRISM (PRogramming In Statistical Modeling) is a logicbased language that integrates logic programming and probabilistic reasoning including parameter learning. It allows for the description of independent probabilistic choices and their consequences in general logic programs. PRISM supports parameter learning, i.e. for a given set of possibly incomplete observed data, PRISM can estimate the probability distributions to best explain the data. This power is suitable for applications such as learning parameters of stochastic grammars, training stochastic models for gene sequence analysis, game record analysis, user modeling, and obtaining probabilistic information for tuning systems performance. PRISM offers incomparable flexibility compared with specific statistical tools such as hidden Markov models (HMMs) [4, 30], probabilistic context free grammars (PCFGs) [4] and discrete Bayesian networks. PRISM employs a prooftheoretic approach to learning. It conducts learning in two phases: the first phase searches for all the explanations for the observed data, and the second phase estimates the probability distributions by using the EM algorithm. Learning from flat explanations can be exponential in both space and time. To speed up learning, the authors proposed learning from explanation graphs and using tabling to reduce redundancy in the construction of explanation graphs. The PRISM programming system is implemented on top of BProlog
PRISM User’s Manual (Version 2.0 beta 4)
"... The past several years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, proba ..."
Abstract
 Add to MetaCart
The past several years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, probabilistic reasoning mechanisms, and machine learning and data mining principles. Logicbased probabilistic learning has found its way into many application areas including bioinformatics, diagnosis and troubleshooting, stochastic language processing, information retrieval, linkage analysis and discovery, robot control, and probabilistic constraint solving. PRISM (PRogramming In Statistical Modeling) is a logicbased language that integrates logic programming and probabilistic reasoning including parameter learning. It allows for the description of independent probabilistic choices and their consequences in general logic programs. PRISM supports parameter learning, i.e. for a given set of possibly incomplete observed data, PRISM can estimate the probability distributions to best explain the data. This power is suitable for applications such as learning parameters of stochastic grammars, training stochastic models for gene sequence analysis, game record analysis, user modeling, and obtaining probabilistic information for tuning systems performance. PRISM offers incomparable flexibility compared with specific statistical tools such as hidden Markov models (HMMs) [4, 30], probabilistic context free grammars (PCFGs) [4] and discrete Bayesian networks. PRISM employs a prooftheoretic approach to learning. It conducts learning in two phases: the first phase searches for all the explanations for the observed data, and the second phase estimates the probability distributions by using the EM algorithm. Learning from flat explanations can be exponential in both space and time. To speed up learning, the authors proposed learning from explanation graphs and using tabling to reduce redundancy in the construction of explanation graphs. The PRISM programming system is implemented on top of BProlog
An Algorithmic and a Geometric . . .
, 2007
"... We show that the class of conditional distributions satisfying the coarsening at Random (CAR) property for discrete data has a simple and robust algorithmic description based on randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexi ..."
Abstract
 Add to MetaCart
We show that the class of conditional distributions satisfying the coarsening at Random (CAR) property for discrete data has a simple and robust algorithmic description based on randomized uniform multicovers: combinatorial objects generalizing the notion of partition of a set. However, the complexity of a given CAR mechanism can be large: the maximal “height” of the needed multicovers can be exponential in the number of points in the sample space. The results stem from a geometric interpretation of the set of CAR distributions as a convex polytope and a characterization of its extreme points. The hierarchy of CAR models defined in this way could be useful in parsimonious statistical modelling of CAR mechanisms, though the results also raise doubts in applied work as to the meaningfulness of the CAR assumption in its full generality.
An Algorithmic and a Geometric Characterization of . . .
, 2005
"... We show that the class of conditional distributions satisfying the Coarsening at Random (CAR) property has a simple algorithmic description based on randomized uniform multicovers, which are combinatorial objects generalizing the notion of partition of a set. The maximum needed height of the multico ..."
Abstract
 Add to MetaCart
We show that the class of conditional distributions satisfying the Coarsening at Random (CAR) property has a simple algorithmic description based on randomized uniform multicovers, which are combinatorial objects generalizing the notion of partition of a set. The maximum needed height of the multicovers is exponential in the number of points in the sample space. This algorithmic characterization stems from a geometric interpretation of the set of CAR distributions as a convex polytope and a characterization of its extreme points. The hierarchy of CAR models defined in this way can be useful in parsimonious statistical modelling of CAR mechanisms.
PRISM User’s Manual (Version 1.11.1)
"... The past few years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, probabili ..."
Abstract
 Add to MetaCart
The past few years have witnessed a tremendous interest in logicbased probabilistic learning as testified by the number of formalisms and systems and their applications. Logicbased probabilistic learning is a multidisciplinary research area that integrates relational or logic formalisms, probabilistic reasoning mechanisms, and machine learning and data mining principles. Logicbased probabilistic learning has found its way into many application areas including bioinformatics, diagnosis and troubleshooting, stochastic language processing, information retrieval, linkage analysis and discovery, robot control, and probabilistic constraint solving. PRISM (PRogramming In Statistical Modeling) is a logicbased language that integrates logic programming and probabilistic reasoning including parameter learning. It allows for the description of independent probabilistic choices and their consequences in general logic programs. PRISM supports parameter learning, i.e. for a given set of possibly incomplete observed data, PRISM can estimate the probability distributions to best explain the data. This power is suitable for applications such as learning parameters of stochastic grammars, training stochastic models for gene sequence analysis, game record analysis, user modeling, and obtaining probabilistic information for tuning systems performance. PRISM offers incomparable flexibility compared with specific statistical tools such as hidden Markov models (HMMs) [4, 25], probabilistic context free grammars (PCFGs) [4] and discrete Bayesian networks. PRISM employs a prooftheoretic approach to learning. It conducts learning in two phases: the first phase searches for all the explanations for the observed data, and the second phase estimates the probability distributions by using the EM algorithm. Learning from flat explanations can be exponential in both space and time. To speed up learning, the authors proposed learning from explanation graphs and using tabling to reduce redundancy in the construction of explanation graphs. The PRISM programming system is implemented on top of BProlog
In Proceedings of UAI06 1 The AI&M Procedure for Learning from Incomplete Data
"... We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihoodbased methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimizing this profile likelihood poses two main difficulties ..."
Abstract
 Add to MetaCart
(Show Context)
We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihoodbased methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimizing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very highdimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and maximization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihoodbased inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random. 1