## Supervised learning from incomplete data via an EM approach (1994)

Venue: | Advances in Neural Information Processing Systems 6 |

Citations: | 199 - 2 self |

### BibTeX

@INPROCEEDINGS{Ghahramani94supervisedlearning,

author = {Zoubin Ghahramani and Michael I. Jordan},

title = {Supervised learning from incomplete data via an EM approach},

booktitle = {Advances in Neural Information Processing Systems 6},

year = {1994},

pages = {120--127},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data sets. We use mixture models for the density estimates and make two distinct appeals to the ExpectationMaximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm---EM is used both for the estimation of mixture components and for coping with missing data. The resulting algorithm is applicable to a wide range of supervised as well as unsupervised learning problems. Results from a classification benchmark---the iris data set---are presented. 1 Introduction Adaptive systems generally operate in environments that are fraught with imperfections; nonetheless they must cope with these imperfections and learn to extract as much relevant information as needed for their particular goals. One form of imperfection is incompleteness in sensing information. Inc...

### Citations

9486 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...on maximum likelihood density estimation for learning from such data sets. We use mixture models for the density estimates and make two distinct appeals to the ExpectationMaximization (EM) principle (=-=Dempster et al., 1977-=-) in deriving a learning algorithm---EM is used both for the estimation of mixture components and for coping with missing data. The resulting algorithm is applicable to a wide range of supervised as w... |

4659 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ture (Jacobs et al., 1991; Jordan and Jacobs, 1994). This architecture is a parametric regression model with a modular structure similar to the nonparametric decision tree and adaptive spline models (=-=Breiman et al., 1984-=-; Friedman, 1991). The approach presented here differs from these regression-based approaches in that the goal of learning is to estimate the density of the data. No distinction is made between input ... |

4274 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...e both kinds of missing data. 2 Density estimation using EM This section outlines the basic learning algorithm for finding the maximum likelihood parameters of a mixture model (Dempster et al., 1977; =-=Duda and Hart, 1973-=-; Nowlan, 1991). We assume that the data X = fx 1 ; : : : ; xN g are generated independently from a mixture density P (x i ) = M X j=1 P (x i j! j ; ` j )P (! j ); (1) where each component of the mixt... |

1776 |
Statistical Analysis With Missing Data
- RJA, DB
- 2002
(Show Context)
Citation Context ...om incomplete data In the previous section we presented one aspect of the EM algorithm: learning mixture models. Another important application of EM is to learning from data sets with missing values (=-=Little and Rubin, 1987-=-; Dempster et al., 1977). This application has been pursued in the statistics literature for non-mixture density estimation problems; in this paper we combine this application of EM with that of learn... |

869 |
Adaptive mixture of local experts
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ...thods with certain of the analytic advantages of parametric methods. Mixture models have been utilized recently for supervised learning problems in the form of the "mixtures of experts" arch=-=itecture (Jacobs et al., 1991-=-; Jordan and Jacobs, 1994). This architecture is a parametric regression model with a modular structure similar to the nonparametric decision tree and adaptive spline models (Breiman et al., 1984; Fri... |

776 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ... the analytic advantages of parametric methods. Mixture models have been utilized recently for supervised learning problems in the form of the "mixtures of experts" architecture (Jacobs et a=-=l., 1991; Jordan and Jacobs, 1994-=-). This architecture is a parametric regression model with a modular structure similar to the nonparametric decision tree and adaptive spline models (Breiman et al., 1984; Friedman, 1991). The approac... |

531 |
Mixture Models: Inference and Applications to Clustering
- McLachlan, Basford
- 1988
(Show Context)
Citation Context .... A possible disadvantage of parametric methods is their lack of flexibility when compared with nonparametric methods. This problem, however, can be largely circumvented by the use of mixture models (=-=McLachlan and Basford, 1988-=-). Mixture models combine much of the flexibility of nonparametric methods with certain of the analytic advantages of parametric methods. Mixture models have been utilized recently for supervised lear... |

193 | A general regression neural network - Specht - 1991 |

141 |
Multivariate Adaptive Regression Splines.” Annals of Statistics 19:1–141
- Friedman
- 1991
(Show Context)
Citation Context ...991; Jordan and Jacobs, 1994). This architecture is a parametric regression model with a modular structure similar to the nonparametric decision tree and adaptive spline models (Breiman et al., 1984; =-=Friedman, 1991-=-). The approach presented here differs from these regression-based approaches in that the goal of learning is to estimate the density of the data. No distinction is made between input and output varia... |

85 |
Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures
- Nowlan
- 1991
(Show Context)
Citation Context ...ng data. 2 Density estimation using EM This section outlines the basic learning algorithm for finding the maximum likelihood parameters of a mixture model (Dempster et al., 1977; Duda and Hart, 1973; =-=Nowlan, 1991-=-). We assume that the data X = fx 1 ; : : : ; xN g are generated independently from a mixture density P (x i ) = M X j=1 P (x i j! j ; ` j )P (! j ); (1) where each component of the mixture is denoted... |

18 | Solving inverse problems using an EM approach to density estimation
- Ghahramani
- 1993
(Show Context)
Citation Context ...the variables. Density estimation is fundamentally more general than function approximation and this generality is needed for a large class of learning problems arising from inverting causal systems (=-=Ghahramani, 1994-=-). These problems cannot be solved easily by traditional function approximation techniques since the data is not generated from noisy samples of a function, but rather of a relation. Acknowledgements ... |