## Learning the dimensionality of hidden variables (2001)

### Cached

### Download Links

- [www.robotics.stanford.edu]
- [pluto.huji.ac.il]
- [www.cs.huji.ac.il]
- [www.cs.huji.ac.il]
- DBLP

### Other Repositories/Bibliography

Venue: | In UAI ’01 |

Citations: | 24 - 3 self |

### BibTeX

@INPROCEEDINGS{Elidan01learningthe,

author = {Gal Elidan and Nir Friedman},

title = {Learning the dimensionality of hidden variables},

booktitle = {In UAI ’01},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize better and have better structure than previous approaches. 1

### Citations

8086 | Maximum likelihood from incomplete data
- Dempster, Laird, et al.
(Show Context)
Citation Context ...der to learn parameters for a given network structure, we can use the Expectation Maximization (EM) algorithm to search for a (local) maximum likelihood (or maximum a posteriori) parameter assignment =-=[7, 16]-=-. In the presence of incomplete data, scoring candidate structures is more complex. We cannot efficiently evaluate the marginal likelihood and need to resort to approximations. A commonly used approxi... |

3919 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...n of it) significantly faster. We now suggest an approach that works with hard assignments to the states of the hidden variables. This approach is motivated by agglomerative clustering methods (e.g., =-=[8]-=-) and Bayesian model merging techniques from the HMM literature [19]. The general outline of the approach is as follows. At each iteration we maintain a hard to` assignment in the training data. We ca... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...ks with respect to the training data, and then to search for the best network according to this score. A commonly used scoring function to learn Bayesian networks is the Bayesian scoring (BDe) metric =-=[14]-=- which we denote by ScoreBDe. This scoring metric uses a balance between the likelihood gain of the learned model and the complexity of the network structure representation. An important characteristi... |

854 | A tutorial on learning with bayesian networks
- Heckerman
- 1995
(Show Context)
Citation Context ...ter and have better structure than previous approaches. 1 Introduction In the last decade there has been a great deal of research focused on the problem of learning Bayesian networks from data (e.g., =-=[13]-=-). An important issue is the existence of hidden (latent) variables that are never observed, yet interact with observed variables. Hidden variables often play an important role in improving the qualit... |

510 | Learning probabilistic relational models
- Getoor, Friedman, et al.
- 2001
(Show Context)
Citation Context ...s for the discovery of hidden variable need to be developed. Another direction is to extend the methods for learning hidden structure in more expressive models such as Probabilistic Relational Models =-=[12]-=-. Acknowledgements We thank Noam Lotner and Iftach Nachman for comments on earlier drafts of this paper. This work was supported in part by Israel Science Foundation grant number 224/991. Nir Friedman... |

438 | Newsweeder: Learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...e FindHidden algorithm with and without agglomeration on synthetic and real-life data. Base line is the performance of the Original network given as an input to FindHidden messages from 20 newsgroups =-=[15]-=-. We represent each message as a vector containing one attribute for the newsgroup and attributes for each word in the vocabulary. We removed common stop words, and then sorted words based on their fr... |

240 |
AutoClass: A Bayesian classification system
- Cheeseman, Kelly, et al.
- 1988
(Show Context)
Citation Context ...oring candidate structures is more complex. We cannot efficiently evaluate the marginal likelihood and need to resort to approximations. A commonly used approximation is the CheesemanStutz (CS) score =-=[5, 6]-=-, which combines the likelihoods of the parameters found by EM, with an estimate of the penalty term associated with structure. The structural EM algorithm of Friedman [10] extends the idea of EM to t... |

220 | The Bayesian Structural EM Algorithm - Friedman - 1998 |

216 |
The EM algorithm for graphical association models with missing data’, Computational Statistics and Analysis
- Lauritzen
- 1995
(Show Context)
Citation Context ...der to learn parameters for a given network structure, we can use the Expectation Maximization (EM) algorithm to search for a (local) maximum likelihood (or maximum a posteriori) parameter assignment =-=[7, 16]-=-. In the presence of incomplete data, scoring candidate structures is more complex. We cannot efficiently evaluate the marginal likelihood and need to resort to approximations. A commonly used approxi... |

175 | Efficient approximations for the marginal likelihood of bayesian networks with hidden variables
- Chickering, Heckerman
- 1997
(Show Context)
Citation Context ...oring candidate structures is more complex. We cannot efficiently evaluate the marginal likelihood and need to resort to approximations. A commonly used approximation is the CheesemanStutz (CS) score =-=[5, 6]-=-, which combines the likelihoods of the parameters found by EM, with an estimate of the penalty term associated with structure. The structural EM algorithm of Friedman [10] extends the idea of EM to t... |

134 | Hidden markov model induction by bayesian model merging
- Stolcke, Omohundro
- 1993
(Show Context)
Citation Context ...s with hard assignments to the states of the hidden variables. This approach is motivated by agglomerative clustering methods (e.g., [8]) and Bayesian model merging techniques from the HMM literature =-=[19]-=-. The general outline of the approach is as follows. At each iteration we maintain a hard to` assignment in the training data. We can represent this assignment as a mapping from " , to the set O ` . T... |

129 | Inducing probabilistic grammars by Bayesian model merging
- Stolcke, Omohundro
- 1994
(Show Context)
Citation Context ...etization. Their approach to discretizing multiple interacting variables is also similar to ours. In the context of learning hidden variables, the most relevant are the works of Stolcke and Omohundro =-=[19, 20]-=-. In these works, they learn hidden Markov models and probabilistic grammar by performing a bottom up stateagglomeration. Similar to our method, they start by spanning all possible states and then ite... |

64 | Discretizing Continuous Attributes While Learning Bayesian Networks
- Friedman, Goldszmidt
- 1996
(Show Context)
Citation Context ...now describe a simple heuristic approach that attempts to approximate the cardinality assignment for multiple variables. The ideas are motivated by a similar approach to multi-variable discretization =-=[11]-=-. The basic idea is to apply the agglomerative procedure of the previous section in a round-robin fashion. At each iteration, we fix the number of states and the state assignment to instances for all ... |

53 |
Comparative genomics of BCG vaccines by whole-genome DNA microarray
- Behr, Wilson, et al.
- 1999
(Show Context)
Citation Context ...3]: a dataset that traces the daily change of 20 major US technology stocks for several years (1516 trading days). These states were discretized to three categories: “up”, “no change”, and “down”. TB =-=[1]-=-: a dataset that records information about 2302 tuberculosis patients in the San Francisco county (courtesy of Dr. Peter Small, Stanford Medical Center). The data set contains demographic information ... |

40 | Discovering hidden variables: A structurebased approach
- Elidan, Lotner, et al.
- 2000
(Show Context)
Citation Context ...vocations of the single-variable procedure to learn the interactions between several hidden variable. Finally, we combine our method with the structural detection of hidden variables of Elidan et al. =-=[9]-=- and show that this leads to learning better performing models, on test and real-life data. 2 Background 2.1 Learning Bayesian Networks ! Consider a finite set of discrete random variables where each ... |

37 | State-Space Abstraction for Anytime Evaluation of Probabilistic Networks
- Wellman, Liu
- 1994
(Show Context)
Citation Context ...owed improved performance as well as more appealing structures. Several works are related to our approach. Several authors examined operations of value abstraction and refinement in Bayesian networks =-=[4, 18, 17, 21]-=-. These works were mostly concerned with the ramifications of these operations on inference and decision making. Decisions about cardinality also appear in the context of discretization. Although the ... |

25 |
The ALARM monitoring system
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context .... We start by evaluating how well our algorithm determines variable cardinality in synthetic datasets where we know the cardinality of the variable we hid. We sampled instances from the Alarm network =-=[2]-=-, and manually hid a variable from the dataset. We then gave our algorithm the original network and evaluated its ability to reconstruct the variable’s cardinality. Figure 3 shows a typical behavior o... |

12 |
Refinement and Coarsening of Bayesian Networks
- Chang, Fung
- 1990
(Show Context)
Citation Context ...owed improved performance as well as more appealing structures. Several works are related to our approach. Several authors examined operations of value abstraction and refinement in Bayesian networks =-=[4, 18, 17, 21]-=-. These works were mostly concerned with the ramifications of these operations on inference and decision making. Decisions about cardinality also appear in the context of discretization. Although the ... |

12 | Reasoning about the value of decision-model refinement: Methods and application
- Poh, Horvitz
- 1993
(Show Context)
Citation Context ...owed improved performance as well as more appealing structures. Several works are related to our approach. Several authors examined operations of value abstraction and refinement in Bayesian networks =-=[4, 18, 17, 21]-=-. These works were mostly concerned with the ramifications of these operations on inference and decision making. Decisions about cardinality also appear in the context of discretization. Although the ... |

2 |
Dynamic construction and refinement of utility based categorization models
- Poh, Fehling, et al.
- 1994
(Show Context)
Citation Context |