Results 1  10
of
15
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure
"... Abstract — It is wellknown that the information bottleneck method and rate distortion theory are related. Here it is described how the information bottleneck can be considered as rate distortion theory for a family of probability measures where information divergence is used as distortion measure. ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Abstract — It is wellknown that the information bottleneck method and rate distortion theory are related. Here it is described how the information bottleneck can be considered as rate distortion theory for a family of probability measures where information divergence is used as distortion measure. It is shown that the information bottleneck method has some properties that are not shared with rate distortion theory based on any other divergence measure. In this sense the information bottleneck method is unique. I.
The “ideal parent” structure learning for continuous variable networks
 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence
, 2004
"... In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hi ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hidden variables. We present a general method for significantly speeding the structure search algorithm for continuous variable networks with common parametric distributions. Importantly, our method facilitates the addition of new hidden variables into the network structure efficiently. We demonstrate the method on several data sets, both for learning structure on fully observable data, and for introducing new hidden variables during structure search. 1
Better informed training of latent syntactic features
 In Proc. of EMNLP
, 2006
"... We study unsupervised methods for learning refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example splitNP without supervision into NP[0] andNP[1], which behave differently. We first propose to learn a PCFG that adds such features to n ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We study unsupervised methods for learning refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example splitNP without supervision into NP[0] andNP[1], which behave differently. We first propose to learn a PCFG that adds such features to nonterminals in such a way that they respect patterns of linguistic feature passing: each node’s nonterminal features are either identical to, or independent of, those of its parent. This linguistic constraint reduces runtime and the number of parameters to be learned. However, it did not yield improvements when training on the Penn Treebank. An orthogonal strategy was more successful: to improve the performance of the EM learner by treebank preprocessing and by annealing methods that split nonterminals selectively. Using these methods, we can maintain high parsing accuracy while dramatically reducing the model size. 1
Learning Bayesian Network Parameters Under Incomplete Data with Domain Knowledge
"... Bayesian networks have gained increasing attention in recent years. One key issue in Bayesian networks (BNs) is parameter learning. When training data is incomplete or sparse or when multiple hidden nodes exist, learning parameters in Bayesian networks (BNs) becomes extremely difficult. Under these ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Bayesian networks have gained increasing attention in recent years. One key issue in Bayesian networks (BNs) is parameter learning. When training data is incomplete or sparse or when multiple hidden nodes exist, learning parameters in Bayesian networks (BNs) becomes extremely difficult. Under these circumstances, the learning algorithms are required to operate in a highdimensional search space and they could easily get trapped among copious local maxima. This paper presents a learning algorithm to incorporate domain knowledge into the learning to regularize the otherwise illposed problem, to limit the search space, and to avoid local optima. Unlike the conventional approaches that typically exploit the quantitative domain knowledge such as prior probability distribution, our method systematically incorporates qualitative constraints on some of the parameters into the learning process. Specifically, the problem is formulated as a constrained optimization problem, where an objective function is defined as a combination of the likelihood function and penalty functions constructed from the qualitative domain knowledge. Then, a gradientdescent procedure is systematically integrated with the Estep and Mstep of the EM algorithm, to estimate the parameters iterativelyuntil it converges. The experiments with both synthetic data and real data for facial action recognition show 2 our algorithm improves the accuracy of the learned BN parameters significantly over the conventional EM algorithm. I.
Learning Bayesian Networks with Qualitative Constraints
 Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition
, 2008
"... Graphical models such as Bayesian Networks (BNs) are being increasingly applied to various computer vision problems. One bottleneck in using BN is that learning the BN model parameters often requires a large amount of reliable and representative training data, which proves to be difficult to acquire ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Graphical models such as Bayesian Networks (BNs) are being increasingly applied to various computer vision problems. One bottleneck in using BN is that learning the BN model parameters often requires a large amount of reliable and representative training data, which proves to be difficult to acquire for many computer vision tasks. On the other hand, there is often available qualitative prior knowledge about the model. Such knowledge comes either from domain experts based on their experience or from various physical or geometric constraints that govern the objects we try to model. Unlike the quantitative prior, the qualitative prior is often ignored due to the difficulty of incorporating them into the model learning process. In this paper, we introduce a closedform solution to systematically combine the limited training data with some generic qualitative knowledge for BN parameter learning. To validate our method, we compare it with the Maximum Likelihood (ML) estimation method under sparse data and with the Expectation Maximization (EM) algorithm under incomplete data respectively. To further demonstrate its applications for computer vision, we apply it to learn a BN model for facial Action Unit (AU) recognition from real image data. The experimental results show that with simple and generic qualitative constraints and using only a small amount of training data, our method can robustly and accurately estimate the BN model parameters. 1.
Generalization from Observed to Unobserved Features by Clustering
"... We argue that when objects are characterized by many attributes, clustering them on the basis of a random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a ran ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We argue that when objects are characterized by many attributes, clustering them on the basis of a random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. We use our framework to analyze generalization to unobserved features of two wellknown clustering algorithms: kmeans and the maximum likelihood multinomial mixture model. The scheme is demonstrated for collaborative filtering of users with movie ratings as attributes and document clustering with words as attributes.
Efficient Relational Learning with Hidden Variable Detection
"... Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this flexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted clas ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this flexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted class of relational MNs (RMNs) called relation treebased RMN (treeRMN), and an efficient Hidden Variable Detection algorithm called Contrastive Variable Induction (CVI). On one hand, the restricted treeRMN only considers simple (e.g., unary and pairwise) features in relational data and thus achieves computational efficiency; and on the other hand, the CVI algorithm efficiently detects hidden variables which can capture long range dependencies. Therefore, the resultant approach is highly efficient yet does not sacrifice its expressive power. Empirical results on four real datasets show that the proposed relational learning method can achieve similar prediction quality as the stateoftheart approaches, but is significantly more efficient in training; and the induced hidden variables are semantically meaningful and crucial to improve the training speed and prediction qualities of treeRMNs. 1
400 NACHMAN ET AL. UAI 2004 “Ideal Parent ” Structure Learning for Continuous Variable Networks
"... In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hi ..."
Abstract
 Add to MetaCart
In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hidden variables. We present a general method for significantly speeding the structure search algorithm for continuous variable networks with common parametric distributions. Importantly, our method facilitates the addition of new hidden variables into the network structure efficiently. We demonstrate the method on several data sets, both for learning structure on fully observable data, and for introducing new hidden variables during structure search. 1
Learning with Multiple Views Proposal for an ICML Workshop
"... We propose to have a workshop on multiview learning at the TwentySecond International Conference on Machine Learning. Two main reasons lead us to the conclusion that the community would benefit from such a workshop. – Multiview learning is a natural, yet nonstandard new problem setting; in ..."
Abstract
 Add to MetaCart
We propose to have a workshop on multiview learning at the TwentySecond International Conference on Machine Learning. Two main reasons lead us to the conclusion that the community would benefit from such a workshop. – Multiview learning is a natural, yet nonstandard new problem setting; in
Exploiting Qualitative Domain Knowledge for Learning Bayesian Network Parameters with Incomplete Data
"... When a large amount of data are missing, or when multiple hidden nodes exist, learning parameters in Bayesian networks (BNs) becomes extremely difficult. This paper presents a learning algorithm to incorporate qualitative domain knowledge to regularize the otherwise illposed problem, limit the sear ..."
Abstract
 Add to MetaCart
When a large amount of data are missing, or when multiple hidden nodes exist, learning parameters in Bayesian networks (BNs) becomes extremely difficult. This paper presents a learning algorithm to incorporate qualitative domain knowledge to regularize the otherwise illposed problem, limit the search space, and avoid local optima. Specifically, the problem is formulated as a constrained optimization problem, where an objective function is defined as a combination of the likelihood function and penalty functions constructed from the qualitative domain knowledge. Then, a gradientdescent procedure is systematically integrated with the Estep and Mstep of the EM algorithm, to estimate the parameters iteratively until it converges. The experiments show our algorithm improves the accuracy of the learned BN parameters significantly over the conventional EM algorithm. 1