Results 1  10
of
59
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 318 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
Unsupervised Learning from Dyadic Data
, 1998
"... Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event cooccurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applic ..."
Abstract

Cited by 116 (11 self)
 Add to MetaCart
(Show Context)
Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event cooccurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applications ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domainindependent framework for unsupervised learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures and unifies probabilistic modeling and structure discovery. Mixture models provide both, a parsimonious yet flexible parameterization of probability distributions with good generalization performance on sparse data, as well as structural information about datainherent grouping structure. We propose an annealed version of the standard Expectation Maximization algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
An experimental comparison of several clustering and intialization methods
, 1998
"... We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, a ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
(Show Context)
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, and modelbased hierarchical agglomerative clustering. We learn naiveBayes models with a hidden root node, using highdimensional discretevariable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality. 1
Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines
 in Advances in Neural Information Processing Systems 15
, 2002
"... We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotonically to the solution of the maximum margin hyperplane. The updates optimize the traditiona ..."
Abstract

Cited by 76 (7 self)
 Add to MetaCart
(Show Context)
We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotonically to the solution of the maximum margin hyperplane. The updates optimize the traditionally proposed objective function for SVMs. They do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They can be used to adjust all the quadratic programming variables in parallel with a guarantee of improvement at each iteration. We analyze the asymptotic convergence of the updates and show that the coefficients of nonsupport vectors decay geometrically to zero at a rate that depends on their margins. In practice, the updates converge very rapidly to good classifiers.
Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction
, 1998
"... We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum... ..."
Abstract

Cited by 76 (0 self)
 Add to MetaCart
We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum...
An experimental comparison of modelbased clustering methods
, 2001
"... Abstract. We compare the three basic algorithms for modelbased clustering on highdimensional discretevariable datasets. All three algorithms use the same underlying model: a naiveBayes model with a hidden root node, also known as a multinomialmixture model. In the first part of the paper, we per ..."
Abstract

Cited by 55 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We compare the three basic algorithms for modelbased clustering on highdimensional discretevariable datasets. All three algorithms use the same underlying model: a naiveBayes model with a hidden root node, also known as a multinomialmixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, and modelbased agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.
Adaptive Overrelaxed Bound Optimization Methods
 In Proceedings of International Conference on Machine Learning, ICML. International Conference on Machine Learning, ICML
, 2003
"... We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and NonNegative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizer ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
(Show Context)
We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and NonNegative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizers and identify analytic conditions under which they are expected to outperform the standard versions. Based on this analysis, we propose a novel, simple adaptive overrelaxed scheme for practical optimization and report empirical results on several synthetic and realworld data sets showing that these new adaptive methods exhibit superior performance (in certain cases by several orders of magnitude) compared to their traditional counterparts. Our "dropin" extensions are simple to implement, apply to a wide variety of algorithms, almost always give a substantial speedup, and do not require any theoretical analysis of the underlying algorithm.
Dynamic Bayesian Networks for Information Fusion with Applications to HumanComputer Interfaces
, 1999
"... Recent advances in various display and virtual technologies coupled with an explosion in available computing power have given rise to a numberofnovel humancomputer interaction (HCI) modalities  speech, visionbased gesture recognition, eye tracking, EEG, etc. However, despite the abundance of nov ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
Recent advances in various display and virtual technologies coupled with an explosion in available computing power have given rise to a numberofnovel humancomputer interaction (HCI) modalities  speech, visionbased gesture recognition, eye tracking, EEG, etc. However, despite the abundance of novel interaction devices, the naturalness and efficiency of HCI has remained low. This is due in particular to the lack of robust sensory data interpretation techniques. To deal with the task of interpreting single and multiple interaction modalities this dissertation establishes a novel probabilistic approach based on dynamic Bayesian networks (DBNs). As a generalization of the successful hidden Markov models, DBNs are a natural basis for the general temporal action interpretation task. The problem of interpretation of single or multiple interacting modalities can then be viewed as a Bayesian inference task. In this work three complex DBN models are introduced: mixtures of DBNs, mixedstate DBNs, and coupled HMMs. Indepth study of these models yields efficient approximate inference and parameter learning techniques applicable to a wide variety of problems. Experimental validation of the proposed approaches in the domains of gesture and speech recognition con rms the model's applicability to both unimodal and multimodal interpretation tasks.
Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
, 2002
"... We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local an ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
(Show Context)
We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and nonlocal variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Basic Principles of Learning Bayesian Logic Programs
 Institute for Computer Science, University of Freiburg
, 2002
"... Bayesian logic programs tightly integrate definite logic programs with Bayesian networks in order to... In this paper, we present results on combining Inductive Logic Programming with Bayesian networks to learn both the qualitative and the quantitative components of Bayesian logic programs from data ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Bayesian logic programs tightly integrate definite logic programs with Bayesian networks in order to... In this paper, we present results on combining Inductive Logic Programming with Bayesian networks to learn both the qualitative and the quantitative components of Bayesian logic programs from data. More precisely, we show how the qualitative components can be learned by combining the inductive logic programming setting learning from interpretations with scorebased techniques for learning Bayesian networks. The estimation of the quantitative components is reduced to the corresponding problem of (dynamic) Bayesian networks