Results 1  10
of
17
Matrix exponentiated gradient updates for online learning and Bregman projections
 Journal of Machine Learning Research
, 2005
"... We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that ..."
Abstract

Cited by 75 (12 self)
 Add to MetaCart
We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: Online learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the nondiagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements. 1
Rotation Invariant Texture Characterization and Retrieval using Steerable Waveletdomain Hidden Markov Models
"... A new statistical model for characterizing texture images based on waveletdomain hidden Markov models and steerable pyramids is presented. The new model is shown to capture well both the subband marginal distributions and the dependencies across scales and orientations of the wavelet descriptors. O ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
(Show Context)
A new statistical model for characterizing texture images based on waveletdomain hidden Markov models and steerable pyramids is presented. The new model is shown to capture well both the subband marginal distributions and the dependencies across scales and orientations of the wavelet descriptors. Once it is trained for an input texture image, the model can be easily steered to characterize that texture at any other orientation. After a diagonalization operation, one obtains a rotationinvariant model of the texture image. The effectiveness of the new texture models are demonstrated in retrieval experiments with large image databases, where significant performance gains are shown. Keywords texture characterization, image retrieval, rotation invariance, wavelets, hidden Markov models, steerable pyramids. Corresponding author. Address: see above; Phone: +41 21 693 7663; Fax: +41 21 693 4312. y Also with Department of EECS, UC Berkeley, Berkeley CA 94720, USA. April 23, 2001 DRAFT I.
Differential Entropic Clustering of Multivariate Gaussians
 Adv. in Neural Inf. Proc. Sys. (NIPS
, 2006
"... Gaussian data is pervasive and many learning algorithms (e.g., kmeans) model their inputs as a single sample drawn from a multivariate Gaussian. However, in many reallife settings, each input object is best described by multiple samples drawn from a multivariate Gaussian. Such data can arise, for ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
(Show Context)
Gaussian data is pervasive and many learning algorithms (e.g., kmeans) model their inputs as a single sample drawn from a multivariate Gaussian. However, in many reallife settings, each input object is best described by multiple samples drawn from a multivariate Gaussian. Such data can arise, for example, in a movie review database where each movie is rated by several users, or in timeseries domains such as sensor networks. Here, each input can be naturally described by both a mean vector and covariance matrix which parameterize the Gaussian distribution. In this paper, we consider the problem of clustering such input objects, each represented as a multivariate Gaussian. We formulate the problem using an information theoretic approach and draw several interesting theoretical connections to Bregman divergences and also Bregman matrix divergences. We evaluate our method across several domains, including synthetic data, sensor network data, and a statistical debugging application. 1
A Distance Measure Between GMMs Based on the Unscented Transform and its Application to Speaker Recognition
 in Proc. of Interspeech, 2005
, 2005
"... This paper proposes a dissimilarity measure between two Gaussian mixture models (GMM). Computing a distance measure between two GMMs that were learned from speech segments is a key element in speaker verification, speaker segmentation and many other related applications. A natural measure between tw ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a dissimilarity measure between two Gaussian mixture models (GMM). Computing a distance measure between two GMMs that were learned from speech segments is a key element in speaker verification, speaker segmentation and many other related applications. A natural measure between two distributions is the KullbackLeibler divergence. However, it cannot be analytically computed in the case of GMM. We propose an accurate and efficiently computed approximation of the KLdivergence. The method is based on the unscented transform which is usually used to obtain a better alternative to the extended Kalman filter. The suggested distance is evaluated in an experimental setup of speakers dataset. The experimental results indicate that our proposed approximations outperform previously suggested methods. 1.
Optimal Power Allocation for Distributed Detection in Wireless Sensor Networks
"... Abstract — In distributed detection systems with wireless sensor networks, communication between sensors and a fusion center is not perfect due to interference and limited communication power of the sensors to combat noise. The problem of optimizing detection performance with imperfect communication ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Abstract — In distributed detection systems with wireless sensor networks, communication between sensors and a fusion center is not perfect due to interference and limited communication power of the sensors to combat noise. The problem of optimizing detection performance with imperfect communication between the sensors and the fusion center over wireless channels brings a new challenge to distributed detection. In this paper, a distributed detection system infrastructure is provided, and a multiaccess channel model is included to account for imperfect communication between the sensors and the fusion center. The Jdivergence between the distributions of the detection statistic under different hypotheses is used as a performance criterion in order to provide a tractable analysis. Optimizing the performance (in terms of the Jdivergence) under a total communication power constraint on the sensors is studied, and the corresponding optimal power allocation scheme is provided. It is interesting to see that, for the case with orthogonal channels, the power allocation can be solved by a weighted waterfilling algorithm. Numerical results are used to illustrate the solution. Index Terms — Distributed detection, wireless sensor networks, multiaccess channel, power allocation I.
Information Theoretic Novelty Detection
, 2009
"... We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate.
Online parameter estimation and runtorun process adjustment using categorical observations
, 2009
"... Categorical observations are frequently observed in runtorun processes where obtaining accurate measurements of quality characteristics is difficult. In such circumstances, the use of categorical observations to estimate a process model and generate an adjustment recipe becomes inevitable. However ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Categorical observations are frequently observed in runtorun processes where obtaining accurate measurements of quality characteristics is difficult. In such circumstances, the use of categorical observations to estimate a process model and generate an adjustment recipe becomes inevitable. However, most conventional runtorun controllers cannot be applied if no continuous observations are available; some parameter estimation methods that can handle categorical data only use historical dataset in an offline manner. In practice, it is common to see observations collected following a time sequence in a runtorun process. Taking the lapping process in semiconductor manufacturing as an example, this paper develops an online approach for parameters estimation and runtorun process adjustment using categorical observations. The proposed method optimises a penalised Maximum Likelihood (ML) function and updates parameters step by step when new categorical observations become available. A control strategy is also derived to generate receipts for process update between runs. The computational results of performance evaluation show that the proposed method is capable of estimating unknown parameters and control output quality online when initial bias exists.
Upper bound KullbackLeibler divergence for hidden Markov models with application as discrimination measure for speech recognition
 in Proceedings of the IEEE International Symposium on Information Theory
, 2006
"... Abstract — This paper presents a criterion for defining an upper bound KullbackLeibler divergence (UBKLD) for Gaussian mixtures models (GMMs). An information theoretic interpretation of this indicator and an algorithm for calculating it based on similarity alignment between mixture components of t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract — This paper presents a criterion for defining an upper bound KullbackLeibler divergence (UBKLD) for Gaussian mixtures models (GMMs). An information theoretic interpretation of this indicator and an algorithm for calculating it based on similarity alignment between mixture components of the models are proposed. This bound is used to characterize an upper bound closedform expression for the KullbackLeibler divergence (KLD) for lefttoright transient hidden Markov models (HMMs), where experiments based on real speech data show that this indicator precisely follows the discrimination tendency of the actual KLD. I.
A joint appearancespatial distance for kernelbased image categorization
 in Proc. IEEE Conf. Computer Vision and Pattern Recognition
, 2008
"... The goal of image categorization is to classify a collection of unlabeled images into a set of predefined classes to support semanticlevel image retrieval. The distance measures used in most existing approaches either ignored the spatial structures or used them in a separate step. As a result, t ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
The goal of image categorization is to classify a collection of unlabeled images into a set of predefined classes to support semanticlevel image retrieval. The distance measures used in most existing approaches either ignored the spatial structures or used them in a separate step. As a result, these distance measures achieved only limited success. To address these difficulties, in this paper, we propose a new distance measure that integrates joint appearancespatial image features. Such a distance measure is computed as an upper bound of an informationtheoretic discrimination, and can be computed efficiently in a recursive formulation that scales well to image size. In addition, the upper bound approximation can be further tightened via adaption learning from a universal reference model. Extensive experiments on two widelyused data sets show that the proposed approach significantly outperforms the stateoftheart approaches. 1.
AutoRegressive HMM Inference with Incomplete Data for ShortHorizon Wind Forecasting
"... Accurate shortterm wind forecasts (STWFs), with time horizons from 0.5 to 6 hours, are essential for efficient integration of wind power to the electrical power grid. Physical models based on numerical weather predictions are currently not competitive, and research on machine learning approaches is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Accurate shortterm wind forecasts (STWFs), with time horizons from 0.5 to 6 hours, are essential for efficient integration of wind power to the electrical power grid. Physical models based on numerical weather predictions are currently not competitive, and research on machine learning approaches is ongoing. Two major challenges confronting these efforts are missing observations and weatherregime induced dependency shifts among wind variables. In this paper we introduce approaches that address both of these challenges. We describe a new regimeaware approach to STWF that use autoregressive hidden Markov models (ARHMM), a subclass of conditional linear Gaussian (CLG) models. Although ARHMMs are a natural representation for weather regimes, as with CLG models in general, exact inference is NPhard when observations are missing (Lerner and Parr, 2001). We introduce a simple approximate inference method for ARHMMs, which we believe has applications in other problem domains. In an empirical evaluation on publicly available wind data from two geographically distinct regions, our approach makes significantly more accurate predictions than baseline models, and uncovers meteorologically relevant regimes. 1