Results 11 - 20
of
144
Pseudo Likelihood Estimation in Network Tomography
, 2003
"... Network monitoring and diagnosis are key to improving network performance. The difficulties of performance monitoring lie in today's fast growing Internet, accompanied by increasingly heterogeneous and unregulated structures. Moreover, these tasks become even harder since one cannot rely on the coll ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
Network monitoring and diagnosis are key to improving network performance. The difficulties of performance monitoring lie in today's fast growing Internet, accompanied by increasingly heterogeneous and unregulated structures. Moreover, these tasks become even harder since one cannot rely on the collaboration of individual routers and servers to directly measure network traffic. Even though the aggregatory nature of possible network measurements gives rise to inverse problems, existing methods for solving inverse problems are usually computationally intractable or statistically inefficient.
Thin Junction Trees
- Advances in Neural Information Processing Systems 14
, 2001
"... We present an algorithm that induces a class of models with thin junction trees---models that are characterized by an upper bound on the size of the maximal cliques of their triangulated graph. By ensuring that the junction tree is thin, inference in our models remains tractable throughout the l ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
We present an algorithm that induces a class of models with thin junction trees---models that are characterized by an upper bound on the size of the maximal cliques of their triangulated graph. By ensuring that the junction tree is thin, inference in our models remains tractable throughout the learning process. This allows both an efficient implementation of an iterative scaling parameter estimation algorithm and also ensures that inference can be performed efficiently with the final model. We illustrate the approach with applications in handwritten digit recognition and DNA splice site detection.
Expectation maximization and posterior constraints
- In Advances in NIPS
, 2007
"... The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables th ..."
Abstract
-
Cited by 33 (11 self)
- Add to MetaCart
The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1
A Bayesian Network Approach to Ontology Mapping
- In: Proceedings ISWC 2005
, 2005
"... Abstract. This paper presents our ongoing effort on developing a principled methodology for automatic ontology mapping based on BayesOWL, a probabilistic framework we developed for modeling uncertainty in semantic web. In this approach, the source and target ontologies are first translated into Baye ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Abstract. This paper presents our ongoing effort on developing a principled methodology for automatic ontology mapping based on BayesOWL, a probabilistic framework we developed for modeling uncertainty in semantic web. In this approach, the source and target ontologies are first translated into Bayesian networks (BN); the concept mapping between the two ontologies are treated as evidential reasoning between the two translated BNs. Probabilities needed for constructing conditional probability tables (CPT) during translation and for measuring semantic similarity during mapping are learned using text classification techniques where each concept in an ontology is associated with a set of semantically relevant text documents, which are obtained by ontology guided web mining. The basic ideas of this approach are validated by positive results from computer experiments on two small real-world ontologies. 1
On Capacities of Quantum Channels
, 1997
"... Capacities of quantum mechanical channels are dened in terms of mutual information quantities. Geometry of the relative entropy is used to express capacity as a divergence radius. The symmetric quantum spin 1=2 channel and the attenuation channel of Boson elds are discussed as examples. 1. Introduct ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Capacities of quantum mechanical channels are dened in terms of mutual information quantities. Geometry of the relative entropy is used to express capacity as a divergence radius. The symmetric quantum spin 1=2 channel and the attenuation channel of Boson elds are discussed as examples. 1. Introduction. A discrete communication system { as modeled by Shannon { is capable of transmitting succesively symbols of a nite input alphabet fx 1 ; x 2 ; : : : ; xm g. In the stochastic approach to the communication model it is assumed that the input symbols show up with certain probability. Let p ji be the probability that a symbol x i is sent over the channel and the output symbol y j appears at the destination. The joint distribution p ji yields marginal distributions (p 1 ; p 2 ; : : : ; p m ) and (q 1 ; q 2 ; : : : ; q k ) on the set of input symbols and output symbols, respectively. Shannon introduced the mutual information I = X i;j p ji log p ji p i q j (1:1) in order to measur...
Sufficient Dimensionality Reduction
- Journal of Machine Learning Research
, 2003
"... Dimensionality reduction of empirical co-occurrence data is a fundamental problem in unsupervised learning. It is also a well studied problem in statistics known as the analysis of cross-classified data. One principled approach to this problem is to represent the data in low dimension with minimal l ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Dimensionality reduction of empirical co-occurrence data is a fundamental problem in unsupervised learning. It is also a well studied problem in statistics known as the analysis of cross-classified data. One principled approach to this problem is to represent the data in low dimension with minimal loss of (mutual) information contained in the original data. In this paper we introduce an information theoretic nonlinear method for finding such a most informative dimension reduction. In contrast with...
The Multiinformation Function As A Tool For Measuring Stochastic Dependence
- Learning in Graphical Models
, 1998
"... . Given a collection of random variables [¸ i ] i2N where N is a finite nonempty set, the corresponding multiinformation function ascribes the relative entropy of the joint distribution of [¸ i ] i2A with respect to the product of distributions of individual random variables ¸ i for i 2 A to every s ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
. Given a collection of random variables [¸ i ] i2N where N is a finite nonempty set, the corresponding multiinformation function ascribes the relative entropy of the joint distribution of [¸ i ] i2A with respect to the product of distributions of individual random variables ¸ i for i 2 A to every subset A ae N . We argue it is a useful tool for problems concerning stochastic (conditional) dependence and independence (at least in discrete case). First, it makes possible to express the conditional mutual information between [¸ i ] i2A and [¸ i ] i2B given [¸ i ] i2C (for every disjoint A; B; C ae N) which can be considered as a good measure of conditional stochastic dependence. Second, one can introduce reasonable measures of dependence of level r among variables [¸ i ] i2A (where A ae N , 1 r ! card A) which are expressible by means of the multiinformation function. Third, it enables one to derive theoretical results on (nonexistence of an) axiomatic characterization of stochastic c...
Soft Evidential Update for Probabilistic Multiagent Systems
- INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
, 2000
"... We address the problem of updating a probability distribution represented by a Bayesian network upon presentation of soft evidence. Our motivation ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
We address the problem of updating a probability distribution represented by a Bayesian network upon presentation of soft evidence. Our motivation
Kullback-Leibler approximation of spectral density functions
- IEEE Trans. Inform. Theory
, 2003
"... Abstract—We introduce a Kullback–Leibler-type distance between spectral density functions of stationary stochastic processes and solve the problem of optimal approximation of a given spectral density 9 by one that is consistent with prescribed second-order statistics. In general, such statistics are ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
Abstract—We introduce a Kullback–Leibler-type distance between spectral density functions of stationary stochastic processes and solve the problem of optimal approximation of a given spectral density 9 by one that is consistent with prescribed second-order statistics. In general, such statistics are expressed as the state covariance of a linear filter driven by a stochastic process whose spectral density is sought. In this context, we show i) that there is a unique spectral density 8 which minimizes this Kullback–Leibler distance, ii) that this optimal approximate is of the form 9 where the “correction term ” is a rational spectral density function, and iii) that the coefficients of can be obtained numerically by solving a suitable convex optimization problem. In the special case where 9=1, the convex functional becomes quadratic and the solution is then specified by linear equations. Index Terms—Approximation of power spectra, cross-entropy minimization, Kullback–Leibler distance, mutual information, optimization, spectral estimation. I.
The estimation of distributions and the minimum relative entropy principle
- Evolutionary Computation
, 2005
"... Estimation of Distribution Algorithms EDA have been proposed as an extension of genetic algorithms. In this paper the relation of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics is explained. The major design issues are discussed within a general interdisc ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Estimation of Distribution Algorithms EDA have been proposed as an extension of genetic algorithms. In this paper the relation of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics is explained. The major design issues are discussed within a general interdisciplinary framework. It is shown that maximum entropy approximations play a crucial role. All proposed algorithms try to minimize the Kullback-Leibler divergence ÃÄ � between the unknown distribution Ô Ü and a class Õ Ü of approximations. The Kullback-Leibler divergence is not symmetric. Approximations which suppose that the function to be optimized is additively decomposed (ADF) minimize ÃÄ � Õ�Ô, the methods which learn the approximate model from data minimize ÃÄ � Ô�Õ. This minimization is identical to maximizing the loglikelihood. In the paper three classes of algorithms are discussed. FDA uses the ADF to compute an approximate factorization of the unknown distribution. The factors are marginal distributions, whose values are computed from samples. The Bethe-Kikuchi approach developed in statistical physics uses bi-variate or higher order marginals. The values of the marginals are computed from a difficult minimization problem. The third class learns the factorization from the data. We analyze our learning algorithm LFDA in detail. It is shown that learning is faced with two problems: first, to detect the important dependencies between the variables, and second, to create an acyclic Bayesian network of bounded clique size.

