Results 1  10
of
7,673
Thompson Sampling for 1Dimensional Exponential Family Bandits
 In Neural Information Processing Systems
, 2013
"... ar ..."
Graphical models, exponential families, and variational inference
, 2008
"... The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fiel ..."
Abstract

Cited by 819 (28 self)
 Add to MetaCart
of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing
Articulated body motion capture by annealed particle filtering
 In IEEE Conf. on Computer Vision and Pattern Recognition
, 2000
"... The main challenge in articulated body motion tracking is the large number of degrees of freedom (around 30) to be recovered. Search algorithms, either deterministic or stochastic, that search such a space without constraint, fall foul of exponential computational complexity. One approach is to intr ..."
Abstract

Cited by 494 (4 self)
 Add to MetaCart
The main challenge in articulated body motion tracking is the large number of degrees of freedom (around 30) to be recovered. Search algorithms, either deterministic or stochastic, that search such a space without constraint, fall foul of exponential computational complexity. One approach
Dual polyhedra and mirror symmetry for Calabi–Yau hypersurfaces in toric varieties
 J. Alg. Geom
, 1994
"... We consider families F(∆) consisting of complex (n − 1)dimensional projective algebraic compactifications of ∆regular affine hypersurfaces Zf defined by Laurent polynomials f with a fixed ndimensional Newton polyhedron ∆ in ndimensional algebraic torus T = (C ∗ ) n. If the family F(∆) defined by ..."
Abstract

Cited by 467 (20 self)
 Add to MetaCart
We consider families F(∆) consisting of complex (n − 1)dimensional projective algebraic compactifications of ∆regular affine hypersurfaces Zf defined by Laurent polynomials f with a fixed ndimensional Newton polyhedron ∆ in ndimensional algebraic torus T = (C ∗ ) n. If the family F(∆) defined
A Neural Probabilistic Language Model
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen ..."
Abstract

Cited by 447 (19 self)
 Add to MetaCart
training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 443 (57 self)
 Add to MetaCart
this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering
Variational algorithms for approximate Bayesian inference
, 2003
"... The Bayesian framework for machine learning allows for the incorporation of prior knowledge in a coherent way, avoids overfitting problems, and provides a principled basis for selecting between alternative models. Unfortunately the computations required are usually intractable. This thesis presents ..."
Abstract

Cited by 440 (9 self)
 Add to MetaCart
the theoretical core of the thesis, generalising the expectationmaximisation (EM) algorithm for learning maximum likelihood parameters to the VB EM algorithm which integrates over model parameters. The algorithm is then specialised to the large family of conjugateexponential (CE) graphical models, and several
The WienerAskey Polynomial Chaos for Stochastic Differential Equations
 SIAM J. SCI. COMPUT
, 2002
"... We present a new method for solving stochastic differential equations based on Galerkin projections and extensions of Wiener's polynomial chaos. Specifically, we represent the stochastic processes with an optimum trial basis from the Askey family of orthogonal polynomials that reduces the dime ..."
Abstract

Cited by 398 (42 self)
 Add to MetaCart
the dimensionality of the system and leads to exponential convergence of the error. Several continuous and discrete processes are treated, and numerical examples show substantial speedup compared to MonteCarlo simulations for low dimensional stochastic inputs.
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical, ..."
Abstract

Cited by 373 (11 self)
 Add to MetaCart
. The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical
A generalization of principal component analysis to the exponential family
 Advances in Neural Information Processing Systems
, 2001
"... Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not realvalued, such as binaryvalued data. This paper draws on ideas from the Exponential family, Generaliz ..."
Abstract

Cited by 155 (1 self)
 Add to MetaCart
Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not realvalued, such as binaryvalued data. This paper draws on ideas from the Exponential family
Results 1  10
of
7,673