Results 1 - 10
of
51
Clustering with Bregman Divergences
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract
-
Cited by 183 (31 self)
- Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate-distortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
Asymptotically Optimal Importance Sampling and Stratification for Pricing Path-Dependent Options
- Mathematical Finance
, 1999
"... This paper develops a variance reduction technique for Monte Carlo simulations of path-dependent options driven by high-dimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of dri ..."
Abstract
-
Cited by 51 (13 self)
- Add to MetaCart
This paper develops a variance reduction technique for Monte Carlo simulations of path-dependent options driven by high-dimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to be optimal in an asymptotic sense. The drift selected has an interpretation as the path of the underlying state variables which maximizes the product of probability and payoff---the most important path. The directions used for stratified sampling are optimal for a quadratic approximation to the integrand or payoff function. Indeed, under differentiability assumptions our importance sampling method eliminates variability due to the linear part of the payoff function, and stratification eliminates much of the variability due to the quadratic part of the payoff. The two parts of the method are linked because the asymptotically optimal drift vector frequently provides a particularly effective direction for stratification. We illustrate the use of the method with path-dependent options, a stochastic volatility model, and interest rate derivatives. The method reveals novel features of the structure of their payoffs. KEY WORDS: Monte Carlo methods, variance reduction, large deviations, Laplace principle 1. INTRODUCTION This paper develops a variance reduction technique for Monte Carlo simulations driven by high-dimensional Gaussian vectors, with particular emphasis on the pricing of pathdependent options. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to...
All quantum adversary methods are equivalent
- THEORY OF COMPUTING
, 2006
"... The quantum adversary method is one of the most versatile lower-bound methods for quantum algorithms. We show that all known variants of this method are equivalent: spectral adversary (Barnum, Saks, and Szegedy, 2003), weighted adversary (Ambainis, 2003), strong weighted adversary (Zhang, 2005), an ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
The quantum adversary method is one of the most versatile lower-bound methods for quantum algorithms. We show that all known variants of this method are equivalent: spectral adversary (Barnum, Saks, and Szegedy, 2003), weighted adversary (Ambainis, 2003), strong weighted adversary (Zhang, 2005), and the Kolmogorov complexity adversary (Laplante and Magniez, 2004). We also present a few new equivalent formulations of the method. This shows that there is essentially one quantum adversary method. From our approach, all known limitations of these versions of the quantum adversary method easily follow.
A Large Deviation Principle with Queueing Applications
, 1997
"... In this paper we present a large deviation principle, for partial sums processes indexed by the half line, which is particularly suited to queueing applications. 1 The main result in this paper provides a new tool for looking at large deviations for queueing systems in equilibrium. Equilibrium syst ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper we present a large deviation principle, for partial sums processes indexed by the half line, which is particularly suited to queueing applications. 1 The main result in this paper provides a new tool for looking at large deviations for queueing systems in equilibrium. Equilibrium systems have generally been treated on a case-by-case basis, with much work and/or additional hypotheses necessary to prove large deviation principles (see, for example, Chang and Zajic [3], Ganesh and Anantharam [11], Ramanan and Dupuis [19]). We provide a simple sufficient condition for the usual sample path LDP (as in Mogulskii's theorem) to be strengthened to a topology for which the reflection mappings appearing in many queueing applications are continuous and the contraction principle can be applied. A step in this direction was made by Dobrushin and Pechersky [5], who introduce a finer topology (a guage topology) which allows one to treat the single server queue with constant service rat...
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
A Bandwidth Sharing Theory for a Large Number of HTTP-like Connections
- IEEE/ACM Trans. on Networking
, 2002
"... There has been tremendous progress in understanding how bandwidth is shared by TCP-like connections. By associating each TCPlike connection with a utility function, the bandwidth sharing problem of TCP-like connections can be modelled as a distributed optimization problem for utility functions. Howe ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
There has been tremendous progress in understanding how bandwidth is shared by TCP-like connections. By associating each TCPlike connection with a utility function, the bandwidth sharing problem of TCP-like connections can be modelled as a distributed optimization problem for utility functions. However, little is known on how bandwidth is shared by HTTP-like connections through their utility functions at the TCP level. One of the main objectives of this paper is to provide a theory for bandwidth sharing of a large number of HTTP-like connections. Based on certain technical assumptions, we show that there is a utility function at the HTTP level for an HTTP-like connection and such a utility function can be derived from the utility function at the TCP level. The bandwidth is then shared by HTTP-like connections through utility functions at the HTTP level. Moreover, there is a probabilistic interpretation for how the utility function at the HTTP level is related to the utility function at the TCP level. This is done by relating utility functions to large deviation rate functions.
Entanglement measures and purification procedures
- Physical Review A
, 1998
"... We improve previously proposed conditions each measure of entanglement has to satisfy. We present a class of entanglement measures that satisfy these conditions and show that the Quantum Relative Entropy and Bures Metric generate two measures of this class. We calculate the measures of entanglement ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We improve previously proposed conditions each measure of entanglement has to satisfy. We present a class of entanglement measures that satisfy these conditions and show that the Quantum Relative Entropy and Bures Metric generate two measures of this class. We calculate the measures of entanglement for a number of mixed two spin 1/2 systems using the Quantum Relative Entropy, and provide an efficient numerical method to obtain the measures of entanglement in this case. In addition, we prove a number of properties of our entanglement measure which have important physical implications. We briefly explain the statistical basis of our measure of entanglement in the case of the Quantum Relative Entropy. We then argue that our entanglement measure determines an upper bound to the number of singlets that can be obtained by any purification procedure. PACS: 03.65.Bz I.
A convergent incremental gradient method with constant step size
- SIAM J. OPTIM
, 2004
"... An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits regions in which the gradient is small infinitely often. Under certain unimodality assumptions, global convergence is established. In the quadratic case, a global linear rate of convergence is shown. The method is applied to distributed optimization problems arising in wireless sensor networks, and numerical experiments compare the new method with the standard incremental gradient method.

