Results 1  10
of
85
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 310 (52 self)
 Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Universal prediction
 IEEE Transactions on Information Theory
, 1998
"... Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both th ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings. Index Terms — Bayes envelope, entropy, finitestate machine, linear prediction, loss function, probability assignment, redundancycapacity, stochastic complexity, universal coding, universal prediction. I.
Asymptotically Optimal Importance Sampling and Stratification for Pricing PathDependent Options
 Mathematical Finance
, 1999
"... This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of dri ..."
Abstract

Cited by 61 (13 self)
 Add to MetaCart
This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to be optimal in an asymptotic sense. The drift selected has an interpretation as the path of the underlying state variables which maximizes the product of probability and payoffthe most important path. The directions used for stratified sampling are optimal for a quadratic approximation to the integrand or payoff function. Indeed, under differentiability assumptions our importance sampling method eliminates variability due to the linear part of the payoff function, and stratification eliminates much of the variability due to the quadratic part of the payoff. The two parts of the method are linked because the asymptotically optimal drift vector frequently provides a particularly effective direction for stratification. We illustrate the use of the method with pathdependent options, a stochastic volatility model, and interest rate derivatives. The method reveals novel features of the structure of their payoffs. KEY WORDS: Monte Carlo methods, variance reduction, large deviations, Laplace principle 1. INTRODUCTION This paper develops a variance reduction technique for Monte Carlo simulations driven by highdimensional Gaussian vectors, with particular emphasis on the pricing of pathdependent options. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to...
Competition and Efficiency in Congested Markets
"... We study the efficiency of oligopoly equilibria in congested markets. The motivating examples are the allocation of network flows in a communication network or of traffic in a transportation network. We show that increasing competition among oligopolists can reduce efficiency, measured as the differ ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
We study the efficiency of oligopoly equilibria in congested markets. The motivating examples are the allocation of network flows in a communication network or of traffic in a transportation network. We show that increasing competition among oligopolists can reduce efficiency, measured as the difference between users ’ willingness to pay and delay costs. We characterize a tight bound of 5/6 on efficiency in pure strategy equilibria when there is zero latency at zero flow and a tight bound of 2 √ 2 − 2 with positive latency at zero flow. These bounds are tight even when the numbers of routes and oligopolists are arbitrarily large.
Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems
 SIAM Journal on Optimization
, 2004
"... A new class of algorithms for solving nonlinearly constrained mixed variable optimization problems is presented. This class combines and extends the AudetDennis Generalized Pattern Search (GPS) algorithms for bound constrained mixed variable optimization, and their GPSfilter algorithms for gene ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
A new class of algorithms for solving nonlinearly constrained mixed variable optimization problems is presented. This class combines and extends the AudetDennis Generalized Pattern Search (GPS) algorithms for bound constrained mixed variable optimization, and their GPSfilter algorithms for general nonlinear constraints. In generalizing existing algorithms, new theoretical convergence results are presented that reduce seamlessly to existing results for more specific classes of problems. While no local continuity or smoothness assumptions are required to apply the algorithm, a hierarchy of theoretical convergence results based on the Clarke calculus is given, in which local smoothness dictate what can be proved about certain limit points generated by the algorithm. To demonstrate the usefulness of the algorithm, the algorithm is applied to the design of a loadbearing thermal insulation system. We believe this is the first algorithm with provable convergence results to directly target this class of problems.
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
H.: On the optimality of conditional expectation as a bregman predictor
 IEEE Trans. Inf. Theory
, 2005
"... Given a probability space (Ω, F,P), a Fmeasurable random variable X, anda subσalgebra G ⊂ F, it is well known that the conditional expectation E[XG] isthe optimal L 2predictor (also known as “the least mean square error ” predictor) of X among all the Gmeasurable random variables [8, 11]. In t ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
Given a probability space (Ω, F,P), a Fmeasurable random variable X, anda subσalgebra G ⊂ F, it is well known that the conditional expectation E[XG] isthe optimal L 2predictor (also known as “the least mean square error ” predictor) of X among all the Gmeasurable random variables [8, 11]. In this paper, we provide necessary and sufficient conditions under which the conditional expectation is the unique optimal predictor. We show that E[XG] istheoptimalpredictorforallBregmanLoss Functions (BLFs), of which L 2 loss function is a special case. Moreover, under mild conditions, we show that BLFs are exhaustive. Namely, if the inÞmum of E[F (X, Y)] over all the Gmeasurable random variables Y and for any variable X is attained at the conditional expectation E[XG], then F is a BLF. 1 1
EL Inference for Partially Identified Models: Large Deviations Optimality and Bootstrap Validity
, 2008
"... This paper addresses the issue of optimal inference for parameters that are partially identified in models with moment inequalities. There currently exists a variety of inferential methods for use in this setting. However, the question of choosing optimally among contending procedures is unresolved. ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
This paper addresses the issue of optimal inference for parameters that are partially identified in models with moment inequalities. There currently exists a variety of inferential methods for use in this setting. However, the question of choosing optimally among contending procedures is unresolved. In this paper, I first consider a canonical large deviations criterion for optimality and show that inference based on the empirical likelihood ratio statistic is optimal. This finding is a direct analog to that in Kitamura (2001) for moment equality models. Second, I introduce a new empirical likelihood bootstrap that provides a valid resampling method for moment inequality models and overcomes the implementation challenges that arise as a result of nonpivotal limit distributions. Lastly, I analyze the finite sample properties of the proposed framework using Monte Carlo simulations. The simulation results are encouraging.
All quantum adversary methods are equivalent
 THEORY OF COMPUTING
, 2006
"... The quantum adversary method is one of the most versatile lowerbound methods for quantum algorithms. We show that all known variants of this method are equivalent: spectral adversary (Barnum, Saks, and Szegedy, 2003), weighted adversary (Ambainis, 2003), strong weighted adversary (Zhang, 2005), an ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
The quantum adversary method is one of the most versatile lowerbound methods for quantum algorithms. We show that all known variants of this method are equivalent: spectral adversary (Barnum, Saks, and Szegedy, 2003), weighted adversary (Ambainis, 2003), strong weighted adversary (Zhang, 2005), and the Kolmogorov complexity adversary (Laplante and Magniez, 2004). We also present a few new equivalent formulations of the method. This shows that there is essentially one quantum adversary method. From our approach, all known limitations of these versions of the quantum adversary method easily follow.