Results 1  10
of
129
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 444 (59 self)
 Add to MetaCart
(Show Context)
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 186 (16 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
Asymptotically Optimal Importance Sampling and Stratification for Pricing PathDependent Options
 Mathematical Finance
, 1999
"... This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of dri ..."
Abstract

Cited by 90 (13 self)
 Add to MetaCart
This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to be optimal in an asymptotic sense. The drift selected has an interpretation as the path of the underlying state variables which maximizes the product of probability and payoffthe most important path. The directions used for stratified sampling are optimal for a quadratic approximation to the integrand or payoff function. Indeed, under differentiability assumptions our importance sampling method eliminates variability due to the linear part of the payoff function, and stratification eliminates much of the variability due to the quadratic part of the payoff. The two parts of the method are linked because the asymptotically optimal drift vector frequently provides a particularly effective direction for stratification. We illustrate the use of the method with pathdependent options, a stochastic volatility model, and interest rate derivatives. The method reveals novel features of the structure of their payoffs. KEY WORDS: Monte Carlo methods, variance reduction, large deviations, Laplace principle 1. INTRODUCTION This paper develops a variance reduction technique for Monte Carlo simulations driven by highdimensional Gaussian vectors, with particular emphasis on the pricing of pathdependent options. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to...
Competition and Efficiency in Congested Markets
"... We study the efficiency of oligopoly equilibria in congested markets. The motivating examples are the allocation of network flows in a communication network or of traffic in a transportation network. We show that increasing competition among oligopolists can reduce efficiency, measured as the differ ..."
Abstract

Cited by 65 (9 self)
 Add to MetaCart
(Show Context)
We study the efficiency of oligopoly equilibria in congested markets. The motivating examples are the allocation of network flows in a communication network or of traffic in a transportation network. We show that increasing competition among oligopolists can reduce efficiency, measured as the difference between users ’ willingness to pay and delay costs. We characterize a tight bound of 5/6 on efficiency in pure strategy equilibria when there is zero latency at zero flow and a tight bound of 2 √ 2 − 2 with positive latency at zero flow. These bounds are tight even when the numbers of routes and oligopolists are arbitrarily large.
A convergent incremental gradient method with constant step size
 SIAM J. OPTIM
, 2004
"... An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits ..."
Abstract

Cited by 63 (3 self)
 Add to MetaCart
(Show Context)
An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits regions in which the gradient is small infinitely often. Under certain unimodality assumptions, global convergence is established. In the quadratic case, a global linear rate of convergence is shown. The method is applied to distributed optimization problems arising in wireless sensor networks, and numerical experiments compare the new method with the standard incremental gradient method.
EL Inference for Partially Identified Models: Large Deviations Optimality and Bootstrap Validity
, 2008
"... This paper addresses the issue of optimal inference for parameters that are partially identified in models with moment inequalities. There currently exists a variety of inferential methods for use in this setting. However, the question of choosing optimally among contending procedures is unresolved. ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
This paper addresses the issue of optimal inference for parameters that are partially identified in models with moment inequalities. There currently exists a variety of inferential methods for use in this setting. However, the question of choosing optimally among contending procedures is unresolved. In this paper, I first consider a canonical large deviations criterion for optimality and show that inference based on the empirical likelihood ratio statistic is optimal. This finding is a direct analog to that in Kitamura (2001) for moment equality models. Second, I introduce a new empirical likelihood bootstrap that provides a valid resampling method for moment inequality models and overcomes the implementation challenges that arise as a result of nonpivotal limit distributions. Lastly, I analyze the finite sample properties of the proposed framework using Monte Carlo simulations. The simulation results are encouraging.
Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems
 SIAM Journal on Optimization
, 2004
"... A new class of algorithms for solving nonlinearly constrained mixed variable optimization problems is presented. This class combines and extends the AudetDennis Generalized Pattern Search (GPS) algorithms for bound constrained mixed variable optimization, and their GPSfilter algorithms for gene ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
(Show Context)
A new class of algorithms for solving nonlinearly constrained mixed variable optimization problems is presented. This class combines and extends the AudetDennis Generalized Pattern Search (GPS) algorithms for bound constrained mixed variable optimization, and their GPSfilter algorithms for general nonlinear constraints. In generalizing existing algorithms, new theoretical convergence results are presented that reduce seamlessly to existing results for more specific classes of problems. While no local continuity or smoothness assumptions are required to apply the algorithm, a hierarchy of theoretical convergence results based on the Clarke calculus is given, in which local smoothness dictate what can be proved about certain limit points generated by the algorithm. To demonstrate the usefulness of the algorithm, the algorithm is applied to the design of a loadbearing thermal insulation system. We believe this is the first algorithm with provable convergence results to directly target this class of problems.
Distributionally Robust Optimization under Moment Uncertainty with Application to DataDriven Problems
"... Stochastic programs can effectively describe the decisionmaking problem in an uncertain environment. Unfortunately, such programs are often computationally demanding to solve. In addition, their solutions can be misleading when there is ambiguity in the choice of a distribution for the random param ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
Stochastic programs can effectively describe the decisionmaking problem in an uncertain environment. Unfortunately, such programs are often computationally demanding to solve. In addition, their solutions can be misleading when there is ambiguity in the choice of a distribution for the random parameters. In this paper, we propose a model describing one’s uncertainty in both the distribution’s form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance). We demonstrate that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently. Furthermore, by deriving new confidence regions for the mean and covariance of a random vector, we provide probabilistic arguments for using our model in problems that rely heavily on historical data. This is confirmed in a practical example of portfolio selection, where our framework leads to better performing policies on the “true” distribution underlying the daily return of assets.
All quantum adversary methods are equivalent
 THEORY OF COMPUTING
, 2006
"... The quantum adversary method is one of the most versatile lowerbound methods for quantum algorithms. We show that all known variants of this method are equivalent: spectral adversary (Barnum, Saks, and Szegedy, 2003), weighted adversary (Ambainis, 2003), strong weighted adversary (Zhang, 2005), an ..."
Abstract

Cited by 52 (5 self)
 Add to MetaCart
The quantum adversary method is one of the most versatile lowerbound methods for quantum algorithms. We show that all known variants of this method are equivalent: spectral adversary (Barnum, Saks, and Szegedy, 2003), weighted adversary (Ambainis, 2003), strong weighted adversary (Zhang, 2005), and the Kolmogorov complexity adversary (Laplante and Magniez, 2004). We also present a few new equivalent formulations of the method. This shows that there is essentially one quantum adversary method. From our approach, all known limitations of these versions of the quantum adversary method easily follow.