Results 1 - 10
of
31
Clustering with Bregman Divergences
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract
-
Cited by 183 (31 self)
- Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate-distortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Coupling and Ergodicity of Adaptive MCMC
, 2007
"... We consider basic ergodicity properties of adaptive MCMC algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. We also give counter-examples to demonstrate that the assumptions we make are not redundant. ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
We consider basic ergodicity properties of adaptive MCMC algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. We also give counter-examples to demonstrate that the assumptions we make are not redundant.
Ordering Monte Carlo Markov Chains
- School of Statistics, University of Minnesota
, 1999
"... Markov chains having the same stationary distribution ß can be partially ordered by performance in the central limit theorem. We say that one chain is at least as good as another in the efficiency partial ordering if the variance in the central limit theorem is at least as small for every L 2 (ß) ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Markov chains having the same stationary distribution ß can be partially ordered by performance in the central limit theorem. We say that one chain is at least as good as another in the efficiency partial ordering if the variance in the central limit theorem is at least as small for every L 2 (ß) functional of the chain. Peskun partial ordering implies efficiency partial ordering [25, 30]. Here we show that Peskun partial ordering implies, for finite state spaces, ordering of all the eigenvalues of the transition matrices, and, for general state spaces, ordering of the suprema of the spectra of the transition operators. We also define a covariance partial ordering based on lag one autocovariances and show that it is equivalent to the efficiency partial ordering when restricted to reversible Markov chains. Similar but weaker results are provided for non-reversible Markov chains. Keywords: Peskun ordering, Eigenvalues, Spectral decomposition, Non-reversible kernels. 1 Introduction I...
Applications of the continuous-time ballot theorem to Brownian motion and related processes
, 2001
"... ..."
Ordering, Slicing And Splitting Monte Carlo Markov Chains
, 1998
"... Markov chain Monte Carlo is a method of approximating the integral of a function f with respect to a distribution ß. A Markov chain that has ß as its stationary distribution is simulated producing samples X 1 ; X 2 ; : : : . The integral is approximated by taking the average of f(X n ) over the sam ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Markov chain Monte Carlo is a method of approximating the integral of a function f with respect to a distribution ß. A Markov chain that has ß as its stationary distribution is simulated producing samples X 1 ; X 2 ; : : : . The integral is approximated by taking the average of f(X n ) over the sample path. The standard way to construct such Markov chains is the Metropolis-Hastings algorithm. The class P of all Markov chains having ß as their unique stationary distribution is very large, so it is important to have criteria telling when one chain performs better than another. The Peskun ordering is a partial ordering on P. If two Markov chains are Peskun ordered, then the better chain has smaller variance in the central limit theorem for every function f that has a variance. Peskun ordering is sufficient for this but not necessary. We study the implications of the Peskun ordering both in finite and general state spaces. Unfortunately there are many Metropolis-Hastings samplers that are...
Performance of Hybrid-ARQ in Block-Fading Channels: A Fixed Outage Probability Analysis
"... This paper studies the performance of hybrid-ARQ (automatic repeat request) in Rayleigh blockfading channels in a setting where rate is increased with the average SNR such that a constant outage probability is maintained. H-ARQ allows for early termination of transmission once the receiver is able t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper studies the performance of hybrid-ARQ (automatic repeat request) in Rayleigh blockfading channels in a setting where rate is increased with the average SNR such that a constant outage probability is maintained. H-ARQ allows for early termination of transmission once the receiver is able to successfully decode, and thus provides an advantage over systems without H-ARQ. It is shown that H-ARQ allows the average transmission rate to very quickly approach the ergodic capacity of the fading channel as the maximum number of fading blocks per codeword (which is proportional to the speed of temporal fading) increases, whereas this convergence is much slower without H-ARQ. Furthermore, although H-ARQ does not provide an advantage in terms of well-known high-SNR metrics (multiplexing gain and high-SNR offset), it is seen to provide a significant advantage throughout the range of practically relevant SNR’s. In addition, incremental redundancy is shown to outperform lower-complexity Chase combining, particularly at moderate and high SNR’s. I.
The Effect of Automatic Gain Control on Serial, Matched-Filter Acquisition in Direct-Sequence Packet Radio Communications
- IEEE Trans. Vehicular Technol
, 2001
"... The performance of a noncoherent serial acquisition technique is evaluated for direct-sequence spread-spectrum packet communications. The acquisition technique that is considered uses threshold crossing of a matched-filter output to detect a fixed-length preamble at the start of each packet. The ana ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The performance of a noncoherent serial acquisition technique is evaluated for direct-sequence spread-spectrum packet communications. The acquisition technique that is considered uses threshold crossing of a matched-filter output to detect a fixed-length preamble at the start of each packet. The analysis accounts for frequency mismatch between the transmitter and the receiver due to oscillator inaccuracies and mobility-induced Doppler shifts. It also accounts for the effects of automatic gain control (AGC) in the receiver. The role of the AGC system in determining the acquisition performance is examined. In addition, selection of the optimal acquisition threshold is considered, and a simple method for selection of a good suboptimal threshold is presented. It is shown that use of this threshold results in performance close to that obtained with the optimal threshold over a wide range of channels.
Convolution operators and zeros of entire functions
- Proc. Amer. Math. Soc
"... Abstract. Let G(z) be a real entire function of order less than 2 with only real zeros. Then we classify certain distributions functions F such that the convolution (G ∗ dF)(z) = ∫ ∞ − ∞ G(z − is) dF (s) has only real zeros. 1. ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Let G(z) be a real entire function of order less than 2 with only real zeros. Then we classify certain distributions functions F such that the convolution (G ∗ dF)(z) = ∫ ∞ − ∞ G(z − is) dF (s) has only real zeros. 1.
Fractional Fokker-Planck Equation for Nonlinear Stochastic Differential Equations Driven by Non-Gaussian Levy Stable Noises
, 1999
"... The Fokker-Planck equation has been very useful for studying dynamic behavior of stochastic differential equations driven by Gaussian noises. In this paper, we derive a Fractional Fokker-Planck equation for the probability distribution of particles whose motion is governed by a nonlinear Langevin-ty ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The Fokker-Planck equation has been very useful for studying dynamic behavior of stochastic differential equations driven by Gaussian noises. In this paper, we derive a Fractional Fokker-Planck equation for the probability distribution of particles whose motion is governed by a nonlinear Langevin-type equation, which is driven by a non-Gaussian Levy-stable noise. We obtain in fact a more general result for Markovian processes generated by stochastic differential equations.

