Results 1  10
of
80
Logistic Regression, AdaBoost and Bregman Distances
, 2000
"... We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt al ..."
Abstract

Cited by 207 (43 self)
 Add to MetaCart
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms can be divided into two types based on whether the parameters are iteratively updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms which interpolates smoothly between these two extremes. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliaryfunction proof technique. As one of our sequentialupdate algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with iterative scaling. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.
Just relax: Convex programming methods for subset selection and sparse approximation
, 2004
"... Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electric ..."
Abstract

Cited by 92 (4 self)
 Add to MetaCart
Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electrical engineering, applied mathematics and statistics, but small theoretical progress has been made over the last fifty years. Subset selection and sparse approximation both admit natural convex relaxations, but the literature contains few results on the behavior of these relaxations for general input signals. This report demonstrates that the solution of the convex program frequently coincides with the solution of the original approximation problem. The proofs depend essentially on geometric properties of the ensemble of elementary signals. The results are powerful because sparse approximation problems are combinatorial, while convex programs can be solved in polynomial time with standard software. Comparable new results for a greedy algorithm, Orthogonal Matching Pursuit, are also stated. This report should have a major practical impact because the theory applies immediately to many realworld signal processing problems. 1.
Entropy and the law of small numbers
 IEEE Trans. Inform. Theory
, 2005
"... Two new informationtheoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary informationtheoretic techniques it is shown that, when Sn = �n i=1 Xi is the sum of the (possibly dependent) binary random variables X1, X2,..., Xn, with E(Xi) = p ..."
Abstract

Cited by 29 (11 self)
 Add to MetaCart
Two new informationtheoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary informationtheoretic techniques it is shown that, when Sn = �n i=1 Xi is the sum of the (possibly dependent) binary random variables X1, X2,..., Xn, with E(Xi) = pi and E(Sn) = λ, then D(PSn�Po(λ)) ≤ n� i=1 p 2 i + � n � i=1 H(Xi) − H(X1, X2,..., Xn), where D(PSn�Po(λ)) is the relative entropy between the distribution of Sn and the Poisson(λ) distribution. The first term in this bound measures the individual smallness of the Xi and the second term measures their dependence. A general method is outlined for obtaining corresponding bounds when approximating the distribution of a sum of general discrete random variables by an infinitely divisible distribution. Second, in the particular case when the Xi are independent, the following sharper bound is established,
A Global Optimization Technique for Statistical Classifier Design
 IEEE Transactions on Signal Processing
"... A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the design of practical classifiers for some of the most popular structures currently in use. The method, grounded in ideas from statistical physics and information theory, extends the deterministic annealing approach for optimization, both to incorporate structural constraints on data assignments to classes and to minimize the probability of error as the cost objective. During the design, data are assigned to classes in probability, so as to minimize the expected classification error given a specified level of randomness, as measured by Shannon's entropy. The constrained optimization is equivalent to a free energy minimization, motivating a deterministic annealing approach in which the entropy...
Information Theoretic Methods in Probability and Statistics
, 2001
"... Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics. ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics.
Logarithmic Sobolev inequalities for some nonlinear PDE's
 Stochastic Processes and their Applications
, 2001
"... The aim of this paper is to study the behavior of solutions of some nonlinear partial differential equations of Mac KeanVlasov type. The main tools used are, on one hand, the logarithmic Sobolev inequality and its connections with the concentration of measure and the transportation inequality with ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The aim of this paper is to study the behavior of solutions of some nonlinear partial differential equations of Mac KeanVlasov type. The main tools used are, on one hand, the logarithmic Sobolev inequality and its connections with the concentration of measure and the transportation inequality with quadratic cost; on the other hand, the propagation of chaos for particle systems in mean field interaction. Key words: Interacting particle system, logarithmic Sobolev inequality, propagation of chaos, relative entropy, concentration of measure 1 Introduction A probability measure on R n satisfies a logarithmic Sobolev inequality with constant C if Ent i f 2 j C Z jrf j 2 d (1) for all smooth enough functions f where Ent i f 2 j = Z f 2 log f 2 d \Gamma Z f 2 d log Z f 2 d : Let us recall two consequences of this property for . First, for every r 0, and every Lipschitz function f on R n (equiped with the Euclidean topology) with kfk Lip = sup x6=y jf(...
Relative entropy and exponential deviation bounds for general Markov chains
 in Proceedings of the 2005 IEEE International Symposium on Information Theory
, 2005
"... Abstract — We develop explicit, general bounds for the probability that the normalized partial sums of a function of a Markov chain on a general alphabet will exceed the steadystate mean of that function by a given amount. Our bounds combine simple informationtheoretic ideas together with techniqu ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract — We develop explicit, general bounds for the probability that the normalized partial sums of a function of a Markov chain on a general alphabet will exceed the steadystate mean of that function by a given amount. Our bounds combine simple informationtheoretic ideas together with techniques from optimization and some fairly elementary tools from analysis. In one direction, we obtain a general bound for the important class of Doeblin chains; this bound is optimal, in the sense that in the special case of independent and identically distributed random variables it essentially reduces to the classical Hoeffding bound. In another direction, motivated by important problems in simulation, we develop a series of bounds in a form which is particularly suited to these problems, and which apply to the more general class of “geometrically ergodic ” Markov chains. I.
Relative Entropy and the multivariable multidimensional Moment Problem
 IEEE Trans. on Information Theory
"... Entropylike functionals on operator algebras have been studied since the pioneering work of von Neumann, Umegaki, Lindblad, and Lieb. The most wellknown are the von Neumann entropy I(ρ): = −trace(ρ log ρ) and a generalization of the KullbackLeibler distance S(ρσ): = trace(ρ log ρ − ρ log σ), re ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Entropylike functionals on operator algebras have been studied since the pioneering work of von Neumann, Umegaki, Lindblad, and Lieb. The most wellknown are the von Neumann entropy I(ρ): = −trace(ρ log ρ) and a generalization of the KullbackLeibler distance S(ρσ): = trace(ρ log ρ − ρ log σ), refered to as quantum relative entropy and used to quantify distance between states of a quantum system. The purpose of this paper is to explore I and S as regularizing functionals in seeking solutions to multivariable and multidimensional moment problems. It will be shown that extrema can be effectively constructed via a suitable homotopy. The homotopy approach leads naturally to a further generalization and a description of all the solutions to such moment problems. This is accomplished by a renormalization of a Riemannian metric induced by entropy functionals. As an application we discuss the inverse problem of describing power spectra which are consistent with secondorder statistics, which has been the main motivation behind the present work.