Results 1  10
of
140
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 247 (12 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Game Theory, Maximum Entropy, Minimum Discrepancy And Robust Bayesian Decision Theory
 ANNALS OF STATISTICS
, 2004
"... ..."
Tracking the Best Disjunction
 Machine Learning
, 1995
"... . Littlestone developed a simple deterministic online learning algorithm for learning kliteral disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for a ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
. Littlestone developed a simple deterministic online learning algorithm for learning kliteral disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule T is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence. We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However the amortized analysis needed for obtaining worstcase mistake bounds requires new techniques. In some cases our low...
Classification with NonMetric Distances: Image Retrieval and Class Representation
, 2000
"... One of the key problems in appearancebased vision is understanding how to use a set of labeled images to classify new images. Classification systems that can model human performance, or that use robust image matching methods, often make use of similarity judgments that are nonmetric; but when the ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
One of the key problems in appearancebased vision is understanding how to use a set of labeled images to classify new images. Classification systems that can model human performance, or that use robust image matching methods, often make use of similarity judgments that are nonmetric; but when the triangle inequality is not obeyed, most existing pattern recognition techniques are not applicable. We note that exemplarbased (or nearestneighbor) methods can be applied naturally when using a wide class of nonmetric similarity functions. The key issue, however, is to find methods for choosing good representatives of a class that accurately characterize it. We show that existing condensing techniques for finding class representatives are illsuited to deal with nonmetric dataspaces. We then focus on developing techniques for solving this problem, emphasizing two points: First, we show that the distance between two images is not a good measure of how well one image can represent ...
Boosting as Entropy Projection
, 1999
"... We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane dened by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.
The CrossEntropy Method for Combinatorial and Continuous Optimization
, 1999
"... We present a new and fast method, called the crossentropy method, for finding the optimal solution of combinatorial and continuous nonconvex optimization problems with convex bounded domains. To find the optimal solution we solve a sequence of simple auxiliary smooth optimization problems based on ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
We present a new and fast method, called the crossentropy method, for finding the optimal solution of combinatorial and continuous nonconvex optimization problems with convex bounded domains. To find the optimal solution we solve a sequence of simple auxiliary smooth optimization problems based on KullbackLeibler crossentropy, importance sampling, Markov chain and Boltzmann distribution. We use importance sampling as an important ingredient for adaptive adjustment of the temperature in the Boltzmann distribution and use KullbackLeibler crossentropy to find the optimal solution. In fact, we use the mode of a unimodal importance sampling distribution, like the mode of beta distribution, as an estimate of the optimal solution for continuous optimization and Markov chains approach for combinatorial optimization. In the later case we show almost surely convergence of our algorithm to the optimal solution. Supporting numerical results for both continuous and combinatorial optimization problems are given as well. Our empirical studies suggest that the crossentropy method has polynomial in the size of the problem running time complexity.
LeZiUpdate: An InformationTheoretic Framework for Personal Mobility Tracking
 in PCS Networks. Wireless Networks
, 2002
"... Abstract. The complexity of the mobility tracking problem in a cellular environment has been characterized under an informationtheoretic framework. Shannon’s entropy measure is identified as a basis for comparing user mobility models. By building and maintaining a dictionary of individual user’s pa ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Abstract. The complexity of the mobility tracking problem in a cellular environment has been characterized under an informationtheoretic framework. Shannon’s entropy measure is identified as a basis for comparing user mobility models. By building and maintaining a dictionary of individual user’s path updates (as opposed to the widely used location updates), the proposed adaptive online algorithm can learn subscribers ’ profiles. This technique evolves out of the concepts of lossless compression. The compressibility of the variabletofixed length encoding of the acclaimed Lempel–Ziv family of algorithms reduces the update cost, whereas their builtin predictive power can be effectively used to reduce paging cost.
Local discriminant bases and their applications
 Journal of Mathematical Imaging and Vision
, 1995
"... Abstract. We describe an extension to the "bestbasis " method to select an orthonormal basis suitable for signal/image classification problems from a large collection of orthonormal bases consisting of wavelet packets or local trigonometric bases. The original bestbasis algorithm select ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
Abstract. We describe an extension to the "bestbasis " method to select an orthonormal basis suitable for signal/image classification problems from a large collection of orthonormal bases consisting of wavelet packets or local trigonometric bases. The original bestbasis algorithm selects a basis minimizing entropy from such a "library of orthonormal bases " whereas the proposed algorithm selects a basis maximizing a certain discriminant measure (e.g., relative entropy) among classes. Once such a basis is selected, a small number of most significant coordinates (features) are fed into a traditional classifier such as Linear Discriminant Analysis (LDA) or Classification and Regression Tree (CARTTM). The performance of these statistical methods is enhanced since the proposed methods reduce the dimensionality of the problem at hand without losing important information for that problem. Here, the basis functions which are welllocalized in the timefrequency plane are used as feature extractors. We applied our method to two signal classification problems and an image texture classification problem. These experiments show the superiority of our method over the direct application of these classifiers on the input signals. As a further application, we also describe a method to extract signal component from data consisting of signal and textured background.
A variational formulation for framebased inverse problems
 Inverse Problems
, 2007
"... A convex variational framework is proposed for solving inverse problems in Hilbert spaces with a priori information on the representation of the target solution in a frame. The objective function to be minimized consists of a separable term penalizing each frame coefficient individually and of a smo ..."
Abstract

Cited by 42 (19 self)
 Add to MetaCart
A convex variational framework is proposed for solving inverse problems in Hilbert spaces with a priori information on the representation of the target solution in a frame. The objective function to be minimized consists of a separable term penalizing each frame coefficient individually and of a smooth term modeling the data formation model as well as other constraints. Sparsityconstrained and Bayesian formulations are examined as special cases. A splitting algorithm is presented to solve this problem and its convergence is established in infinitedimensional spaces under mild conditions on the penalization functions, which need not be differentiable. Numerical simulations demonstrate applications to framebased image restoration. 1
Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes
 In: IEEE Transaction on Pattern Analysis and Machine Intelligence
, 2006
"... Abstract—We present a heuristic method for learning error correcting output codes matrices based on a hierarchical partition of the class space that maximizes a discriminative criterion. To achieve this goal, the optimal codeword separation is sacrificed in favor of a maximum class discrimination in ..."
Abstract

Cited by 42 (19 self)
 Add to MetaCart
Abstract—We present a heuristic method for learning error correcting output codes matrices based on a hierarchical partition of the class space that maximizes a discriminative criterion. To achieve this goal, the optimal codeword separation is sacrificed in favor of a maximum class discrimination in the partitions. The creation of the hierarchical partition set is performed using a binary tree. As a result, a compact matrix with high discrimination power is obtained. Our method is validated using the UCI database and applied to a real problem, the classification of traffic sign images. Index Terms—Multiple classifiers, multiclass classification, visual object recognition. 1