Results 1  10
of
16
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 247 (12 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
An InformationTheoretic Approach to Traffic Matrix Estimation
 In Proc. ACM SIGCOMM
, 2003
"... Traffic matrices are required inputs for many IP network management ..."
Abstract

Cited by 119 (13 self)
 Add to MetaCart
Traffic matrices are required inputs for many IP network management
Online portfolio selection using multiplicative updates
 Mathematical Finance
, 1998
"... We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algo ..."
Abstract

Cited by 78 (10 self)
 Add to MetaCart
We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algorithm is very simple to implement and requires only constant storage and computing time per stock ineach trading period. We tested the performance of our algorithm on real stock data from the New York Stock Exchange accumulated during a 22year period. On this data, our algorithm clearly outperforms the best single stock aswell as Cover's universal portfolio selection algorithm. We also present results for the situation in which the We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio investment strategy. The algorithm employsamultiplicative update rule derived using a framework introduced by Kivinen and Warmuth [20]. Our algorithm is very simple to implement and its time and storage requirements grow linearly in the number of stocks.
Tracking the Best Disjunction
 Machine Learning
, 1995
"... . Littlestone developed a simple deterministic online learning algorithm for learning kliteral disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for a ..."
Abstract

Cited by 72 (11 self)
 Add to MetaCart
. Littlestone developed a simple deterministic online learning algorithm for learning kliteral disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule T is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence. We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However the amortized analysis needed for obtaining worstcase mistake bounds requires new techniques. In some cases our low...
Boosting as Entropy Projection
, 1999
"... We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen ..."
Abstract

Cited by 59 (8 self)
 Add to MetaCart
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane dened by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.
Estimating PointtoPoint and PointtoMultipoint Traffic Matrices: An InformationTheoretic Approach
 IEEE/ACM Trans. Netw
, 2005
"... Traffic matrices are required inputs for many IP network management tasks, such as capacity planning, traffic engineering and network reliability analysis. However, it is difficult to measure these matrices directly in large operational IP networks, so there has been recent interest in inferring tra ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
Traffic matrices are required inputs for many IP network management tasks, such as capacity planning, traffic engineering and network reliability analysis. However, it is difficult to measure these matrices directly in large operational IP networks, so there has been recent interest in inferring traffic matrices from link measurements and other more easily measured data. Typically, this inference problem is illposed, as it involves significantly more unknowns than data. Experience in many scientific and engineering fields has shown that it is essential to approach such illposed problems via "regularization". This paper presents a new approach to traffic matrix estimation using a regularization based on "entropy penalization". Our solution chooses the traffic matrix consistent with the measured data that is informationtheoretically closest to a model in which source/destination pairs are stochastically independent. It applies to both pointtopoint and pointtomultipoint traffic matrix estimation. We use fast algorithms based on modern convex optimization theory to solve for our traffic matrices. We evaluate our algorithm with real backbone traffic and routing data, and demonstrate that it is fast, accurate, robust, and flexible.
Groupwise point pattern registration using a novel CDFbased JensenShannon divergence
 in: IEEE Computer Vision and Pattern Recognition
"... In this paper, we propose a novel and robust algorithm for the groupwise nonrigid registration of multiple unlabeled pointsets with no bias toward any of the given pointsets. To quantify the divergence between multiple probability distributions each estimated from the given point sets, we develop ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
In this paper, we propose a novel and robust algorithm for the groupwise nonrigid registration of multiple unlabeled pointsets with no bias toward any of the given pointsets. To quantify the divergence between multiple probability distributions each estimated from the given point sets, we develop a novel measure based on their cumulative distribution functions that we dub the CDFJS divergence. The measure parallels the well known JensenShannon divergence (defined for probability density functions) but is more regular than the JS divergence since its definition is based on CDFs as opposed to density functions. As a consequence, CDFJS is more immune to noise and statistically more robust than the JS. We derive the analytic gradient of the CDFJS divergence
Learning of Depth Two Neural Networks with Constant Fanin at the Hidden Nodes (Extended Abstract)
 In Proc. 9th Annu. Conf. on Comput. Learning Theory
, 1996
"... We present algorithms for learning depth two neural networks where the hidden nodes are threshold gates with constant fanin. The transfer function of the output node might be more general: we have results for the cases when the threshold function, the logistic function or the identity function is u ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present algorithms for learning depth two neural networks where the hidden nodes are threshold gates with constant fanin. The transfer function of the output node might be more general: we have results for the cases when the threshold function, the logistic function or the identity function is used as the transfer function at the output node. We give batch and online learning algorithms for these classes of neural networks and prove bounds on the performance of our algorithms. The batch algorithms work for real valued inputs whereas the online algorithms assume that the inputs are discretized. The hypotheses of our algorithms are essentially also neural networks of depth two. However, their number of hidden nodes might be much larger than the number of hidden nodes of the neural network that has to be learned. Our algorithms can handle such a large number of hidden nodes since they rely on multiplicative weight updates at the output node, and the performance of these algorithms s...
Bayesian Methods: Applications in Information Aggregation and Image Data Mining
, 1999
"... More accurate interpretation of remotely sensed data is based on a concept combining synergistically signals, information or knowledge from different sources. The aim is information mining, extraction and presentation. A hierarchical structure of data fusion levels has been identified: on image sign ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
More accurate interpretation of remotely sensed data is based on a concept combining synergistically signals, information or knowledge from different sources. The aim is information mining, extraction and presentation. A hierarchical structure of data fusion levels has been identified: on image signal level, on image features, on physical parameters extracted from images, on meta features resulting from image feature modelling, on feature grouping. The Bayesian perspective is discussed aiming at a variety of aspects. The power of the Bayesian approach is endowed i. e. by the possibility to analyse uniformly the uncertainties over scene parameters in data acquired from heterogeneous and incommensurable sources.
The origin of black hole entropy
"... In this thesis properties and the origin of black hole entropy are investigated from various points of view. First, laws of black hole thermodynamics are reviewed. In particular, the first and generalized second laws are investigated in detail. It is in these laws that the black hole entropy plays k ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this thesis properties and the origin of black hole entropy are investigated from various points of view. First, laws of black hole thermodynamics are reviewed. In particular, the first and generalized second laws are investigated in detail. It is in these laws that the black hole entropy plays key roles. Next, three candidates for the origin of the black hole entropy are analyzed: the Dbrane statisticalmechanics, the brick wall model, and the entanglement thermodynamics. Finally, discussions are given on semiclassical consistencies of the brick wall model and the entanglement thermodynamics and on the information loss problem. Contents 1