Results 1  10
of
42
Universal Portfolios
, 1996
"... We exhibit an algorithm for portfolio selection that asymptotically outperforms the best stock in the market. Let x i = (x i1 ; x i2 ; : : : ; x im ) t denote the performance of the stock market on day i ; where x ij is the factor by which the jth stock increases on day i : Let b i = (b i1 ; b i2 ..."
Abstract

Cited by 163 (5 self)
 Add to MetaCart
We exhibit an algorithm for portfolio selection that asymptotically outperforms the best stock in the market. Let x i = (x i1 ; x i2 ; : : : ; x im ) t denote the performance of the stock market on day i ; where x ij is the factor by which the jth stock increases on day i : Let b i = (b i1 ; b i2 ; : : : ; b im ) t ; b ij 0; P j b ij = 1 ; denote the proportion b ij of wealth invested in the jth stock on day i : Then S n = Q n i=1 b t i x i is the factor by which wealth is increased in n trading days. Consider as a goal the wealth S n = max b Q n i=1 b t x i that can be achieved by the best constant rebalanced portfolio chosen after the stock outcomes are revealed. It can be shown that S n exceeds the best stock, the Dow Jones average, and the value line index at time n: In fact, S n usually exceeds these quantities by an exponential factor. Let x 1 ; x 2 ; : : : ; be an arbitrary sequence of market vectors. It will be shown that the nonanticipating sequence ...
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 137 (11 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
Universal Portfolios with Side Information
 IEEE Transactions on Information Theory
, 1996
"... We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight fr ..."
Abstract

Cited by 94 (4 self)
 Add to MetaCart
We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight from observed market and sideinformation outcomes. This is an individual sequence result which shows that the difference between the exponential growth rates of wealth of the best stateconstant rebalanced portfolio and the universal portfolio with sideinformation is uniformly less than (d=(2n)) log(n + 1) + (k=n) log 2 for every stock market and sideinformation sequence and for all time n. Here d = k(m \Gamma 1) is the number of degrees of freedom in the stateconstant rebalanced portfolio with k states of sideinformation and m stocks. The proof of this result establishes a close connection between universal investment and universal data compression. Keywords: Universal investment, univ...
Universal Discrete Denoising: Known Channel
 IEEE Trans. Inform. Theory
, 2003
"... A discrete denoising algorithm estimates the input sequence to a discrete memoryless channel (DMC) based on the observation of the entire output sequence. For the case in which the DMC is known and the quality of the reconstruction is evaluated with a given singleletter fidelity criterion, we pr ..."
Abstract

Cited by 81 (32 self)
 Add to MetaCart
A discrete denoising algorithm estimates the input sequence to a discrete memoryless channel (DMC) based on the observation of the entire output sequence. For the case in which the DMC is known and the quality of the reconstruction is evaluated with a given singleletter fidelity criterion, we propose a discrete denoising algorithm that does not assume knowledge of statistical properties of the input sequence. Yet, the algorithm is universal in the sense of asymptotically performing as well as the optimum denoiser that knows the input sequence distribution, which is only assumed to be stationary and ergodic. Moreover, the algorithm is universal also in a semistochastic setting, in which the input is an individual sequence, and the randomness is due solely to the channel noise.
Online algorithms in machine learning
 IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART
, 1998
"... The areas of OnLine Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computation ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
The areas of OnLine Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the "online algorithms" framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of online algorithms. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirity. Pointers to the literature are given for more sophisticated versions of these algorithms.
General empirical Bayes wavelet methods and exactly adaptive minimax estimation

, 2005
"... In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risk ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risks and exact minimax risks in broad collections of classes of signals. In particular, our estimators are uniformly adaptive to the minimum risk of separable estimators and the exact minimax risks simultaneously in Besov balls of all smoothness and shape indices, and they are uniformly superefficient in convergence rates in all compact sets in Besov spaces with a finite secondary shape parameter. Furthermore, in classes nested between Besov balls of the same smoothness index, our estimators dominate threshold and James–Stein estimators within an infinitesimal fraction of the minimax risks. More general block empirical Bayes estimators are developed. Both white noise with drift and nonparametric regression are considered.
Oracle and adaptive compound decision rules for false discovery rate control
 J. Am. Statist. Ass
, 2007
"... We develop a compound decision theory framework for multipletesting problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multipletesting procedures, which ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
We develop a compound decision theory framework for multipletesting problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multipletesting procedures, which are p value–based, are inefficient, and propose an adaptive procedure based on the z values. The z value–based adaptive procedure asymptotically attains the performance of the z value oracle procedure and is more efficient than the conventional p value–based methods. We investigate the numerical performance of the adaptive procedure using both simulated and real data. In particular, we demonstrate our method in an analysis of the microarray data from a human immunodeficiency virus study that involves testing a large number of hypotheses simultaneously.
An empirical bayes approach to contextual region classification
 In CVPR
, 2009
"... This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in informationtheoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in informationtheoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. Labeled training data is needed only to learn a local appearance model for image patches (although additional supervisory information can optionally be incorporated when it is available). Instead of assuming a parametric prior such as a Markov random field for the class labels, the proposed approach uses the empirical Bayes technique of statistical inversion to recover a contextual model directly from the test data, either as a spatially varying or as a globally constant prior distribution over the classes in the image. Results on two challenging datasets convincingly demonstrate that useful contextual information can indeed be learned from unlabeled data. 1.
When did Bayesian inference become “Bayesian"?
 BAYESIAN ANALYSIS
, 2006
"... While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesi ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesian developments, beginning with Bayes’ posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when “Bayesian” emerged as the label of choice for those who advocated Bayesian methods.
Agnostic Online Learning
"... We study learnability of hypotheses classes in agnostic online prediction models. The analogous question in the PAC learning model [Valiant, 1984] was addressed by Haussler [1992] and others, who showed that the VC dimension characterization of the sample complexity of learnability extends to the ag ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We study learnability of hypotheses classes in agnostic online prediction models. The analogous question in the PAC learning model [Valiant, 1984] was addressed by Haussler [1992] and others, who showed that the VC dimension characterization of the sample complexity of learnability extends to the agnostic (or ”unrealizable”) setting. In his influential work, Littlestone [1988] described a combinatorial characterization of hypothesis classes that are learnable in the online model. We extend Littlestone’s results in two aspects. First, while Littlestone only dealt with the realizable case, namely, assuming there exists a hypothesis in the class that perfectly explains the entire data, we derive results for the nonrealizable (agnostic) case as well. In particular, we describe several models of nonrealizable data and derive upper and lower bounds on the achievable regret. Second, we extend the theory to include marginbased hypothesis classes, in which the prediction of each hypothesis is accompanied by a confidence value. We demonstrate how the newly developed theory seamlessly yields novel online regret bounds for the important class of large margin linear separators. 1