Results 1  10
of
16
On prediction using variable order Markov models
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average logloss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the LempelZiv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
Tracking the Best Linear Predictor
 Journal of Machine Learning Research
, 2001
"... In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of ex ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor u t at each trial t is allowed to change with time, and the total online loss of the algorithm is compared to the sum of the losses of u t at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors. Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm wellbehaved and the static bounds can be converted to shifting bounds.
Efficient algorithms for universal portfolios
 Proceedings of the 41st Annual Symposium on the Foundations of Computer Science
, 2000
"... A constant rebalanced portfolio is an investment strategy that keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover's Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight (3, 9, 2, ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
A constant rebalanced portfolio is an investment strategy that keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover's Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight (3, 9, 2, 8, 16, 4, 5, 6). While this algorithm has good performance guarantees, all known implementations are exponential in the number of stocks, restricting the number of stocks used in experiments (9, 4, 2, 5, 6). We present an efficient implementation of the Universal algorithm that is based on nonuniform random walks that are rapidly mixing (1, 14, 7). This same implementation also works for nonfinancial applications of the Universal algorithm, such as data compression (6) and language modeling (11).
Nonlinear Interpolation Of Topic Models For Language Model Adaptation
 IN PROCEEDINGS OF ICSLP98
, 1998
"... Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representat ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a languagemodel to the topic of a document, using a nonlinear interpolation of ngram language models. A threeway, mutually exclusive division of the vocabulary into general, ontopic and offtopic word classes is used to combine word predictions from a topicspecific and a general language model. We achieve ...
On the Competitive Theory and Practice of Portfolio Selection
 In Proc. of the 4th Latin American Symposium on Theoretical Informatics (LATIN’00
, 2002
"... The portfolio selection problem is clearly one of the most fundamental problems in the field of computational finance. Given a set of say m stocks (one of which may be "cash"), the natural online problem is to determine a portfolio for the ith trading period based on the sequence of prices ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
The portfolio selection problem is clearly one of the most fundamental problems in the field of computational finance. Given a set of say m stocks (one of which may be "cash"), the natural online problem is to determine a portfolio for the ith trading period based on the sequence of prices (or equivalently relative prices) for the preceding i \Gamma 1 trading periods. There has been both a growing interest and a growing skepticism concerning the value of a competitive theory of online portfolio selection algorithms. Competitive analysis is based on a worst case perspective and such a perspective is inconsistent with the more widely accepted analyses and theories based on statistical assumptions. The competitive framework does (perhaps surprisingly) permit non trivial upper bounds on relative performance against CBALOPT, an optimal offline constant rebalancing portfolio. Perhaps more impressive are some preliminary experimental results showing that certain algorithms that enjoy "respectable" competitive (i.e. worst case) performance also seem to perform quite well on historical sequences of data. These algorithms and the emerging competitive theory are directly related to studies in information theory and computational learning theory and indeed some of these algorithms have been pioneered within the information theory and computational learning communities. One goal of this paper is to try to better understand the extent to which competitive portfolio algorithms are indeed "learning". In doing so we discuss some simple strategies which can adapt to the data sequence. We present a mixture of both theoretical and experimental results. We also present a more inclusive study of the performance of existing and new algorithms with respect to a standard ...
Switching Strategies for Sequential Decision Problems With Multiplicative Loss With Application to Portfolios
"... Abstract—A wide variety of problems in signal processing can be formulated such that decisions are made by sequentially taking convex combinations of vectorvalued observations and these convex combinations are then multiplicatively compounded over time. A “universal ” approach to such problems migh ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract—A wide variety of problems in signal processing can be formulated such that decisions are made by sequentially taking convex combinations of vectorvalued observations and these convex combinations are then multiplicatively compounded over time. A “universal ” approach to such problems might attempt to sequentially achieve the performance of the best fixed convex combination, as might be achievable noncausally, by observing all of the outcomes in advance. By permitting different piecewisefixed strategies within contiguous regions of time, the best algorithm in this broader class would be able to switch between different fixed strategies to optimize performance to the changing behavior of each individual sequence of outcomes. Without knowledge of the data length or the number of switches necessary, the algorithms developed in this paper can achieve the performance of the best piecewisefixed strategy that can choose both the partitioning of the sequence of outcomes in time as well as the best strategy within each time segment. We compete with an exponential number of such partitions, using only complexity linear in the data length and demonstrate that the regret with respect to the best such algorithm is at most (ln ()) in the exponent, where is the data length. Finally, we extend these results to include finite collections of candidate algorithms, rather than convex combinations and further investigate the use of an arbitrary sideinformation sequence. Index Terms—Convex combinations, portfolio, sequential decisions, side information, switching, universal. I.
Language Model Mixtures for Contextual Ad Placement in Personal Blogs
"... Abstract. We introduce a method for contentbased advertisement selection for personal blog pages, based on combining multiple representations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the language used in the blog it ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We introduce a method for contentbased advertisement selection for personal blog pages, based on combining multiple representations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the language used in the blog itself combined with the language used in related sources of information, such as comments posted to a blog post or the blogger’s community. An evaluation of our ad placement method shows improvement over stateoftheart ad placement methods which were not designed for blog pages. 1
Domainspecific disambiguation for typing with ambiguous keyboards
 PROCEEDINGS OF THE EACL WORKSHOP
, 2003
"... In this paper, we investigate whether and how domainspecific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on languagespecific word frequencies of the lexicon CELEX (in ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper, we investigate whether and how domainspecific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on languagespecific word frequencies of the lexicon CELEX (in this study English and German is dealt with). The more specific frequency information is extracted from texts in the special domains of school homework in three subjects and articles in two different scientific areas. All in all, we could not always reach a better performance by deploying domainspecific predictions. As a general solution we propose an interpolated language model combining both the general and the specific language model. For all our domains good results  compared to an ideal prediction on the basis of all available models  could be achieved by this method.
Adaptive Weighing of Context Models for Lossless Data Compression
"... Until recently the state of the art in lossless data compression was prediction by partial match (PPM). A PPM model estimates the nextsymbol probability distribution by combining statistics from the longest matching contiguous contexts in which each symbol value is found. We introduce a context mix ..."
Abstract
 Add to MetaCart
Until recently the state of the art in lossless data compression was prediction by partial match (PPM). A PPM model estimates the nextsymbol probability distribution by combining statistics from the longest matching contiguous contexts in which each symbol value is found. We introduce a context mixing model which improves on PPM by allowing contexts which are arbitrary functions of the history. Each model independently estimates a probability and confidence that the next bit of data will be 0 or 1. Predictions are combined by weighted averaging. After a bit is arithmetic coded, the weights are adjusted along the cost gradient in weight space to favor the most accurate models. Context mixing compressors, as implemented by the open source PAQ project, are now top ranked on several independent benchmarks. 1.
Research Article Unsupervised Adaptation of Statistical Language Models for Speech Recognition
"... (Now with TEMIC SDS GmbH, Ulm, Germany). It has been demonstrated repeatedly that the acoustic models of a speakerindependent speech recognition system can benefit substantially from the application of unsupervised adaptation methods as a means of speaker enrollment. Unsupervised adaptation has how ..."
Abstract
 Add to MetaCart
(Now with TEMIC SDS GmbH, Ulm, Germany). It has been demonstrated repeatedly that the acoustic models of a speakerindependent speech recognition system can benefit substantially from the application of unsupervised adaptation methods as a means of speaker enrollment. Unsupervised adaptation has however not yet been applied to the statistical language model component of the recognition system. We investigate two techniques with which a firstpass recognition transcription is used to adapt the parameters of the ngram language model that is used in the recognition search. It is found that best results are achieved when both methods are employed in conjunction with each other. The performance of the adaptation methods were determined experimentally by application to the transcription of a set of lecture speeches. Improvements both in terms of language model perplexity as well as recognition word errorrate were achieved.