Results 1 
9 of
9
On prediction using variable order Markov models
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average logloss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the LempelZiv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
Tracking the Best Linear Predictor
 Journal of Machine Learning Research
, 2001
"... In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of ex ..."
Abstract

Cited by 53 (11 self)
 Add to MetaCart
In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor u t at each trial t is allowed to change with time, and the total online loss of the algorithm is compared to the sum of the losses of u t at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors. Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm wellbehaved and the static bounds can be converted to shifting bounds.
Efficient algorithms for universal portfolios
 Proceedings of the 41st Annual Symposium on the Foundations of Computer Science
, 2000
"... A constant rebalanced portfolio is an investment strategy that keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover's Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight (3, 9, 2, 8, 1 ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
A constant rebalanced portfolio is an investment strategy that keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover's Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight (3, 9, 2, 8, 16, 4, 5, 6). While this algorithm has good performance guarantees, all known implementations are exponential in the number of stocks, restricting the number of stocks used in experiments (9, 4, 2, 5, 6). We present an efficient implementation of the Universal algorithm that is based on nonuniform random walks that are rapidly mixing (1, 14, 7). This same implementation also works for nonfinancial applications of the Universal algorithm, such as data compression (6) and language modeling (11).
On the Competitive Theory and Practice of Portfolio Selection
 In Proc. of the 4th Latin American Symposium on Theoretical Informatics (LATIN’00
, 2002
"... The portfolio selection problem is clearly one of the most fundamental problems in the field of computational finance. Given a set of say m stocks (one of which may be "cash"), the natural online problem is to determine a portfolio for the ith trading period based on the sequence of prices (or equiv ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
The portfolio selection problem is clearly one of the most fundamental problems in the field of computational finance. Given a set of say m stocks (one of which may be "cash"), the natural online problem is to determine a portfolio for the ith trading period based on the sequence of prices (or equivalently relative prices) for the preceding i \Gamma 1 trading periods. There has been both a growing interest and a growing skepticism concerning the value of a competitive theory of online portfolio selection algorithms. Competitive analysis is based on a worst case perspective and such a perspective is inconsistent with the more widely accepted analyses and theories based on statistical assumptions. The competitive framework does (perhaps surprisingly) permit non trivial upper bounds on relative performance against CBALOPT, an optimal offline constant rebalancing portfolio. Perhaps more impressive are some preliminary experimental results showing that certain algorithms that enjoy "respectable" competitive (i.e. worst case) performance also seem to perform quite well on historical sequences of data. These algorithms and the emerging competitive theory are directly related to studies in information theory and computational learning theory and indeed some of these algorithms have been pioneered within the information theory and computational learning communities. One goal of this paper is to try to better understand the extent to which competitive portfolio algorithms are indeed "learning". In doing so we discuss some simple strategies which can adapt to the data sequence. We present a mixture of both theoretical and experimental results. We also present a more inclusive study of the performance of existing and new algorithms with respect to a standard ...
Nonlinear Interpolation Of Topic Models For Language Model Adaptation
 IN PROCEEDINGS OF ICSLP98
, 1998
"... Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representat ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a languagemodel to the topic of a document, using a nonlinear interpolation of ngram language models. A threeway, mutually exclusive division of the vocabulary into general, ontopic and offtopic word classes is used to combine word predictions from a topicspecific and a general language model. We achieve ...
Switching Strategies for Sequential Decision Problems With Multiplicative Loss With Application to Portfolios
"... Abstract—A wide variety of problems in signal processing can be formulated such that decisions are made by sequentially taking convex combinations of vectorvalued observations and these convex combinations are then multiplicatively compounded over time. A “universal ” approach to such problems migh ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract—A wide variety of problems in signal processing can be formulated such that decisions are made by sequentially taking convex combinations of vectorvalued observations and these convex combinations are then multiplicatively compounded over time. A “universal ” approach to such problems might attempt to sequentially achieve the performance of the best fixed convex combination, as might be achievable noncausally, by observing all of the outcomes in advance. By permitting different piecewisefixed strategies within contiguous regions of time, the best algorithm in this broader class would be able to switch between different fixed strategies to optimize performance to the changing behavior of each individual sequence of outcomes. Without knowledge of the data length or the number of switches necessary, the algorithms developed in this paper can achieve the performance of the best piecewisefixed strategy that can choose both the partitioning of the sequence of outcomes in time as well as the best strategy within each time segment. We compete with an exponential number of such partitions, using only complexity linear in the data length and demonstrate that the regret with respect to the best such algorithm is at most (ln ()) in the exponent, where is the data length. Finally, we extend these results to include finite collections of candidate algorithms, rather than convex combinations and further investigate the use of an arbitrary sideinformation sequence. Index Terms—Convex combinations, portfolio, sequential decisions, side information, switching, universal. I.
Language Model Mixtures for Contextual Ad Placement in Personal Blogs
"... Abstract. We introduce a method for contentbased advertisement selection for personal blog pages, based on combining multiple representations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the language used in the blog it ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We introduce a method for contentbased advertisement selection for personal blog pages, based on combining multiple representations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the language used in the blog itself combined with the language used in related sources of information, such as comments posted to a blog post or the blogger’s community. An evaluation of our ad placement method shows improvement over stateoftheart ad placement methods which were not designed for blog pages. 1
Domainspecific disambiguation for typing with ambiguous keyboards
 PROCEEDINGS OF THE EACL WORKSHOP
, 2003
"... In this paper, we investigate whether and how domainspecific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on languagespecific word frequencies of the lexicon CELEX (in ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper, we investigate whether and how domainspecific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on languagespecific word frequencies of the lexicon CELEX (in this study English and German is dealt with). The more specific frequency information is extracted from texts in the special domains of school homework in three subjects and articles in two different scientific areas. All in all, we could not always reach a better performance by deploying domainspecific predictions. As a general solution we propose an interpolated language model combining both the general and the specific language model. For all our domains good results  compared to an ideal prediction on the basis of all available models  could be achieved by this method.
Speech Recognition in a Dialog System for Patient Health Monitoring
"... Abstract—We describe CARDIAC, a prototype for an intelligent conversational assistant that provides health monitoring for chronic heart failure patients. CARDIAC supports user initiative through its ability to understand natural language and connect it to intention recognition. The spoken language i ..."
Abstract
 Add to MetaCart
Abstract—We describe CARDIAC, a prototype for an intelligent conversational assistant that provides health monitoring for chronic heart failure patients. CARDIAC supports user initiative through its ability to understand natural language and connect it to intention recognition. The spoken language interface allows patients to interact with CARDIAC without special training. We present speech recognition results obtained during an evaluation with fourteen chronic heart failure patients. Keywords dialog systems; speech recognition; natural language processing; data acquisition; patient selfmanagement; chronic heart failure. I.