Results 1 -
9 of
9
Tracking the Best Linear Predictor
- Journal of Machine Learning Research
, 2001
"... In most on-line learning research the total on-line loss of the algorithm is compared to the total loss of the best off-line predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of ex ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
In most on-line learning research the total on-line loss of the algorithm is compared to the total loss of the best off-line predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor u t at each trial t is allowed to change with time, and the total on-line loss of the algorithm is compared to the sum of the losses of u t at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors. Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm well-behaved and the static bounds can be converted to shifting bounds.
On prediction using variable order Markov models
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
Efficient algorithms for universal portfolios
- Proceedings of the 41st Annual Symposium on the Foundations of Computer Science
, 2000
"... A constant rebalanced portfolio is an investment strategy that keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover's Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight (3, 9, 2, 8, 1 ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
A constant rebalanced portfolio is an investment strategy that keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover's Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight (3, 9, 2, 8, 16, 4, 5, 6). While this algorithm has good performance guarantees, all known implementations are exponential in the number of stocks, restricting the number of stocks used in experiments (9, 4, 2, 5, 6). We present an efficient implementation of the Universal algorithm that is based on non-uniform random walks that are rapidly mixing (1, 14, 7). This same implementation also works for non-financial applications of the Universal algorithm, such as data compression (6) and language modeling (11).
On the Competitive Theory and Practice of Portfolio Selection
- In Proc. of the 4th Latin American Symposium on Theoretical Informatics (LATIN’00
, 2002
"... The portfolio selection problem is clearly one of the most fundamental problems in the field of computational finance. Given a set of say m stocks (one of which may be "cash"), the natural online problem is to determine a portfolio for the ith trading period based on the sequence of prices (or equiv ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The portfolio selection problem is clearly one of the most fundamental problems in the field of computational finance. Given a set of say m stocks (one of which may be "cash"), the natural online problem is to determine a portfolio for the ith trading period based on the sequence of prices (or equivalently relative prices) for the preceding i \Gamma 1 trading periods. There has been both a growing interest and a growing skepticism concerning the value of a competitive theory of online portfolio selection algorithms. Competitive analysis is based on a worst case perspective and such a perspective is inconsistent with the more widely accepted analyses and theories based on statistical assumptions. The competitive framework does (perhaps surprisingly) permit non trivial upper bounds on relative performance against CBAL-OPT, an optimal offline constant rebalancing portfolio. Perhaps more impressive are some preliminary experimental results showing that certain algorithms that enjoy "respectable" competitive (i.e. worst case) performance also seem to perform quite well on historical sequences of data. These algorithms and the emerging competitive theory are directly related to studies in information theory and computational learning theory and indeed some of these algorithms have been pioneered within the information theory and computational learning communities. One goal of this paper is to try to better understand the extent to which competitive portfolio algorithms are indeed "learning". In doing so we discuss some simple strategies which can adapt to the data sequence. We present a mixture of both theoretical and experimental results. We also present a more inclusive study of the performance of existing and new algorithms with respect to a standard ...
Nonlinear Interpolation Of Topic Models For Language Model Adaptation
- IN PROCEEDINGS OF ICSLP-98
, 1998
"... Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representat ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a languagemodel to the topic of a document, using a nonlinear interpolation of n-gram language models. A three-way, mutually exclusive division of the vocabulary into general, on-topic and off-topic word classes is used to combine word predictions from a topic-specific and a general language model. We achieve ...
Language Model Mixtures for Contextual Ad Placement in Personal Blogs
"... Abstract. We introduce a method for content-based advertisement selection for personal blog pages, based on combining multiple representations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the language used in the blog it ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. We introduce a method for content-based advertisement selection for personal blog pages, based on combining multiple representations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the language used in the blog itself combined with the language used in related sources of information, such as comments posted to a blog post or the blogger’s community. An evaluation of our ad placement method shows improvement over state-of-the-art ad placement methods which were not designed for blog pages. 1
Switching Strategies for Sequential Decision Problems With Multiplicative Loss With Application to Portfolios
"... Abstract—A wide variety of problems in signal processing can be formulated such that decisions are made by sequentially taking convex combinations of vector-valued observations and these convex combinations are then multiplicatively compounded over time. A “universal ” approach to such problems migh ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—A wide variety of problems in signal processing can be formulated such that decisions are made by sequentially taking convex combinations of vector-valued observations and these convex combinations are then multiplicatively compounded over time. A “universal ” approach to such problems might attempt to sequentially achieve the performance of the best fixed convex combination, as might be achievable noncausally, by observing all of the outcomes in advance. By permitting different piecewise-fixed strategies within contiguous regions of time, the best algorithm in this broader class would be able to switch between different fixed strategies to optimize performance to the changing behavior of each individual sequence of outcomes. Without knowledge of the data length or the number of switches necessary, the algorithms developed in this paper can achieve the performance of the best piecewise-fixed strategy that can choose both the partitioning of the sequence of outcomes in time as well as the best strategy within each time segment. We compete with an exponential number of such partitions, using only complexity linear in the data length and demonstrate that the regret with respect to the best such algorithm is at most (ln ()) in the exponent, where is the data length. Finally, we extend these results to include finite collections of candidate algorithms, rather than convex combinations and further investigate the use of an arbitrary side-information sequence. Index Terms—Convex combinations, portfolio, sequential decisions, side information, switching, universal. I.
Domain-specific disambiguation for typing with ambiguous keyboards
- PROCEEDINGS OF THE EACL WORKSHOP
, 2003
"... In this paper, we investigate whether and how domain-specific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on language-specific word frequencies of the lexicon CELEX (in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we investigate whether and how domain-specific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on language-specific word frequencies of the lexicon CELEX (in this study English and German is dealt with). The more specific frequency information is extracted from texts in the special domains of school homework in three subjects and articles in two different scientific areas. All in all, we could not always reach a better performance by deploying domain--specific predictions. As a general solution we propose an interpolated language model combining both the general and the specific language model. For all our domains good results -- compared to an ideal prediction on the basis of all available models -- could be achieved by this method.
Speech Recognition in a Dialog System for Patient Health Monitoring
"... Abstract—We describe CARDIAC, a prototype for an intelligent conversational assistant that provides health monitoring for chronic heart failure patients. CARDIAC supports user initiative through its ability to understand natural language and connect it to intention recognition. The spoken language i ..."
Abstract
- Add to MetaCart
Abstract—We describe CARDIAC, a prototype for an intelligent conversational assistant that provides health monitoring for chronic heart failure patients. CARDIAC supports user initiative through its ability to understand natural language and connect it to intention recognition. The spoken language interface allows patients to interact with CARDIAC without special training. We present speech recognition results obtained during an evaluation with fourteen chronic heart failure patients. Keywords- dialog systems; speech recognition; natural language processing; data acquisition; patient self-management; chronic heart failure. I.

