Results 1  10
of
45
On prediction using variable order Markov models
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average logloss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the LempelZiv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
Multiple Kernel Tracking with SSD
 IN CVPR’04
, 2004
"... Kernelbased objective functions optimized using the mean shift algorithm have been demonstrated as an effective means of tracking in video sequences. The resulting algorithms combine the robustness and invariance properties afforded by traditional densitybased measures of image similarity, while c ..."
Abstract

Cited by 52 (1 self)
 Add to MetaCart
Kernelbased objective functions optimized using the mean shift algorithm have been demonstrated as an effective means of tracking in video sequences. The resulting algorithms combine the robustness and invariance properties afforded by traditional densitybased measures of image similarity, while connecting these techniques to continuous optimization algorithms. This paper
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
Machine learning methods for predicting failures in hard drives: A multipleinstance application
 Journal of Machine Learning research
, 2005
"... We compare machine learning methods applied to a difficult realworld problem: predicting computer harddrive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametricallydistributed data. We develop a ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
We compare machine learning methods applied to a difficult realworld problem: predicting computer harddrive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametricallydistributed data. We develop a new algorithm based on the multipleinstance learning framework and the naive Bayesian classifier (miNB) which is specifically designed for the low falsealarm case, and is shown to have promising performance. Other methods compared are support vector machines (SVMs), unsupervised clustering, and nonparametric statistical tests (ranksum and reverse arrangements). The failureprediction performance of the SVM, ranksum and miNB algorithm is considerably better than the threshold method currently implemented in drives, while maintaining low false alarm rates. Our results suggest that nonparametric statistical tests should be considered for learning problems involving detecting rare events in time series data. An appendix details the calculation of ranksum significance probabilities in the case of discrete, tied observations, and we give new recommendations about when the exact calculation should be used instead of the commonlyused normal approximation. These normal approximations may be particularly inaccurate for rare event problems like hard drive failures.
Limit results on pattern entropy
 IEEE Trans. Inf. Theory
, 2006
"... We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th v ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th variable in a random process differs from all preceding ones.
Entropy Inference and the JamesStein Estimator, with Application to Nonlinear Gene Association Networks
"... We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly effic ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and datagenerating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropybased geneassociation network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, JamesStein estimator, “small n, large p ” setting, mutual information, gene association network
Probabilistic FiniteState Machines  Part I
"... Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finitestate automata with other well known devices that generate strings as hidden Markov models and ngrams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
A lower bound on compression of unknown alphabets
 Theoret. Comput. Sci
, 2005
"... Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero. In this paper we show that pattern redundancy is at least (1.5 log 2 e) n 1/3 bits. To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman’s saddlepoint approximation technique to determine the coefficients ’ asymptotic behavior. 1
Estimation of the number of operating sensors in largescale sensor networks with mobile access
 IEEE Transactions on Signal Processing
, 2006
"... This paper investigates the estimation of the number of operating sensors in a sensor network in which the data collection is made by a mobile access point. We propose an estimator based on the GoodTuring estimator of the missing mass and generalize it to other related problems such as the estimati ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
This paper investigates the estimation of the number of operating sensors in a sensor network in which the data collection is made by a mobile access point. We propose an estimator based on the GoodTuring estimator of the missing mass and generalize it to other related problems such as the estimation of the distribution of energy available at sensors. The estimator is analyzed using the theory of large deviations. We present closedform bounds on the large deviation exponent and characterize confidence intervals for the estimator.
MultiStyle Language Model for Web Scale Information Retrieval
"... Web documents are typically associated with many text streams, including the body, the title and the URL that are determined by the authors, and the anchor text or search queries used by others to refer to the documents. Through a systematic large scale analysis on their cross entropy, we show that ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Web documents are typically associated with many text streams, including the body, the title and the URL that are determined by the authors, and the anchor text or search queries used by others to refer to the documents. Through a systematic large scale analysis on their cross entropy, we show that these text streams appear to be composed in different language styles, and hence warrant respective language models to properly describe their properties. We propose a language modeling approach to Web document retrieval in which each document is characterized by a mixture model with components corresponding to the various text streams associated with the document. Immediate issues for such a mixture model arise as all the text streams are not always present for the documents, and they do not share the same lexicon, making it challenging to properly combine the statistics from the mixture components. To address these issues, we introduce an “openvocabulary” smoothing technique so that all the component language models have the same cardinality and their scores can simply be linearly combined. To ensure that the approach can cope with Web scale applications, the model training algorithm is designed to require no labeled data and can be fully automated with few heuristics and no empirical parameter tunings. The evaluation on Web document ranking tasks shows that the component language models indeed have varying degrees of capabilities as predicted by the crossentropy analysis, and the combined mixture model outperforms the stateoftheart BM25F based system.