Results 1  10
of
15
The Viterbi algorithm
 Proceedings of the IEEE
, 1973
"... vol. 6, no. 8, pp. 211220, 1951. [7] J. L. Anderson and J. W..Ryon, “Electromagnetic radiation in accelerated systems, ” Phys. Rev., vol. 181, pp. 17651775, 1969. [8] C. V. Heer, “Resonant frequencies of an electromagnetic cavity in an accelerated system of reference, ” Phys. Reu., vol. 134, pp. A ..."
Abstract

Cited by 738 (3 self)
 Add to MetaCart
vol. 6, no. 8, pp. 211220, 1951. [7] J. L. Anderson and J. W..Ryon, “Electromagnetic radiation in accelerated systems, ” Phys. Rev., vol. 181, pp. 17651775, 1969. [8] C. V. Heer, “Resonant frequencies of an electromagnetic cavity in an accelerated system of reference, ” Phys. Reu., vol. 134, pp. A799A804, 1964. [9] T. C. Mo, “Theory of electrodynamics in media in noninertial frames and applications, ” J. Math. Phys., vol. 11, pp. 25892610, 1970.
Design of a Linguistic Postprocessor using Variable Memory Length Markov Models
 In International Conference on Document Analysis and Recognition
, 1995
"... We present the design of a linguistic postprocessor for character recognizers. The central module of our system is a trainable variable memory length Markov model (VLMM) which predicts the next character given a variable length window of past characters. The overall system is composed of several fin ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
We present the design of a linguistic postprocessor for character recognizers. The central module of our system is a trainable variable memory length Markov model (VLMM) which predicts the next character given a variable length window of past characters. The overall system is composed of several finite state automata, including the main VLMM and a proper noun VLMM. The best model reported in the literature (Brown et al 1992) achieves 1.75 bits per character on the Brown corpus. On that same corpus, our model, trained on 10 times less data, reaches 2.19 bits per character and is 200 times smaller (_ 160,000 parameters). The model was designed for handwriting recognition applications but can be used for other OCR problems and speech recognition.
Good applications for crummy machine translation. Machine Translation
, 1993
"... Ideally, we might hope to improve the performance of our MT systems by improving the system, but it might be even more important to improve performance by looking for a more appropriate application. A survey of the literature on evaluation of MT systems seems to suggest that the success of the evalu ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
Ideally, we might hope to improve the performance of our MT systems by improving the system, but it might be even more important to improve performance by looking for a more appropriate application. A survey of the literature on evaluation of MT systems seems to suggest that the success of the evaluation often depends very strongly on the selection of an appropriate application. If the application is wellchosen, then it often becomes fairly clear how the system should be evaluated. Moreover, the evaluation is likely to make the system look good. Conversely, if the application is not clearly identified (or worse, if the application is poorly chosen), then it is often very difficult to find a satisfying evaluation paradigm. We begin our discussion with a brief review of some evaluation metrics that have been tried in the past and conclude that it is difficult to identify a satisfying evaluation paradigm that will make sense over all possible applications. It is probably wise to identify the application first, and then we will be in a much better position to address evaluation questions. The discussion will then turn to the main point, an essay on how to pick a good niche application for stateoftheart (crummy) machine translation. 21.
OneDimensional and MultiDimensional Substring Selectivity Estimation
, 2000
"... this paper,we uw pru,C cou,CF1p fix trees (PSTs) as the basic datastruC tur forsu,3kRk, selectivity estimation. For the 1D problem, we present a novel techniqu called MO (Maximal Overlap). We then develop and analyze two 1D estimation algorithms, MOC and MOLC,based on MO and a constraintbased cha ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
this paper,we uw pru,C cou,CF1p fix trees (PSTs) as the basic datastruC tur forsu,3kRk, selectivity estimation. For the 1D problem, we present a novel techniqu called MO (Maximal Overlap). We then develop and analyze two 1D estimation algorithms, MOC and MOLC,based on MO and a constraintbased characterization of all possible completions of a given PST. For the kD problem,we first generalize PSTs tomuCpC1k dimensions and develop a space and timeefficient probabilistic algorithm to constru kD PSTs directly. We then show how to extend MO tomu8331, dimensions. Finally,we demonstrate,both analytically and experimentally,that MO is both practical and su,CF1p8,C8 suR,u1 to competing algorithms. Key words: String selectivity  Maximal overlap  Short memory property PruNp couNp&, fix tree 1 Introduction One often wishes to obtain aqu8k estimate of thenu,C1 of times a particuRk suticuR occuc in a database. A traditional application is for optimizing SQLqu,Ck1 with the like predicate (e.g., name like %jones%).SuC predicates are pervasive in data warehou, quou,kp33R,u of the presence of "u3kp&, data [HS95]. With the growing importance of XML,LDAP directories,and other textbased information stores on the Internet,su8,u,u querne are becoming increasingly common. Fuon.,kpCRR, manysituk,1k with these applications, aqu81 may specifysufy,kCC8 to be matched onmuC&8&8 alphanu1p8, attriburi or dimensions. Thequ,C [(name like %jones%)AN (tel like 973360%)AN (mail like %research.att.com)] is one example. Often the attribuR3 mentioned in these kinds ofmu33RF,RkN,u,u quC,Rk may be correlated. For the above example,becau, of the geographical location of the research labs,people that satisfy thequ,& (mail like %research.att.com) may have an u, xpectedly high probability to sat...
Substring selectivity estimation
 In Proceedings of the ACM Symposium on Principles of Database Systems
, 1999
"... We study the problem of estimating selectivity of approximate substring queries. Its importance in databases is ever increasing as more and more data are input by users and are integrated with many typographical errors and different spelling conventions. To begin with, we consider edit distance for ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
We study the problem of estimating selectivity of approximate substring queries. Its importance in databases is ever increasing as more and more data are input by users and are integrated with many typographical errors and different spelling conventions. To begin with, we consider edit distance for the similarity between a pair of strings. Based on information stored in an extended Ngram table, we propose two estimation algorithms, MOF and LBS for the task. The latter extends the former with ideas from set hashing signatures. The experimental results show that MOF is a lightweight algorithm that gives fairly accurate estimations. However, if more space is available, LBS can give better accuracy than MOF and other baseline methods. Next, we extend the proposed solution to other similarity predicates, SQL LIKE operator and Jaccard similarity. 1.
Predictability, Complexity, and Learning
, 2001
"... We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If t ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, powerlaw growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
Let Your Fingers do the Spelling: Implicit disambiguation of words spelled with the telephone keypad
, 1991
"... One way to enter words into an interactive computer system is to spell them with the letters on a telephone keypad. Although each button has three letters, the system designer can often supply the system with enough additional information that it can select the intended word without additional inp ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
One way to enter words into an interactive computer system is to spell them with the letters on a telephone keypad. Although each button has three letters, the system designer can often supply the system with enough additional information that it can select the intended word without additional input from the user. This is called implicit disambiguation. This paper examines the obstacles to implicit disambiguation and describes two different kinds of knowledge that can make it possible.
Information theory and learning: a physical approach
, 2000
"... We try to establish a unified information theoretic approach to learning and to explore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behavior as a function of the length of the series, and ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We try to establish a unified information theoretic approach to learning and to explore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behavior as a function of the length of the series, and explain how other quantities of interest studied previously in learning theory—as well as in dynamical systems and statistical mechanics—emerge from this universally definable concept. We then prove that predictive information provides the unique measure for the complexity of dynamics underlying the time series and show that there are classes of models characterized by power–law growth of the predictive information that are qualitatively more complex than any of the systems that have been investigated before. Further, we investigate numerically the learning of a nonparametric probability density, which is an example of a problem with power–law complexity, and show that the proper Bayesian formulation of this problem provides for the ‘Occam ’ factors that punish overly complex models and thus allow one to learn
MPSGs (Multiattribute Prediction Suffix Graphs)
, 2000
"... this article is not enougth. In order to study this kind of problems, several values must be consider like, in example, economic changes, prices of other articles, etc. Nevertheless, both PSAs and Markov chains did not model sequences conditioned by parallel sequences. The only way to apply Markov c ..."
Abstract
 Add to MetaCart
this article is not enougth. In order to study this kind of problems, several values must be consider like, in example, economic changes, prices of other articles, etc. Nevertheless, both PSAs and Markov chains did not model sequences conditioned by parallel sequences. The only way to apply Markov chains to this kind of problems is by using the cartesian product of alphabets of attributes of the problem as the 3 DRAFT. Multiattribute predictin sux graph 20th November 2000 alphabet of the Markov chain. However, this make to grow exponentially the length of the Markov chain with the number of attributes. Moreover, the cardinal of the alphabet grows signicantly. This make to grow the number of states. A big number of states decreases the number of samples used to compute every probability in the model and so, it decreases the condence of the model. Using PSAs in order to model this kind of problems has exponential complexity too. On the one hand, PSAs have all problems of Markov chains. On the other, PSAs can not handle dierent memory lengths for every attribute needed to describe the model. It is because all attributes do not need the same memory length but, if the cartesian product of alphabets of attributes is used, all attributes must have the same memory length. In this paper, a variable memory length multiattribute Markov chain is described. This model is called MPSA (Multiattribute Probabilistic Sux Automata). In this way, the PSA model is a subclass of the MPSA model. Due to a MPSA is dicult to learn, this paper will describe how to learn sequences generated by one attribute if the sequences generated by the rest of attributes are known. So, this model allows to analize independtly the memory length in every attribute needed to know the next symbol probability...