Results 11  20
of
40
Predicting the Future of Discrete Sequences From Fractal Representations of the Past
, 2001
"... We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such ..."
Abstract

Cited by 37 (11 self)
 Add to MetaCart
We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common sux shared by any two nblocks, the closer lie their point representations.
Speech Recognition And The Frequency Of Recently Used Words: A Modified Markov Model For Natural Language
, 1988
"... Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jclinck has achieved considerable success in this domain. A modification of the Ma ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jclinck has achieved considerable success in this domain. A modification of the Markov approach, which assigns higher probabilities to recently used words, is proposed and tested against a pure Markov model. Parameter calculation and comparison of the two models both involve use of the LOB Corpus of tagged modern English.
Recurrent Neural Networks With Small Weights Implement Definite Memory Machines
 NEURAL COMPUTATION
, 2003
"... Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition function of recurrent network with small weights and `squashing ' activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite mem
The Use Of Linguistic Hierarchies In Speech Understanding
 IN PROC. ICSLP
, 1998
"... This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives topdown from sentence level structure, terminating in either words or syllables. Its main ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives topdown from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottomup from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and lettertosound/soundtoletter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.
A New Approach to Word Sense Disambiguation
 In Proceedings of the ARPA Workshop on Human Language Technology
, 1994
"... This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, the results of which suggest that the schema is particularly well suited to nouns but that it is also applicable to words in other syntactic categories. 1. INTRODUCTION Assigning sense tags to the words in a text can be viewed as a classification problem. A probabilistic classifier assigns to each word the tag that has the highest estimated probability of having occurred in the given context. Designing a probabilistic classifier for wordsense disambiguation includes two main subtasks: specifying an appropriate model and estimating the parameters of that model. The former involves selecting informative contextual features (such as collocations) and describing the joint distribution of the...
An Iterative, DPbased Search Algorithm for Statistical Machine Translation
 In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98
, 1998
"... The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for transla ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently a Dynamic Programminglike algorithm has been introduced which computes approximate solutions for some models. These solutions can be improved by using an iterative algorithm that refines the succesive solutions and uses a smoothing technique for some probabilistic distribution of the models based on an interpolation of different distributions. The technique resulting from this combination has been tested on the “Tourist Task ” corpus, which was generated in a semiautomated way. The best results achieved were a worderror rate of 9.3% and a sentenceerror rate of 44.4%. 1.
CategoryBased Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on wordcategory ngrams. The second section, in chapter 4, extends this model by allowing shortrange word relations to be captured through the incorporation of selected word ngrams. ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on wordcategory ngrams. The second section, in chapter 4, extends this model by allowing shortrange word relations to be captured through the incorporation of selected word ngrams.
Hierarchical PitmanYor language models for ASR in meetings
 In Proceedings of IEEE ASRU International Conference
, 2007
"... In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the PitmanYor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical PitmanYor language model (HPYLM) provides a Bayesian interpretation of LM smoothing. An approx ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the PitmanYor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical PitmanYor language model (HPYLM) provides a Bayesian interpretation of LM smoothing. An approximation to the HPYLM recovers the exact formulation of the interpolated KneserNey smoothing method in ngram models. This paper focuses on the application and scalability of HPYLM on a practical large vocabulary ASR system. Experimental results on NIST RT06s evaluation meeting data verify that HPYLM is a competitive and promising language modeling technique, which consistently performs better than interpolated KneserNey and modified KneserNey ngram LMs in terms of both perplexity and word error rate.
Constructing FiniteContext Sources From Fractal Representations of Symbolic Sequences
, 1998
"... We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spa ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spaces embodies a natural smoothness assumption (nblocks with long common suffices are likely to produce similar continuations) in that the longer is the common suffix shared by any two nblocks, the closer lie their point representations. Finding a set of prediction contexts is then formulated as a resource allocation problem solved by vector quantizing the spatial representation of the training sequence nblock structure. Our predictive models are similar in spirit to variable memory length Markov models (VLMMs). We compare the proposed models with both the classical and variable memory length Markov models on two chaotic symbolic sequences with different levels of subsequence distribution ...
A Framework for Data Prefetching using Offline Training of Markovian Predictors
 In 20th International Conference on Computer Design (ICCD 2002
, 2002
"... An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching solutions ranging from pure software approach by inserting prefetch instructions through program analysis to purely hardware mechanisms have been proposed. The degrees of success of those techniques ar ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching solutions ranging from pure software approach by inserting prefetch instructions through program analysis to purely hardware mechanisms have been proposed. The degrees of success of those techniques are dependent on the nature of the applications. The need for innovative approach is rapidly growing with the introduction of applications such as objectoriented applications that show dynamically changing memory access behavior. In this paper, we propose a novel framework for the use of data prefetchers that are trained offline using smart learning algorithms to produce prediction models which captures hidden memory access patterns. Once built, those prediction models are loaded into a data prefetching unit in the CPU at the appropriate point during the runtime to drive the prefetching. On average by using table size of about 8KB size, we were able to achieve prediction accuracy of about 68% through our own proposed learning method and performance was boosted about 37% on average on the benchmarks we tested. Furthermore, we believe our proposed framework is amenable to other predictors and can be done as a phase of the profilingoptimizingcompiler.