Results 11  20
of
37
Predicting the Future of Discrete Sequences From Fractal Representations of the Past
, 2001
"... We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common sux shared by any two nblocks, the closer lie their point representations.
Speech Recognition And The Frequency Of Recently Used Words: A Modified Markov Model For Natural Language
, 1988
"... Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jclinck has achieved considerable success in this domain. A modification of the Ma ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jclinck has achieved considerable success in this domain. A modification of the Markov approach, which assigns higher probabilities to recently used words, is proposed and tested against a pure Markov model. Parameter calculation and comparison of the two models both involve use of the LOB Corpus of tagged modern English.
Recurrent Neural Networks With Small Weights Implement Definite Memory Machines
 NEURAL COMPUTATION
, 2003
"... Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
(Show Context)
Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition function of recurrent network with small weights and `squashing ' activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite mem
The Use Of Linguistic Hierarchies In Speech Understanding
 IN PROC. ICSLP
, 1998
"... This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives topdown from sentence level structure, terminating in either words or syllables. Its main ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives topdown from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottomup from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and lettertosound/soundtoletter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.
A New Approach to Word Sense Disambiguation
 In Proceedings of the ARPA Workshop on Human Language Technology
, 1994
"... This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, the results of which suggest that the schema is particularly well suited to nouns but that it is also applicable to words in other syntactic categories. 1. INTRODUCTION Assigning sense tags to the words in a text can be viewed as a classification problem. A probabilistic classifier assigns to each word the tag that has the highest estimated probability of having occurred in the given context. Designing a probabilistic classifier for wordsense disambiguation includes two main subtasks: specifying an appropriate model and estimating the parameters of that model. The former involves selecting informative contextual features (such as collocations) and describing the joint distribution of the...
CategoryBased Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on wordcategory ngrams. The second section, in chapter 4, extends this model by allowing shortrange word relations to be captured through the incorporation of selected word ngrams. ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on wordcategory ngrams. The second section, in chapter 4, extends this model by allowing shortrange word relations to be captured through the incorporation of selected word ngrams.
An Iterative, DPbased Search Algorithm for Statistical Machine Translation
 In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98
, 1998
"... The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for transla ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently a Dynamic Programminglike algorithm has been introduced which computes approximate solutions for some models. These solutions can be improved by using an iterative algorithm that refines the succesive solutions and uses a smoothing technique for some probabilistic distribution of the models based on an interpolation of different distributions. The technique resulting from this combination has been tested on the “Tourist Task ” corpus, which was generated in a semiautomated way. The best results achieved were a worderror rate of 9.3% and a sentenceerror rate of 44.4%. 1.
Hierarchical PitmanYor language models for ASR in meetings
 In Proceedings of IEEE ASRU International Conference
, 2007
"... In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the PitmanYor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical PitmanYor language model (HPYLM) provides a Bayesian interpretation of LM smoothing. An approx ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the PitmanYor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical PitmanYor language model (HPYLM) provides a Bayesian interpretation of LM smoothing. An approximation to the HPYLM recovers the exact formulation of the interpolated KneserNey smoothing method in ngram models. This paper focuses on the application and scalability of HPYLM on a practical large vocabulary ASR system. Experimental results on NIST RT06s evaluation meeting data verify that HPYLM is a competitive and promising language modeling technique, which consistently performs better than interpolated KneserNey and modified KneserNey ngram LMs in terms of both perplexity and word error rate.
Constructing FiniteContext Sources From Fractal Representations of Symbolic Sequences
, 1998
"... We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spa ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spaces embodies a natural smoothness assumption (nblocks with long common suffices are likely to produce similar continuations) in that the longer is the common suffix shared by any two nblocks, the closer lie their point representations. Finding a set of prediction contexts is then formulated as a resource allocation problem solved by vector quantizing the spatial representation of the training sequence nblock structure. Our predictive models are similar in spirit to variable memory length Markov models (VLMMs). We compare the proposed models with both the classical and variable memory length Markov models on two chaotic symbolic sequences with different levels of subsequence distribution ...
Statistical Language Processing based on SelfOrganising Word Classification
, 1994
"... An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting class ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags  unique nbit numbers the most significant bitpatterns of which incorporate class information. Therefore, access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of two natural languages, from the phonemic to the semantic level. The system has been favourably compared  directly and indirectly  with other word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by structural...