Results 11 - 20
of
28
Predicting the Future of Discrete Sequences From Fractal Representations of the Past
, 2001
"... We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the n-block structure of the training sequence into a geometric structure of points in a unit hypercube, such ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the n-block structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common sux shared by any two n-blocks, the closer lie their point representations.
Speech Recognition And The Frequency Of Recently Used Words: A Modified Markov Model For Natural Language
, 1988
"... Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jclinck has achieved considerable success in this domain. A modification of the Ma ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jclinck has achieved considerable success in this domain. A modification of the Markov approach, which assigns higher probabilities to recently used words, is proposed and tested against a pure Markov model. Parameter calculation and comparison of the two models both involve use of the LOB Corpus of tagged modern English.
Recurrent Neural Networks With Small Weights Implement Definite Memory Machines
- NEURAL COMPUTATION
, 2003
"... Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition funct ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition function of recurrent network with small weights and `squashing ' activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite mem-
A New Approach to Word Sense Disambiguation
- In Proceedings of the ARPA Workshop on Human Language Technology
, 1994
"... This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
This paper presents and evaluates models created according to a schema that provides a description of the joint distribution of the values of sense tags and contextual features that is potentially applicable to a wide range of content words. The models are evaluated through a series of experiments, the results of which suggest that the schema is particularly well suited to nouns but that it is also applicable to words in other syntactic categories. 1. INTRODUCTION Assigning sense tags to the words in a text can be viewed as a classification problem. A probabilistic classifier assigns to each word the tag that has the highest estimated probability of having occurred in the given context. Designing a probabilistic classifier for word-sense disambiguation includes two main sub-tasks: specifying an appropriate model and estimating the parameters of that model. The former involves selecting informative contextual features (such as collocations) and describing the joint distribution of the...
The Use Of Linguistic Hierarchies In Speech Understanding
- IN PROC. ICSLP
, 1998
"... This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottom-up from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and letter-to-sound/sound-to-letter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.
Category-Based Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams.
An Iterative, DP-based Search Algorithm for Statistical Machine Translation
- In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98
, 1998
"... The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for transla ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently a Dynamic Programming-like algorithm has been introduced which computes approximate solutions for some models. These solutions can be improved by using an iterative algorithm that refines the succesive solutions and uses a smoothing technique for some probabilistic distribution of the models based on an interpolation of different distributions. The technique resulting from this combination has been tested on the “Tourist Task ” corpus, which was generated in a semi-automated way. The best results achieved were a word-error rate of 9.3% and a sentence-error rate of 44.4%. 1.
Constructing Finite-Context Sources From Fractal Representations of Symbolic Sequences
, 1998
"... We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence n-block structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spa ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence n-block structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spaces embodies a natural smoothness assumption (n-blocks with long common suffices are likely to produce similar continuations) in that the longer is the common suffix shared by any two n-blocks, the closer lie their point representations. Finding a set of prediction contexts is then formulated as a resource allocation problem solved by vector quantizing the spatial representation of the training sequence n-block structure. Our predictive models are similar in spirit to variable memory length Markov models (VLMMs). We compare the proposed models with both the classical and variable memory length Markov models on two chaotic symbolic sequences with different levels of subsequence distribution ...
Hierarchical Pitman-Yor language models for ASR in meetings
- In Proceedings of IEEE ASRU International Conference
, 2007
"... In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the Pitman-Yor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical Pitman-Yor language model (HPY-LM) provides a Bayesian interpretation of LM smoothing. An approx ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the Pitman-Yor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical Pitman-Yor language model (HPY-LM) provides a Bayesian interpretation of LM smoothing. An approximation to the HPYLM recovers the exact formulation of the interpolated Kneser-Ney smoothing method in n-gram models. This paper focuses on the application and scalability of HPYLM on a practical large vocabulary ASR system. Experimental results on NIST RT06s evaluation meeting data verify that HPYLM is a competitive and promising language modeling technique, which consistently performs better than interpolated Kneser-Ney and modified Kneser-Ney n-gram LMs in terms of both perplexity and word error rate.
Statistical Language Processing based on Self-Organising Word Classification
, 1994
"... An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting class ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags --- unique n-bit numbers the most significant bit-patterns of which incorporate class information. Therefore, access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of two natural languages, from the phonemic to the semantic level. The system has been favourably compared --- directly and indirectly --- with other word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by structural...

