Results 11 - 20
of
36
Information Extraction from Broadcast News
- Philosophical Transactions of the Royal Society of London, Series A
, 2000
"... This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first models name class information as a word attribute; the second explicitly models both word-word and class-class transitions. A common n-gram based formulation is used for both models. The task of named entity identification is characterized by relatively sparse training data and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub--4E evaluation for North American Broadcast News.
Probabilistic k-Testable Tree Languages
- PROCEEDINGS OF 5TH INTERNATIONAL COLLOQUIUM, ICGI 2000, LISBON (PORTUGAL), VOLUME 1891 OF LECTURE NOTES IN COMPUTER SCIENCE
, 2000
"... In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods).
A Study Of n-Gram And Decision Tree Letter Language Modeling Methods
- SPEECH COMMUNICATION
, 1998
"... The goal of this paper is to investigate various language model smoothing techniques and decision tree based language model design algorithms. For this purpose, we build language models for printable characters (letters), based on the Brown corpus. We consider two classes of models for the text gene ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The goal of this paper is to investigate various language model smoothing techniques and decision tree based language model design algorithms. For this purpose, we build language models for printable characters (letters), based on the Brown corpus. We consider two classes of models for the text generation process: the n-gram language model and various decision tree based language models. In the first part of the paper, we compare the most popular smoothing algorithms applied to the former. We conclude that the bottom-up deleted interpolation algorithm performs the best in the task of n-gram letter language model smoothing, significantly outperforming the back-off smoothing technique for large values of n. In the second part of the paper, we consider various decision tree development algorithms. Among them, a K-means clustering type algorithm for the design of the decision tree questions gives the best results. However, the n-gram language model outperforms the decision tree language models for letter language modeling. We believe that this is due to the predictive nature of letter strings, which seems to be naturally modeled by n-grams.
"Almost Parsing" Technique for Language Modeling
"... In this paper we present an approach that incorporates structural information into language models without really parsing the utterance. This approach brings together the advantages of a n-gram language model -- speed, robustness and the ability to integrate with the speech recognizer with the need ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper we present an approach that incorporates structural information into language models without really parsing the utterance. This approach brings together the advantages of a n-gram language model -- speed, robustness and the ability to integrate with the speech recognizer with the need to model syntactic constraints, under a uniform representation. We also show that our approach produces better language models than language models based on part-of-speech tags.
Domain adaptation with clustered language models
- In Proceedings of International Conference on Acoustics, Speech and Signal Processing
, 1997
"... In this paper, a method of domain adaptation for clustered language models is developed. It is based on a previously developed clustering algorithm, but with a modified optimisation criterion. The results are shown to be slightly superior to the previously published ’Fillup ’ method, which can be us ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper, a method of domain adaptation for clustered language models is developed. It is based on a previously developed clustering algorithm, but with a modified optimisation criterion. The results are shown to be slightly superior to the previously published ’Fillup ’ method, which can be used to adapt standard n-gram models. However, the improvement both methods give compared to models built from scratch on the adaptation data is quite small (less than 11 % relative improvement in word error rate). This suggests that both methods are still unsatisfactory from a practical point of view. 1
Hidden Model Sequence Models for Automatic Speech Recognition
, 2001
"... Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In m ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In many cases the pronunciation model operates on a phoneme level and is derived independently of the underlying models. In contrast, this work is aimed at improving pronunciation modelling on a sub-phone level in a combined framework. The modelling of pronunciation variation is assumed to be of special importance for recognition of spontaneous speech.
Smoothing and Compression with Stochastic k-testable Tree Languages ⋆ Abstract
"... In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.
Hierarchical Non-Emitting Markov Models
, 1998
"... We describe a simple variant of the interpolated Markov model with nonemitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on natural language texts under a wide range of expe ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We describe a simple variant of the interpolated Markov model with nonemitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitting model is also much less prone to overfitting.
Noisy sequence classification with smoothed markov chains
- In Conférence francophone sur l’apprentissage automatique 2006, (CAp 2006
, 2006
"... This paper is concerned with sequence classification using Markov chains when classification noise is included in the learning data. These models offer a direct generalization of a Multinomial Naive Bayes classifier by taking into account dependences between successive events up to a certain history ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper is concerned with sequence classification using Markov chains when classification noise is included in the learning data. These models offer a direct generalization of a Multinomial Naive Bayes classifier by taking into account dependences between successive events up to a certain history length. Our study shows that smoothed Markov chains are very robust to classification noise. The relation between classification accuracy and test set perplexity, often used to measure prediction quality, is discussed. The influence of varying the model order is also studied from an experimental viewpoint. Experiments are conducted both on a gender classification task from spelling of first names and splicing region classification in DNA sequences. The first set of experiments also illustrate the superiority of smoothed Markov chains to classify noisy sequence over an automaton learning technique using boosting.

