Results 1 - 10
of
36
Language Model Adaptation Using Mixtures And An Exponentially Decaying Cache
- In Proceedings of ICASSP-97
, 1997
"... This paper presents two techniques for language model adaptation. The first is based on the use of mixtures of language models: the training text is partitioned according to topic, a language model is constructed for each component, and at recognition time appropriate weightings are assigned to each ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
This paper presents two techniques for language model adaptation. The first is based on the use of mixtures of language models: the training text is partitioned according to topic, a language model is constructed for each component, and at recognition time appropriate weightings are assigned to each component to model the observed style of language. The second technique is based on augmenting the standard trigram model with a cache component in which words recurrence probabilities decay exponentially over time. Both techniques yield a significant reduction in perplexity over the baseline trigram language model when faced with multi-domain test text, the mixture-based model giving a 24% reduction and the cache-based model giving a 14% reduction. The two techniques attack the problem of adaptation at different scales, and as a result can be used in parallel to give a total perplexity reduction of 30%. 1. INTRODUCTION In constructing a language model intended for general text, one is fac...
The Interaction of Knowledge Sources for Word Sense Disambiguation
- Computational Linguistics
, 2001
"... Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94 % on our evaluation corpus. Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems. 1.
Web-based models for natural language processing
- ACM Transactions on Speech and Language Processing
, 2005
"... Previous work demonstrated that Web counts can be used to approximate bigram counts, suggesting that Web-based frequencies should be useful for a wide variety of Natural Language Processing (NLP) tasks. However, only a limited number of tasks have so far been tested using Web-scale data sets. The pr ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
Previous work demonstrated that Web counts can be used to approximate bigram counts, suggesting that Web-based frequencies should be useful for a wide variety of Natural Language Processing (NLP) tasks. However, only a limited number of tasks have so far been tested using Web-scale data sets. The present article overcomes this limitation by systematically investigating the performance of Web-based models for several NLP tasks, covering both syntax and semantics, both generation and analysis, and a wider range of n-grams and parts of speech than have been previously explored. For the majority of our tasks, we find that simple, unsupervised models perform better when n-gram counts are obtained from the Web rather than from a large corpus. In some cases, performance can be improved further by using backoff or interpolation techniques that combine Web counts and corpus counts. However, unsupervised Web-based models generally fail to outperform supervised state-ofthe-art models trained on smaller corpora. We argue that Web-based models should therefore be used as a baseline for, rather than an alternative to, standard supervised models.
The web as a baseline: Evaluating the performance of unsupervised web-based models for a range of nlp tasks
- In Proc. of Human Language Technologies - North American Chapter of the Association for Computational Linguistics (HLT-NAACL
, 2004
"... Previous work demonstrated that web counts can be used to approximate bigram frequencies, and thus should be useful for a wide variety of NLP tasks. So far, only two generation tasks (candidate selection for machine translation and confusion-set disambiguation) have been tested using web-scale data ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Previous work demonstrated that web counts can be used to approximate bigram frequencies, and thus should be useful for a wide variety of NLP tasks. So far, only two generation tasks (candidate selection for machine translation and confusion-set disambiguation) have been tested using web-scale data sets. The present paper investigates if these results generalize to tasks covering both syntax and semantics, both generation and analysis, and a larger range of n-grams. For the majority of tasks, we find that simple, unsupervised models perform better when n-gram frequencies are obtained from the web rather than from a large corpus. However, in most cases, web-based models fail to outperform more sophisticated state-of-theart models trained on small corpora. We argue that web-based models should therefore be used as a baseline for, rather than an alternative to, standard models. 1
Verb Class Disambiguation Using Informative Priors
- COMPUTATIONAL LINGUISTICS
, 2004
"... Levin’s (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin’s inventory to a simple statistical model of verb class ambi ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Levin’s (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin’s inventory to a simple statistical model of verb class ambiguity. Using this model we are able to generate preferences for ambiguous verbs without the use of a disambiguated corpus. We additionally show that these preferences are useful as priors for a verb sense disambiguator.
On the Means for Clarification in Dialogue
- Current and New Directions in Discourse & Dialogue
, 2003
"... The ability to request clarification of utterances is a vital part of the communicative process. In this paper we discuss the range of possible forms for clarification requests, together with the range of readings they can convey. We present the results of corpus analysis which show a correlation be ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
The ability to request clarification of utterances is a vital part of the communicative process. In this paper we discuss the range of possible forms for clarification requests, together with the range of readings they can convey. We present the results of corpus analysis which show a correlation between certain forms and possible readings, together with some indication of maximum likely distance between request and the utterance being clarified.
Particle-Based Language Modelling
, 2000
"... This paper investigates the use of particle (sub-word) N-grams for language modelling. One linguistics-based and two datadriven algorithms are presented and evaluated in terms of perplexity for Russian and English. Interpolating word trigram and particle 6-gram models gives up to a 7.5% perplexity r ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper investigates the use of particle (sub-word) N-grams for language modelling. One linguistics-based and two datadriven algorithms are presented and evaluated in terms of perplexity for Russian and English. Interpolating word trigram and particle 6-gram models gives up to a 7.5% perplexity reduction over the baseline word trigram model for Russian. Lattice rescoring experiments are also performed on 1997 DARPA Hub4 evaluation lattices where the interpolated model gives a 0.4% absolute reduction in word error rate over the baseline word trigram model. 1. INTRODUCTION Most of the current approaches to language modelling for speech recognition tend to use words, or classes of words, as the modelling units. Words are a logical choice, since it is ultimately words that are to be output by a speech recognition system, but they are not necessarily the best units for capturing dependencies in a text. The optimal set of units will inevitably depend on the language, the sparsity of the ...
The Applicability Of Adaptive Language Modelling For The Broadcast News Task
- ICSLP98
, 1998
"... Adaptive language models have consistently been shown to lead to a significant reduction in language model perplexity compared to the equivalent static trigram model on many data sets. When these language models have been applied to speech recognition, however, they have seldom resulted in a corresp ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Adaptive language models have consistently been shown to lead to a significant reduction in language model perplexity compared to the equivalent static trigram model on many data sets. When these language models have been applied to speech recognition, however, they have seldom resulted in a corresponding reduction in word error rate. This paper will investigate some of the possible reasons for this apparent discrepancy, and will explore the circumstances under which adaptive language models can be useful. We will concentrate on cache-based and mixture-based models and their use on the Broadcast News task. 1. INTRODUCTION The performance of an automatic speech recognition system can depend critically on the suitability of its language model. For example, a system trained to recognise speech read from the Wall Street Journal will be equipped with a language model trained on many millions of words from previous editions of the newspaper, and will perform very well on its specified task...
The Theory and Use of Clarification Requests in Dialogue
, 2004
"... Clarification requests are an important, relatively common and yet under-studied dialogue device allowing a user to ask about some feature (e.g. the meaning or form) of an utterance, or part thereof. They can take many different forms (often highly elliptical) and can have many different meanings (r ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Clarification requests are an important, relatively common and yet under-studied dialogue device allowing a user to ask about some feature (e.g. the meaning or form) of an utterance, or part thereof. They can take many different forms (often highly elliptical) and can have many different meanings (requesting various types of information). This thesis combines empirical, theoretical and implementational work to provide a study of the various types of clarification request that exist, give a theoretical analysis thereof, and show how the results can be applied to add useful capabilities to a prototype computational dialogue system. A series
Category-Based Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams.

