Results 1 - 10
of
67
A Maximum-Entropy-Inspired Parser
, 1999
"... We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of ..."
Abstract
-
Cited by 671 (16 self)
- Add to MetaCart
We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of the Wall Street Journal tree- bank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innova- tion is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.
TnT - A Statistical Part-Of-Speech Tagger
, 2000
"... Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even sh ..."
Abstract
-
Cited by 293 (3 self)
- Add to MetaCart
Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
- IN PROCEEDINGS OF HLT-NAACL
, 2003
"... We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective ..."
Abstract
-
Cited by 181 (12 self)
- Add to MetaCart
We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.
Distributional Part-of-Speech Tagging
- In Proc. of 7th Conference of the European Chapter of the Association for Computational Linguistics
, 1995
"... This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus. ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.
Speech repairs, intonational phrases and discourse markers: modeling speakers’ utterances in spoken dialogue
- Computational Linguistics
, 1999
"... Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Eve ..."
Abstract
-
Cited by 61 (9 self)
- Add to MetaCart
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly intertwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72 % of turn-internal intonational boundaries with a precision of 71%, 97 % of discourse markers with 96 % precision, and detect and correct 66 % of repairs with 74 % precision.
A second-order hidden markov model for part-of-speech tagging
- In Proceedings of the 37th Annual Meeting of the ACL
, 1999
"... This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual info ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual information than standard statistical systems. New methods of smoothing the estimated probabilities are also introduced to address the sparse data problem. 1
Automatic stochastic tagging of natural language texts
- Computational Linguistics
, 1995
"... Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the ele ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the eleven main grammatical classes and a large set of grammatical categories common to all languages. The unknown words are tagged using an experimentally proven stochastic hypothesis that links the stochastic behavior of the unknown words with that of the less probable known words. A fully automatic training and tagging program has been implemented on an IBM PC-compatible 80386-based computer. Measurements of error rate, time response, and memory requirements have shown that the taggers " performance is satisfactory, even though a small training text is available. The error rate is improved when new texts are used to update the stochastic model parameters. 1.
Context-Sensitive Statistics for Improved Grammatical Language Models
- In Proceedings of the Twelfth National Conference on Artificial Intelligence
, 1994
"... We develop a language model using probabilistic context-free grammars (PCFGs) that is "pseudo context-sensitive" in that the probability that a non-terminal N expands using a rule r depends on N 's parent. We derive the equations for estimating the necessary probabilities using a variant of the insi ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
We develop a language model using probabilistic context-free grammars (PCFGs) that is "pseudo context-sensitive" in that the probability that a non-terminal N expands using a rule r depends on N 's parent. We derive the equations for estimating the necessary probabilities using a variant of the inside-outside algorithm. We give experimental results showing that, beginning with a high-performance PCFG, one can develop a pseudo PCSG that yields significant performance gains. Analysis shows that the benefits from the context-sensitive statistics are localized, suggesting that we can use them to extend the original PCFG. Experimental results confirm that this is both feasible and the resulting grammar retains the performance gains. This implies that our scheme may be useful as a novel method for PCFG induction. 1 Introduction Like its non-stochastic brethren, probabilistic parsing has been based upon context-free grammars (CFGs), and for similar reasons: CFGs support a simple and efficien...
Domain-Specific Knowledge Acquisition For Conceptual Sentence Analysis
, 1994
"... The availability of on-line corpora is rapidly changing the field of natural language processing (NLP) from one dominated by theoretical models of often very specific linguistic phenomena to one guided by computational models that simultaneously account for a wide variety of phenomena that occur i ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
The availability of on-line corpora is rapidly changing the field of natural language processing (NLP) from one dominated by theoretical models of often very specific linguistic phenomena to one guided by computational models that simultaneously account for a wide variety of phenomena that occur in real-world text. Thus far, among the best-performing and most robust systems for reading and summarizing large amounts of real-world text are knowledge-based natural language systems. These systems rely heavily on domain-specific, handcrafted knowledge to handle the myriad syntactic, semantic, and pragmatic ambiguities that pervade virtually all aspects of sentence analysis. Not surprisingly, however, generating this knowledge for new domain...
Language Independent, Minimally Supervised Induction of Lexical Probabilities
- In Proceedings of ACL-2000
, 2000
"... A central problem in part-of-speech tagging, especially for new languages for which limited annotated resources are available, is estimating the distribution of lexical probabilities for unknown words. This paper introduces a new paradigmatic similarity measure and presents a minimally supervised le ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
A central problem in part-of-speech tagging, especially for new languages for which limited annotated resources are available, is estimating the distribution of lexical probabilities for unknown words. This paper introduces a new paradigmatic similarity measure and presents a minimally supervised learning approach combining effective selection and weighting methods based on paradigmatic and contextual similarity measures populated from large quantities of inexpensive raw text data. This approach is highly language independent and requires no modification to the algorithm or implementation to shift between languages such as French and English.

