Results 1 - 10
of
21
A Maximum-Entropy-Inspired Parser
, 1999
"... We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of ..."
Abstract
-
Cited by 671 (16 self)
- Add to MetaCart
We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of the Wall Street Journal tree- bank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innova- tion is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.
SELECTION AND INFORMATION: A CLASS-BASED APPROACH TO LEXICAL RELATIONSHIPS
, 1993
"... Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement “The number two is blue” may be syntactically well formed, but at some level it is anomalous — BLUE is not a predicate that can be applied to numbers. According to the influential theo ..."
Abstract
-
Cited by 209 (8 self)
- Add to MetaCart
Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement “The number two is blue” may be syntactically well formed, but at some level it is anomalous — BLUE is not a predicate that can be applied to numbers. According to the influential theory of (Katz and Fodor, 1964), a predicate associates a set of defining features with each argument, expressed within a restricted semantic vocabulary. Despite the persistence of this theory, however, there is widespread agreement about its empirical shortcomings (McCawley, 1968; Fodor, 1977). As an alternative, some critics of the Katz-Fodor theory (e.g. (Johnson-Laird, 1983)) have abandoned the treatment of selectional constraints as semantic, instead treating them as indistinguishable from inferences made on the basis of factual knowledge. This provides a better match for the empirical phenomena, but it opens up a different problem: if selectional constraints are the same as inferences in general, then accounting for them will require a much more complete understanding of knowledge representation and inference than we have at present. The problem, then, is this: how can a theory of selectional constraints be elaborated without first having either an empirically adequate theory of defining features or a comprehensive theory of inference? In this dissertation, I suggest that an answer to this question lies in the representation of conceptual
Introduction to the Special Issue on Computational Linguistics using Large Corpora
- Computational Linguistics
, 1993
"... ..."
Equations for Part-of-Speech Tagging
- In Proceedings of the Eleventh National Conference on Artificial Intelligence
, 1993
"... We derive from first principles the basic equations for a few of the basic hidden-Markov-model word taggers as well as equations for other models which may be novel (the descriptions in previous papers being too spare to be sure). We give performance results for all of the models. The results from o ..."
Abstract
-
Cited by 98 (2 self)
- Add to MetaCart
We derive from first principles the basic equations for a few of the basic hidden-Markov-model word taggers as well as equations for other models which may be novel (the descriptions in previous papers being too spare to be sure). We give performance results for all of the models. The results from our best model (96.45% on an unused test sample from the Brown corpus with 181 distinct tags) is on the upper edge of reported results. We also hope these results clear up some confusion in the literature about the best equations to use. However, the major purpose of this paper is to show how the equations for a variety of models may be derived and thus encourage future authors to give the equations for their model and the derivations thereof. Introduction The last few years have seen a fair number of papers on part-of-speech tagging --- assigning the correct part of speech to each word in a text [1,2,4,5,7,8,9,10]. Most of these systems view the text as having been produced by a hidden Mar...
Context-Sensitive Statistics for Improved Grammatical Language Models
- In Proceedings of the Twelfth National Conference on Artificial Intelligence
, 1994
"... We develop a language model using probabilistic context-free grammars (PCFGs) that is "pseudo context-sensitive" in that the probability that a non-terminal N expands using a rule r depends on N 's parent. We derive the equations for estimating the necessary probabilities using a variant of the insi ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
We develop a language model using probabilistic context-free grammars (PCFGs) that is "pseudo context-sensitive" in that the probability that a non-terminal N expands using a rule r depends on N 's parent. We derive the equations for estimating the necessary probabilities using a variant of the inside-outside algorithm. We give experimental results showing that, beginning with a high-performance PCFG, one can develop a pseudo PCSG that yields significant performance gains. Analysis shows that the benefits from the context-sensitive statistics are localized, suggesting that we can use them to extend the original PCFG. Experimental results confirm that this is both feasible and the resulting grammar retains the performance gains. This implies that our scheme may be useful as a novel method for PCFG induction. 1 Introduction Like its non-stochastic brethren, probabilistic parsing has been based upon context-free grammars (CFGs), and for similar reasons: CFGs support a simple and efficien...
The Entropy Of English Using Ppm-Based Models
- In Data Compression Conference
, 1996
"... this paper is to show that the difference between the best machine models and human models is smaller than might be indicated by these results. This follows from a number of observations: firsfly, the original human experiments used only 27 character English (letters plus space) against full 128 cha ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
this paper is to show that the difference between the best machine models and human models is smaller than might be indicated by these results. This follows from a number of observations: firsfly, the original human experiments used only 27 character English (letters plus space) against full 128 character ASCII text for most computer experiznents; secondly, using large amounts of priming text substantially improves PPM's performance; and thirdly, the PPM algorithm can k,e modified to perform better for English text. The result of this is machine performance down to 1.46 bpc
A Simple But Useful Approach To Conjunct Identification
, 1992
"... This paper presents an approach to identifying conjuncts of coordinate conjunctions appearing in text which has been labelled with syntactic and semantic tags. The overall project of which this research is a part is also briefly discussed. The program was tested on a 10,000 word chapter of the Merck ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper presents an approach to identifying conjuncts of coordinate conjunctions appearing in text which has been labelled with syntactic and semantic tags. The overall project of which this research is a part is also briefly discussed. The program was tested on a 10,000 word chapter of the Merck Veterinary Manual. The algorithm is deterministic and domain independent and it performs relatively well on a large real-life domain. Constructs not handled by the simple algorithm are also described in some detail.
Text Classification and Segmentation Using Minimum Cross-Entropy
, 2000
"... Several methods for classifying and segmenting text are described. These are based on ranking text sequences by their cross-entropy calculated using a fixed order character-based Markov model adapted from the PPM text compression algorithm. Experimental results show that the methods are a signi cant ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Several methods for classifying and segmenting text are described. These are based on ranking text sequences by their cross-entropy calculated using a fixed order character-based Markov model adapted from the PPM text compression algorithm. Experimental results show that the methods are a signi cant improvement over previously used methods in a number of areas. For example, text can be classified with a very high degree of accuracy by authorship, language, dialect and genre. Highly accurate text segmentation is also possible -- the accuracy of the PPM-based Chinese word segmenter is close to 99% on Chinese news text; similarly, a PPM-based method of segmenting text by language achieves an accuracy of over 99%.
Interactive feature space construction using semantic information
- In Proceedings of CoNLL
, 2009
"... Specifying an appropriate feature space is an important aspect of achieving good performance when designing systems based upon learned classifiers. Effectively incorporating information regarding semantically related words into the feature space is known to produce robust, accurate classifiers and i ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Specifying an appropriate feature space is an important aspect of achieving good performance when designing systems based upon learned classifiers. Effectively incorporating information regarding semantically related words into the feature space is known to produce robust, accurate classifiers and is one apparent motivation for efforts to automatically generate such resources. However, naive incorporation of this semantic information may result in poor performance due to increased ambiguity. To overcome this limitation, we introduce the interactive feature space construction protocol, where the learner identifies inadequate regions of the feature space and in coordination with a domain expert adds descriptiveness through existing semantic resources. We demonstrate effectiveness on an entity and relation extraction system including both performance improvements and robustness to reductions in annotated data. 1

