Results 11 - 20
of
570
Learning to Parse Natural Language with Maximum Entropy Models
, 1999
"... This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the pa ..."
Abstract
-
Cited by 191 (0 self)
- Add to MetaCart
This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the state-of-the-art.
Modeling local coherence: An entity-based approach
- In Proceedings of ACL 2005
, 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract
-
Cited by 187 (14 self)
- Add to MetaCart
(Show Context)
This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a state-of-the-art coherence model. 1
Maltparser: A language-independent system for data-driven dependency parsing
- In Proc. of the Fourth Workshop on Treebanks and Linguistic Theories
, 2005
"... ..."
The Penn Chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering
, 2005
"... With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora wi ..."
Abstract
-
Cited by 170 (23 self)
- Add to MetaCart
(Show Context)
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with dierent segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are diÆcult. As a rst step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The rst two installments of the corpus, 250 thousand words of data, fully segmented, POS-tagged and syntactically bracketed, have been released to the public via LDC (www.ldc.upenn.edu). In this paper, we discuss several Chinese linguistic issues and their implications for our treebanking eorts and how we address these issues when developing our annotation guidelines. We also describe our engineering strategies to improve speed while ensuring annotation quality. 1
Statistical Dependency Analysis with Support Vector Machines
- In Proceedings of IWPT
, 2003
"... In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little wo ..."
Abstract
-
Cited by 162 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that our parser uses no information from phrase structures. 1
YAGO: A Large Ontology from Wikipedia and WordNet
, 2008
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy a ..."
Abstract
-
Cited by 148 (16 self)
- Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.
A Statistical Parser for Czech
, 1999
"... This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly infiected language, and (2) it has relatively free word order. These dif- ferences are likely to .pose new problems for tech- niques that have been developed on Engli ..."
Abstract
-
Cited by 143 (4 self)
- Add to MetaCart
This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly infiected language, and (2) it has relatively free word order. These dif- ferences are likely to .pose new problems for tech- niques that have been developed on English. We describe our experience in building on the parsing model of (Collins 97). Our final results - 80% dependency accuracy - represent good progress towards the 91% accuracy of the parser on English (Wall Street Journal) text.
Automatic Verb Classification Based on Statistical Distributions of Argument Structure
- Computational Linguistics
, 2001
"... this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level ..."
Abstract
-
Cited by 137 (20 self)
- Add to MetaCart
this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level
Adding Semantic Annotation to the Penn TreeBank
- In Proceedings of the Human Language Technology Conference
, 2002
"... This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate ..."
Abstract
-
Cited by 136 (2 self)
- Add to MetaCart
This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate the automatic extraction of relational data. An argument such as the window in John broke the window and in The window broke would receive the same label in both sentences. In order to ensure reliable human annotation, we provide our annotators with explicit guidelines for labeling all of the syntactic and semantic frames of each particular verb. We give several examples of these guidelines and discuss the inter-annotator agreement figures. We also discuss our current experiments on the automatic expansion of our verb guidelines based on verb class membership. Our current rate of progress and our consistency of annotation demonstrate the feasibility of the task.
Training Tree Transducers
- IN HLT-NAACL
, 2004
"... Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to ..."
Abstract
-
Cited by 132 (12 self)
- Add to MetaCart
(Show Context)
Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to draw on, as it has been worked out in an extensive literature. We motivate the use of tree transducers for natural language and address the training problem for probabilistic tree-totree and tree-to-string transducers.