• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Three generative, lexicalised models for statistical parsing. In (1997)

by Michael Collins
Venue:ACL,
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 570
Next 10 →

Learning to Parse Natural Language with Maximum Entropy Models

by Adwait Ratnaparkhi , 1999
"... This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the pa ..."
Abstract - Cited by 191 (0 self) - Add to MetaCart
This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the state-of-the-art.

Modeling local coherence: An entity-based approach

by Regina Barzilay - In Proceedings of ACL 2005 , 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract - Cited by 187 (14 self) - Add to MetaCart
This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a state-of-the-art coherence model. 1
(Show Context)

Citation Context

...al, grammatical, semantic and positional. Once we have identified entity classes, the next step is to fill out grid entries with relevant syntactic information. We employ a robust statistical parser (=-=Collins, 1997-=-) to determine the constituent structure for each sentence, from which subjects (s), objects (o), and relations other than subject or object (x) are identified. Passive verbs are recognized using a sm...

Maltparser: A language-independent system for data-driven dependency parsing

by Joakim Nivre, Johan Hall - In Proc. of the Fourth Workshop on Treebanks and Linguistic Theories , 2005
"... ..."
Abstract - Cited by 170 (7 self) - Add to MetaCart
Abstract not found

The Penn Chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering

by Nianwen Xue, Fei Xia , 2005
"... With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora wi ..."
Abstract - Cited by 170 (23 self) - Add to MetaCart
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with dierent segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are diÆcult. As a rst step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The rst two installments of the corpus, 250 thousand words of data, fully segmented, POS-tagged and syntactically bracketed, have been released to the public via LDC (www.ldc.upenn.edu). In this paper, we discuss several Chinese linguistic issues and their implications for our treebanking eorts and how we address these issues when developing our annotation guidelines. We also describe our engineering strategies to improve speed while ensuring annotation quality. 1
(Show Context)

Citation Context

...es. Most notably, the Penn English Treebank (Marcus, Santorini, and Marcinkiewicz, 1993) has proven to be a crucial resource in the recent success of English part-of-speech (POS) taggers and parsers (=-=Collins, 1997-=-; Collins, 2000; Charniak, 2000), as it provides common training and testing material so that dierent algorithms can be compared and progress be gauged. Its success triggered the development of treeb...

Statistical Dependency Analysis with Support Vector Machines

by Hiroyasu Yamada, Yuji Matsumoto - In Proceedings of IWPT , 2003
"... In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little wo ..."
Abstract - Cited by 162 (1 self) - Add to MetaCart
In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that our parser uses no information from phrase structures. 1
(Show Context)

Citation Context

... satisfactorily accurate considering that our parser uses no information from phrase structures. 1 Introduction A number of statistical parsers have been proposed and attained a very good performance =-=[3, 10, 7]-=-. While most of well known work uses learning on Penn treebank [8] syntactic annotated text for the training data, comparable performance is hardly obtainable when they are applied to texts in quite d...

YAGO: A Large Ontology from Wikipedia and WordNet

by Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum , 2008
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy a ..."
Abstract - Cited by 148 (16 self) - Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.

A Statistical Parser for Czech

by Michael Collins, Lance Ramshaw, Jan Hajic, Christoph Tillmann , 1999
"... This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly infiected language, and (2) it has relatively free word order. These dif- ferences are likely to .pose new problems for tech- niques that have been developed on Engli ..."
Abstract - Cited by 143 (4 self) - Add to MetaCart
This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly infiected language, and (2) it has relatively free word order. These dif- ferences are likely to .pose new problems for tech- niques that have been developed on English. We describe our experience in building on the parsing model of (Collins 97). Our final results - 80% dependency accuracy - represent good progress towards the 91% accuracy of the parser on English (Wall Street Journal) text.

Automatic Verb Classification Based on Statistical Distributions of Argument Structure

by Paola Merlo, Suzanne Stevenson - Computational Linguistics , 2001
"... this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level ..."
Abstract - Cited by 137 (20 self) - Add to MetaCart
this paper, we focus on argument structure--the thematic roles assigned by a verb to its arguments--as the way in which the relational semantics of the verb is represented at the syntactic level

Adding Semantic Annotation to the Penn TreeBank

by Paul Kingsbury, Martha Palmer, Mitch Marcus - In Proceedings of the Human Language Technology Conference , 2002
"... This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate ..."
Abstract - Cited by 136 (2 self) - Add to MetaCart
This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate the automatic extraction of relational data. An argument such as the window in John broke the window and in The window broke would receive the same label in both sentences. In order to ensure reliable human annotation, we provide our annotators with explicit guidelines for labeling all of the syntactic and semantic frames of each particular verb. We give several examples of these guidelines and discuss the inter-annotator agreement figures. We also discuss our current experiments on the automatic expansion of our verb guidelines based on verb class membership. Our current rate of progress and our consistency of annotation demonstrate the feasibility of the task.

Training Tree Transducers

by Jonathan Graehl, Kevin Knight - IN HLT-NAACL , 2004
"... Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to ..."
Abstract - Cited by 132 (12 self) - Add to MetaCart
Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to draw on, as it has been worked out in an extensive literature. We motivate the use of tree transducers for natural language and address the training problem for probabilistic tree-totree and tree-to-string transducers.
(Show Context)

Citation Context

...ight, and Marcu 2003), natural language generation (Langkilde and Knight 1998; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (Baker 1979; Lari and Young 1990; =-=Collins 1997-=-; Chelba and Jelinek 2000; Charniak 2001; Klein ∗ Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292. E-mail: graehl@isi.edu. ∗∗ Information Sciences Institute, 4676 Admiralt...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University