• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

The Tradeoffs Between Open and Traditional Relation Extraction

by Michele Banko, Oren Etzioni
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 112
Next 10 →

Identifying relations for open information extraction. In:

by Anthony Fader , Stephen Soderland , Oren Etzioni - Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), , 2011
"... Abstract Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-ofthe-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we ..."
Abstract - Cited by 140 (4 self) - Add to MetaCart
Abstract Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-ofthe-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce two simple syntactic and lexical constraints on binary relations expressed by verbs. We implemented the constraints in the REVERB Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TEXTRUNNER and WOE pos . More than 30% of REVERB's extractions are at precision 0.8 or highercompared to virtually none for earlier systems. The paper concludes with a detailed analysis of REVERB's errors, suggesting directions for future work.
(Show Context)

Citation Context

...ion 5, systems such as TEXTRUNNER are unable to learn the constraints embedded in REVERB. Of course, a learning system, utilizing a different hypothesis space, and an appropriate set of training examples, could potentially learn and refine the constraints in REVERB. This is a topic for future work, which we consider in Section 6. The first Open IE system was TEXTRUNNER (Banko et al., 2007), which used a Naive Bayes model with unlexicalized POS and NP-chunk features, trained using examples heuristically generated from the Penn Treebank. Subsequent work showed that utilizing a linear-chain CRF (Banko and Etzioni, 2008) or Markov Logic Network (Zhu et al., 2009) can lead to improved extraction. The WOE systems introduced by Wu and Weld make use of Wikipedia as a source of training data for their extractors, which leads to further improvements over TEXTRUNNER (Wu and Weld, 2010). Wu and Weld also show that dependency parse features result in a dramatic increase in precision and recall over shallow linguistic features, but at the cost of extraction speed. Other approaches to large-scale IE have included Preemptive IE (Shinyama and Sekine, 2006), OnDemand IE (Sekine, 2006), and weak supervision for IE (Mintz et...

Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

by Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld
"... Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially ..."
Abstract - Cited by 104 (15 self) - Add to MetaCart
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO-of(Jobs, Apple). This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level. 1
(Show Context)

Citation Context

...s found in text on the Web. Open IE systems, which perform selfsupervised learning of relation-independent extractors (e.g., Preemptive IE (Shinyama and Sekine, 2006), TEXTRUNNER (Banko et al., 2007; =-=Banko and Etzioni, 2008-=-) and WOE (Wu and Weld, 2010)) can scale to millions of documents, but don’t output canonicalized relations. 8.1 Weak Supervision Weak supervision (also known as distant- or self supervision) refers t...

A latent dirichlet allocation method for selectional preferences

by Alan Ritter, Oren Etzioni - In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL , 2010
"... The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distri ..."
Abstract - Cited by 80 (8 self) - Add to MetaCart
The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85 % increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al.’s system (Pantel et al., 2007). 1
(Show Context)

Citation Context

...mmonly co-occur. This information is very helpful in guiding inference. We run LDA-SP to compute preferences on a massive dataset of binary relations r(a1, a2) ex-tracted from the Web by TEXTRUNNER (=-=Banko and Etzioni, 2008-=-). Our experiments demonstrate that LDA-SP significantly outperforms state of the art approaches obtaining an 85% increase in recall at precision 0.9 on the standard pseudodisambiguation task. Additio...

StatSnowball: a Statistical Approach to Extracting Entity Relationships

by Jun Zhu, Zaiqing Nie, Xiaojing Liu, Bo Zhang, Ji-rong Wen - WWW 2009 MADRID! TRACK: DATA MINING / SESSION: STATISTICAL METHODS , 2009
"... Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the abili ..."
Abstract - Cited by 56 (2 self) - Add to MetaCart
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic networks

Entity Disambiguation for Knowledge Base Population

by Mark Dredze, Paul Mcnamee, Delip Rao, Adam Gerber, Tim Finin
"... The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge ..."
Abstract - Cited by 50 (4 self) - Add to MetaCart
The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources. Further, our approach achieves performance of up to 95 % on entities mentioned from newswire and 80 % on a public test set that was designed to include challenging queries. 1
(Show Context)

Citation Context

...tion The ability to identify entities like people, organizations and geographic locations (Tjong Kim Sang and De Meulder, 2003), extract their attributes (Pasca, 2008), and identify entity relations (=-=Banko and Etzioni, 2008-=-) is useful for several applications in natural language processing and knowledge acquisition tasks like populating structured knowledge bases (KB). However, inserting extracted knowledge into a KB is...

Recovering Semantics of Tables on the Web

by Petros Venetis, Alon Halevy, Jayant Madhavan, Warren Shen, Fei Wu, Gengxin Miao, Chung Wu
"... The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by en ..."
Abstract - Cited by 42 (4 self) - Add to MetaCart
The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables. To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach. 1.
(Show Context)

Citation Context

...n extraction that outputs instances of a given relation, OIE extracts any relation using a set of relations-independent heuristics. In our implementation, we use the TextRunner open extraction system =-=[2]-=-. As reported in [2], TextRunner has precision around 73.9% and recall around 58.4%. In Appendix B we provide additional details about TextRunner. 3.3 Evaluating candidate annotations The databases de...

Open Information Extraction: The Second Generation

by Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, Mausam - PROCEEDINGS OF THE TWENTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE , 2011
"... How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews ha ..."
Abstract - Cited by 41 (0 self) - Add to MetaCart
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domainspecific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both commonsense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

Structured relation discovery using generative models

by Limin Yao, Aria Haghighi, Sebastian Riedel, Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew Mccallum - In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2011
"... We explore unsupervised approaches to rela-tion extraction between two named entities; for instance, the semantic bornIn relation be-tween a person and location entity. Con-cretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus ..."
Abstract - Cited by 36 (7 self) - Add to MetaCart
We explore unsupervised approaches to rela-tion extraction between two named entities; for instance, the semantic bornIn relation be-tween a person and location entity. Con-cretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus of ob-served triples of entity mention pairs and the surface syntactic dependency path between them. The output of each model is a cluster-ing of observed relation tuples and their as-sociated textual expressions to underlying se-mantic relation types. Our proposed models exploit entity type constraints within a relation as well as features on the dependency path be-tween entity mentions. We examine effective-ness of our approach via multiple evaluations and demonstrate 12 % error reduction in preci-sion over a state-of-the-art weakly supervised baseline. 1
(Show Context)

Citation Context

.... However, less explored are open-domain approaches where the set of possible relation types are not fixed and little to no labeled is given for each relation type (Banko et 1 Introduction al., 2007; =-=Banko and Etzioni, 2008-=-). A more related line of research has explored inducing relaMany NLP applications would benefit from large knowledge bases of relational information about tion types via clustering. For example, DIRT...

Entity Linking: Finding Extracted Entities in a Knowledge Base

by Delip Rao, Paul Mcnamee, Mark Dredze
"... Abstract. In the menagerie of tasks for information extraction, entity linking is a new beast that has drawn a lot of attention from NLP practitioners and researchers recently. Entity Linking, also referred to as record linkage or entity resolution, involves aligning a textual mention of a named-ent ..."
Abstract - Cited by 25 (0 self) - Add to MetaCart
Abstract. In the menagerie of tasks for information extraction, entity linking is a new beast that has drawn a lot of attention from NLP practitioners and researchers recently. Entity Linking, also referred to as record linkage or entity resolution, involves aligning a textual mention of a named-entity to an appropriate entry in a knowledge base, which may or may not contain the entity. This has manifold applications ranging from linking patient health records to maintaining personal credit files, prevention of identity crimes, and supporting law enforcement. We discuss the key challenges present in this task and we present a high-performing system that links entities using max-margin ranking. We also summarize recent work in this area and describe several open research problems.
(Show Context)

Citation Context

...d entities, identify relationships between the entities expressed in the text. For instance, given two person names in news documents about crime and violence, identify the victim and the perpetrator =-=[4, 44]-=-. Most relation extraction methods can be classified as open- or closed-domain depending on the restrictions on extractable relations. Closed domain systems extract a fixed set of relations while in o...

M.: Discovering Relations between Noun Categories

by Estevam Rafael Hruschka, See Profile, Thahir P Mohamed, Estevam R Hruschka, Jr. Tom, M Mitchell - In Proceedings of the conference on empirical methods in natural language processing (EMNLP 2011 , 2011
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract - Cited by 22 (1 self) - Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
(Show Context)

Citation Context

...nko, Cararella et al. 2007) because the specific theory which Einstein derived is missing in that tuple. In the tuple (45, „went to‟, „Boston‟), one of the arguments (i.e. 45) is not well formed. In (=-=Banko and Etzioni, 2008-=-) a Conditional Random Field (CRF) classifier is used to perform Open Relation Extraction which improves by more than 60% the F-score achieved by the Naive Bayes model in the TextRunner system. Howeve...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University