Results 1 - 10
of
112
Identifying relations for open information extraction. In:
- Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP),
, 2011
"... Abstract Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-ofthe-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we ..."
Abstract
-
Cited by 140 (4 self)
- Add to MetaCart
(Show Context)
Abstract Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-ofthe-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce two simple syntactic and lexical constraints on binary relations expressed by verbs. We implemented the constraints in the REVERB Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TEXTRUNNER and WOE pos . More than 30% of REVERB's extractions are at precision 0.8 or highercompared to virtually none for earlier systems. The paper concludes with a detailed analysis of REVERB's errors, suggesting directions for future work.
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
"... Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially ..."
Abstract
-
Cited by 104 (15 self)
- Add to MetaCart
(Show Context)
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO-of(Jobs, Apple). This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level. 1
A latent dirichlet allocation method for selectional preferences
- In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL
, 2010
"... The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distri ..."
Abstract
-
Cited by 80 (8 self)
- Add to MetaCart
(Show Context)
The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85 % increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al.’s system (Pantel et al., 2007). 1
StatSnowball: a Statistical Approach to Extracting Entity Relationships
- WWW 2009 MADRID! TRACK: DATA MINING / SESSION: STATISTICAL METHODS
, 2009
"... Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the abili ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic networks
Entity Disambiguation for Knowledge Base Population
"... The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
(Show Context)
The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources. Further, our approach achieves performance of up to 95 % on entities mentioned from newswire and 80 % on a public test set that was designed to include challenging queries. 1
Recovering Semantics of Tables on the Web
"... The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by en ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
(Show Context)
The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables. To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach. 1.
Open Information Extraction: The Second Generation
- PROCEEDINGS OF THE TWENTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2011
"... How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews ha ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domainspecific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both commonsense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.
Structured relation discovery using generative models
- In Proceedings of the Conference on Empirical Methods in Natural Language Processing
, 2011
"... We explore unsupervised approaches to rela-tion extraction between two named entities; for instance, the semantic bornIn relation be-tween a person and location entity. Con-cretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
(Show Context)
We explore unsupervised approaches to rela-tion extraction between two named entities; for instance, the semantic bornIn relation be-tween a person and location entity. Con-cretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus of ob-served triples of entity mention pairs and the surface syntactic dependency path between them. The output of each model is a cluster-ing of observed relation tuples and their as-sociated textual expressions to underlying se-mantic relation types. Our proposed models exploit entity type constraints within a relation as well as features on the dependency path be-tween entity mentions. We examine effective-ness of our approach via multiple evaluations and demonstrate 12 % error reduction in preci-sion over a state-of-the-art weakly supervised baseline. 1
Entity Linking: Finding Extracted Entities in a Knowledge Base
"... Abstract. In the menagerie of tasks for information extraction, entity linking is a new beast that has drawn a lot of attention from NLP practitioners and researchers recently. Entity Linking, also referred to as record linkage or entity resolution, involves aligning a textual mention of a named-ent ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In the menagerie of tasks for information extraction, entity linking is a new beast that has drawn a lot of attention from NLP practitioners and researchers recently. Entity Linking, also referred to as record linkage or entity resolution, involves aligning a textual mention of a named-entity to an appropriate entry in a knowledge base, which may or may not contain the entity. This has manifold applications ranging from linking patient health records to maintaining personal credit files, prevention of identity crimes, and supporting law enforcement. We discuss the key challenges present in this task and we present a high-performing system that links entities using max-margin ranking. We also summarize recent work in this area and describe several open research problems.
M.: Discovering Relations between Noun Categories
- In Proceedings of the conference on empirical methods in natural language processing (EMNLP 2011
, 2011
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.