Results 1 - 10
of
30
A Semi-Supervised Method to Learn and Construct Taxonomies using the Web
"... Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root concept, a basic level concept, and recursive surface patterns to learn automatically from the Web hyponym-hypernym pai ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root concept, a basic level concept, and recursive surface patterns to learn automatically from the Web hyponym-hypernym pairs subordinated to the root; (2) a Web based concept positioning procedure to validate the learned pairs ’ is-a relations; and (3) a graph algorithm that derives from scratch the integrated taxonomy structure of all the terms. Comparing results with WordNet, we find that the algorithm misses some concepts and links, but also that it discovers many additional ones lacking in WordNet. We evaluate the taxonomization power of our method on reconstructing parts of the WordNet taxonomy. Experiments show that starting from scratch, the algorithm can reconstruct 62 % of the WordNet taxonomy for the regions tested.
Latent variable models of selectional preference
- In ACL 2010
, 2010
"... This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to pr ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data. 1
A Bayesian Model for Unsupervised Semantic Parsing
"... We propose a non-parametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of s ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We propose a non-parametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of semantically equivalent syntactic structures, and (3) predict predicate-argument relations between the fragments. We use hierarchical Pitman-Yor processes to model statistical dependencies between meaning representations of predicates and those of their arguments, as well as the clusters of their syntactic realizations. We develop a modification of the Metropolis-Hastings split-merge sampler, resulting in an efficient inference algorithm for the model. The method is experimentally evaluated by using the induced semantic representation for the question answering task in the biomedical domain. 1
Identifying Functional Relations in Web Text
"... Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods o ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typed functionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.
Identifying Relations for Open Information Extraction
, 2011
"... Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-ofthe-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-ofthe-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce two simple syntactic and lexical constraints on binary relations expressed by verbs. We implemented the constraints in the REVERB Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TEXTRUNNER and WOE pos. More than 30 % of REVERB’s extractions are at precision 0.8 or higher— compared to virtually none for earlier systems. The paper concludes with a detailed analysis of REVERB’s errors, suggesting directions for future work.
Structured Relation Discovery using Generative Models
"... We explore unsupervised approaches to relation extraction between two named entities; for instance, the semantic bornIn relation between a person and location entity. Concretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus of ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We explore unsupervised approaches to relation extraction between two named entities; for instance, the semantic bornIn relation between a person and location entity. Concretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus of observed triples of entity mention pairs and the surface syntactic dependency path between them. The output of each model is a clustering of observed relation tuples and their associated textual expressions to underlying semantic relation types. Our proposed models exploit entity type constraints within a relation as well as features on the dependency path between entity mentions. We examine effectiveness of our approach via multiple evaluations and demonstrate 12 % error reduction in precision over a state-of-the-art weakly supervised baseline. al., 2010). We propose a series of generative probabilistic models, broadly similar to standard topic models, which generate a corpus of observed triples of entity mention pairs and the surface syntactic dependency path between them. Our proposed models exploit entity type constraints within a relation as well as features on the dependency path between entity mentions. The output of our approach is a clustering over observed relation paths (e.g. “X was born in Y ” and “X is from Y”) such that expressions in the same cluster bear the same semantic relation type between entities. Past work has shown that standard supervised techniques can yield high-performance relation detection when abundant labeled data exists for a fixed inventory of individual relation types (e.g. leaderOf) (Kambhatla, 2004; Culotta and Sorensen, 2004; Roth and tau Yih, 2002). However, less explored are open-domain approaches where the set of possible relation types are not fixed and little to no labeled is given for each relation type (Banko et 1
Automatic Labelling of Topic Models
"... We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the lab ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features, optionally fed into a supervised ranking model. Our method is shown to perform strongly over four independent sets of topics, significantly better than a benchmark method. 1
Machine reading at the university of washington
- IN NAACL WORKSHOP FAM-LBR
, 2010
"... Machine reading is a long-standing goal of AI and NLP. In recent years, tremendous progress has been made in developing machine learning approaches for many of its subtasks such as parsing, information extraction, and question answering. However, existing end-to-end solutions typically require subst ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Machine reading is a long-standing goal of AI and NLP. In recent years, tremendous progress has been made in developing machine learning approaches for many of its subtasks such as parsing, information extraction, and question answering. However, existing end-to-end solutions typically require substantial amount of human efforts (e.g., labeled data and/or manual engineering), and are not well poised for Web-scale knowledge acquisition. In this paper, we propose a unifying approach for machine reading by bootstrapping from the easiest extractable knowledge and conquering the long tail via a self-supervised learning process. This self-supervision is powered by joint inference based on Markov logic, and is made scalable by leveraging hierarchical structures and coarse-to-fine inference. Researchers at the University of Washington have taken the first steps in this direction. Our existing work explores the wide spectrum of this vision and shows its promise.
Commonsense from the Web: Relation Properties
"... When general purpose software agents fail, it’s often because they’re brittle and need more background commonsense knowledge. In this paper we present relation properties as a valuable type of commonsense knowledge that can be automatically inferred at scale by reading the Web. People base many comm ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
When general purpose software agents fail, it’s often because they’re brittle and need more background commonsense knowledge. In this paper we present relation properties as a valuable type of commonsense knowledge that can be automatically inferred at scale by reading the Web. People base many commonsense inferences on their knowledge of relation properties such as functionality, transitivity, and others. For example, all people know that bornIn(Year) satisfies the functionality property, meaning that each person can be born in exactly one year. Thus inferences like "Obama was born in 1961, so he was not born in 2008", which computers do not know, are obvious even to children. We demonstrate scalable heuristics for learning relation functionality from noisy Web text that outperform existing approaches to detecting functionality. The heuristics we use address Web NLP challenges that are also common to learning other relation properties, and can be easily transferred. Each relation property we learn for a Web-scale set of relations will enable computers to solve real tasks, and the data from learning many such properties will be a useful addition to general commonsense knowledge bases.
Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
"... Metonymic language is a pervasive phenomenon. Metonymic type shifting, or argument type coercion, results in a selectional restriction violation where the argument’s semantic class differs from the class the predicate expects. In this paper we present an unsupervised method that learns the selection ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Metonymic language is a pervasive phenomenon. Metonymic type shifting, or argument type coercion, results in a selectional restriction violation where the argument’s semantic class differs from the class the predicate expects. In this paper we present an unsupervised method that learns the selectional restriction of arguments and enables the detection of argument coercion. This method also generates an enhanced probabilistic resolution of logical metonymies. The experimental results indicate substantial improvements the detection of coercions and the ranking of metonymic interpretations. 1

