Results 1 -
9 of
9
Automatic Factual Question Generation from Text
"... Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this chall ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this challenge by automating the creation of a specific type of assessment item. Specifically, we focus on automatically generating factual WH questions. Our goal is to create an automated system that can take as input a text and produce as output questions for assessing a reader’s knowledge of the information in the text. The questions could then be presented to a teacher, who could select and revise the ones that he or she judges to be useful. After introducing the problem, we describe some of the computational and linguistic challenges presented by factual question generation. We then present an implemented system that leverages existing natural language processing techniques to address some of these challenges. The system uses a combination of manually encoded transformation rules and a statistical question ranker trained on a tailored dataset of labeled system output. We present experiments that evaluate individual components of the system as well as the system as a whole. We found, among other things, that the question ranker roughly doubled the acceptability
Open Information Extraction using Wikipedia (An updated and corrected version of our ACL-2010 paper)
"... Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the ..."
Abstract
- Add to MetaCart
Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. The key to WOE’s performance is a novel form of self-supervised learning for open extractors — using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE’s extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher. 1
Leveraging different meronym discovery methods for bridging resolution in French
, 2012
"... Abstract. This paper presents a statistical system for resolving bridging descriptions in French, a language for which current lexical resources have a very low coverage. The system is similar to that developed for English by [20], but it was enriched to integrate meronymic information extracted aut ..."
Abstract
- Add to MetaCart
Abstract. This paper presents a statistical system for resolving bridging descriptions in French, a language for which current lexical resources have a very low coverage. The system is similar to that developed for English by [20], but it was enriched to integrate meronymic information extracted automatically from both web queries and raw text using syntactic patterns. Through various experiments on the DEDE corpus [8], we show that although still mediocre the performance of our system compare favorably to those obtained by [20] for English. In addition, our evaluation indicates that the different meronym extraction methods have a cumulative effect, but that the text pattern-based extraction method is more robust and leads to higher accuracy than the Web-based approach. 1
Open Information Extraction: the Second Generation
"... How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews ha ..."
Abstract
- Add to MetaCart
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domainspecific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both commonsense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE. 1
and Media Informatics,
"... Most relation extraction methods, especially in the domain of biology, rely on machine learning methods to classify a cooccurring pair of entities in a sentence to be related or not. Such an approach requires a training corpus, which involves expert annotation and is tedious, timeconsuming, and expe ..."
Abstract
- Add to MetaCart
Most relation extraction methods, especially in the domain of biology, rely on machine learning methods to classify a cooccurring pair of entities in a sentence to be related or not. Such an approach requires a training corpus, which involves expert annotation and is tedious, timeconsuming, and expensive. We overcome this problem by the use of existing knowledge in structured databases to automatically generate a training corpus for protein-protein interactions. An extensive evaluation of different instance selection strategies is performed to maximize robustness on this presumably noisy resource. Successful strategies to consistently improve performance include a majority voting ensemble of classifiers trained on subsets of the training corpus and the use of knowledge bases consisting of proven non-interactions. Our best configured model built without manually annotated data shows very competitive results on several publicly available benchmark corpora. 1
Abstract Leveraging Knowledge Bases in Web Text Processing
, 2012
"... The Web contains more text than any other source in human history, and continues to expand rapidly. Computer algorithms to process and extract knowledge from Web text have the potential not only to improve Web search, but also to collect a sizable fraction of human knowledge and use it to enable sma ..."
Abstract
- Add to MetaCart
The Web contains more text than any other source in human history, and continues to expand rapidly. Computer algorithms to process and extract knowledge from Web text have the potential not only to improve Web search, but also to collect a sizable fraction of human knowledge and use it to enable smarter artificial intelligence. To scale to the size and diversity of the Web, many Web text processing algorithms use domain-independent statistical approaches, rather than limiting their processing to any fixed ontologies or sets of domains. While traditional knowledge bases (KBs) had limited coverage of general knowledge, the last few years have seen the rapid rise of new KBs like Freebase and Wikipedia that now cover millions of general interest topics. While these KBs still do not cover the full diversity of the Web, this thesis demonstrates that they are now close enough that there are ways to effectively leverage them in domain-independent Web text processing. It presents and empirically verifies how these KBs can be used to filter uninteresting Web extractions, enhance understanding and usability of both extracted relations and extracted entities, and even power new functionality for Web search. The effective integration of KBs with
Large-Scale Learning of Word Relatedness with Constraints
"... Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the lear ..."
Abstract
- Add to MetaCart
Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.

