Results 1 -
4 of
4
Open Information Extraction: the Second Generation
"... How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews ha ..."
Abstract
- Add to MetaCart
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domainspecific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both commonsense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE. 1
Abstract Leveraging Knowledge Bases in Web Text Processing
, 2012
"... The Web contains more text than any other source in human history, and continues to expand rapidly. Computer algorithms to process and extract knowledge from Web text have the potential not only to improve Web search, but also to collect a sizable fraction of human knowledge and use it to enable sma ..."
Abstract
- Add to MetaCart
The Web contains more text than any other source in human history, and continues to expand rapidly. Computer algorithms to process and extract knowledge from Web text have the potential not only to improve Web search, but also to collect a sizable fraction of human knowledge and use it to enable smarter artificial intelligence. To scale to the size and diversity of the Web, many Web text processing algorithms use domain-independent statistical approaches, rather than limiting their processing to any fixed ontologies or sets of domains. While traditional knowledge bases (KBs) had limited coverage of general knowledge, the last few years have seen the rapid rise of new KBs like Freebase and Wikipedia that now cover millions of general interest topics. While these KBs still do not cover the full diversity of the Web, this thesis demonstrates that they are now close enough that there are ways to effectively leverage them in domain-independent Web text processing. It presents and empirically verifies how these KBs can be used to filter uninteresting Web extractions, enhance understanding and usability of both extracted relations and extracted entities, and even power new functionality for Web search. The effective integration of KBs with
Contents lists available at SciVerse ScienceDirect Artificial Intelligence
"... www.elsevier.com/locate/artint BabelNet: The automatic construction, evaluation and application ..."
Abstract
- Add to MetaCart
www.elsevier.com/locate/artint BabelNet: The automatic construction, evaluation and application
Open Domain Event Extraction from Twitter
"... Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events h ..."
Abstract
- Add to MetaCart
Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events has focused largely on newswire text; Twitter’s unique characteristics present new challenges and opportunities for open-domain event extraction. This paper describes TwiCal— the first open-domain event-extraction and categorization system for Twitter. We demonstrate that accurately extracting an open-domain calendar of significant events from Twitter is indeed feasible. In addition, we present a novel approach for discovering important event categories and classifying extracted events based on latent variable models. By leveraging large volumes of unlabeled data, our approach achieves a 14 % increase in maximum F1 over a supervised baseline. A continuously updating demonstration of our system can be viewed at

