Results 1 - 10
of
14
From Information to Knowledge: Harvesting Entities and Relationships from Web Sources
"... There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-l ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-ofthe-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.
YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia
- Commun. ACM
"... We are grateful for input from various people’s work: Edwin Lewis-Kelham for implementing the YAGO2 user interface, Gerard de Melo for his help on integrating his Universal WordNet, and Erdal Kuzey for his work on named events and time facts in Wikipedia. We would also like to thank the people who h ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We are grateful for input from various people’s work: Edwin Lewis-Kelham for implementing the YAGO2 user interface, Gerard de Melo for his help on integrating his Universal WordNet, and Erdal Kuzey for his work on named events and time facts in Wikipedia. We would also like to thank the people who helped evaluate the quality of YAGO2 by manual assessment, most notably, Ndapandula Nakashole, Stephan Seufert, Erdal Kuzey, and We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 80 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95 % of the facts in YAGO2. In this paper, we present the extraction methodology, the integration of the spatio-temporal dimension, and our knowledge representation SPOTL, an extension of the original SPO-triple
Instance-Driven Attachment of Semantic Annotations over Conceptual Hierarchies
"... Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more specific instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for conve ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more specific instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for converting flat sets of instance-level annotations into hierarchically organized, concept-level annotations, which capture not only the broad semantics of the desired arguments (e.g., ‘People ’ rather than ‘Locations’), but also the correct level of generality (e.g., ‘Composers ’ rather than ‘People’, or ‘Jazz Composers’). The method refrains from encoding features specific to a particular domain or annotation, to ensure immediate applicability to new, previously unseen annotations. Over a gold standard of semantic annotations and concepts that best capture their arguments, the method substantially outperforms three baselines, on average, computing concepts that are less than one step in the hierarchy away from the corresponding gold standard concepts. 1
Providing Multilingual, Multimodal Answers to Lexical Database Queries
"... Language users are increasingly turning to electronic resources to address their lexical information needs, due to their convenience and their ability to simultaneously capture different facets of lexical knowledge in a single interface. In this paper, we discuss techniques to respond to a user’s le ..."
Abstract
- Add to MetaCart
Language users are increasingly turning to electronic resources to address their lexical information needs, due to their convenience and their ability to simultaneously capture different facets of lexical knowledge in a single interface. In this paper, we discuss techniques to respond to a user’s lexical queries by providing multilingual and multimodal information, and facilitating navigating along different types of links. To this end, structured information from sources like WordNet, Wikipedia, Wiktionary, as well as Web services is linked and integrated to provide a multi-faceted yet consistent response to user queries. The meanings of words in many different languages are characterized by mapping them to appropriate WordNet sense identifiers and adding multilingual gloss descriptions as well as example sentences. Relationships are derived from WordNet and Wiktionary to allow users to discover semantically related words, etymologically related words, alternative spellings, as well as misspellings. Last but not least, images, audio recordings, and geographical maps extracted from Wikipedia and Wiktionary allow for a multimodal experience. 1.
Evaluating a Semantic Network Automatically Constructed from Lexical Co-occurrence on a Word Sense Disambiguation Task
"... We describe the extension and objective evaluation of a network1 of semantically related noun senses (or concepts) that has been automatically acquired by analyzing lexical cooccurrence in Wikipedia. The acquisition process makes no use of the metadata or links that have been manually built into the ..."
Abstract
- Add to MetaCart
We describe the extension and objective evaluation of a network1 of semantically related noun senses (or concepts) that has been automatically acquired by analyzing lexical cooccurrence in Wikipedia. The acquisition process makes no use of the metadata or links that have been manually built into the encyclopedia, and nouns in the network are automatically disambiguated to their corresponding noun senses without supervision. For this task, we use the noun sense inventory of WordNet 3.0. Thus, this work can be conceived of as augmenting the WordNet noun ontology with unweighted, undirected relatedto edges between synsets. Our network contains 208,832 such edges. We evaluate our network’s performance on a word sense disambiguation (WSD) task and show: a) the network is competitive with WordNet when used as a stand-alone knowledge source for two WSD algorithms; b) combining our network with WordNet achieves disambiguation results that exceed the performance of either resource individually; and c) our network outperforms a similar resource that has been automatically derived from semantic annotations in the Wikipedia corpus. 1
Towards a Universal Taxonomy of Many Concepts
"... Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in natural human language. The challenge thus lies in how to transfer human knowledge to machines. Much work has been devoted to creating univers ..."
Abstract
- Add to MetaCart
Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in natural human language. The challenge thus lies in how to transfer human knowledge to machines. Much work has been devoted to creating universal ontologies for this purpose. However, none of the existing ontologies has the necessary depth and breadth to offer “universal understanding. ” In this paper, we present a universal, probabilistic ontology that is more comprehensive than any of the existing ontologies. Currently, it contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages and two years ’ worth of search log data. Unlike traditional knowledge bases that treat knowledge as black and white, it enables probabilistic interpretations of the information it contains. The probabilistic nature then enables it to incorporate heterogeneous information in a natural way. We present details of how the core ontology is constructed, and how it models knowledge’s inherent uncertainty, ambiguity, and inconsistency. We also discuss potential applications, e.g., understanding user intent, that can benefit from the taxonomy.
Automatically Structuring Domain Knowledge from Text: an Overview of Current Research
"... This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably ..."
Abstract
- Add to MetaCart
This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.
Taxonomic RelationExtraction fromWikipedia: Datasetsand Algorithms
"... The dynamic and continuously growing category structure of Wikipedia has been used in numerous ontology extraction methods. We present a dataset of category subgraphs automatically extracted from Wikipedia that are manually annotated for is-a and instance-of relations in order to enable a more compr ..."
Abstract
- Add to MetaCart
The dynamic and continuously growing category structure of Wikipedia has been used in numerous ontology extraction methods. We present a dataset of category subgraphs automatically extracted from Wikipedia that are manually annotated for is-a and instance-of relations in order to enable a more comprehensive evaluation of taxonomy mining approaches. We also show how the new dataset can be used with a rich set of features to train accurate taxonomy extraction models. 1 IntroductionandMotivation The translation of large amounts of text and unstructured
The Role of Queries in Ranking Labeled Instances Extracted from Text Marius Pas¸ca
"... A weakly supervised method uses anonymized search queries to induce a ranking among class labels extracted from unstructured text for various instances. The accuracy of the extracted class labels exceeds that of previous methods, over evaluation sets of instances associated with Web search queries. ..."
Abstract
- Add to MetaCart
A weakly supervised method uses anonymized search queries to induce a ranking among class labels extracted from unstructured text for various instances. The accuracy of the extracted class labels exceeds that of previous methods, over evaluation sets of instances associated with Web search queries. 1
Unsupervised techniques for discovering ontology elements from Wikipedia article links
"... We present an unsupervised and unrestricted approach to discovering an infobox like ontology by exploiting the inter-article links within Wikipedia. It discovers new slots and fillers that may not be available in the Wikipedia infoboxes. Our results demonstrate that there are certain types of proper ..."
Abstract
- Add to MetaCart
We present an unsupervised and unrestricted approach to discovering an infobox like ontology by exploiting the inter-article links within Wikipedia. It discovers new slots and fillers that may not be available in the Wikipedia infoboxes. Our results demonstrate that there are certain types of properties that are evident in the link structure of resources like Wikipedia that can be predicted with high accuracy using little or no linguistic analysis. The discovered properties can be further used to discover a class hierarchy. Our experiments have focused on analyzing people in Wikipedia, but the techniques can be directly applied to other types of entities in text resources that are rich with hyperlinks. 1

