Results 11 - 20
of
130
Topic-driven multi-document summarization with encyclopedic knowledge and activation spreading
- In Proc. of EMNLP-08
, 2008
"... Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic – a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with th ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic – a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To “understand ” the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC summarization data. The system implemented ranks high compared to the participating systems in the DUC competitions, confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system. 1
DBpedia Mobile: A Location-Enabled Linked Data Browser
"... In this demonstration, we present DBpedia Mobile, a location-centric DBpedia client application for mobile devices consisting of a map view and a Fresnel-based Linked Data browser. The DBpedia project extracts structured information from Wikipedia and publishes this information as Linked Data on the ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this demonstration, we present DBpedia Mobile, a location-centric DBpedia client application for mobile devices consisting of a map view and a Fresnel-based Linked Data browser. The DBpedia project extracts structured information from Wikipedia and publishes this information as Linked Data on the Web. The DBpedia dataset contains information about 2.18 million things, including almost 300,000 geographic locations. DBpedia is interlinked with various other location-related datasets. Based on the current GPS position of a mobile device, DBpedia Mobile renders a map indicating nearby locations from the DBpedia dataset. Starting from this map, users can explore background information about locations and can navigate into interlinked datasets. DBpedia Mobile demonstrates that the DBpedia dataset can serve as a useful starting point to explore the Geospatial Semantic Web using a mobile device.
Using the Web to Reduce Data Sparseness in Pattern-based Information Extraction
- In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD
, 2007
"... Abstract. Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a p ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract. Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific databases. We present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can have similar effects as produced by increasing the number of seeds, it leads overall to better results. Our approach thus allows to combine advantages of two sources: The high reliability of a closed corpus and the high redundancy of the Web. 1
Towards a universal wordnet by learning from combined evidence
- In Proc. CIKM 2009
, 2009
"... Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically orga ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.
Decoding Wikipedia categories for knowledge acquisition
- In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI-08
, 2008
"... This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge.
Owlgres: A Scalable OWL Reasoner
"... Abstract. We present Owlgres, a DL-Lite reasoner implementation written for PostgreSQL, a mature open source database. Owlgres is an OWL reasoner that provides consistency checking and conjunctive query services, supports DL-LiteR as well as the OWL sameAs construct, and is not limited to PostgreSQL ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. We present Owlgres, a DL-Lite reasoner implementation written for PostgreSQL, a mature open source database. Owlgres is an OWL reasoner that provides consistency checking and conjunctive query services, supports DL-LiteR as well as the OWL sameAs construct, and is not limited to PostgreSQL. We discuss the implementation with special focus on sameAs and the supported subset of the SPARQL language. Emphasis is given to the implemented optimization techniques which resulted in significant performance improvement. Based on a confidential NASA dataset and part of the DBpedia dataset, we show a typical use case for Owlgres, i.e. given a terminology and a dataset, Owlgres provides querying on a persistent knowledge base with reasoning at query time in the expressivity of DL-LiteR. 1
From Information to Knowledge: Harvesting Entities and Relationships from Web Sources
"... There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-l ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-ofthe-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.
Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition
"... Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area. 1
A Semi-Supervised Method to Learn and Construct Taxonomies using the Web
"... Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root concept, a basic level concept, and recursive surface patterns to learn automatically from the Web hyponym-hypernym pai ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root concept, a basic level concept, and recursive surface patterns to learn automatically from the Web hyponym-hypernym pairs subordinated to the root; (2) a Web based concept positioning procedure to validate the learned pairs ’ is-a relations; and (3) a graph algorithm that derives from scratch the integrated taxonomy structure of all the terms. Comparing results with WordNet, we find that the algorithm misses some concepts and links, but also that it discovers many additional ones lacking in WordNet. We evaluate the taxonomization power of our method on reconstructing parts of the WordNet taxonomy. Experiments show that starting from scratch, the algorithm can reconstruct 62 % of the WordNet taxonomy for the regions tested.
Information Extraction from the Web: Techniques and Applications
, 2007
"... Web Information Extraction (WIE) systems have recently been able to extract massive quantities of relational data from online text. This has opened the possibility of achieving
an elusive goal in Artificial Intelligence (AI): broad-coverage domain knowledge. AI systems depend to a great extent on ha ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Web Information Extraction (WIE) systems have recently been able to extract massive quantities of relational data from online text. This has opened the possibility of achieving
an elusive goal in Artificial Intelligence (AI): broad-coverage domain knowledge. AI systems depend to a great extent on having knowledge about the domains in which they operate, and such knowledge is typically expensive to enter into the system. Furthermore, the knowledge must be entered for every different domain in which an application is to operate. The Web contains knowledge about all kinds of different domains, but in a format that is not readily
usable by AI systems. WIE promises to bridge the gap between the Web and AI.
Natural Language Processing is an example of an area in AI in which knowledge can make a dramatic difference in the performance of an application. Understanding or interpreting
language depends on the ability to understand the words used in a domain. The meanings, usages, and syntactic properties of words, and the relative frequency with which
certain words are used, are necessary pieces of information for effective language processing, and much of this information can be extracted from text. In one case study, this thesis examines methods for using extracted information in improving a particular kind of language
processing tool, a parser.
Before information extraction can become broadly useful, however, more research must be done to improve the quality of the extracted information. A number of factors affect the
quality, including correctness, importance or relevance, and the sophistication of meaning representation. The second case study in this thesis investigates a method for resolving synonyms in extracted information. This technique changes the meaning representation of extractions from one that relates words or names to one that relates entities to one another.

