Results 1 - 10
of
37
Language-model-based ranking for queries on RDF-graphs
, 2009
"... The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large “knowledge repositories” such as DBpedia, Freebase, and YAGO. These collections can be viewed as graphs of entities and r ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large “knowledge repositories” such as DBpedia, Freebase, and YAGO. These collections can be viewed as graphs of entities and relationships (ER graphs) and can be represented as a set of subject-property-object (SPO) triples in the Semantic-Web data model RDF. Queries can be expressed in the W3C-endorsed SPARQL language or by similarly designed graph-pattern search. However, exact-match query semantics often fall short of satisfying the users ’ needs by returning too many or too few results. Therefore, IR-style ranking models are crucially needed. In this paper, we propose a language-model-based approach to ranking the results of exact, relaxed and keyword-augmented graphpattern queries over RDF graphs such as ER graphs. Our method estimates a query model and a set of result-graph models and ranks results based on their Kullback-Leibler divergence with respect to the query model. We demonstrate the effectiveness of our ranking model by a comprehensive user study.
Data Summaries for On-Demand Queries over Linked Data
- In: Proceedings of the 19th International World Wide Web Conference
, 2010
"... Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data – where structured data ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data – where structured data is accessible live and up-to-date at distributed Web resources that may change constantly – only to a limited degree, as query results can never be current. An ideal query answering system for Linked Data should return current answers in a reasonable amount of time, even on corpora as large as the Web. Query processors evaluating queries directly on the live sources require knowledge of the contents of data sources. In this paper, we develop and evaluate an approximate index structure summarising graph-structured content of sources adhering to Linked Data principles, provide an algorithm for answering conjunctive queries over Linked Data on the Web exploiting the source summary, and evaluate the system using synthetically generated queries. The experimental results show that our lightweight index structure enables complete and up-to-date query results over Linked Data, while keeping the overhead for querying low and providing a satisfying source ranking at no additional cost.
From Information to Knowledge: Harvesting Entities and Relationships from Web Sources
"... There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-l ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-ofthe-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.
x-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases ABSTRACT
"... The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition to incremental bulk loading, applications also ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition to incremental bulk loading, applications also require online updates with flexible control over multi-user isolation levels and data consistency. The challenge lies in meeting these requirements while retaining the capability for fast querying. This paper presents a comprehensive solution that is based on an extended deferred-indexing method with integrated versioning. The version store enables time-travel queries that are efficiently processed without adversely affecting queries on the current data. For flexible consistency, transactional concurrency control is provided with options for either snapshot isolation or full serializability. All methods are integrated in an extension of the RDF-3X system, and their very good performance for both queries and updates is demonstrated by measurements of multi-user workloads with real-life data as well as stress-test synthetic loads.
RDFViewS: A Storage Tuning Wizard for RDF Applications ∗
"... In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of potential view configurations, and exploits the possibly available semantic information- expressed via an RDF Schema- to ensure the completeness of the query evaluation.
Ad-hoc Object Retrieval in the Web of Data
"... Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a s ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.
Scalable Indexing of RDF Graphs for Efficient Join Processing ABSTRACT
"... Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structures. Weak data locality negatively impacts storage and query processing costs. Towards stronger data locality, we propose ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structures. Weak data locality negatively impacts storage and query processing costs. Towards stronger data locality, we propose a Three-way Triple Tree (TripleT) secondary memory indexing technique to facilitate flexible and efficient join evaluation on RDF data. The novelty of TripleT is that the index is built over the atoms occurring in the data set, rather than at a coarser granularity, such as whole triples occurring in the data set; and, the atoms are indexed regardless of the roles (i.e., subjects, predicates, or objects) they play in the triples of the data set. We show through extensive empirical evaluation that TripleT exhibits multiple orders of magnitude improvement over the state-of-the-art, in terms of both storage and query processing costs.
BitMat: A Main-memory Bit Matrix of RDF Triples for Conjunctive Triple Pattern Queries
"... This poster proposes BitMat, a bit matrix structure for representing a large number of RDF triples in memory and processing conjunctive triple pattern (multi-join) queries using it. The compact in-memory storage and use of bitwise operations, can lead to a faster processing of join queries when comp ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This poster proposes BitMat, a bit matrix structure for representing a large number of RDF triples in memory and processing conjunctive triple pattern (multi-join) queries using it. The compact in-memory storage and use of bitwise operations, can lead to a faster processing of join queries when compared to the conventional RDF triple stores. Unlike conventional RDF triple stores, where the size of the intermediate join results can grow very large, our BitMat based multijoin algorithm ensures that the intermediate result set remains small across any number of join operations (provided there are no Cartesian joins). We present the key concepts of BitMat structure, its use in processing join queries, describe the preliminary experimental results with UniProt and LUBM datasets, and discuss the possible use case scenarios.
A Database Perspective on Consuming Linked Data on the Web
"... During recent years an increasing number of data providers adopted the Linked Data principles for publishing and connecting structured data on the Web, thus creating a globally distributed dataspace – the Web of Data. While the execution of structured, SQL-like queries over this dataspace opens po ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
During recent years an increasing number of data providers adopted the Linked Data principles for publishing and connecting structured data on the Web, thus creating a globally distributed dataspace – the Web of Data. While the execution of structured, SQL-like queries over this dataspace opens possibilities not conceivable before, query execution on the Web of Data poses novel challenges. These challenges provide great opportunities for the database community. In this article we introduce the concept of Linked Data and discuss different approaches to query the Web of Data. Our goal is to provide a general understanding of this new research area and of the challenges and open issues that must be addressed.
A Node Indexing Scheme for Web Entity Retrieval
"... Abstract. Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrie ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an “entity retrieval system ” designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine. 1

