Results 1 - 10
of
27
Keyword Search on Structured and Semi-Structured Data
"... Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supp ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon
Language-model-based ranking for queries on RDF-graphs
, 2009
"... The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large “knowledge repositories” such as DBpedia, Freebase, and YAGO. These collections can be viewed as graphs of entities and r ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large “knowledge repositories” such as DBpedia, Freebase, and YAGO. These collections can be viewed as graphs of entities and relationships (ER graphs) and can be represented as a set of subject-property-object (SPO) triples in the Semantic-Web data model RDF. Queries can be expressed in the W3C-endorsed SPARQL language or by similarly designed graph-pattern search. However, exact-match query semantics often fall short of satisfying the users ’ needs by returning too many or too few results. Therefore, IR-style ranking models are crucially needed. In this paper, we propose a language-model-based approach to ranking the results of exact, relaxed and keyword-augmented graphpattern queries over RDF graphs such as ER graphs. Our method estimates a query model and a set of result-graph models and ranks results based on their Kullback-Leibler divergence with respect to the query model. We demonstrate the effectiveness of our ranking model by a comprehensive user study.
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach
, 2009
"... Existing keyword-search systems in relational databases require users to submit a complete query to compute answers. Often users feel “left in the dark” when they have limited knowledge about the data, and have to use a try-and-see method to modify queries and find answers. In this paper we propose ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Existing keyword-search systems in relational databases require users to submit a complete query to compute answers. Often users feel “left in the dark” when they have limited knowledge about the data, and have to use a try-and-see method to modify queries and find answers. In this paper we propose a novel approach to keyword search in the relational world, called Tastier. A Tastier system can bring instant gratification to users by supporting type-ahead search, which finds answers “on the fly” as the user types in query keywords. A main challenge is how to achieve a high interactive speed for large amounts of data in multiple tables, so that a query can be answered efficiently within milliseconds. We propose efficient index structures and algorithms for finding relevant answers on-the-fly by joining tuples in the database. We devise a partition-based method to improve query performance by grouping relevant tuples and pruning irrelevant tuples efficiently. We also develop a technique to answer a query efficiently by predicting highly relevant complete queries for the user. We have conducted a thorough experimental evaluation of the proposed techniques on real data sets to demonstrate the efficiency and practicality of this new search paradigm.
Structured Search Result Differentiation
"... Studies show that about 50 % of web search is for information exploration purpose, where a user would like to investigate, compare, evaluate, and synthesize multiple relevant results. Due to the absence of general tools that can effectively analyze and differentiate multiple results, a user has to m ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Studies show that about 50 % of web search is for information exploration purpose, where a user would like to investigate, compare, evaluate, and synthesize multiple relevant results. Due to the absence of general tools that can effectively analyze and differentiate multiple results, a user has to manually read and comprehend potentially large results in an exploratory search. Such a process is time consuming, labor intensive and error prone. With meta information embedded, keyword search on structured data provides the potential for automating or semi-automating the comparison of multiple results. In this paper we present an approach for differentiating search results on structured data. We define the differentiability of query results and quantify the degree of difference. Then we define the problem of identifying a limited number of valid features in a result that can maximally differentiate this result from the others, which is proved to be NP-hard. We propose two local optimality conditions, namely singleswap and multi-swap. Efficient algorithms are designed to achieve local optimality. To show the applicability of our approach, we implemented a system XRed for XML result differentiation. Our empirical evaluation verifies the effectiveness and efficiency of the proposed approach.
Topcells: Keywordbased search of top-k aggregated documents in text cube
- In International Conference on Data Engineering (ICDE
, 2010
"... RDBMSs provide users with a ranked list of relevant linked structures (e.g. joined tuples) or individual tuples. In this paper, we aim to support keyword search in a data cube with text-rich dimension(s) (so-called text cube). Each document is associated with structural dimensions. A cell in the tex ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
RDBMSs provide users with a ranked list of relevant linked structures (e.g. joined tuples) or individual tuples. In this paper, we aim to support keyword search in a data cube with text-rich dimension(s) (so-called text cube). Each document is associated with structural dimensions. A cell in the text cube aggregates a set of documents with matching dimension values on a subset of dimensions. Given a keyword query, our goal is to find the top-k most relevant cells in the text cube. We propose a relevance scoring model and efficient ranking algorithms. Experiments are conducted to verify their efficiency. I.
Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases
- In ER
, 2008
"... Abstract. The existing approaches of keyword search over relational databases always identify the relationships between tuples on the fly, which are rather inefficient as such relational relationships are very rich in the underlying databases. Alternatively, this paper proposes an alternative way by ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. The existing approaches of keyword search over relational databases always identify the relationships between tuples on the fly, which are rather inefficient as such relational relationships are very rich in the underlying databases. Alternatively, this paper proposes an alternative way by retrieving and materializing tuple units for facilitating the online processing of keyword search. We first propose a novel concept of tuple units, which are composed of the relevant tuples connected by the primary-foreign-key relationships. We then demonstrate how to generate and materialize the tuple units, and the technique for generating the tuple units can be done by issuing SQL statements and thus can be performed directly on the underlying RDBMS without modification to the database engine. Finally, we examine the techniques of indexing and ranking to improve the search efficiency and search quality. We have implemented our method and the experimental results show that our approach achieves much better search performance, and outperforms the alternative literatures significantly. 1
Keyword Search in Relational Databases: A Survey
"... The integration of DB and IR provides flexible ways for users to query information in the same platform [6, 2, 3, 7, 5, 28]. On one hand, the sophisticated DB facilities provided by RDBMSs assist users to query well-structured information using SQL. On the other hand, IR techniques allow users to se ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The integration of DB and IR provides flexible ways for users to query information in the same platform [6, 2, 3, 7, 5, 28]. On one hand, the sophisticated DB facilities provided by RDBMSs assist users to query well-structured information using SQL. On the other hand, IR techniques allow users to search unstructured information using keywords based on scoring and ranking, and do not need users to understand any database schemas.
Harvesting large-scale grids for software resources
- In CCGRID ’09
, 2009
"... Abstract — Grid infrastructures are in operation around the world, federating an impressive collection of computational resources and a wide variety of application software. In this context, it is important to establish advanced software discovery services that could help end-users locate software c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — Grid infrastructures are in operation around the world, federating an impressive collection of computational resources and a wide variety of application software. In this context, it is important to establish advanced software discovery services that could help end-users locate software components suitable to their needs. In this paper, we present the design, architecture and implementation of an open-source keywordbased paradigm for the search of software resources in Grid infrastructures, called Minersoft. A key goal of Minersoft is to annotate automatically all the software resources with keywordrich metadata. Using advanced Information Retrieval techniques, we locate software resources with respect to users queries. Experiments were conducted in EGEE, one of the largest Grid production services currently in operation. Results showed that Minersoft successfully crawled 12.3 million valid files (620 GB size) and sustained, in most sites, high crawling rates. I.
Providing Built-in Keyword Search Capabilities in RDBMS
, 2009
"... Abstract Acommonapproachtoperformingkeywordsearch overrelational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are ind ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract Acommonapproachtoperformingkeywordsearch overrelational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are independent of the underlying relational database management system (RDBMS), thus cannot benefit from the capabilities of the RDBMS. As an alternative, in this paper we propose a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently. We propose a novel structureaware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. The proposed techniques can be implemented using a standard relational RDBMS to benefit from its indexing and query processing capability. We have implemented our techniques in MYSQL, which can provide built-in keyword-search capabilities using SQL. The experimental results show a significant improvement in both search efficiency and result quality comparing to existing state-of-the-art approaches. 1
Toward Scalable Keyword Search over Relational Data
"... Keyword search (KWS) over relational databases has recently received significant attention. Many solutions and many prototypes have been developed. This task requires addressing many issues, including robustness, accuracy, reliability, and privacy. An emerging issue, however, appears to be performan ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Keyword search (KWS) over relational databases has recently received significant attention. Many solutions and many prototypes have been developed. This task requires addressing many issues, including robustness, accuracy, reliability, and privacy. An emerging issue, however, appears to be performance related: current KWS systems have unpredictable running times. In particular, for certain queries it takes too long to produce answers, and for others the system may even fail to return (e.g., after exhausting memory). In this paper we argue that as today’s users have been “spoiled ” by the performance of Internet search engines, KWS systems should return whatever answers they can produce quickly and then provide users with options for exploring any portion of the answer space not covered by these answers. Our basic idea is to produce answers that can be generated quickly as in today’s KWS systems, then to show users query forms that characterize the unexplored portion of the answer space. Combining KWS systems with forms allows us to bypass the performance problems inherent to KWS without compromising query coverage. We provide a proof of concept for this proposed approach, and discuss the challenges encountered in building this hybrid system. Finally, we present experiments over real-world datasets to demonstrate the feasibility of the proposed solution. 1.

