Results 1 - 10
of
26
Keyword Search on Structured and Semi-Structured Data
"... Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supp ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
(Show Context)
Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon
Enhanced results for web search
- In SIGIR’11. ACM
, 2011
"... “Ten blue links ” have defined web search results for the last fifteen years – snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intentspeci ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
“Ten blue links ” have defined web search results for the last fifteen years – snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intentspecific key value pairs, and elements that allow the user to interact with the contents of a web page directly from the search results page. We show that users express a preference for enhanced results both explicitly, and when observed in their search behavior. We also demonstrate the effectiveness of enhanced results in helping users to assess the relevance of search results. Lastly, we show that we can efficiently generate enhanced results to cover a significant fraction of search result pages.
eXtract: A Snippet Generation System for XML Search
, 2008
"... Snippets are used by almost every text search engine to complement ranking schemes in order to effectively handle user keyword search. Despite the fact that XML is a standard representation format of web data, research on generating result snippets for XML search remains untouched. In this work, we ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Snippets are used by almost every text search engine to complement ranking schemes in order to effectively handle user keyword search. Despite the fact that XML is a standard representation format of web data, research on generating result snippets for XML search remains untouched. In this work, we present eXtract, a system that efficiently generates self-contained result snippets within a given size bound which effectively summarize the query results and differentiate them from one another, according to which users can quickly assess the relevance of the query results.
Size-l Object Summaries for Relational Keyword Search
, 2011
"... A previously proposed keyword search paradigm produces, as a query result, a ranked list of Object Summaries (OSs). An OS is a tree structure of related tuples that summarizes all data held in a relational database about a particular Data Subject (DS). However, some of these OSs are very large in si ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
A previously proposed keyword search paradigm produces, as a query result, a ranked list of Object Summaries (OSs). An OS is a tree structure of related tuples that summarizes all data held in a relational database about a particular Data Subject (DS). However, some of these OSs are very large in size and therefore unfriendly to users that initially prefer synoptic information before proceeding to more comprehensive information about a particular DS. In this paper, we investigate the effective and efficient retrieval of concise and informative OSs. We argue that a good size-l OS should be a stand-alone and meaningful synopsis of the most important information about the particular DS. More precisely, we define a size-l OS as a partial OS composed of l important tuples. We propose three algorithms for the efficient generation of size-l OSs (in addition to the optimal approach which requires exponential time). Experimental evaluation on DBLP and TPC-H databases verifies the effectiveness and efficiency of our approach.
Snippet Generation for Semantic Web Search Engines
"... Abstract. With the development of the Semantic Web, more and more ontologies are available for exploitation by semantic search engines. However, while semantic search engines support the retrieval of candidate ontologies, the final selection of the most appropriate ontology is still difficult for th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. With the development of the Semantic Web, more and more ontologies are available for exploitation by semantic search engines. However, while semantic search engines support the retrieval of candidate ontologies, the final selection of the most appropriate ontology is still difficult for the end users. In this paper, we extend existing work on ontology summarization to support the presentation of ontology snippets. The proposed solution leverages a new semantic similarity measure to generate snippets that are based on the given query. Experimental results have shown the potential of our solution in this problem domain that is largely unexplored so far.
Summarizing Answer Graphs Induced by Keyword Queries
"... Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as “answer graphs”, that could include different relationships among keywords. An ignored yet important task is to group and su ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as “answer graphs”, that could include different relationships among keywords. An ignored yet important task is to group and summarize answer graphs that share similar structures and contents for better query interpretation and result understanding. This paper studies the summarization problem for the answer graphs induced by a keyword query Q. (1) A notion of summary graph is proposed to characterize the summarization of answer graphs. Given Q and a set of answer graphs G, a summary graph preserves the relation
Result Reconstruction Approach for More Effective XML Fragment Search
"... We propose and evaluate a method for obtaining more ac-curate search results in extensible markup language (XML) fragment search, which is a search that produces only rele-vant fragments or portions of an XML document. The exist-ing approaches generate a ranked list in descending order of each XML f ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
We propose and evaluate a method for obtaining more ac-curate search results in extensible markup language (XML) fragment search, which is a search that produces only rele-vant fragments or portions of an XML document. The exist-ing approaches generate a ranked list in descending order of each XML fragment’s relevance to a search query; however, these approaches often extract irrelevant XML fragments and overlook more relevant fragments. To address these problems, our approach extracts relevant XML fragments by considering the size of the fragments and the relationships between the fragments. Next, we score the XML fragments to generate a refined ranked list. For scoring, we rank the XML fragments that are informative for user information needs as high in the list. In particular, each XML fragment is scored using the statistics of its descendant and ancestor XML fragments. Our experimental evaluations show that the proposed method outperforms BM25E, a conventional approach, which nei-ther reconstructs XML fragments nor uses descendant and ancestor statistics.
Entity Summarisation with Limited Edge Budget on Knowledge Graphs
"... Abstract—We formulate a novel problem of summarising entities with limited presentation budget on entity-relationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented together with a visualising tool. Experimental user evaluation of the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We formulate a novel problem of summarising entities with limited presentation budget on entity-relationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented together with a visualising tool. Experimental user evaluation of the algorithm wasconductedonreallarge semanticknowledgegraphsextracted from the web. The reported results of experimental user evaluation are promising and encourage to continue the work on improving the algorithm. I.
Y.: Summarizing highly structured documents for effective search interaction
- In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
, 2012
"... As highly structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Unfortunately existing work on document sum-marization, especially in the context of search, has been mainly focused on unst ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
As highly structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Unfortunately existing work on document sum-marization, especially in the context of search, has been mainly focused on unstructured documents, and little at-tention has been paid to highly structured documents. Due to the different characteristics of structured and unstruc-tured documents, the ideal approaches for document sum-marization might be different. In this paper, we study the problem of summarizing highly structured documents in a search context. We propose a new summarization approach based on query-specific facet selection. Our approach aims to discover the important facets hidden behind a query us-ing a machine learning approach, and summarizes retrieved documents based on those important facets. In addition, we propose to evaluate summarization approaches based on a utility function that measures how well the summaries assist users in interacting with the search results. Furthermore, we develop a game on Mechanical Turk to evaluate different summarization approaches. The experimental results show that the new summarization approach significantly outper-forms two existing ones.
Document summarization based on data reconstruction
- In Proceedings of AAAI
, 2012
"... Document summarization is of great value to many real world applications, such as snippets generation for search results and news headlines generation. Tradition-ally, document summarization is implemented by ex-tracting sentences that cover the main topics of a doc-ument with a minimum redundancy. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Document summarization is of great value to many real world applications, such as snippets generation for search results and news headlines generation. Tradition-ally, document summarization is implemented by ex-tracting sentences that cover the main topics of a doc-ument with a minimum redundancy. In this paper, we take a different perspective from data reconstruction and propose a novel framework named Document Summa-rization based on Data Reconstruction (DSDR). Specif-ically, our approach generates a summary which consist of those sentences that can best reconstruct the original document. To model the relationship among sentences, we introduce two objective functions: (1) linear recon-struction, which approximates the document by linear combinations of the selected sentences; (2) nonnega-tive linear reconstruction, which allows only additive, not subtractive, linear combinations. In this framework, the reconstruction error becomes a natural criterion for measuring the quality of the summary. For each objec-tive function, we develop an efficient algorithm to solve the corresponding optimization problem. Extensive ex-periments on summarization benchmark data sets DUC 2006 and DUC 2007 demonstrate the effectiveness of our proposed approach.