Results 1 - 10
of
20
An Efficient and Versatile Query Engine for TopX Search
- In VLDB
, 2005
"... This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists ..."
Abstract
-
Cited by 54 (17 self)
- Add to MetaCart
This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying...
On the Integration of Structure Indexes and Inverted Lists
- In SIGMOD
, 2004
"... Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions ove ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions over graph/tree-structured XML data. These methods can be classified into two broad classes. The first involves graph traversal where the input query is evaluated by traversing the data graph or some compressed representation. The other class involves information-retrieval style processing using inverted lists. In this framework, structure indexes have been proposed to be used as a substitute for graph traversal. These structure indexes are proven to be very effective when applied to queries that examine the “coarse ” structure of documents. For example, for many
Querying Complex Structured Databases
, 2007
"... Correctly generating a structured query (e.g., an XQuery or a SQL query) requires the user to have a full understanding of the database schema, which can be a daunting task. Alternative query models have been proposed to give users the ability to query the database without schema knowledge. ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Correctly generating a structured query (e.g., an XQuery or a SQL query) requires the user to have a full understanding of the database schema, which can be a daunting task. Alternative query models have been proposed to give users the ability to query the database without schema knowledge.
Measuring similarity between collection of values
- In 6th International Workshop on Web Information and Data Management
, 2004
"... In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain sin ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or complex, i.e., nested structures that contain other nodes, we propose two types of similarity metrics: MAVs, for atomic nodes and MCVs, for complex nodes. In the first case, we suggest the use of several application domain dependent metrics. In the second case, we define metrics for complex values that are structure dependent, andcanbe distinctly applied for tuples and collections of values. We also present experiments showing the effectiveness of our method.
Efficient, effective and flexible XML retrieval using summaries
- In Proc. of the 5th Intl Workshop of the Initiative for the Evaluation of XML Retrieval (INEX
, 2007
"... Abstract. Retrieval queries that combine structural constraints with keyword search are placing new challenges on retrieval systems. This paper presents TReX—a new retrieval system for XML. TReX can efficiently return either all the answers to a given query or only the top-k answers. In this paper, ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. Retrieval queries that combine structural constraints with keyword search are placing new challenges on retrieval systems. This paper presents TReX—a new retrieval system for XML. TReX can efficiently return either all the answers to a given query or only the top-k answers. In this paper, we discuss our participation in the annual Initiative for the Evaluation of XML Retrieval (INEX) workshop in the ad-hoc track. Our main contribution is to investigate the use of summaries and the flexibility they provide when dealing with structural constraints. We describe algorithms for retrieval using summaries. Experimental results are presented showing that TReX answers queries efficiently and effectively. 1
Users’ perspectives on the usefulness of structure for XML information retrieval
- In Proceedings of the 1st International Conference on the Theory of Inofrmation Retrieval
, 2007
"... Abstract: The widespread use of the eXtensible Markup Language (XML) on the Web and in digital libraries has led to a drastic increase in the number of XML Information Retrieval (IR) systems being developed. XML IR approaches exploit the logical structure of documents for their querying, retrieval a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract: The widespread use of the eXtensible Markup Language (XML) on the Web and in digital libraries has led to a drastic increase in the number of XML Information Retrieval (IR) systems being developed. XML IR approaches exploit the logical structure of documents for their querying, retrieval and presentation to the user. Despite their abundance, there remains uncertainty regarding the advantages that structural information may bring to IR. In this paper we report on a user study exploring questions around the potential benefits of structure to users, such as: Is structural information useful when searching for relevant information? Can the structure of a document help to locate relevant information when browsing inside a document? Does the role of structural information depend on the length of a document? Our investigation was conducted as part of the INEX 2006 interactive track experiment, which we supplemented with questionnaires. Our qualitative analysis of the data collected from seven participants aims to identify how users will interact with XML IR systems. We do this by drawing parallels with paper based information searching, Web searching, and digital library searching. What we find is that XML IR users are unlike Web users – they use advanced search facilities, they prefer a list of results supplement with branch points into the document, and they need better methods of navigation within long documents. 1.
Integration of IR into an XML Database
, 2002
"... Structure matching has been the focus and strength of standard XML querying. However, textual content is still an essential component of XML data. It is therefore important to extend the standard XML database engine to allow for "Information Retrieval" style queries, namely, "keyword " based retriev ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Structure matching has been the focus and strength of standard XML querying. However, textual content is still an essential component of XML data. It is therefore important to extend the standard XML database engine to allow for "Information Retrieval" style queries, namely, "keyword " based retrieval and "result ranking". In this paper, we describe our e#ort in integrating information retrieval techniques into the Timber XML database system being developed at the University of Michigan, and our participation in the INitiative for the Evaluation of XML Retrieval (INEX).
Effective keyword search in XML documents based on MIU
- In Proc. of DASFAA Conference
, 2006
"... Abstract. Keyword search is an effective approach for most users to search for information because they do not need to learn complex query languages or the underlying structures of the data. This paper focuses on effective keyword search in XML documents which are modeled as labeled trees. We first ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. Keyword search is an effective approach for most users to search for information because they do not need to learn complex query languages or the underlying structures of the data. This paper focuses on effective keyword search in XML documents which are modeled as labeled trees. We first analyze the problems caused by the refinement of result granularity during XML keyword search and then propose to partition an XML document into XML fragments with the granularity of Minimal Information Unit (MIU). Furthermore, we present efficient index structures and the corresponding search algorithms. Finally, our comprehensive experiments demonstrate the benefits of our method over previously proposed methods in terms of result quality, index size and execution time. 1
Querying XML Using Structures and Keywords in Timber
- In SIGIR’03
, 2003
"... This demonstration will describe how Timber, a native XML database system, has been extended with the capability to answer XML-style structured queries (e.g., XQuery) with embedded IR-style keyword-based non-boolean conditions. With the original structured query processing engine and the IR extensio ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This demonstration will describe how Timber, a native XML database system, has been extended with the capability to answer XML-style structured queries (e.g., XQuery) with embedded IR-style keyword-based non-boolean conditions. With the original structured query processing engine and the IR extensions built into the system, Timber is well suited for e#ciently and e#ectively processing queries with both structural and textual content constraints.
Z.M.: Querying Web Metadata: Native Score Management and Text Support in Databases
- ACM Trans. on Database Sys
, 2004
"... In this paper, we discuss the issues involved in adding a native score management system to object-relational databases, to be used in querying web metadata (that describes the semantic content of web resources). The web metadata model is based on topics (representing entities), relationships among ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we discuss the issues involved in adding a native score management system to object-relational databases, to be used in querying web metadata (that describes the semantic content of web resources). The web metadata model is based on topics (representing entities), relationships among topics (called metalinks), and importance scores (sideway values) of topics and metalinks. We extend database relations with scoring functions and importance scores. We add to SQL score-management clauses with well-defined semantics, and propose the sideway-value algebra (SVA), to evaluate the extended SQL queries. SQL extensions and the SVA algebra are illustrated through two web resources, namely, the DBLP Bibliography and the SIGMOD Anthology. SQL extensions include clauses for propagating input tuple importance scores to output tuples during query processing, clauses that specify query stopping conditions, threshold predicates—a type of approximate similarity predicates for text comparisons, and user-defined-function-based predicates. The propagated importance scores are then used to rank and return a small number of output tuples. The query stopping conditions are propagated to SVA operators during query processing. We show that our SQL extensions are well-defined, meaning that, given a database and a query Q, under any query processing scheme, the output tuples of Q and their importance scores stay the same.

