Results 1 - 10
of
15
The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking
- In EDBT
, 2002
"... Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electroni ..."
Abstract
-
Cited by 89 (7 self)
- Add to MetaCart
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retreval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness.
TeXQuery: A Full-Text Search Extension to XQuery
, 2004
"... ... mix of structured and unstructured (text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very rudimentary queries over text data. We thus propose TeXQuery, which is a powerful full-text search extension to ..."
Abstract
-
Cited by 61 (7 self)
- Add to MetaCart
... mix of structured and unstructured (text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very rudimentary queries over text data. We thus propose TeXQuery, which is a powerful full-text search extension to XQuery. TeXQuery provides a rich set of fully composable full-text search primitives, such as Boolean connectives, phrase matching, proximity distance, stemming and thesauri. TeXQuery also enables users to seamlessly query over both structured and text data by embedding TeXQuery primitives in XQuery, and vice versa. Finally, TeXQuery supports a flexible scoring construct that can be used to score query results based on full-text predicates. TeXQuery is the precursor of the full-text language extensions to XPath 2.0 and XQuery 1.0 currently being developed by the W3C.
Tree Pattern Relaxation
, 2002
"... Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In this paper ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In this paper, we study the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns. We consider weighted tree patterns, where exact and relaxed weights, associated with nodes and edges of the tree pattern, are used to compute the scores of query answers. We are
ProTDB: Probabilistic data in XML
- In Proceedings of the 28th VLDB Conference
, 2002
"... Abstract Whereas traditional databases manage onlydeterministic information, many applications that use databases involve uncertain data.This paper presents a Probabilistic Tree Data Base (ProTDB) to manage probabilistic data,represented in XML. Our approach differs from previous effortsto develop p ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Abstract Whereas traditional databases manage onlydeterministic information, many applications that use databases involve uncertain data.This paper presents a Probabilistic Tree Data Base (ProTDB) to manage probabilistic data,represented in XML. Our approach differs from previous effortsto develop probabilistic relational systems in that we build a probabilistic XML database.This design is driven by application needs that involve data not readily amenable to a rela-tional representation. XML data poses several modeling challenges: due to its structure, dueto the possibility of uncertainty association at multiple granularities, and due to the possi-bility of missing and repeated sub-elements. We present a probabilistic XML model thataddresses all of these challenges. We devise an implementation of XML query operationsusing our probability model, and demonstrate the efficiency of our implementation experi-mentally. We have used ProTDB to manage data fromtwo application areas: protein chemistry data from the bioinformatics domain, and informa-tion extraction data obtained from the web using a natural language analysis system. Wepresent a brief case study of the latter to demonstrate the value of probabilistic XMLdata management.
Similarity Search in XML Data using Cost-Based Query Transformations
- IN PROC. 4TH INTERN. WORKSHOP ON THE WEB AND DATABASES
, 2001
"... XML query engines should support structured queries. They should retrieve exact matches as well as results similar to the query. In this paper, we introduce the simple query language approXQL that supports hierarchical, Boolean-connected query patterns. The interpretation of approXQL queries is f ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
XML query engines should support structured queries. They should retrieve exact matches as well as results similar to the query. In this paper, we introduce the simple query language approXQL that supports hierarchical, Boolean-connected query patterns. The interpretation of approXQL queries is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between a query and the data and is used to rank the results. All results of an approXQL query can be computed in polynomial time with respect to the database size.
Structural Proximity Searching for Large Collections of Semi-Structured Data
- In Proceedings of ACM CIKM
, 2001
"... The richness of the XML data format allows data to be structured in a way which precisely captures the semantics required by the author. It is the structure of the data, however, which forms the basis of all XML query languages. Without at least some notion of the structure, a user cannot meaningful ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
The richness of the XML data format allows data to be structured in a way which precisely captures the semantics required by the author. It is the structure of the data, however, which forms the basis of all XML query languages. Without at least some notion of the structure, a user cannot meaningfully query the data. This problem is compounded when one considers that heterogeneous data adhering to different schema are likely to exist in the database(s) being queried. This paper proposes a solution based on an e- cient proximity index. In particular, we describe a family of encoding and compression schemes which enable us to build an index to eciently implement the proximity search. Our index is extremely small, and can reect updates in the underlying database in modest time. Experiments show that our algorithm and implementation are fast and scale well.
GalaTex: A Conformant Implementation of the XQuery Full-Text Language
- WWW
"... We describe GALATEX [10], the first complete implementation of XQuery Full-Text, a W3C specification that extends XPath 2.0 and XQuery 1.0 with full-text search capabilities. XQuery Full-Text provides composable full-text search primitives such as simple keyword search, Boolean queries, and keyword- ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We describe GALATEX [10], the first complete implementation of XQuery Full-Text, a W3C specification that extends XPath 2.0 and XQuery 1.0 with full-text search capabilities. XQuery Full-Text provides composable full-text search primitives such as simple keyword search, Boolean queries, and keyword-distance predicates. GALATEX is intended to serve as a reference implementation for XQuery Full-Text and as a platform for addressing new research problems such as scoring full-text query results, optimizing XML queries over both structure and text, and evaluating top-k queries on scored results. GALATEX is an all-XQuery implementation initially focused on completeness and conformance rather than on efficiency. We describe its implementation on top of Galax, a complete XQuery implementation and identify some performance challenges, possible solutions, and their interactions with XQuery implementations. 1.
Expressive and efficient ranked queries for XML data
- Proceedings of the Fourth International Workshop on the Web and Databases (WebDB'01)
, 2001
"... Several XML query languages have been developed, but none support ranked/weighted query results based on textual similarity. We propose ELIXIR, a general-purpose language for XML information retrieval that extends Deutsch et al's XML-QL language with a textual similarity operator. Unlike related eff ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Several XML query languages have been developed, but none support ranked/weighted query results based on textual similarity. We propose ELIXIR, a general-purpose language for XML information retrieval that extends Deutsch et al's XML-QL language with a textual similarity operator. Unlike related efforts, ELIXIR is sufficiently expressive to handle queries such as "find books and CDs with similar titles" that require similarity joins. Our query processing algorithm rewrites an ELIXIR query into a series of XML-QL queries that generate intermediate relational data, and then invokes Cohen's WHIRL algorithm to efficiently evaluate the similarity operators on this intermediate data, yielding an XML document with nodes ranked by similarity.
Integrating Diverse Information Management Systems: A Brief Survey
- IEEE Data Engineering Bulletin
, 2001
"... Most current information management systems can be classified into text retrieval systems, relational/object database systems, or semistructured/XML database systems. However, in practice, many applications data sets involve a combination of free text, structured data, and semistructured data. Henc ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Most current information management systems can be classified into text retrieval systems, relational/object database systems, or semistructured/XML database systems. However, in practice, many applications data sets involve a combination of free text, structured data, and semistructured data. Hence, integration of different types of information management systems has been, and continues to be, an active research topic. In this paper, we present a short survey of prior work on integrating and inter-operating between text, structured, and semistructured database systems. We classify existing literature based on the kinds of systems being integrated and the approach to integration. Based on this classification, we identify the challenges and the key themes underlying existing work in this area.
Semantic similarity search on semistructured data with the XXL search engine
- Information Retrieval
, 2005
"... Abstract. Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as el ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract. Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

