Results 1 - 10
of
19
Probabilistic information retrieval approach for ranking of database query results
- ACM Transactions on Database Systems (TODS
, 2006
"... We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.
Articulating information needs in XML query languages
- Transactions on Information Systems
"... Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test suite of the INEX XML retrieval evaluation initiative. Theoretically, we create two mathematical models of users ’ knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language. Our main experimental findings are: First, while structure is used in varying degrees of complexity, two thirds of the queries can be expressed in a fielded-search like format which does not use the hierarchical structure of the documents. Second, three quarters of the queries use constraints on the context of the elements to be returned; these contextual constraints cannot be captured by ordinary keyword queries. Third, structure is used as a search hint, and not as a strict requirement, when judged against the underlying information need. Fourth, the use of structure in queries functions as a precision enhancing device.
Report on the db/ir panel at SIGMOD 2005
- SIGMOD Record
, 2005
"... This paper summarizes the salient aspects of the SIGMOD 2005 ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper summarizes the salient aspects of the SIGMOD 2005
Classification and Intelligent Search on Information in XML
- Bulletin of the IEEE Technical Committee on Data Engineering
, 2002
"... this paper. 2 Query languages for information retrieval of XML documents Looking at the broad variety of XML applications and systems that are currently under development, one can see that there are in fact two different views on XML: ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper. 2 Query languages for information retrieval of XML documents Looking at the broad variety of XML applications and systems that are currently under development, one can see that there are in fact two different views on XML:
Understanding content-and-structure
- In Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology
, 2005
"... Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML content comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. This has generated considerable interest i ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML content comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. This has generated considerable interest in both the IR and DB communities, and has lead to the launch of evaluation efforts tailored for XML documents. One of the driving and long-standing research questions here is: How does the increased expressiveness of languages for querying XML documents help users to better, and more effectively, express their information needs? And closely related to this: How should we evaluate systems that enable users to express their information needs using both content and structural constraints?
Multi-Dimensional Search for Personal Information Management Systems
, 2008
"... With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a need for complex search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools usually index text content, allowing for some I ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a need for complex search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools usually index text content, allowing for some IR-style ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. We propose a novel multi-dimensional approach to semi-structured data searches in personal information management systems by allowing users to provide fuzzy structure and metadata conditions in addition to keyword conditions. Our techniques provide a complex query interface that is more comprehensive than content-only searches as it considers three query dimensions (content, structure, metadata) in the search. We propose techniques to individually score each dimension, as well as a framework to integrate the three dimension scores into a meaningful unified score. Our work is integrated in Wayfinder, an existing fully-functioning file system. We perform a thorough experimental evaluation of our techniques to show the effect of approximating individual dimensions on the overall scores and ranks of files, as well as on query performance. Our experiments show that our scoring strategy adequately takes into account the approximation in each dimension to efficiently evaluate fuzzy multi-dimensional queries. In addition, fuzzy query conditions in non-content dimensions can significantly improve scoring (and thus ranking) accuracy.
Modelling anchor text retrieval in book search based on back-of-book index
- In Proceedings of the SIGIR 2008 Workshop on Focused Retrieval
, 2008
"... This paper proposes a probabilistic logic abstraction for modelling tf-boosting approaches to anchor text retrieval, adapted for the task of page-search in books. The underlying idea is to view the backof-book index (BoBI) as a list of anchors pointing to pages in the book. First, we model the direc ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper proposes a probabilistic logic abstraction for modelling tf-boosting approaches to anchor text retrieval, adapted for the task of page-search in books. The underlying idea is to view the backof-book index (BoBI) as a list of anchors pointing to pages in the book. First, we model the direct application of hypertext-based tf-boosting to books and show that this naive method of propagating anchor-text from the BoBI does not deliver the desired tf-boosting effect. To address this, we then propose a revised anchor-text retrieval model based on a novel voter approach. In this approach, each page of the book, where a given term occurs, acts as a virtual voter to the pages referenced by the BoBI for that term. The tf-boosting effect is achieved by propagating term weights from the voter pages to the pages in the BoBI. We use probabilistic Datalog for the high-level abstract modelling of retrieval strategies, which allows for the evolution and transfer of successful techniques from one domain, such as anchor-text retrieval in Web IR, to a similar domain, such as book search.
SIRIUS: A Lightweight XML Indexing and Approximate
- Search System at INEX 2005, INEX 2005, LNCS
"... Abstract. This paper reports on SIRIUS, a lightweight indexing and search engine [6] for XML documents. The retrieval approach implemented is document oriented. It involves an approximate matching scheme of the structure and textual content. Instead of managing the matching of whole DOM trees, SIRIU ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. This paper reports on SIRIUS, a lightweight indexing and search engine [6] for XML documents. The retrieval approach implemented is document oriented. It involves an approximate matching scheme of the structure and textual content. Instead of managing the matching of whole DOM trees, SIRIUS splits the documents object model in a set of paths. This set is indexed using optimized data structures. In this view, the request is a path-like expression with conditions on the attribute values. In this paper, we present the main functionalities and characteristics of this XML IR system and second we relate on our experience on adapting and using it for the INEX 2005 ad-hoc retrieval task. Finally, we present and analyze the SIRIUS retrieval performance obtained during the INEX 2005 evaluation campaign and show that despite the lightweight characteristics of SIRIUS we obtained quite good precision at low recall values. 1
Vague Element Selection and Query Rewriting for XML Retrieval
"... In this paper we present the extension of our prototype three-level database system (TIJAH) developed for structured information retrieval. The extension is aimed at modeling vague search on XML elements. All three levels (conceptual, logical, and physical) of the TIJAH system are enhanced to suppor ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we present the extension of our prototype three-level database system (TIJAH) developed for structured information retrieval. The extension is aimed at modeling vague search on XML elements. All three levels (conceptual, logical, and physical) of the TIJAH system are enhanced to support vague search concepts. The vague search is implemented as vague selection of XML elements using XML element name expansion lists and rewriting techniques. We test the performance of retrieval models using automatically generated expansion lists and compared them with models that use manual ones. The goal is to find the best approach for structured information retrieval with vague structural constraints on element names expressed in the query. 1.
Efficient Multi-Dimensional Query Processing in Personal Information Management Systems
, 2008
"... The relentless growth in capacity and dropping price of storage are driving an explosion in the amount of information users are collecting and storing in personal information management systems. This explosion of information has led to a critical need for complex search tools to access often very he ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The relentless growth in capacity and dropping price of storage are driving an explosion in the amount of information users are collecting and storing in personal information management systems. This explosion of information has led to a critical need for complex search tools to access often very heterogeneous data in a simple and efficient manner. Such tools should provide both highquality flexible scoring mechanisms and efficient query processing capabilities. In this paper, we focus on indexes and algorithms to efficiently identify the most relevant files that match multi-dimensional queries comprised of relaxed content, metadata, and structure conditions. We also adapted existing top-k query strategies to our specific scenario. Our work is integrated in Wayfinder, an existing fully functioning file system. We perform a thorough experimental evaluation of our file search techniques and show that our query processing strategies exhibit good behavior across all dimensions, resulting in good overall query performance and good scalability. 1

