Results 1 -
8 of
8
XIRQL: A Query Language for Information Retrieval in XML Documents
, 2001
"... Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these featur ..."
Abstract
-
Cited by 140 (6 self)
- Add to MetaCart
Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented, that also serves as a starting point for query optimization.
XIRQL: An XML Query Language Based on Information Retrieval Concepts
, 2001
"... Most proposals for XML query languages are based on the data-centric view on XML and do not support uncertainty and vagueness, thus being insuitable for information retrieval (IR) of XML documents. Based on the document-centric view, we present the query language XIRQL which implements IR-related fe ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Most proposals for XML query languages are based on the data-centric view on XML and do not support uncertainty and vagueness, thus being insuitable for information retrieval (IR) of XML documents. Based on the document-centric view, we present the query language XIRQL which implements IR-related features such as weighting and ranking, relevance-oriented search, datatypes with vague predicates, and structural relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented which also serves as a starting point for query optimization.
Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital Libraries
- ACM Trans. Inf. Syst
, 2004
"... Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development e#orts diverge and interoperability su#ers. In this paper, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which contribute to defin ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development e#orts diverge and interoperability su#ers. In this paper, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which contribute to define digital libraries rigorously and usefully. Streams are sequences of abstract items used to describe static and dynamic content. Structures can be defined as labeled directed graphs, which impose organization. Spaces are sets of abstract items and operations on those sets that obey certain rules. Scenarios consist of sequences of events or actions that modify states of a computation in order to accomplish a functional requirement. Societies comprehend entities and the relationships between and among them. Together these abstractions relate and unify concepts, among others, of digital objects, metadata, collections, and services required to formalize and elucidate "digital libraries". The applicability, versatility and unifying power of the theory is demonstrated through its use in three distinct applications: building and interpretation of a DL taxonomy, analysis of case studies of digital libraries, and utilization as a formal basis for a DL description language. Keywords: digital libraries, theory, foundations, definitions, applications 1 1 Motivation Digital libraries are extremely complex information systems. The proper concept of a digital library seems hard to completely understand and evades definitional consensus. Di#erent views (e.g., historical, technological) and perspectives (e.g., from the library and information science, information retrieval, or human-computer interaction communities) have led to a myriad of di#ering definitions. Licklider, in his seminal ...
Filtering Algorithms for Information Retrieval Models with named Attributes and Proximity Operators
- In SIGIR
, 2004
"... In the selective dissemination of information (or publish/ subscribe) paradigm, clients subscribe to a server with continuous queries (or profiles) that express their information needs. Clients can also publish documents to servers. Whenever a document is published, the continuous queries satisfying ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
In the selective dissemination of information (or publish/ subscribe) paradigm, clients subscribe to a server with continuous queries (or profiles) that express their information needs. Clients can also publish documents to servers. Whenever a document is published, the continuous queries satisfying this document are found and notifications are sent to appropriate clients. This paper deals with the filtering problem that needs to be solved efficiently by each server: Given a database of continuous queries db and a document d, find all queries q ∈ db that match d. We present data structures and indexing algorithms that enable us to solve the filtering problem efficiently for large databases of queries expressed in the model AWP which is based on named attributes with values of type text, and word proximity operators. 1.
H.: The bird numbering scheme for xml and tree databases – deciding and reconstructing tree relations using efficient arithmetic operations
- In: Proc. Int’l. XML Database Symposium (XSym). Volume 3671 of LNCS., Springer-Verlag
, 2005
"... We introduce a family of numbering schemes for the nodes of tree databases that are based on a structural summary for the database, such as the DataGuide. Using such a scheme, given the node IDs of two database nodes and the corresponding nodes in the structural summary we may decide the extended XP ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We introduce a family of numbering schemes for the nodes of tree databases that are based on a structural summary for the database, such as the DataGuide. Using such a scheme, given the node IDs of two database nodes and the corresponding nodes in the structural summary we may decide the extended XPath relations Child, Child +, Child ∗, Following, NextSibling, NextSibling +, NextSibling ∗ for the nodes without access to the database. Similarly we can reconstruct the parent node and neighboured siblings of a given node. All decision and reconstruction steps are based on simple arithmetic operations. The BIRD scheme offers high expressivity and needs modest storage capacities. Compared to other identification schemes with similar expressivity, BIRD performs best in terms of both storage consumption and execution time for decision and reconstruction. A very attractive feature of the BIRD scheme is that all extended XPath relations can be decided and reconstructed in constant time, i.e. independent of tree position and distance of the nodes involved. 1.
A query language and user interface for XML information retrieval
- In
, 2003
"... As XML is about to become the standard format for structured documents, there is an increasing need for appropriate information retrieval (IR) methods. Since classical IR methods were developed for unstructured documents only, the logical markup of XML documents poses new challenges. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
As XML is about to become the standard format for structured documents, there is an increasing need for appropriate information retrieval (IR) methods. Since classical IR methods were developed for unstructured documents only, the logical markup of XML documents poses new challenges.
Next-Generation Information Retrieval: Integrating Document and Data Retrieval Based on XML
, 2003
"... xi 1 ..."
PIX: A System for Phrase Matching in XML Documents: A Demonstration
, 2003
"... We present a system that enables flexible and efficient phrase matching in XML documents. Since XML allows structured and unstructured information to be interleaved, phrase matching in XML raises new challenges. Our system, named PIX, permits phrase matching in XML documents that contain "mixed cont ..."
Abstract
- Add to MetaCart
We present a system that enables flexible and efficient phrase matching in XML documents. Since XML allows structured and unstructured information to be interleaved, phrase matching in XML raises new challenges. Our system, named PIX, permits phrase matching in XML documents that contain "mixed content". A key feature of PIX is that users can specify which element and content to ignore when matching a phrase. PIX uses inverted indices and an efficient evaluation algorithm to compute the set of matches and returns answers where phrases, ignored tags and content are highlighted. In addition, query answers are sorted using a ranking function. PIX is implemented as an extension of GALAX, a full-fledged XQuery engine. The functionality of PIX is fully integrated into XQuery and permits a natural combination of XPath-based structure matching with phrase matching.

