Results 1 - 10
of
25
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
- ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Abstract
-
Cited by 149 (28 self)
- Add to MetaCart
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
Semistructured Data and XML
, 1998
"... This paper argues that the research on semistructured data is receiving a new set of challenges with the advent of XML (Extensible Mark-up Language [Bos97, Con98]). This is a new standard approved by the World Wide Web Consortium that many believe will become the de facto data exchange format for th ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
This paper argues that the research on semistructured data is receiving a new set of challenges with the advent of XML (Extensible Mark-up Language [Bos97, Con98]). This is a new standard approved by the World Wide Web Consortium that many believe will become the de facto data exchange format for the Web. XML supports the electronic exchange of machine-readable data (while HTML is designed primarily for human-readable documents). XML data shares many features of semistructured data: its structure can be irregular, is not always known ahead of time, and may change frequently and without notice. On the other hand it is easy to convert data from any source into XML which will make it attractive for organizations to "publish" their information sources in XML, and thus make them available to other XML applications on the Web. For XML applications to reach their full potential however, we need to build the right tools to process data in this new format. Existing Web tools (browsers, search engines) are oriented toward document operations . For XML we need database operations , like data extraction, data integration, data translation, data storage. The research done so far on semistructured data may offer some solutions to the database problems posed by XML. For example the recently proposed query language for XML, called XML-QL [DFF
A language for queries on structure and contents of textual databases
- In Proc. ACM SIGIR'95
, 1995
"... ..."
Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System
, 1996
"... Usually referred to as fielded search, a predicate specifies a pattern to be matched against the content of a field (Figure 2 , Construct 2). Typically, for each searchable field, IR systems build indexes [Salton 1989; Frakes and Baeza-Yates 1992; Faloutsos 1985] to direct the search engine to find ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
Usually referred to as fielded search, a predicate specifies a pattern to be matched against the content of a field (Figure 2 , Construct 2). Typically, for each searchable field, IR systems build indexes [Salton 1989; Frakes and Baeza-Yates 1992; Faloutsos 1985] to direct the search engine to find documents with some given term, such as the word cat or phrase "Joe Doe". The indexing schemes of a field restrict how it can be queried. Generally, there are two ways of indexing .
Complete Answer Aggregates for Tree-like Databases: A Novel Approach to Combine Querying and Navigation
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2001
"... The use of markup languages like SGML, HTML, or XML for encoding the structure of documents or linguistic data has lead . . . ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
The use of markup languages like SGML, HTML, or XML for encoding the structure of documents or linguistic data has lead . . .
Source Attribution for Querying Against Semi-structured Documents
, 1998
"... Mediation architectures like the Context Interchange research project, from which this work stems, integrate disparate information sources, hiding distribution and reconciling heterogeneity. As a result of transparent access, the notion of distinct sources often disappears from queries and results. ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Mediation architectures like the Context Interchange research project, from which this work stems, integrate disparate information sources, hiding distribution and reconciling heterogeneity. As a result of transparent access, the notion of distinct sources often disappears from queries and results. However, there are situations where users or applications do need to know the sources from which a particular datum is drawn: for example, enforcement of intellectual property, evaluation of data quality or measurement of the timeliness of data. In this paper, we define attribution as the association of a value in the result of a query with the sources either from which the data was extracted or which contributed to the selection. Motivated by multisource querying, attribution has particular applicability in the context of the World Wide Web. After beginning with an example that both describes and motivates the need for attribution within the semi-structured environment of the Web, we offer...
Improving index structures for structured document retrieval
- In IRSG'99, 21st Annual Colloquium on IR Research
, 1999
"... Structured document retrieval has established itself as a new research area in the overlap between Database Systems and Information Retrieval. This work proposes a filtering technique, that can be added to already existing index structures of many structured document retrieval systems. This new tech ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Structured document retrieval has established itself as a new research area in the overlap between Database Systems and Information Retrieval. This work proposes a filtering technique, that can be added to already existing index structures of many structured document retrieval systems. This new technique takes the contextual structure information of query and document database into account and reduces the occurrence sets returned by the original index structure drastically. This improves the performance of query evaluation. A measure is introduced that allows to quantify the added value of the proposed index structure. Based on this measure a heuristic is presented that allows to include only valuable context information in the index structure. 1
Query by templates: A generalized approach for visual query formulation for text dominated databases
- In ADL
, 1997
"... The WWW has a great potential of evolving into aglobally distributed digital document library.The primary use of such a library is to retrieve information quickly and easily. Because of the size of these libraries, simple keyword searches often result in too many matches. More complex searches invol ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The WWW has a great potential of evolving into aglobally distributed digital document library.The primary use of such a library is to retrieve information quickly and easily. Because of the size of these libraries, simple keyword searches often result in too many matches. More complex searches involving boolean expressions are di cult to formulate and understand. This paper describes QBT (Query By Templates), a visual method for formulating queries for structured document databases modeled with SGML. Based on Zloof's QBE (Query By Example), this method incorporates the structure of the documents for composing powerful queries. The goal of this technique is to design an interface for querying structured documents without prior knowledge of the internal structure. This paper describes the rationale behind QBT, illustrates the query formulation principles using QBT, and describes results obtained from a usability analysis on a prototype implementation of QBT on the Web using the Java TM programming language. 1
Models for Integrated Information Retrieval and Database Systems
- IEEE Data Engineering Bulletin
, 1996
"... In this paper, we show that there is a mismatch between information retrieval (IR) and database (DB) concepts, and we devise solutions for this problem. DB oriented approaches have to distinguish between the logical and the content structure of objects, and should also consider the layout structure. ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In this paper, we show that there is a mismatch between information retrieval (IR) and database (DB) concepts, and we devise solutions for this problem. DB oriented approaches have to distinguish between the logical and the content structure of objects, and should also consider the layout structure. Data independence — not regarded in IR before — can be achieved by using the notion of vague predicates. Since IR is based on uncertain inference, data models with uncertainty are required for an integrated IR-DB system. For this purpose, we present a probabilistic relational algebra. As extensions, probabilistic Datalog yields a more expressive query language, whereas a probabilistic nested relational model is more appropriate for modelling document structures. 1

