Results 1 -
5 of
5
Computing on Data Streams
, 1998
"... In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, ..."
Abstract
-
Cited by 141 (3 self)
- Add to MetaCart
In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, invoking in the process tools from the area of communication complexity.
Query Rewriting for Semistructured Data
"... We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q ..."
Abstract
-
Cited by 60 (9 self)
- Add to MetaCart
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and query composition- techniques that were developed for structured, relational data. We also develop an algorithm for equivalence checking of TSL queries. We show that the algorithm is sound and complete for TSL, i.e., it always finds every non-trivial TSL rewriting query of q, and we discuss its complexity. We extend the rewriting algorithm to use some forms of structural constraints (such as DTDs) and find more opportunities for query rewriting.
Distributed Hypertext Resource Discovery Through Examples
, 1999
"... We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as "find the number of links from an environmental protection page to a page abou ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as "find the number of links from an environmental protection page to a page about oil and natural gas over the last year." A key problem in populating the database in such a system is to discover web resources related to the topics involved in such queries. We argue that that a keywordbased "find similar" search based on a giant all-purpose crawler is neither necessary nor adequate for resource discovery. Instead we exploit the properties that pages tend to cite pages with related topics, and given that a page u cites a page about a desired topic, it is very likely that u cites additional desirable pages. We exploit these properties by using a crawler controlled by two hypertext mining programs: (1) a classifier that evaluates the relevance of a region of the web to the...
Managing Periodically Updated Data in Relational Databases: A Stochastic Modeling Approach
- Journal of the ACM
, 2001
"... Recent trends in information management involve the periodic transcription of data onto secondary devices in a networked environment, and the proper scheduling of these transcriptions is critical for efficient data management. To assist in the scheduling process, we are interested in modeling dat ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Recent trends in information management involve the periodic transcription of data onto secondary devices in a networked environment, and the proper scheduling of these transcriptions is critical for efficient data management. To assist in the scheduling process, we are interested in modeling data obsolescence, that is, the reduction of consistency over time between a relation and its replica. The modeling is based on techniques from the field of stochastic processes, and provides several stochastic models for content evolution in the base relations of a database, taking referential integrity constraints into account. These models are general enough to accommodate most of the common scenarios in databases, including batch insertions and life spans both with and without memory. As an initial "proof of concept" of the applicability of our approach, we validate the insertion portion of our model framework via experiments with real data feeds. We also discuss a set of transcription protocols which make use of the proposed stochastic model.
Query Rewriting using Semistructured Views
- In Proc. of the ACM SIGMOD Int. Conf. on Management of Data
, 1998
"... We address the problem of query rewriting for MSL, a semistructured language developed at Stanford in the TSIMMIS project for information integration. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We address the problem of query rewriting for MSL, a semistructured language developed at Stanford in the TSIMMIS project for information integration. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and unification -- techniques which were developed for structured, relational data. At the same time we develop an algorithm for equivalence checking of MSL queries. We show that the rewriting algorithm is sound and complete, i.e., it always finds every conjunctive MSL rewriting query of q, and we discuss its complexity. We currently incorporate the algorithm in the TSIMMIS system. 1 Introduction Recently, many semistructured data models, query and view definition languages have been proposed [GM + 97, MAG + 97, BDHS96, AV97a, MM97, KS95, PGMU96,...

