Results 1 - 10
of
42
Data Caching Issues in an Information Retrieval System
- ACM Transactions on Database Systems
, 1990
"... Currently, a variety of information retrieval systems are available to potential users. These services are provided by commercial enterprises (such as Dow Jones [6] and The Source [7]), while others are research efforts (the Boston Community Information System [S]). While in many cases these systems ..."
Abstract
-
Cited by 191 (6 self)
- Add to MetaCart
Currently, a variety of information retrieval systems are available to potential users. These services are provided by commercial enterprises (such as Dow Jones [6] and The Source [7]), while others are research efforts (the Boston Community Information System [S]). While in many cases these systems are accessed from personal computers, typically no advantage is taken of the computing resources of those machines (such as local processing and storage). In this paper we explore the possibility of using the user’s local storage capabilities to cache data at the user’s site. This would improve the response time of user queries albeit at the cost of incurring the overhead required in maintaining multiple copies. In order to reduce this overhead it may be appropriate to allow copies to diverge in a controlled fashion. This would not only make caching less costly, but would also make it possible to propagate updates to the copies more efficiently, for example, when the system is lightly loaded, when communication tariffs are lower, or by batching updates together. Just as importantly, it also makes it possible to access the copies even when the communication lines or the central site are down. Thus, we introduce the notion of quasi-copies, which embodies the ideas sketched above. We also define the types of deviations that seem useful, and discuss the available implementation strategies.
Continual Queries for Internet Scale Event-Driven Information Delivery
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1999
"... In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries a ..."
Abstract
-
Cited by 153 (13 self)
- Add to MetaCart
In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries are standing queries that monitor update of interest and return results whenever the update reaches specified thresholds. In OpenCQ, users may specify to the system the information they would like to monitor (such as the events or the update thresholds they are interested in). Whenever the information of interest becomes available, the system immediately delivers it to the relevant users; otherwise, the system continually monitors the arrival of the desired information and pushes it to the relevant users as it meets the specified update thresholds. In contrast to conventional pull-based data management systems such as DBMSs and Web search engines, OpenCQ exhibits two important featu...
Updating Derived Relations: Detecting Irrelevant and Autonomously Computable Updates
- ACM Transactions on Database Systems
, 1989
"... Consider a database containing not only base relations but also stored derived relations (also called materialized or concrete views). When a base relation is updated, it may also be necessary to update some of the derived relations. This paper gives sufficient and necessary conditions for detecting ..."
Abstract
-
Cited by 151 (2 self)
- Add to MetaCart
Consider a database containing not only base relations but also stored derived relations (also called materialized or concrete views). When a base relation is updated, it may also be necessary to update some of the derived relations. This paper gives sufficient and necessary conditions for detecting when an update of a base relation cannot affect a derived relation (an irrelevant update), and for detecting when a derived relation can be correctly updated using no data other than the derived relation itself and the given update operation (an autonomously computable update). The class of derived relations considered is restricted to those defined by PSJ-expressions, that is, any relational algebra expression constructed from an arbitrary number of project, select and join operations (but containing no self-joins). The class of update operations consists of insertions, deletions, and modifications, where the set of tuples to be deleted or modified is specified by a selection condition on ...
The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching
- In Proceedings of the International Conference on Extending Database Technology
, 1994
"... Abstract. In this paper, we describe the design and implementation of the ADMS query optimizer. This optimizer integrates query matching into optimization and generates more e cient query plans using cached results. It features data caching and pointer caching, alternative cache replacement strategi ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
Abstract. In this paper, we describe the design and implementation of the ADMS query optimizer. This optimizer integrates query matching into optimization and generates more e cient query plans using cached results. It features data caching and pointer caching, alternative cache replacement strategies, and di erent cache update methods. A comprehensive set of experiments were conducted using a benchmark database and synthetic queries. The results showed that pointer caching and dynamic cache update strategies substantially saved query execution time and, thus, increased query throughput under situations with fair query correlation and update load. The requirement of the disk cache space is relatively small, and the extra optimization overhead introduced is more than o set by the time saved in query evaluation. 1
Applying Update Streams in a Soft Real-Time Database System
- In ACM SIGMOD
, 1995
"... Many papers have examined how to efficiently export a materialized view but to our knowledge none have studied how to efficiently import one. To import a view, i.e., to install a stream of updates, a real-time database system must process new updates in a timely fashion to keep the database "fresh," ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
Many papers have examined how to efficiently export a materialized view but to our knowledge none have studied how to efficiently import one. To import a view, i.e., to install a stream of updates, a real-time database system must process new updates in a timely fashion to keep the database "fresh," but at the same time must process transactions and ensure they meet their time constraints. In this paper, we discuss the various properties of updates and views (including staleness) that affect this tradeoff. We also examine, through simulation, four algorithms for scheduling transactions and installing updates in a soft real-time database. Keywords: soft real-time, temporal databases, materialized views, updates. 1 Introduction The problem we study in this paper arose during the ongoing implementation of the STRIP real-time database system. 1 This system [2] provides traditional database services (e.g., SQL, indexing, recovery) with real-time facilities (e.g., transaction deadlines,...
Best-Effort Cache Synchronization with Source Cooperation
- IN SIGMOD
, 2002
"... In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (out-of-date) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by sele ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (out-of-date) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by selectively refreshing modified objects. We call the online process of selecting which objects to refresh in order to minimize divergence best-effort synchronization. In most approaches to best-effort synchronization, the cache coordinates the process and selects objects to refresh. In this paper, we propose a best-effort synchronization scheduling policy that exploits cooperation between data sources and the cache. We also propose an implementation of our policy that incurs low communication overhead even in environments with very large numbers of sources. Our algorithm is adaptive to wide fluctuations in available resources and data update rates. Through experimental simulation over synthetic and real-world data, we demonstrate the effectiveness of our algorithm, and we quantify the significant decrease in divergence achievable with source cooperation.
Incremental Maintenance for Materialized Views over Semistructured Data
, 1998
"... Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the d ..."
Abstract
-
Cited by 60 (6 self)
- Add to MetaCart
Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it. To achieve fast query response time, these views are often materialized. This paper studies incremental maintenance techniques for materialized views over semistructured data. We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work. We propose a new algorithm that produces a set of queries that compute the changes to the view based upon a change to the source. We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view. We show that for nearly all types of database updates, it is more efficient to a...
Differential Evaluation of Continual Queries
- In IEEE Proceedings of the 16th International Conference on Distributed Computing Systems, Hong Kong
, 1996
"... Information Superhighway environments such as the Internet have brought us ready access to large amount of information. However, Internet data is notoriously unorganized and autonomously managed in a distributed fashion. Large scale information monitoring in the Internet environment requires support ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
Information Superhighway environments such as the Internet have brought us ready access to large amount of information. However, Internet data is notoriously unorganized and autonomously managed in a distributed fashion. Large scale information monitoring in the Internet environment requires support beyond traditional database techniques. Two of the key issues are the increasing reward in monitoring a fast growing information base and the similarly increasing processing cost. To improve the expressiveness of queries for information monitoring, we define continual queries as a useful tool for monitoring of updated information. Continual queries are standing queries that monitor the source data and notify the users whenever new data matches the query. In addition to periodic refresh, continual queries include Epsilon Transaction concepts to allow users to specify query refresh based on the magnitude of updates. To support efficient processing of continual queries, we propose a different...
Temporal Specialization and Generalization
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1994
"... A standard relation has two dimensions: attributes and tuples. A temporal relation contains two additional orthogonal time dimensions, namely, valid time and transaction time. Valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the te ..."
Abstract
-
Cited by 42 (19 self)
- Add to MetaCart
A standard relation has two dimensions: attributes and tuples. A temporal relation contains two additional orthogonal time dimensions, namely, valid time and transaction time. Valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation. Although, in general, there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications, the valid and transaction times exhibit more or less restricted interrelationships that define several types of specialized temporal relations. The paper examines five different areas where a variety of types of specialized temporal relations are present. In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. For example, a fact may have an associated transaction time indicating when it was stored in a previous temporal relation. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation. The presented framework for generalization and specialization allows researchers as well as database and system designers to precisely characterize, compare, and thus better understand temporal relations and the application systems in which they are embedded. The framework’s comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and gener-alizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing.

