Results 1 - 10
of
243
TelegraphCQ: Continuous Dataflow Processing for an Uncertan World
, 2003
"... Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, qu ..."
Abstract
-
Cited by 329 (18 self)
- Add to MetaCart
Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, query processors based on adaptive dataflow will be necessary. The Telegraph project has developed a suite of novel technologies for continuously adaptive query processing. The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams. In this paper, we describe the system architecture and its underlying technology, and report on our ongoing implementation effort, which leverages the PostgreSQL open source code base. We also discuss open issues and our research agenda.
Balancing Push and Pull for Data Broadcast
"... The increasing ability to interconnect computers through internetworking,wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centered applications based on data dissemination. These applications employ broadcast to deliver data to very large client ..."
Abstract
-
Cited by 189 (7 self)
- Add to MetaCart
The increasing ability to interconnect computers through internetworking,wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centered applications based on data dissemination. These applications employ broadcast to deliver data to very large client populations. We have proposed the Broadcast Disks paradigm [Zdon94, Acha95b] for organizing the contents of a data broadcast program and for managing client resources in response to such a program. Our previous workon Broadcast Disks focused exclusively on the "push-based" approach, where data is sent out on the broadcast channel according to a periodic schedule, in anticipation of client requests. In this paper, we study how to augment the push-only model with a "pull-based" approach of using a backchannel to allow clients to send explicit requests for data to the server. We analyze the scalability and performance of a broadcast-based system that integrates push and pull and study the impac...
Characterizing Reference Locality in the WWW
, 1996
"... As the World Wide Web (Web) is increasingly adopted as the infrastructure for large-scale distributed information systems, issues of performance modeling become ever more critical. In particular, locality of reference is an important property in the performance modeling of distributed information sy ..."
Abstract
-
Cited by 184 (18 self)
- Add to MetaCart
As the World Wide Web (Web) is increasingly adopted as the infrastructure for large-scale distributed information systems, issues of performance modeling become ever more critical. In particular, locality of reference is an important property in the performance modeling of distributed information systems. In the case of the Web, understanding the nature of reference locality will help improve the design of middleware, such as caching, prefetching, and document dissemination systems. For example, good measurements of reference locality would allow us to generate synthetic reference streams with accurate performance characteristics, would allow us to compare empirically measured streams to explain differences, and would allow us to predict expected performance for system design and capacity planning. In this paper we propose models for both temporal and spatial locality of reference in streams of requests arriving at Web servers. We show that simple models based only on document popularity (likelihood of reference) are insufficient for capturing either temporal or spatial locality. Instead, we rely on an equivalent, but numerical, representation of a reference stream: a stack distance trace. We show that temporal locality can be characterized by
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Data on Air: Organization and Access
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Organizing massive amount of data on wireless communication networks in order to provide fast and low power access to users equipped with palmtops, is a new challenge to the data management and telecommunication communities. Solutions must take under consideration the physical restrictions of low ..."
Abstract
-
Cited by 169 (3 self)
- Add to MetaCart
Organizing massive amount of data on wireless communication networks in order to provide fast and low power access to users equipped with palmtops, is a new challenge to the data management and telecommunication communities. Solutions must take under consideration the physical restrictions of low network bandwidth and limited battery life of palmtops. This paper proposes algorithms for multiplexing clustering and nonclustering indexes along with data on wireless networks. The power consumption and the latency for obtaining the required data are considered as the two basic performance criteria for all algorithms. First, this paper describes two algorithms namely, (1, m) Indexing and Distributed Indexing, for multiplexing data and its clustering index. Second, an algorithm called Nonclustered Indexing is described for allocating static data and its corresponding nonclustered index. Then, the Nonclustered indexing algorithm is generalized to the case of multiple indexes. Finally, t...
Dissemination-based Data Delivery Using Broadcast Disks
- IEEE Personal Communications
, 1995
"... Mobile computers and wireless networks are emerging technologies which promise to make ubiquitous computing a reality. One challenge that must be met in order to truly realize this potential is that of providing mobile clients with ubiquitous access to data. Mobile clients may often be disconnected ..."
Abstract
-
Cited by 125 (10 self)
- Add to MetaCart
Mobile computers and wireless networks are emerging technologies which promise to make ubiquitous computing a reality. One challenge that must be met in order to truly realize this potential is that of providing mobile clients with ubiquitous access to data. Mobile clients may often be disconnected from stationary server machines or may have only a low-bandwidth channel for sending messages to servers. Such an environment raises difficulties for supporting data-intensive applications for three reasons: 1) the inability to predict, with 100% accuracy, the future data needs of many applications, 2) limits on storage capacities of mobile machines, and 3) the need to provide clients with new or updated data values. One (and perhaps the only) way to address these challenges is to provide stationary server machines with a relatively high-bandwidth channel over which to broadcast data to a client population in anticipation of the need for that data at the clients. Such a system can be said to...
Query Indexing and Velocity Constrained Indexing: Scalable Techniques For Continuous Queries on Moving Objects
- IEEE Transactions on Computers
, 2002
"... Moving object environments are characterized by large numbers of moving objects and numerous concurrent continuous queries over these objects. Efficient evaluation of these queries in response to the movement of the objects is critical for supporting acceptable response times. In such environments ..."
Abstract
-
Cited by 102 (18 self)
- Add to MetaCart
Moving object environments are characterized by large numbers of moving objects and numerous concurrent continuous queries over these objects. Efficient evaluation of these queries in response to the movement of the objects is critical for supporting acceptable response times. In such environments the traditional approach of building an index on the objects (data) suffers from the need for frequent updates and thereby results in poor performance. In fact, a brute force, no-index strategy yields better performance in many cases. Neither the traditional approach, nor the brute force strategy achieve reasonable query processing times. This paper develops novel techniques for the efficient and scalable evaluation of multiple continuous queries on moving objects. Our solution leverages two complimentary techniques: Query Indexing and Velocity Constrained Indexing (VCI). Query Indexing relies on i) incremental evaluation; ii) reversing the role of queries and data; and iii) exploiting the relative locations of objects and queries. VCI takes advantage of the maximum possible speed of objects in order to delay the expensive operation of updating an index to reflect the movement of objects. In contrast to an earlier technique [29] that requires exact knowledge about the movement of the objects, VCI does not rely on such information. While Query Indexing outperforms VCI, it does not efficiently handle the arrival of new queries. Velocity constrained indexing, on the other hand, is unaffected by changes in queries. We demonstrate that a combination of Query Indexing and Velocity Constrained Indexing enables the scalable execution of insertion and deletion of queries in addition to processing ongoing queries. We also develop several optimizations and present a detaile...
Prefetching from a broadcast disk
- In Proceedings of ICDE'96: The 1996 International Conference on Data Engineering
, 1996
"... Broadcast Disks have been proposed as a means to efficiently deliver data to clients in “asymmetric ” environments where the available bandwidth from the server to the clients greatly exceeds the bandwidth in the opposite direction. A previous study investigated the use of cost-based caching to impr ..."
Abstract
-
Cited by 89 (10 self)
- Add to MetaCart
Broadcast Disks have been proposed as a means to efficiently deliver data to clients in “asymmetric ” environments where the available bandwidth from the server to the clients greatly exceeds the bandwidth in the opposite direction. A previous study investigated the use of cost-based caching to improve performance when clients access the broadcast in a demand-driven manner [AAF95]. Such demand-driven access however, does not fully exploit the dissemination-based nature of the broadcast, which is particularly conducive to client prefetching. With a Broadcast Disk, pages continually flow past the clients so that, in contrast to traditional environments, prefetching can be performed without placing additional load on shared resources. We argue for the use of a simple prefetch heuristic called ¢¡ and show that ¢ ¡ balances the cache residency time of a data item with its bandwidth allocation. Because of this tradeoff, ¢¡ is very tolerant of variations in the broadcast program. We describe an implementable approximation for ¢¡
Scheduling On-demand Broadcasts: New Metrics and Algorithms
, 1998
"... As satellite, wireless and Cable TV-based networks spread their reach, there is an increased infrastructure of high-bandwidth links into the home and on the road. Much of this enhanced infrastructure inherently relies on broadcast technology to deliver data to large user populations. This increase i ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
As satellite, wireless and Cable TV-based networks spread their reach, there is an increased infrastructure of high-bandwidth links into the home and on the road. Much of this enhanced infrastructure inherently relies on broadcast technology to deliver data to large user populations. This increase in broadcast capacity has been complemented by the growth of large-scale information-centric applications. Many of these applications such as wireless internets and traffic information systems are pull-based, that is, they respond to on-demand user requests. In this paper, we study the scheduling problems arising in such on-demand broadcast environments for applications with data requests of varying sizes, and the novel issues that arise therein. We study the problem in its generality while much of the previous work has focused on one special case or the other, such as, assuming identical-sized data requests, or static client access profiles known by the server a priori, etc. Traditionally,...
Adaptive Data Broadcast in Hybrid Networks
- In Proceedings of the 23rd VLDB Conference
, 1997
"... With the immense popularity of the Web,the world is witnessing an unprecedented demand for data services. At the same time, the Internet is evolving towards an information super-highway that incorporates a wide mixture of existing and emerging communication technologies, including wireless, mobile, ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
With the immense popularity of the Web,the world is witnessing an unprecedented demand for data services. At the same time, the Internet is evolving towards an information super-highway that incorporates a wide mixture of existing and emerging communication technologies, including wireless, mobile, and hybrid networking. Taking advantage of these new technologies, we are proposing a hybrid scheme which effectively combines broadcast for massive data dissemination and unicast for individual data delivery. In this paper, we describe a technique that uses the broadcast medium for storage of frequently requested data, and an algorithm that continuously adapts the broadcast content to match the hot-spot of the database. We show that the hot-spot can be accurately obtained by monitoring the "broadcast misses" observed through direct requests. This is a departure from other broadcast-based systems which rely on efficient scheduling based on precompiled user profiles. We also show that the pro...

