Results 1 - 10
of
62
Network Applications of Bloom Filters: A Survey
- Internet Mathematics
, 2002
"... Abstract. ABloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been u ..."
Abstract
-
Cited by 257 (12 self)
- Add to MetaCart
Abstract. ABloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications. 1.
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
, 1997
"... Garlic is a middleware system that provides an in-tegrated view of a variety of legacy data sources, without changing how or where data is stored. In this paper, we describe our architecture for wrap-pers, key components of Garlic that encapsulate data sources and mediate between them and the middle ..."
Abstract
-
Cited by 200 (2 self)
- Add to MetaCart
Garlic is a middleware system that provides an in-tegrated view of a variety of legacy data sources, without changing how or where data is stored. In this paper, we describe our architecture for wrap-pers, key components of Garlic that encapsulate data sources and mediate between them and the middleware. Garlic wrappers model legacy data as objects, participate in query planning, and provide standard interfaces for method invocation and query execution. To date, we have built wrappers for 10 data sources. Our experience shows that Garlic wrappers can be written quickly and that our architecture is flexible enough to accommo-date data sources with a variety of data models and a broad range of traditional and non-tradition-al query processing capabilities. 1
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Probabilistic Location and Routing
, 2002
"... We propose probabilistic location to enhance the performance of existing peer-to-peer location mechanisms in the case where a replica for the queried data item exists close to the query source. We introduce the attenuated Bloom filter, a lossy distributed index. We describe how to use these data str ..."
Abstract
-
Cited by 120 (7 self)
- Add to MetaCart
We propose probabilistic location to enhance the performance of existing peer-to-peer location mechanisms in the case where a replica for the queried data item exists close to the query source. We introduce the attenuated Bloom filter, a lossy distributed index. We describe how to use these data structures for document location and how to maintain them despite document motion. We include a detailed performance study which indicates that our algorithm performs as desired, both finding closer replicas and finding them faster than deterministic algorithms alone. I.
Query Optimization
, 1996
"... Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the typ ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the type of problem that query optimizers are called to solve. Given a query, there are many plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their final output but vary in their cost, i.e., the amount of time that they need to run. What is the plan that needs the least amount of time? Such query optimization is absolutely necessary in a DBMS. The cost difference between two alternatives can be enormous. For example, consider the following database schema, which will be...
An overview of query optimization in relational systems
- In PODS
, 1998
"... There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased an ..."
Abstract
-
Cited by 99 (1 self)
- Add to MetaCart
There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased and incomplete view of this field. The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area. I would like to apologize to the many contributors in this area whose work I have failed to explicitly acknowledge due to oversight or lack of space. I take the liberty of trading technical precision for ease of presentation. 2.
Performance Tradeoffs for Client-Server Query Processing
, 1996
"... The construction of high-performance database systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and server resources in a flexible manner. The two predominant paradigms for client-se ..."
Abstract
-
Cited by 61 (17 self)
- Add to MetaCart
The construction of high-performance database systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and server resources in a flexible manner. The two predominant paradigms for client-server query execution are datashipping and query-shipping. We first define these policies in terms of the restrictions they place on operator site selection during query optimization. We then investigate the performance tradeoffs between them for bulk query processing. While each strategy has advantages, neither one on its own is efficient across a wide range of circumstances. We describe andevaluate a more flexible policy called hybrid-shipping, which can execute queries at clients, servers, or any combination of the two. Hybrid-shipping is shown to at least match the best of the two "pure" policies, and in some situations, to perform better than both. The implementation of hybrid-shipping rais...
The Architecture of PIER: an Internet-Scale Query Processor
- In CIDR
, 2005
"... This paper presents the architecture of PIER , an Internetscale query engine we have been building over the last three years. PIER is the first general-purpose relational query processor targeted at a peer-to-peer (p2p) architecture of thousands or millions of participating nodes on the Internet. ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
This paper presents the architecture of PIER , an Internetscale query engine we have been building over the last three years. PIER is the first general-purpose relational query processor targeted at a peer-to-peer (p2p) architecture of thousands or millions of participating nodes on the Internet. It supports massively distributed, database-style dataflows for snapshot and continuous queries. It is intended to serve as a building block for a diverse set of Internet-scale informationcentric applications, particularly those that tap into the standardized data readily available on networked machines, including packet headers, system logs, and file names
A keyword-set search system for peer-to-peer networks
, 2002
"... The Keyword-Set Search System (KSS) is a Peer-to-Peer (P2P) keyword search system that uses a distributed inverted index. The main challenge in a distributed index and search system is finding the right scheme to partition the index across the nodes in the network. The most obvious scheme would be t ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
The Keyword-Set Search System (KSS) is a Peer-to-Peer (P2P) keyword search system that uses a distributed inverted index. The main challenge in a distributed index and search system is finding the right scheme to partition the index across the nodes in the network. The most obvious scheme would be to partition the index by keyword.
Optimization techniques for queries with expensive methods
- ACM Transactions on Database Systems (TODS
, 1998
"... Object-Relational database management systems allow knowledgeable users to de ne new data types, as well as new methods (operators) for the types. This exibility produces an attendant complexity, which must be handled in new ways for an Object-Relational database management system to be e cient. In ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
Object-Relational database management systems allow knowledgeable users to de ne new data types, as well as new methods (operators) for the types. This exibility produces an attendant complexity, which must be handled in new ways for an Object-Relational database management system to be e cient. In this paper we study techniques for optimizing queries that contain time-consuming methods. The focus of traditional query optimizers has been on the choice of join methods and orders; selections have been handled by \pushdown " rules. These rules apply selections in an arbitrary order before as many joins as possible, using the assumption that selection takes no time. However, users of Object-Relational systems can embed complex methods in selections. Thus selections may take signi cant amounts of time, and the query optimization model must be enhanced. In this paper, we carefully de ne a query cost framework that incorporates both selectivity and cost estimates for selections. We develop an algorithm called Predicate Migration, and prove that it produces optimal plans for queries with expensive methods. We then describe our implementation of Predicate Migration in the commercial Object-Relational database management system Illustra, and discuss practical issues that a ect our earlier assumptions. We compare Predicate Migration to a variety of simpler optimization techniques, and demonstrate that Predicate Migration is the best general solution to date. The alternative techniques we presentmaybe useful for constrained workloads.

