Results 1 - 10
of
101
TelegraphCQ: Continuous Dataflow Processing for an Uncertan World
, 2003
"... Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, qu ..."
Abstract
-
Cited by 329 (18 self)
- Add to MetaCart
Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, query processors based on adaptive dataflow will be necessary. The Telegraph project has developed a suite of novel technologies for continuously adaptive query processing. The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams. In this paper, we describe the system architecture and its underlying technology, and report on our ongoing implementation effort, which leverages the PostgreSQL open source code base. We also discuss open issues and our research agenda.
Skip graphs
- in SODA
, 2003
"... Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functionality of a balanced tree in a distributed system where resources are stored in separate nodes that may fail at any time. They are designed for use in searching peer-to-peer systems, and by providin ..."
Abstract
-
Cited by 202 (8 self)
- Add to MetaCart
Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functionality of a balanced tree in a distributed system where resources are stored in separate nodes that may fail at any time. They are designed for use in searching peer-to-peer systems, and by providing the ability to perform queries based on key ordering, they improve on existing search tools that provide only hash table functionality. Unlike skip lists or other tree data structures, skip graphs are highly resilient, tolerating a large fraction of failed nodes without losing connectivity. In addition, simple and straightforward algorithms can be used to construct a skip graph, insert new nodes into it, search it, and detect and repair errors in a skip graph introduced due to node failures.
Operating System Support for Planetary-Scale Network Services
, 2004
"... PlanetLab is a geographically distributed overlay network designed to support the deployment and evaluation of planetary-scale network services. Two high-level goals shape its design. First, to enable a large research community to share the infrastructure, PlanetLab provides distributed virtualizati ..."
Abstract
-
Cited by 179 (17 self)
- Add to MetaCart
PlanetLab is a geographically distributed overlay network designed to support the deployment and evaluation of planetary-scale network services. Two high-level goals shape its design. First, to enable a large research community to share the infrastructure, PlanetLab provides distributed virtualization, whereby each service runs in an isolated slice of PlanetLab’s global resources. Second, to support competition among multiple network services, PlanetLab decouples the operating system running on each node from the networkwide services that define PlanetLab, a principle referred to as unbundled management. This paper describes how Planet-Lab realizes the goals of distributed virtualization and unbundled management, with a focus on the OS running on each node. 1
A survey of peer-to-peer content distribution technologies
- ACM Computing Surveys
, 2004
"... Distributed computer architectures labeled “peer-to-peer ” are designed for the sharing of computer resources (content, storage, CPU cycles) by direct exchange, rather than requiring the intermediation or support of a centralized server or authority. Peer-to-peer architectures are characterized by t ..."
Abstract
-
Cited by 171 (6 self)
- Add to MetaCart
Distributed computer architectures labeled “peer-to-peer ” are designed for the sharing of computer resources (content, storage, CPU cycles) by direct exchange, rather than requiring the intermediation or support of a centralized server or authority. Peer-to-peer architectures are characterized by their ability to adapt to failures and
On the Feasibility of Peer-to-Peer Web Indexing and Search
- IN IPTPS’03
, 2003
"... This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We pr ..."
Abstract
-
Cited by 121 (11 self)
- Add to MetaCart
This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.
Simple Efficient Load Balancing algorithms for Peer-to-Peer Systems. Available at www.iptps04.cs.ucsd.edu/papers/karger- loadbalance.pdf. Accessed on 25 th
, 2008
"... Load balancing is a critical issue for the efficient operation of peerto-peer networks. We give two new load-balancing protocols whose provable performance guarantees are within a constant factor of optimal. Our protocols refine the consistent hashing data structure that underlies the Chord (and Koo ..."
Abstract
-
Cited by 117 (0 self)
- Add to MetaCart
Load balancing is a critical issue for the efficient operation of peerto-peer networks. We give two new load-balancing protocols whose provable performance guarantees are within a constant factor of optimal. Our protocols refine the consistent hashing data structure that underlies the Chord (and Koorde) P2P network. Both preserve Chord’s logarithmic query time and near-optimal data migration cost. Consistent hashing is an instance of the distributed hash table (DHT) paradigm for assigning items to nodes in a peer-to-peer system: items and nodes are mapped to a common address space, and nodes have to store all items residing closeby in the address space. Our first protocol balances the distribution of the key address space to nodes, which yields a load-balanced system when the DHT maps items “randomly ” into the address space. To our knowledge, this yields the first P2P scheme simultaneously achieving O(logn) degree, O(logn) look-up cost, and constant-factor load balance (previous schemes settled for any two of the three). Our second protocol aims to directly balance the distribution of items among the nodes. This is useful when the distribution of items in the address space cannot be randomized. We give a simple protocol that balances load by moving nodes to arbitrary locations “where they are needed. ” As an application, we use the last protocol to give an optimal implementation of a distributed data structure for range searches on ordered data.
Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks
, 2002
"... RDF-based P2P networks have a number of advantages compared with simpler P2P networks such as Napster, Gnutella or with approaches based on distributed indices such as CAN and CHORD. RDF-based P2P networks allow complex and extendable descriptions of resources instead of fixed and limited ones, and ..."
Abstract
-
Cited by 111 (23 self)
- Add to MetaCart
RDF-based P2P networks have a number of advantages compared with simpler P2P networks such as Napster, Gnutella or with approaches based on distributed indices such as CAN and CHORD. RDF-based P2P networks allow complex and extendable descriptions of resources instead of fixed and limited ones, and they provide complex query facilities against these metadata instead of simple keyword-based searches.
Approximate Range Selection Queries in Peer-to-Peer
- In CIDR
, 2002
"... We present an architecture for a data sharing peer-to-peer system where the data is shared in the form of database relations. In general, peer-to-peer systems try to locate exactmatch data objects to simple user queries. ..."
Abstract
-
Cited by 76 (6 self)
- Add to MetaCart
We present an architecture for a data sharing peer-to-peer system where the data is shared in the form of database relations. In general, peer-to-peer systems try to locate exactmatch data objects to simple user queries.
Cache-and-Query for Wide Area Sensor Databases
, 2003
"... Webcams, microphones, pressure gauges and other sensors provide exciting new opportunities for querying and monitoring the physical world. In this paper we focus on querying wide area sensor databases, containing (XML) data derived from sensors spread over tens to thousands of miles. We present the ..."
Abstract
-
Cited by 72 (18 self)
- Add to MetaCart
Webcams, microphones, pressure gauges and other sensors provide exciting new opportunities for querying and monitoring the physical world. In this paper we focus on querying wide area sensor databases, containing (XML) data derived from sensors spread over tens to thousands of miles. We present the first scalable system for executing XPATH queries on such databases. The system maintains the logical view of the data as a single XML document, while physically the data is fragmented across any number of host nodes. For scalability, sensor data is stored close to the sensors, but can be cached elsewhere as dictated by the queries (auto-tuning). Our design enables self-starting distributed queries that jump directly to the lowest common ancestor of the query result, dramatically reducing query response times. We present a novel query-evaluategather technique (using XSLT) for detecting (1) which data in a local database fragment is part of the query result, and (2) how to gather the missing parts. We define partitioning and cache invariants that ensure that even partial matches on cached data are exploited and that correct answers are returned, despite our dynamic query-driven caching. Experimental results demonstrate that our techniques dramatically increase query throughputs and decrease query response times in wide area sensor databases.

