Results 1 - 10
of
44
Declarative Networking: Language, Execution and Optimization
, 2006
"... The networking and distributed systems communities have recently explored a variety of new network architectures, both for applicationlevel overlay networks, and as prototypes for a next-generation Internet architecture. In this context, we have investigated declarative networking: the use of a dis ..."
Abstract
-
Cited by 57 (18 self)
- Add to MetaCart
The networking and distributed systems communities have recently explored a variety of new network architectures, both for applicationlevel overlay networks, and as prototypes for a next-generation Internet architecture. In this context, we have investigated declarative networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architectures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query processing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the traditional semi-na ve query evaluation technique, to overcome fundamental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the "eventual consistency" of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applications of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed.
A casestudy in building layered DHT applications
- In Proceedings of the 2005 SIGCOMM (Aug. 2005). [11] CHAWATHE, Y., RATNASAMY, S., BRESLAU, L., LANHAM, N., AND SHENKER, S. Making Gnutella-like P2P systems scalable.In Proceedings of the 2003 SIGCOMM
, 2003
"... Recent research has shown that one can use Distributed Hash Tables (DHTs) to build scalable, robust and efficient applications. One question that is often left unanswered is that of simplicity of implementation and deployment. In this paper, we explore a case study of building an application for whi ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
Recent research has shown that one can use Distributed Hash Tables (DHTs) to build scalable, robust and efficient applications. One question that is often left unanswered is that of simplicity of implementation and deployment. In this paper, we explore a case study of building an application for which ease of deployment dominated the need for high performance. The application we focus on is Place Lab, an end-user positioning system. We evaluate whether it is feasible to use DHTs as an application-independent building block to implement a key component of Place Lab: its “mapping infrastructure.” We present Prefix Hash Trees, a data structure used by Place Lab for geographic range queries that is built entire on top of a standard DHT. By strictly layering Place Lab’s data structures on top of a generic DHT service, we were able to decouple the deployment and management of Place Lab from that of the underlying DHT. We identify the characteristics of Place Lab that made it amenable for deploying in this layered manner, and comment on its effect on performance.
Proof sketches: Verifiable in-network aggregation
- In IEEE Internation Conference on Data Engineering (ICDE
, 2007
"... Recent work on distributed, in-network aggregation assumes a benign population of participants. Unfortunately, modern distributed systems are plagued by malicious participants. In this paper we present a first step towards verifiable yet efficient distributed, in-network aggregation in adversarial s ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Recent work on distributed, in-network aggregation assumes a benign population of participants. Unfortunately, modern distributed systems are plagued by malicious participants. In this paper we present a first step towards verifiable yet efficient distributed, in-network aggregation in adversarial settings. We describe a general framework and threat model for the problem and then present proof sketches, a compact verification mechanism that combines cryptographic signatures and Flajolet-Martin sketches to guarantee acceptable aggregation error bounds with high probability. We derive proof sketches for count aggregates and extend them for random sampling, which can be used to provide verifiable approximations for a broad class of dataanalysis queries, e.g., quantiles and heavy hitters. Finally, we evaluate the practical use of proof sketches, and observe that adversaries can often be reduced to much smaller violations in practice than our worst-case bounds suggest. 1.
Sharing aggregate computation for distributed queries
- In SIGMOD
, 2007
"... An emerging challenge in modern distributed querying is to efficiently process multiple continuous aggregation queries simultaneously. Processing each query independently may be infeasible, so multi-query optimizations are critical for sharing work across queries. The challenge is to identify overla ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
An emerging challenge in modern distributed querying is to efficiently process multiple continuous aggregation queries simultaneously. Processing each query independently may be infeasible, so multi-query optimizations are critical for sharing work across queries. The challenge is to identify overlapping computations that may not be obvious in the queries themselves. In this paper, we reveal new opportunities for sharing work in the context of distributed aggregation queries that vary in their selection predicates. We identify settings in which a large set of q such queries can be answered by executing k ≪ q different queries. The k queries are revealed by analyzing a boolean matrix capturing the connection between data and the queries that they satisfy, in a manner akin to familiar techniques like Gaussian elimination. Indeed, we identify a class of linear aggregate functions (including SUM, COUNT and AVERAGE), and show that the sharing potential for such queries can be optimally recovered using standard matrix decompositions from computational linear algebra. For some other typical aggregation functions (including MIN and MAX) we find that optimal sharing maps to the NP-hard set basis problem. However, for those scenarios, we present a family of heuristic algorithms and demonstrate that they perform well for moderate-sized matrices. We also present a dynamic distributed system architecture to exploit sharing opportunities, and experimentally evaluate the benefits of our techniques via a novel, flexible random workload generator we develop for this setting. Categories and Subject Descriptors: H.2.4 [Systems]: Distributed databases
Discovering and exploiting keyword and attribute-value co-occurrences to improve p2p routing indices
- In CIKM
, 2006
"... Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organ ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the
A Scalable Data Platform for a Large Number of Small Applications
"... As a growing number of websites open up their APIs to external application developers (e.g., Facebook, Yahoo! Widgets, Google Gadgets), these websites are facing an intriguing scalability problem: while each user-generated application is by itself quite small (in terms of size and throughput require ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
As a growing number of websites open up their APIs to external application developers (e.g., Facebook, Yahoo! Widgets, Google Gadgets), these websites are facing an intriguing scalability problem: while each user-generated application is by itself quite small (in terms of size and throughput requirements), there are many many such applications. Unfortunately, existing data-management solutions are not designed to handle this form of scalability in a cost-effective, manageable and/or flexible manner. For instance, large installations of commercial database systems such as Oracle, DB2 and SQL Server are usually very expensive and difficult to manage. At the other extreme, low-cost hosted datamanagement solutions such as Amazon’s SimpleDB do not support sophisticated data-manipulation primitives such as joins that are necessary for developing most Web applications. To address this issue, we explore a new point in the design space whereby we use commodity hardware and free software (MySQL) to scale to a large number of applications while still supporting full SQL functionality, transactional guarantees, high availability and Service Level Agreements (SLAs). We do so by exploiting the key property that each application is “small ” and can fit in a single machine (which can possibly be shared with other applications). Using this property, we design replication strategies, data migration techniques and load balancing operations that automate the tasks that would otherwise contribute to the operational and management complexity of dealing with a large number of applications. Our experiments based on the TPC-W benchmark suggest that the proposed system can scale to a large number of small applications. 1.
Finally, a use for componentized transport protocols
- In HotNets IV
, 2005
"... This paper argues a new relevance for an old idea: decomposing transport protocols into a set of resuable building blocks that can be recomposed in different ways depending on application requirements. We conjecture that point-to-point applications may well be adequately served by the existing suite ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper argues a new relevance for an old idea: decomposing transport protocols into a set of resuable building blocks that can be recomposed in different ways depending on application requirements. We conjecture that point-to-point applications may well be adequately served by the existing suite of monolithic protocol implementations, but widely-distributed peer-to-peer systems such as overlays are not: the design space of transport protocols between nodes in a large, highly coordinated system is much larger. We provide several examples of existing systems that have implemented a diverse range of transport protocols, and show how a building-block approach covers these systems well, enabling simple specification of hybrids and variants of the protocols. In particular, we show how all of our examples can be implemented in the networking stack of P2, a multipurpose system for building overlay networks from declarative specifications. 1
Efficient processing of XPath queries with structured overlay networks
- In ODBASE’05, Agia
, 2005
"... Abstract. Non-trivial search predicates beyond mere equality are at the current focus of P2P research. Structured queries, as an important type of non-trivial search, have been studied extensively mainly for unstructured P2P systems so far. As unstructured P2P systems do not use indexing, structured ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract. Non-trivial search predicates beyond mere equality are at the current focus of P2P research. Structured queries, as an important type of non-trivial search, have been studied extensively mainly for unstructured P2P systems so far. As unstructured P2P systems do not use indexing, structured queries are very easy to implement since they can be treated equally to any other type of query. However, this comes at the expense of very high bandwidth consumption and limitations in terms of guarantees and expressiveness that can be provided. Structured P2P systems are an efficient alternative as they typically offer logarithmic search complexity in the number of peers. Though the use of a distributed index (typically a distributed hash table) makes the implementation of structured queries more efficient, it also introduces considerable complexity, and thus only a few approaches exist so far. In this paper we present a first solution for efficiently supporting structured queries, more specifically, XPath queries, in structured P2P systems. For the moment we focus on supporting queries with descendant axes (“//”) and wildcards (“*”) and do not address joins. The results presented in this paper provide foundational basic functionalities to be used by higher-level query engines for more efficient, complex query support. 1
Efficient content authentication in peer-to-peer networks
- Proc. ACNS
, 2007
"... Abstract. We study a new model for data authentication over peer-topeer (p2p) storage networks, where data items are stored, queried and authenticated in a totally decentralized fashion. The model captures the security requirements of emerging distributed computing applications. We present an effici ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Abstract. We study a new model for data authentication over peer-topeer (p2p) storage networks, where data items are stored, queried and authenticated in a totally decentralized fashion. The model captures the security requirements of emerging distributed computing applications. We present an efficient construction of a distributed Merkle tree (DMT), which realizes an authentication tree over a p2p network, thus extending a fundamental cryptographic technique to distributed environments. We show how our DMT can be used to design an authenticated distributed hash table that is secure against replay attacks and consistent with the update history. Our scheme is built on top of a broad class of existing p2p overlay networks and achieves generality by using only the basic functionality of object location. We use this scheme to design the first efficient distributed authenticated dictionary. 1

