Results 11 - 20
of
44
XML processing in DHT networks
- In ICDE
, 2008
"... Abstract — We study the scalable management of XML data in P2P networks based on distributed hash tables (DHTs). We identify performance limitations in this context, and propose an array of techniques to lift them. First, we adapt the DHT platform’s index store and communication primitives to the ne ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract — We study the scalable management of XML data in P2P networks based on distributed hash tables (DHTs). We identify performance limitations in this context, and propose an array of techniques to lift them. First, we adapt the DHT platform’s index store and communication primitives to the needs of massive data processing. Second, we introduce a distributed hierarchical index and associated efficient algorithms to speed up query processing. Third, we present an innovative, XMLspecific flavor of Bloom filters, to reduce data transfers entailed by query processing. Our approach is fully implemented in the KadoP system, used in a real-life software manufacturing application. Our experiments demonstrate the benefits of the proposed techniques. I.
Reliable Storage and Querying for Collaborative Data Sharing Systems
"... Abstract — The sciences, business confederations, and medicine urgently need infrastructure for sharing data and updates among collaborators ’ constantly changing, heterogeneous databases. The ORCHESTRA system addresses these needs by providing data transformation and exchange capabilities across DB ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Abstract — The sciences, business confederations, and medicine urgently need infrastructure for sharing data and updates among collaborators ’ constantly changing, heterogeneous databases. The ORCHESTRA system addresses these needs by providing data transformation and exchange capabilities across DBMSs, combined with archived storage of all database versions. ORCHESTRA adopts a peer-to-peer architecture in which individual collaborators contribute data and compute resources, but where there may be no dedicated server or compute cluster. We study how to take the combined resources of ORCHES-TRA’s autonomous nodes, as well as PCs from “cloud ” services such as Amazon EC2, and provide reliable, cooperative storage and query processing capabilities. We guarantee reliability and correctness as in distributed or cloud DBMSs, while also supporting cross-domain deployments, replication, and transparent failover, as provided by peer-to-peer systems. Our storage and query subsystem supports dozens to hundreds of nodes across different domains, possibly including nodes on cloud services. Our contributions include (1) a modified data partitioning substrate that combines cluster and peer-to-peer techniques, (2) an efficient implementation of replicated, reliable, versioned storage of relational data, (3) new query processing and indexing techniques over this storage layer, and (4) a mechanism for incrementally recomputing query results that ensures correct, complete, and duplicate-free results in the event of node failure during query execution. We experimentally validate query processing performance, failure detection methods, and the performance benefits of incremental recovery in a prototype implementation. I.
Hyperion: High Volume Stream Archival for Retrospective Querying
, 2006
"... Network monitoring systems that support data archiving and after-the-fact (retrospective) queries are useful for a multitude of purposes, such as anomaly detection and network and security forensics. Data archiving for such systems, however, is complicated by (a) data arrival rates, which may be hun ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Network monitoring systems that support data archiving and after-the-fact (retrospective) queries are useful for a multitude of purposes, such as anomaly detection and network and security forensics. Data archiving for such systems, however, is complicated by (a) data arrival rates, which may be hundreds of thousands of packets per second on a single link, and (b) the need for online indexing of this data to support retrospective queries. At these data rates, both common database index structures and general-purpose file systems perform poorly. This paper describes Hyperion, a system for archiving, indexing, and on-line retrieval of high-volume data streams. We employ a write-optimized stream file system for high-speed storage of simultaneous data streams, and a novel use of signature file indexes in a distributed multi-level index. We implement Hyperion on commodity hardware and conduct a detailed evaluation using synthetic data and real network traces. Our streaming file system, StreamFS, is shown to be fast enough to archive traces at over a million packets per second. The index allows queries over hours of data to complete in as little as 10-20 seconds, and the entire system is able to index and archive over 200,000 packets/sec while processing simultaneous on-line queries. 1
Network awareness in Internet-scale stream processing
- IEEE Data Engineering Bulletin
, 2005
"... Efficient query processing across a wide-area network requires network awareness, i.e., tracking and leveraging knowledge of network characteristics when making optimization decisions. This paper summarizes our work on network-aware query processing techniques for widely-distributed, large-scale str ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Efficient query processing across a wide-area network requires network awareness, i.e., tracking and leveraging knowledge of network characteristics when making optimization decisions. This paper summarizes our work on network-aware query processing techniques for widely-distributed, large-scale stream-processing applications. We first discuss the operator placement problem (i.e., deciding where to execute the operators of a query plan) and present results, based on a prototype deployment on the PlanetLab network testbed, that quantify the benefits of network awareness. We then present a summary of our present focus on the operator distribution problem, which involves parallelizing the evaluation of a single operator in a networked setting. 1
The Data Ring: Community Content Sharing
- In Third Biennial Conference on Innovative Data Systems Research (CIDR 2007), Asilomar
, 2007
"... Information ubiquity has created a large crowd of users ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Information ubiquity has created a large crowd of users
DObjects: Enabling Distributed Data Services for Metacomputing Platforms
"... Abstract. Many applications rely heavily on large amounts of data in the distributed storages collected over time or produced by large scale scientific experiments or simulations. The key constraints for building a distributed data query infrastructure for such applications are: scalability, consist ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. Many applications rely heavily on large amounts of data in the distributed storages collected over time or produced by large scale scientific experiments or simulations. The key constraints for building a distributed data query infrastructure for such applications are: scalability, consistency, heterogeneity and network and resource dynamics. We designed and developed DObjects, a generalpurpose query and data operations infrastructure that can be integrated with metacomputing middleware. This paper describes the architecture of our data services and shows how those services were integrated with the metacomputing framework offering users an open platform for building distributed applications that require access to data integrated from multiple distributed data sources.
Automaton Segmentation: A New Approach to Preserve Privacy
- in XML Information Brokering”, ACM Conf. on Computer and Communication Security (CCS
, 2007
"... A Distributed Information Brokering System (DIBS) is a peer-to-peer overlay network that comprises diverse data servers and brokering components helping client queries locate the data server(s). Many existing information brokering systems adopt server side access control deployment and honest assump ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A Distributed Information Brokering System (DIBS) is a peer-to-peer overlay network that comprises diverse data servers and brokering components helping client queries locate the data server(s). Many existing information brokering systems adopt server side access control deployment and honest assumptions on brokers. However, little attention has been drawn on privacy of data and metadata stored and exchanged within DIBS. In this paper, we address privacypreserving information sharing via on-demand information access. We propose a flexible and scalable system using a broker-coordinator overlay network. Through an innovative automaton segmentation scheme, distributed access control enforcement, and query segment encryption, our system integrates security enforcement and query forwarding while preserving system-wide privacy. We present the automaton segmentation approach, analyze privacy preservation in details, and finally examine the end-to-end performance and scalability through experiments and analysis.
Optimal service ordering in decentralized queries over web services
, 2010
"... The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from traditional optimization significantly, since execution takes place in parallel and thus the query response time is determined by the slowest node in the plan, which is called the bottleneck node. Although polynomial algorithms have been proposed for several variants of optimization problems in this setting, the fact that communication links are typically heterogeneous in wide-area environments has been largely overlooked. Our proposal is the first attempt, to the best of our knowledge, which tries to optimize linear orderings of services when the services communicate directly with each other and the communication links are heterogeneous. We propose a novel optimal algorithm to solve this problem efficiently. The evaluation of the proposal shows that it can result in significant reductions of the response time.
Quality-aware distributed data delivery for continuous query services
- In Proc. ACM SIGMOD Int. Conf. on Management of Data
, 2006
"... We consider the problem of distributed continuous data delivery services in an overlay network of heterogeneous nodes. Each node in the system can be a source for any number of data streams and at the same time be a consumer node that is receiving streams sourced at other nodes. A consumer node may ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of distributed continuous data delivery services in an overlay network of heterogeneous nodes. Each node in the system can be a source for any number of data streams and at the same time be a consumer node that is receiving streams sourced at other nodes. A consumer node may define a filter on a source stream such that only the desired portion of the stream is delivered, minimizing the amount of unnecessary bandwidth consumption. By heterogeneous, we mean that nodes not only may have varying network bandwidths and computing resources but also different interests in terms of the filters and the rates of the data streams they are interested in. Our objective is to construct an efficient stream delivery network in which nodes cooperate in forwarding data streams in the presence of constrained resources. We formalize this distributed stream delivery problem as an optimization one by starting with a simple setup where the network topology is fixed and the node bandwidth characteristics are known. The goal of the optimization is to find valid delivery graphs with minimum bandwidth consumption. We extend this problem formulation to QoS-aware stream delivery, in order to handle the bandwidth constrained cases in which unwanted drops and delays are inevitable. We provide a classification of delivery graph construction schemes, and in light of this classification we develop pragmatic quality-aware stream delivery (QASD) algorithms. These algorithms aim at constructing efficient stream delivery graphs in a distributed setting, where global knowledge is not available and network characteristics are not known in advance. We introduce a set of evaluation metrics and provide experimental results to illustrate the effectiveness of our proposed algorithms under these metrics. 1.
Self-organizing schema mappings in the gridvine peer data management system
- In VLDB-07
"... GridVine is a Peer Data Management System based on a decentralized access structure. Built following the principle of data independence, it separates a logical layer – where data, schemas and mappings are managed – from a physical layer consisting of a structured Peer-to-Peer network supporting effi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
GridVine is a Peer Data Management System based on a decentralized access structure. Built following the principle of data independence, it separates a logical layer – where data, schemas and mappings are managed – from a physical layer consisting of a structured Peer-to-Peer network supporting efficient routing of messages and index load-balancing. Our system is totally decentralized, yet it fosters semantic interoperability through pairwise schema mappings and query reformulation. In this demonstration, we present a set of algorithms to automatically organize the network of schema mappings. We concentrate on three key functionalities: (1) the sharing of data, schemas and schema mappings in the network, (2) the dynamic creation and deprecation of mappings to foster global interoperability, and (3) the propagation of queries using the mappings. We illustrate these functionalities using bioinformatic schemas and data in a network running on several hundreds of peers simultaneously. 1.

