Results 1 -
7 of
7
The Architecture of PIER: an Internet-Scale Query Processor
- In CIDR
, 2005
"... This paper presents the architecture of PIER , an Internetscale query engine we have been building over the last three years. PIER is the first general-purpose relational query processor targeted at a peer-to-peer (p2p) architecture of thousands or millions of participating nodes on the Internet. ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
This paper presents the architecture of PIER , an Internetscale query engine we have been building over the last three years. PIER is the first general-purpose relational query processor targeted at a peer-to-peer (p2p) architecture of thousands or millions of participating nodes on the Internet. It supports massively distributed, database-style dataflows for snapshot and continuous queries. It is intended to serve as a building block for a diverse set of Internet-scale informationcentric applications, particularly those that tap into the standardized data readily available on networked machines, including packet headers, system logs, and file names
A Survey of Distributed Data Aggregation Algorithms
, 2011
"... Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting val-ues result from the distributed computation of functions like count, sum and average. So ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting val-ues result from the distributed computation of functions like count, sum and average. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal char-acteristics.
Roll No: 03D11013 under the guidance of
"... Peer-to-Peer (P2P) refers to a class of systems and applications that employ distributed resources like storage, cycles, content, human presence available at the edges of the Internet, to perform a critical function in a decentralized manner. Peer-to-peer systems have been increasing in popularity i ..."
Abstract
- Add to MetaCart
(Show Context)
Peer-to-Peer (P2P) refers to a class of systems and applications that employ distributed resources like storage, cycles, content, human presence available at the edges of the Internet, to perform a critical function in a decentralized manner. Peer-to-peer systems have been increasing in popularity in recent years as they are used by millions of users to share massive amounts of data over the Internet. P2P systems involves decentralized, self-organizing, highly dynamic loose coupling of many autonomous computers. Applications based on P2P networks promise scalabilty, no central administration of control, robustness to dynamic load and failures. Hence they need scalable and self organizing data structures and algorithms. The purpose of this work is to survey the different existing techniques for querying over these overlay networks. We start this by first looking at exact search and then move on to one-dimensional and multi-dimensional range queries. Finally we look at aggregation queries on P2P systems. The last
INDEXING IN PEER-TO-PEER SYSTEMS
, 2007
"... Peer-to-Peer systems are large scale distributed systems whose component nodes participate in similar roles and hence are “peers”. Peer-to-peer systems have generated a lot of interest because of their scalability, fault-tolerance and robustness properties. The peer-to-peer paradigm was first popula ..."
Abstract
- Add to MetaCart
Peer-to-Peer systems are large scale distributed systems whose component nodes participate in similar roles and hence are “peers”. Peer-to-peer systems have generated a lot of interest because of their scalability, fault-tolerance and robustness properties. The peer-to-peer paradigm was first popularized by file sharing systems like Napster, Kazaa and BitTorrent. It is increasingly being used in an enterprise setting to enable highly scalable applications using low cost commodity clusters. Amazon S3 is one such example that uses peer-to-peer technology to provide a simple scalable storage service. With large number of peers and large amounts of data, one of the questions of fundamental interest in a peer-to-peer system is: how to find relevant data quickly? In this thesis, I present efficient peer-to-peer indices that support lookup of relevant data quickly. My thesis contains (1) Kelips, an efficient Distributed Hash Table (DHT), (2) Kache, a cooperative caching application, and (3) r-Kelips, an efficient peer-to-peer range index. In addition to complex range query support, demanding applications like transaction processing and military applications require strong
Noname manuscript No. (will be inserted by the editor) In-Network Outlier Detection in Wireless Sensor Networks
"... the date of receipt and acceptance should be inserted later Abstract To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth a ..."
Abstract
- Add to MetaCart
(Show Context)
the date of receipt and acceptance should be inserted later Abstract To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy consumption, (3) uses only single-hop communication, thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance by simulation, using real sensor data streams. Our results demonstrate that our approach is accurate and imposes reasonable communication and power consumption demands.
c ○ 2009 Steven Y. KoEFFICIENT ON-DEMAND OPERATIONS IN LARGE-SCALE INFRASTRUCTURES BY
"... In large-scale distributed infrastructures such as clouds, Grids, peer-to-peer systems, and wide-area testbeds, users and administrators typically desire to perform on-demand operations that deal with the most up-to-date state of the infrastructure. However, the scale and dynamism present in the ope ..."
Abstract
- Add to MetaCart
(Show Context)
In large-scale distributed infrastructures such as clouds, Grids, peer-to-peer systems, and wide-area testbeds, users and administrators typically desire to perform on-demand operations that deal with the most up-to-date state of the infrastructure. However, the scale and dynamism present in the operating environment make it challenging to support on-demand operations efficiently, i.e., in a bandwidth- and response-efficient manner. This dissertation discusses several on-demand operations, challenges associated with them, and system designs that meet these challenges. Specifically, we design and implement techniques for 1) on-demand group monitoring that allows users and administrators of an infrastructure to query and aggregate the upto-date state of the machines (e.g., CPU utilization) in one or multiple groups, 2) on-demand storage for intermediate data generated by dataflow programming paradigms running in clouds, 3) on-demand Grid scheduling that makes worker-centric scheduling decisions based on the current availability of compute nodes, and 4) on-demand key/value pair lookup that is overlay-independent