Results 1 - 10
of
11
ProtoPeer: A P2P Toolkit Bridging the Gap Between Simulation and Live Deployment ABSTRACT
"... Simulators are a commonly used tool in peer-to-peer systems research. However, they may not be able to capture all the details of a system operating in a live network. Transitioning from the simulation to the actual system implementation is a non-trivial and time-consuming task. We present ProtoPeer ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
Simulators are a commonly used tool in peer-to-peer systems research. However, they may not be able to capture all the details of a system operating in a live network. Transitioning from the simulation to the actual system implementation is a non-trivial and time-consuming task. We present ProtoPeer, a peer-to-peer systems prototyping toolkit that allows for switching between the event-driven simulation and live network deployment without changing any of the application code. ProtoPeer defines a set of APIs for message passing, message queuing, timer operations as well as overlay routing and managing the overlay neighbors. Users can plug in their own custom implementations of most of the parts of ProtoPeer including custom network models for simulation and custom message passing over different network stacks. ProtoPeer is not only a framework for building systems but also for evaluating them. It has a unified system-wide infrastructure for event injection, measurement logging, measurement aggregation and managing evaluation scenarios. The simulator scales to tens of thousands of peers and gives accurate predictions closely matching the live network measurements.
Web Text Retrieval with a P2P Query-Driven Index
- In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’07
, 2007
"... In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. However, as the number of generated term combinations can still become quite large, we also use term statistics extracted from available query logs to index only such combinations that are frequently present in user queries. Thus, by avoiding the generation of superfluous indexing term combinations, we achieve an additional substantial reduction in bandwidth and storage consumption. As a result, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users. More precisely, our theoretical analysis and experimental results indicate that, at the price of a marginal loss in retrieval quality for rare queries, the generated index size and network traffic remain manageable even for web-size document collections. Furthermore, our experiments show that at the same time the achieved retrieval quality is fully comparable to the one obtained with a state-of-the-art centralized query engine.
ALVIS Peers: A Scalable Full-text Peer-to-Peer Retrieval Engine
, 2006
"... We present Alvis peers, a full-text P2P retrieval engine designed to offer retrieval performance comparable to centralized solutions while scaling to a very large number of peers. It is the result of our research efforts within the project Alvis 1 that aims at building a truly-distributed semantic s ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We present Alvis peers, a full-text P2P retrieval engine designed to offer retrieval performance comparable to centralized solutions while scaling to a very large number of peers. It is the result of our research efforts within the project Alvis 1 that aims at building a truly-distributed semantic search engine. To cope with problem of unscalable bandwidth consumption in the P2P network, the engine implements a novel retrieval model that indexes highly-discriminative keys (HDKs)—terms and term sets appearing in a limited number of collection documents. Our prototype is a fully-functional retrieval engine built over a structured P2P network. It includes a component for HDK-based indexing and retrieval, and a distributed content-based ranking module. Such an integrated system represents a substantial contribution to the design and development of realistic P2P retrieval systems.
Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval
- In Infoscale
, 2007
"... We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index tha ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index that stores up to top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can be very large, we propose to use query statistics to index only such combinations that are indeed frequently requested by the users. Thus, by avoiding the maintenance of superfluous indexing information, we achieve a substantial reduction in bandwidth and storage. A specific activation mechanism is applied to continuously update the indexing information according to changes in the query distribution, resulting in an efficient, constantly evolving query-driven indexing structure. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for websize document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. Moreover, our experiments confirm that the retrieval performance is only slightly lower than the one obtained with state-of-the-art centralized query engines.
LANES: An Inter-Domain Data-Oriented Routing Architecture
"... Data-oriented networking has attracted research recently, but the efficiency of the state-of-the-art solutions can still be improved. Our work towards this goal is set in a cleanslate architecture consisting of modular rendezvous, routing, and forwarding functions. In this paper we present the inter ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Data-oriented networking has attracted research recently, but the efficiency of the state-of-the-art solutions can still be improved. Our work towards this goal is set in a cleanslate architecture consisting of modular rendezvous, routing, and forwarding functions. In this paper we present the inter-domain routing layer and its interplay with the other components of the system. The proposed system is built around two types of nodes: forwarding nodes and branching nodes. The forwarding nodes are optimized for throughput with no per-subscription state and no need to change passing packets, while branching nodes contain a large memory for caching and can make complex routing decisions. The amount of storage space and bandwidth can be independently scaled to suit the needs of each network. In the background, topology nodes perform load-balancing and configure routes in each domain using a two-dimensional addressing mechanism. The paths taken by packets adapt to the number of active subscribers to keep the amount of innetwork state and latency low. A new data-oriented congestion control scheme is introduced, which takes into account the use of storage resources on-path and is fair to multicast flows.
AlvisP2P: Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network ∗
"... In this paper we present the AlvisP2P IR engine, which enables efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. In such a network, each peer publishes its local index and invests a part of its local computing resources (storage, CPU, bandwi ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
In this paper we present the AlvisP2P IR engine, which enables efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. In such a network, each peer publishes its local index and invests a part of its local computing resources (storage, CPU, bandwidth) to maintain a fraction of a global P2P index. This investment is rewarded by the network-wide accessibility of the local documents via the global search facility. The AlvisP2P engine uses an optimized overlay network and relies on novel indexing/retrieval mechanisms that ensure low bandwidth consumption, thus enabling unlimited network growth. Our demonstration shows how an easy-to-install AlvisP2P client can be used to join an existing P2P network, index local (text or even multimedia) documents with collectionspecific indexing mechanisms, and control access rights to them. 1.
Improving the Throughput of Distributed Hash Tables . . .
"... Advanced applications for Distributed Hash Tables ..."
ProtoPeer: Bridging the Gap Between Simulation and Live Deployment
- In Proceedings of the 2nd International Conference on Simulation Tools and Techniques
, 2009
"... Simulators are a commonly used tool in peer-to-peer sys-tems research. However, they may not be able to capture all the details of a system operating in a live in the net-work. Transitioning from simulation to the actual system implementation is a non-trivial and time-consuming task. We present Prot ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Simulators are a commonly used tool in peer-to-peer sys-tems research. However, they may not be able to capture all the details of a system operating in a live in the net-work. Transitioning from simulation to the actual system implementation is a non-trivial and time-consuming task. We present ProtoPeer, a peer-to-peer systems pro-totyping toolkit that allows for switching between the event-driven simulation and live network deployment without changing any of the application code. ProtoPeer exports a set of APIs for message passing, message queu-ing, timer operations as well as overlay routing and man-aging the overlay neighbors. Users can plug in their own custom implementations of most of the parts of ProtoPeer including custom network models for simulation and cus-tom message passing over transports other than the de-fault TCP/UDP. Applications implemented using ProtoPeer are divided into modules each encapsulating a separate piece of the message passing functionality. The modules can be reused and composed to achieve the desired system be-havior. ProtoPeer has a unified system-wide infrastruc-ture for measurement logging and aggregation, event in-jection and managing evaluation scenarios. The simula-tor scales to tens of thousands of peers and gives accurate predictions closely matching the live network measure-ments. 1
Handling very large . . .
"... The principal service of Distributed Hash Tables (DHTs) is route(id, data), which sends data to a peer responsible for id, using typically O(log( # of peers)) overlay hops. Certain applications like peer-to-peer information retrieval generate billions of small messages that are concurrently inserted ..."
Abstract
- Add to MetaCart
The principal service of Distributed Hash Tables (DHTs) is route(id, data), which sends data to a peer responsible for id, using typically O(log( # of peers)) overlay hops. Certain applications like peer-to-peer information retrieval generate billions of small messages that are concurrently inserted into a DHT. These applications can generate messages faster than the DHT can process them. To support such demanding applications, a DHT needs a congestion control mechanism to efficiently handle high loads of messages. In this paper we provide an extended study on congestion control for DHTs: we present a theoretical analysis that demonstrates that congestion control for DHTs is absolutely necessary for applications that provide elastic traffic. We then present a new congestion control algorithm for DHTs. We provide extensive live evaluations in a ModelNet cluster and the PlanetLab test bed, which show that our algorithm is nearly loss-free, fair, and provides low lookup times and high throughput under cross-load.
Overload Control for a System in Time-Varying Environment
, 2013
"... In recent papers we considered how two large service systems that are primarily designed to operate independently, can help each other in face of unexpected overloads, due to a sudden change in the arrival rates. We proposed an overload control, which we named fixed-queue-ratio with thresholds (FQR- ..."
Abstract
- Add to MetaCart
(Show Context)
In recent papers we considered how two large service systems that are primarily designed to operate independently, can help each other in face of unexpected overloads, due to a sudden change in the arrival rates. We proposed an overload control, which we named fixed-queue-ratio with thresholds (FQR-T), whose aim was to prevent any sharing of customers, i.e., sending customers from one class to be served in the other class ’ pool, during normal loads, and to initiate sharing automatically once a threshold is crossed, in which case the corresponding pool is considered overloaded. The goal is to keep the relation between the two queues fixed at a certain ratio, which is optimal in a deterministic “fluid” approximation, assuming a holding cost is incurred on the two queues. To avoid harmful sharing our control includes the one-way sharing rule, stipulating that sharing is allowed in only one direction at any time. In this paper we consider a more complex time-varying environment, in which the arrival rates and staffing levels are time dependent, so that the system may fluctuate between periods of various loads, with overloads possible in either direction. We show that FQR-T needs to be modified to account for these more complex settings, since it may be slow to react to the changing environment, and may even cause sever fluctuations once the arrival rates return to normal after an overload incident. Our new control, FQR with activation-and-release thresholds (FQR-ART) is designed to automatically respond to changes in the environment by initiating sharing in the right direction quickly, if that is needed, while avoiding harmful phenomenons, such as congestion collapse and severe oscillations during normal loads. A novel fluid approximation, described implicitly via an ordinary-differential equation (ODE) is developed, as well as an efficient algorithm to solve that ODE. 1