Results 1 - 10
of
97
A Scalable Distributed Information Management System
- In Proc SIGCOMM
, 2003
"... We present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary ..."
Abstract
-
Cited by 127 (18 self)
- Add to MetaCart
We present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary views of global information. To serve as a basic building block, a SDIMS should have four properties: scalability to many nodes and attributes, flexibility to accommodate a broad range of applications, administrative isolation for security and availability, and robustness to node and network failures. We design, implement and evaluate a SDIMS that (1) leverages Distributed Hash Tables (DHT) to create scalable aggregation trees, (2) provides flexibility through a simple API that lets applications control propagation of reads and writes, (3) provides administrative isolation through simple extensions to current DHT algorithms, and (4) achieves robustness to node and network reconfigurations through lazy reaggregation, on-demand reaggregation, and tunable spatial replication. Through extensive simulations and micro-benchmark experiments, we observe that our system is an order of magnitude more scalable than existing approaches, achieves isolation properties at the cost of modestly increased read latency in comparison to flat DHTs, and gracefully handles failures.
Design and Implementation Tradeoffs for Wide-Area Resource Discovery
- In Proceedings of 14th IEEE Symposium on High Performance, Research Triangle Park
, 2005
"... We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intra-group, inter-group, and per-node c ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intra-group, inter-group, and per-node characteristics, along with the utility that the application derives from specified ranges of metric values. This design gives users the flexibility to find geographically distributed resources for applications that are sensitive to both node and network characteristics, and allows the system to rank acceptable configurations based on their quality for that application. Rather than evaluating a single implementation of SWORD, we explore a variety of architectural designs that deliver the required functionality in a scalable and highly-available manner. We discuss the tradeoffs of using a centralized architecture as compared to a fully decentralized design to perform wide-area resource discovery. To summarize our results, we found that a centralized architecture based on 4-node server cluster sites at network peering facilities outperforms a decentralized DHT-based resource discovery infrastructure with respect to query latency for all but the smallest number of sites. However, although a centralized architecture shows significant promise in stable environments, we find that our decentralized implementation has acceptable performance and also benefits from the DHT’s self-healing properties in more volatile environments. We evaluate the advantages and disadvantages of centralized and distributed resource discovery architectures on 1000 hosts in emulation and on approximately 200 PlanetLab nodes spread across the Internet.
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
, 2003
"... We present MRNet, a software-based multicast/reduction network for building scalable performance and system administration tools. MRNet supports multiple simultaneous, asynchronous collective communication operations. ..."
Abstract
-
Cited by 40 (8 self)
- Add to MetaCart
We present MRNet, a software-based multicast/reduction network for building scalable performance and system administration tools. MRNet supports multiple simultaneous, asynchronous collective communication operations.
Cassandra- A Decentralized Structured Storage System
"... Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread acr ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency. 1.
Netbait: a Distributed Worm Detection Service
, 2002
"... This paper presents Netbait, a planetary-scale service for distributed detection of Internet worms. Netbait allows users to pose queries that identify which machines on a given network have been compromised based on the collective view of a geographically distributed set of machines. It is based on ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
This paper presents Netbait, a planetary-scale service for distributed detection of Internet worms. Netbait allows users to pose queries that identify which machines on a given network have been compromised based on the collective view of a geographically distributed set of machines. It is based on a distributed query processing architecture that evaluates queries expressed using a subset of SQL against a single logical database table. This single logical table is realized using a distributed set of relational databases, each populated by local intrusion detection systems running on Netbait server nodes. For speed, queries in Netbait are processed in parallel by distributing them over dynamically constructed query processing trees built over Tapestry, a distributed object and location routing (DOLR) layer. For e#ciency, query results are compressed using application-specific aggregation and compact encodings.
Efficient hierarchical parallel genetic algorithms using grid computing
- Future Generation Computer Systems
, 2007
"... In this paper, we present an efficient Hierarchical Parallel Genetic Algorithm framework using Grid computing (GE-HPGA). The framework is developed using standard Grid technologies and has two distinctive features, 1) an extended GridRPC API to conceal the high complexity of Grid environment, and 2) ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
In this paper, we present an efficient Hierarchical Parallel Genetic Algorithm framework using Grid computing (GE-HPGA). The framework is developed using standard Grid technologies and has two distinctive features, 1) an extended GridRPC API to conceal the high complexity of Grid environment, and 2) a metascheduler for seamless resource discovery and selection. To assess the practicality of the framework, theoretical analysis on the possible speed-up offered is presented. Empirical study on GE-HPGA using a benchmark problem and a realistic aerodynamic airfoil shape optimization problem for diverse Grid environments having different communication protocols, cluster sizes, processing nodes, at geographically disparate locations also indicates that the proposed GE-HPGA using Grid computing offers a credible framework for providing significant speed-up to evolutionary design optimization in science and engineering.
IrisNet: An architecture for enabling sensor-enriched Internet service
, 2002
"... The proliferation and affordability of webcams and other smart sensors have created opportunities for novel sensor-enriched Internet services, which combine traditional data sources with information collected from live sensor feeds. This paper describes the design, architecture, and implementation o ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
The proliferation and affordability of webcams and other smart sensors have created opportunities for novel sensor-enriched Internet services, which combine traditional data sources with information collected from live sensor feeds. This paper describes the design, architecture, and implementation of IrisNet, the first generalpurpose software infrastructure tailored to the unique demands of worldwide sensing services. IrisNet provides service authors with a very high-level abstraction of the underlying system, to ease authoring of new services. For scalability and decreased bandwidth consumption, IrisNet pushes both sensor feed processing and queries close to the sensor nodes. IrisNet provides distributed query processing, data partitioning, caching, load balancing, and replication schemes optimized for sensor-enriched Internet services. This paper reports on experiments with a working IrisNet prototype running two example services that demonstrate the effectiveness of IrisNet’s features in achieving scalability and reducing query response times.
Fingerprinting the datacenter: Automated classification of performance crises
- In Proceedings of EuroSys’10
, 2010
"... Contemporary datacenters comprise hundreds or thousands of machines running applications requiring high availability and responsiveness. Although a performance crisis is easily detected by monitoring key end-to-end performance indicators (KPIs) such as response latency or request throughput, the var ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Contemporary datacenters comprise hundreds or thousands of machines running applications requiring high availability and responsiveness. Although a performance crisis is easily detected by monitoring key end-to-end performance indicators (KPIs) such as response latency or request throughput, the variety of conditions that can lead to KPI degradation makes it difficult to select appropriate recovery actions. We propose and evaluate a methodology for automatic classification and identification of crises, and in particular for detecting whether a given crisis has been seen before, so that a known solution may be immediately applied. Our approach is based on a new and efficient representation of the datacenter’s state called a fingerprint, constructed by statistical selection and summarization of the hundreds of performance metrics typically collected on such systems. Our evaluation uses 4 months of trouble-ticket data from a production datacenter with hundreds of machines running a 24x7 enterprise-class user-facing application. In experiments in a realistic and rigorous operational setting, our approach provides operators the information necessary to initiate recovery actions with 80 % correctness in an average of 10 minutes, which is 50 minutes earlier than the deadline provided to us by the operators. To the best of our knowledge this is Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
On-demand overlays for distributed system management
- In Proceedings of USENIX WORLDS
, 2005
"... This paper presents the management overlay network (MON) system that we are building and running on the PlanetLab testbed. MON is a distributed system designed to facilitate the management of large distributed applications. Toward this goal, MON builds on-demand overlay structures that allow users t ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper presents the management overlay network (MON) system that we are building and running on the PlanetLab testbed. MON is a distributed system designed to facilitate the management of large distributed applications. Toward this goal, MON builds on-demand overlay structures that allow users to execute instant management commands, such as query the current status of the application, or push software updates to all the nodes. The on-demand approach enables MON to be light-weight, requiring minimum amount of resources when no commands are executed. It also frees MON from complex failure repair mechanisms, since no overlay structure is maintained for a prolonged time. MON is currently running on more than 300 nodes on the Planet-Lab. Our initial experiments on the PlanetLab show that MON has good performance, both in terms of command response time and achieved bandwidth for software push. 1
Tolerating Correlated Failures in Wide-Area Monitoring Services
, 2004
"... OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in

