Results 1 - 10
of
20
MapReduce: simplified data processing on large clusters
- OSDI’04: PROCEEDINGS OF THE 6TH CONFERENCE ON SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
, 2004
"... MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with t ..."
Abstract
-
Cited by 913 (3 self)
- Add to MetaCart
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day. 1
Deeper inside pagerank
- Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract
-
Cited by 107 (4 self)
- Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
Collaborative Filtering: A Machine Learning Perspective
, 2004
"... Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing method ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing methods for the task of rating prediction from a machine learning perspective. We show that many existing methods proposed for this task are simple applications or modi cations of one or more standard machine learning methods for classifi cation, regression, clustering, dimensionality reduction, and density estimation. We introduce new prediction methods in all of these classes. We introduce a new experimental procedure for testing stronger forms of generalization than has been used previously. We implement a total of nine prediction methods, and conduct large scale prediction accuracy experiments. We show interesting new results on the relative performance of these methods.
Energy-Efficient Real-Time Heterogeneous Server Clusters
- In Proceedings of RTAS
, 2006
"... With increasing costs of energy consumption and cooling, power management in server clusters has become an increasingly important design issue. Current clusters for real-time applications are designed to handle peak loads, where all servers are fully utilized. In practice, peak load conditions rarel ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
With increasing costs of energy consumption and cooling, power management in server clusters has become an increasingly important design issue. Current clusters for real-time applications are designed to handle peak loads, where all servers are fully utilized. In practice, peak load conditions rarely happen and clusters are most of the time underutilized. This creates the opportunity for using slower frequencies, and thus smaller energy consumption, with little or no impact on the Quality of Service (QoS), for example, performance and timeliness. In this work we present a cluster-wide QoS-aware technique that dynamically reconfigures the cluster to reduce energy consumption during periods of reduced load. Moreover, we also investigate the effects of local QoS-aware power management using Dynamic Voltage Scaling (DVS). Since most real-world clusters consist of machines of different kind (in terms of both performance and energy consumption) we focus on heterogeneous clusters. For validation, we describe and evaluate an implementation of the proposed scheme using the Apache Webserver in a small realistic cluster. Our experimental results show that using our scheme it is possible to save up to 45 % of the total energy consumed by the servers, maintaining average response times within the specified deadlines and number of dropped requests within the required amount. 1
Enabling service adaptability with versatile anycast. Concurrency and Computation: Practice and Experience
, 1837
"... We present versatile anycast, which allows a service running on a varying collection of nodes scattered over a wide-area network to present itself to the clients as one running on a single node. Providing a single logical address enables the client-side software to preserve the traditional service a ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We present versatile anycast, which allows a service running on a varying collection of nodes scattered over a wide-area network to present itself to the clients as one running on a single node. Providing a single logical address enables the client-side software to preserve the traditional service access model based on single access points. At the same time, the dynamic composition of anycast groups implemented by versatile anycast enables the server-side service infrastructure to evolve and adapt to changing network conditions. We implement versatile anycast using Mobile IPv6, which decouples the logical addresses of mobile nodes from their physical location. We exploit that decoupling to implement logical service addresses that are not bound to any physical nodes, and employ standard MIPv6 mechanisms to dynamically map each such address onto individual service nodes. Our solution enables a service to transparently hand off clients among the service nodes at the network level while preserving optimal routing between the clients and the service nodes. We demonstrate that the overhead of versatile anycasting is very low. In particular, the client-perceived handoff time is shown to be a linear function of the latencies among the client and the service nodes participating in the handoff. 1
Rainbow: Cost-Effective Software Architecture-based Self-Adaptation
, 2008
"... preferences, strategy, tactic, architectural operatorTo My almighty God My dear Family Modern, complex software systems (e-commerce, IT, critical infrastructures, etc.) are increasingly required to continue operation in the face of change, to selfadapt to accommodate shifting user priorities, resour ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
preferences, strategy, tactic, architectural operatorTo My almighty God My dear Family Modern, complex software systems (e-commerce, IT, critical infrastructures, etc.) are increasingly required to continue operation in the face of change, to selfadapt to accommodate shifting user priorities, resource variability, changing environments, and component failures. While manual oversight benefits from global problem contexts and flexible policies, human operators are costly and prone to error. Low-level, embedded mechanisms (exceptions, time-outs, etc.) are effective and timely for error recovery, but are local in scope to the point-of-failure, applicationspecific, and costly to modify when adaptation objectives change. An ideal solution leverages domain expertise, provides an end-to-end system perspective, adapts the target system in a timely manner, and can be engineered cost-effectively. Architecture-based self-adaptation closes the “loop of control, ” using external
Guided Google: A Meta Search Engine and its Implementation Using the Google Distributed Web Services
- Web Services”, International Journal of Computers and Applications, Volume 26, No.1, ISSN: 1206-212X (202), ACTA
, 2004
"... With the ubiquity of the Internet and Web, search engines have been sprouting like mushrooms after a rainfall. However, innovative search engines and guided search capabilities have started appearing only in recent years. For instance, Google, which is one of the popular search engines, supports Web ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
With the ubiquity of the Internet and Web, search engines have been sprouting like mushrooms after a rainfall. However, innovative search engines and guided search capabilities have started appearing only in recent years. For instance, Google, which is one of the popular search engines, supports Web Services that allow external applications to issue Web search queries that are actually processed using a Google's commodity cluster computer made up of 15,000 PC nodes. The goals of these applications are to help ease and guide the searching efforts of novice web users towards their desired objectives. A number of implementations of such services are emerging. This paper proposes a guided meta-search engine, called Guided Google that serves as an advanced interface to the actual Google.com search engine.
Physically Constrained Architecture for Chip Multiprocessors
, 2006
"... Recent product announcements show a clear trend towards aggressive integration of multiple cores on a single chip. This kind of architecture is called a “chip multiprocessor ” or CMP. By taking advantage of thread level parallelism, CMP can achieve better performance/power scalability with technolog ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recent product announcements show a clear trend towards aggressive integration of multiple cores on a single chip. This kind of architecture is called a “chip multiprocessor ” or CMP. By taking advantage of thread level parallelism, CMP can achieve better performance/power scalability with technology than single core architectures. However, this trend presents an expansive design space for chip architects, encompassing number of cores per die, core size and complexity (pipeline depth and superscalar width), core type (in-order and out-of-order, single-threaded or multi-threaded), memory hierarchy and interconnection fabric design, operating voltage and frequency, and so on. These choices are especially difficult because all the variables of interest are inter-related and must be considered simultaneously. Furthermore, trade-offs among these design choices vary depending both on workloads and physical constraints like power, area and thermal constraints. Ignoring any of these physical constraints at early design stage may lead to significant performance loss. In this dissertation I explore this multi-dimensional design space across a range of possible physical constraints, for multiple categories of workloads. To assist this design space exploration, a validated systematic infrastructure is designed to help accelerate CMP simulation. I believe this

