Results 1  10
of
14
Distributed Graph Simulation: Impossibility and Possibility
"... This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel comput ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size Q  of query Q, and (b) data shipment if its total amount of data shipped is determined by Q and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2) However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on Q, Fm  and the number Vf  of nodes in G with edges across different fragments; and its data shipment depends on Q and the number Ef  of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms. 1.
ProgramCentric Cost Models for Locality and Parallelism
, 2013
"... Good locality is critical for the scalability of parallel computations. Many cost models that quantify locality and parallelism of a computation with respect to specific machine models have been proposed. A significant drawback of these machinecentric cost models is their lack of portability. Sinc ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Good locality is critical for the scalability of parallel computations. Many cost models that quantify locality and parallelism of a computation with respect to specific machine models have been proposed. A significant drawback of these machinecentric cost models is their lack of portability. Since the design and analysis of good algorithms in most machinecentric cost models is a nontrivial task, lack of portability can lead to a significant wastage of design effort. Therefore, a machineindependent portable cost model for locality and parallelism that is relevant to a broad class of machines can be a valuable guide for the design of portable and scalable algorithms as well as for understanding the complexity of problems. This thesis addresses the problem of portable analysis by presenting programcentric metrics for measuring the locality and parallelism of nestedparallel programs written for shared memory machines – metrics based solely on the program structure without reference to machine parameters such as processors, caches and connections. The metrics we present for this purpose are the parallel cache com
Scalable Big Graph Processing in MapReduce
"... MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and faulttolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce ClassMMC to d ..."
Abstract
 Add to MetaCart
MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and faulttolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce ClassMMC to define the memory consumption, communication cost, CPU cost, and number of MapReduce rounds for an algorithm to execute in MapReduce. However, neither of them is designed for big graph processing in MapReduce, since the constraints inMMC can be hardly achieved simultaneously on graphs and the conditions inMRC may induce scalability problems when processing big graph data. In this paper, we study scalable big graph processing in MapReduce. We introduce a Scalable Graph processing Class SGC by relaxing some constraints inMMC to make it suitable for scalable graph processing. We define two graph join operators in SGC, namely, EN join andNE join, using which a wide range of graph algorithms can be
Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees
"... Graphs in real life applications are often huge, such as the Web graph and various social networks. These massive graphs are often stored and processed in distributed sites. In this paper, we study graph algorithms that adopt Google’s Pregel, an iterative vertexcentric framework for graph processin ..."
Abstract
 Add to MetaCart
Graphs in real life applications are often huge, such as the Web graph and various social networks. These massive graphs are often stored and processed in distributed sites. In this paper, we study graph algorithms that adopt Google’s Pregel, an iterative vertexcentric framework for graph processing in the Cloud. We first identify a set of desirable properties of an efficient Pregel algorithm, such as linear space, communication and computation cost per iteration, and logarithmic number of iterations. We define such an algorithm as a practical Pregel algorithm (PPA). We then propose PPAs for computing connected components (CCs), biconnected components (BCCs) and strongly connected components (SCCs). The PPAs for computing BCCs and SCCs use the PPAs of many fundamental graph problems as building blocks, which are of interest by themselves. Extensive experiments over large real graphs verified the efficiency of our algorithms. 1.
bases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Re
"... We introduce three formal models of distributed systems for query evaluation on massive data ..."
Abstract
 Add to MetaCart
We introduce three formal models of distributed systems for query evaluation on massive data
Binary ThetaJoins using MapReduce: Efficiency Analysis and Improvements
"... We deal with binary thetajoins in a MapReduce environment, and we make two contributions. First, we show that the best known algorithm to date for this problem can reach the optimal tradeo ↵ between the size of the input a reducer can receive and the incurred communication cost when the join selec ..."
Abstract
 Add to MetaCart
We deal with binary thetajoins in a MapReduce environment, and we make two contributions. First, we show that the best known algorithm to date for this problem can reach the optimal tradeo ↵ between the size of the input a reducer can receive and the incurred communication cost when the join selectivity is high. Second, when the join selectivity is low, we present improvements upon the stateoftheart with a view to decreasing the communication cost and the maximum load a reducer can receive, taking also into account the load imbalance across the reducers. 1.
Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees
"... Graphs in real life applications are often huge, such as the Web graph and various social networks. These massive graphs are often stored and processed in distributed sites. In this paper, we study graph algorithms that adopt Google’s Pregel, an iterative vertexcentric framework for graph processin ..."
Abstract
 Add to MetaCart
(Show Context)
Graphs in real life applications are often huge, such as the Web graph and various social networks. These massive graphs are often stored and processed in distributed sites. In this paper, we study graph algorithms that adopt Google’s Pregel, an iterative vertexcentric framework for graph processing in the Cloud. We first identify a set of desirable properties of an efficient Pregel algorithm, such as linear space, communication and computation cost per iteration, and logarithmic number of iterations. We define such an algorithm as a practical Pregel algorithm (PPA). We then propose PPAs for computing connected components (CCs), biconnected components (BCCs) and strongly connected components (SCCs). The PPAs for computing BCCs and SCCs use the PPAs of many fundamental graph problems as building blocks, which are of interest by themselves. Extensive experiments over large real graphs verified the efficiency of our algorithms. 1.
IEEE TRANSACTIONS ON COMPUTERS, TC2013120869 1 ProximityAware LocalRecoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud
"... Abstract—Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacysensitive information, which brin ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacysensitive information, which brings about privacy concerns potentially if the information is released or shared to thirdparties in cloud. A practical and widelyadopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to smallscale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the localrecoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximityaware clustering problem. A scalable twophase clustering approach consisting of a tancestors clustering (similar to kmeans) algorithm and a proximityaware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing dataparallel computation in cloud. Extensive experiments on reallife data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the timeefficiency of localrecoding anonymization over existing approaches.
Noname manuscript No. (will be inserted by the editor) Scale and Object Aware Image Thumbnailing
"... Abstract In this paper we study effective approaches to create thumbnails from input images. Since a thumbnail will eventually be presented to and perceived by a human visual system, a thumbnailing algorithm should consider several important issues in the process including thumbnail scale, object co ..."
Abstract
 Add to MetaCart
Abstract In this paper we study effective approaches to create thumbnails from input images. Since a thumbnail will eventually be presented to and perceived by a human visual system, a thumbnailing algorithm should consider several important issues in the process including thumbnail scale, object completeness and local structure smoothness. To address these issues, we propose a new thumbnailing framework named Scale and Object Aware Thumbnailing (SOAT), which contains two components focusing respectively on saliency measure and thumbnail warping/cropping. The first component, named Scale and Object Aware Saliency (SOAS), models the human perception of thumbnails using visual acuity theory, which takes thumbnail scale into consideration. In addition, the “objectness” measurement (Alexe et al) is integrated in SOAS, as to preserve object completeness. The second component uses SOAS to guide the thumbnailing based on either retargeting or cropping. The retargeting version uses the ThinPlateSpline (TPS) warping for preserving structure smoothness. An extended seam carving algorithm is developed to sample control points used for TPS model estimation. The cropping version searches a cropping window that balances the spatial efficiency and SOASbased content preservation. The proposed algorithms were evaluated in three experiments: a quantitative user study to evaluate thumbnail browsing efficiency, a quantitative user study for subject preference, and a qualitative study on the RetargetMe dataset. In all studies, SOAT demonstrated promising performances in comparison with stateoftheart algorithms.
MapReduce Based Location Selection Algorithm for Utility Maximization with Capacity Constraints
"... Given a set of facility objects and a set of client objects, where each client is served by her nearest facility and each facility is constrained by a service capacity, we study how to find all the locations on which if a new facility with a given capacity is established, the number of served client ..."
Abstract
 Add to MetaCart
Given a set of facility objects and a set of client objects, where each client is served by her nearest facility and each facility is constrained by a service capacity, we study how to find all the locations on which if a new facility with a given capacity is established, the number of served clients is maximized (in other words, the utility of the facilities is maximized). This problem is intrinsically difficult. An existing algorithm with an exponential complexity is not scalable and cannot handle this problem on large data sets. Therefore, we propose to solve the problem through parallel computing, in particular using MapReduce. We propose an arcbased method to divide the search space into disjoint partitions. For load balancing, we propose a dynamic strategy to assign partitions to reduce tasks so that the estimated load difference is within a threshold. We conduct extensive experiments using both real and synthetic data sets of large sizes. The results demonstrate the efficiency and scalability of the algorithm. Chapter 1