Results 1  10
of
20
1 Parallel Spectral Clustering in Distributed Systems
"... Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as kmeans. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform cluster ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as kmeans. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through
Heuristics for Work Distribution of a Homogeneous Parallel Dynamic Programming Scheme on Heterogeneous Systems”, Proceedings of the 3 rd International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks (HeteroPar’04
 MS in mathematics and engineering from the Moscow Aviation Institute in 1980 and his PhD in engineering from HeavyMachinery Research Institute in
"... Abstract — In this paper the possibility of including automatic optimization techniques in the design of parallel dynamic programming algorithms in heterogeneous systems is analyzed. The main idea is to automatically approach the optimum values of a number of algorithmic parameters (number of proces ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract — In this paper the possibility of including automatic optimization techniques in the design of parallel dynamic programming algorithms in heterogeneous systems is analyzed. The main idea is to automatically approach the optimum values of a number of algorithmic parameters (number of processes, number of processors, processes per processor), and thus obtain low execution times. Hence, users could be provided with routines which execute efficiently, and independently of the experience of the user in heterogeneous computing and dynamic programming, and which can adapt automatically to a new network of processors or a new network configuration. I.
Scalable Node Level Computation Kernels for Parallel Exact Inference
"... In this paper, we investigate data parallelism in exact inference with respect to arbitrary junction trees. Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically with clique width and the number of states of random varia ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
In this paper, we investigate data parallelism in exact inference with respect to arbitrary junction trees. Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically with clique width and the number of states of random variables. We study potential table representation and scalable algorithms for node level primitives. Based on such node level primitives, we propose computation kernels for evidence collection and evidence distribution. A data parallel algorithm for exact inference is presented using the proposed computation kernels. We analyze the scalability of node level primitives, computation kernels and the exact inference) algorithm using the coarse grained multicomputer (CGM) model. According to the analysis, we achieve O NdCwC ∏ wC j=1 rC,j/P local computation time and O(N) global communication rounds using ∏wC j=1 rC,j, where N is the number of cliques in the junction tree; dC is the clique degree; rC,j is the P processors, 1 ≤ P ≤ maxC number of states of the jth random variable in C; wC is the clique width; and ws is the separator width. We implemented the proposed algorithm on stateoftheart clusters. Experimental results show that the proposed algorithm exhibits almost linear scalability over a wide range.
Ranking and Semisupervised Classification on Large Scale Graphs Using MapReduce
"... Label Propagation, a standard algorithm for semisupervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from realworld datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Label Propagation, a standard algorithm for semisupervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from realworld datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be implemented as a scalable parallel algorithm using the mapreduce framework. In addition to semisupervised classification, this approach to Label Propagation allows us to adapt the algorithm to make it usable for ranking on graphs and derive the theoretical connection between Label Propagation and PageRank. We provide empirical evidence to that effect using two natural language tasks – lexical relatedness and polarity induction. The version of the Label Propagation algorithm presented here scales linearly in the size of the data with a constant main memory requirement, in contrast to the quadratic cost of both in traditional approaches. 1
Automatic Task Reorganization in MapReduce
"... Abstract—MapReduce is increasingly considered as a useful parallel programming model for largescale data processing. It exploits parallelism among execution of primitive map and reduce operations. Hadoop is an open source implementation of MapReduce that has been used in both academic research and ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract—MapReduce is increasingly considered as a useful parallel programming model for largescale data processing. It exploits parallelism among execution of primitive map and reduce operations. Hadoop is an open source implementation of MapReduce that has been used in both academic research and industry production. However, its implementation strategy that one map task processes one data block limits the degree of concurrency and degrades performance because of inability to fully utilize available resources. In addition, its assumption that task execution time in each phase does not vary much does not always hold, which makes speculative execution useless. In this paper, we present mechanisms to dynamically split and consolidate tasks to cope with load balancing and break through the concurrency limit resulting from fixed task granularity. For singlejob system, two algorithms are proposed for circumstances where prior knowledge is known and unknown. For multijob case, we propose a modified shortestjobfirst strategy, which minimizes job turnaround time theoretically when combined with task splitting. We compared the effectiveness of our approach to the default task scheduling strategy using both synthesized and tracebased workloads. Simulation results show that our approach improves performance significantly.
Anomaly Localization in LargeScale Clusters
"... Abstract — A critical problem facing by managing largescale clusters is to identify the location of problems in a system in case of unusual events. As the scale of high performance computing (HPC) grows, systems are getting bigger. When a system fails to function properly, healthrelated data are c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — A critical problem facing by managing largescale clusters is to identify the location of problems in a system in case of unusual events. As the scale of high performance computing (HPC) grows, systems are getting bigger. When a system fails to function properly, healthrelated data are collected for troubleshooting. However, due to the massive quantities of information obtained from a large number of components, the root causes of anomalies are often buried like needles in a haystack. In this paper, we present a localization method to automatically find out the potential root causes (i.e. a subset of nodes) of the problem from the overwhelming amount of data collected systemwide. System managers can focus on examining these potential locations, thereby significantly reducing human efforts required for anomaly localization. Our method consists of three interrelated steps: (1) feature collection to assemble a feature space for the system; (2) feature extraction to obtain the most significant features for efficient data analysis by applying the principal component analysis (PCA) algorithm; and (3) outlier detection to quickly identify the nodes that are “far away” from the majority by using the cellbased detection algorithm. Preliminary studies are presented to demonstrate the potential of our method for localizing anomalies in a computing environment where the nodes perform comparable tasks. I.
U N I V E R
"... Parallel Computing has become pervasive and the number of processors placed in computers will further increase in the future. However, software developers are struggling to efficiently exploit the computational resources provided by parallel architectures. It is thus inevitable to investigate the be ..."
Abstract
 Add to MetaCart
Parallel Computing has become pervasive and the number of processors placed in computers will further increase in the future. However, software developers are struggling to efficiently exploit the computational resources provided by parallel architectures. It is thus inevitable to investigate the behaviour of parallel programs and develop methods that help improving their performance. The software developer should not have to worry about how to map parallelism to the underlying architecture, but should instead concentrate on exposing the parallelism and leave the mapping task to the runtime system. In this project, the behaviour of parallel programs in the presence of workload is investigated. It is shown that choosing the right number of threads for an application is crucial to achieve the best performance possible when there is other workload running on the system. The default policy of creating as many threads as there are cores is rarely optimal in this situation and using the optimal number of threads reduces the runtime by 22.5 % on average w.r.t. the default policy. Determining the optimal number
A Distributed Algorithm for Web Content Replication
"... Abstract—Web caching and replication techniques increase accessibility of Web contents and reduce Internet bandwidth requirements. In this paper, we are considering the replica placement problem in a distributed replication group. The replication group consists of servers dedicating certain amount o ..."
Abstract
 Add to MetaCart
Abstract—Web caching and replication techniques increase accessibility of Web contents and reduce Internet bandwidth requirements. In this paper, we are considering the replica placement problem in a distributed replication group. The replication group consists of servers dedicating certain amount of memory for replicating objects. The replica placement problem is to place the replica at the servers within the replication group such that the access time over all objects and servers is minimized. We design a distributed 2approximation algorithm that solves this optimization problem. We show that the communication and computational complexity of the algorithm is polynomial in the number of servers and objects. We perform simulation experiments to investigate the performance of our algorithm. I.
PARALLEL ALGORITHMS FOR LARGE SCALE Macroeconomic Models
, 2007
"... Macroeconometric models with forwardlooking variables give raise to very large systems of equations that requires heavy computations. These models was influenced by the development of new and efficient computational techniques and they are an interesting testing ground for the numerical methods add ..."
Abstract
 Add to MetaCart
Macroeconometric models with forwardlooking variables give raise to very large systems of equations that requires heavy computations. These models was influenced by the development of new and efficient computational techniques and they are an interesting testing ground for the numerical methods addressed in this research. The most difficult problem in solving such models is to obtain the solution of the linear system that arises during the Newton step. For this purpose we have used both direct methods based on matrix factorization and nonstationary iterative methods, also called Krylov methods that provide an interesting alternative to the direct methods. In this paper we present performance results of both serial and parallel versions of the algorithms involved in solving these models. Although parallel implementation of the most dense linear algebra operations is a well understood process, the availability of general purpose, high performance parallel dense linear algebra libraries is limited by the complexity of implementation. This paper describes PLSS – (Parallel Linear System Solver) a library which provides routines for linear system solving with an interface easy to use, that mirrors the natural description of sequential linear algebra algorithms.
HIGH PERFORMANCE RECORD LINKAGE
"... In current world, the immense size of a data set makes problems in finding similar/identitcal data. In addition, the dirtiness of data, i.e. typos, missing/tilting information, and additional noises usually occurred by careless editing or entry mistakes, makes further difficulty to identify entityb ..."
Abstract
 Add to MetaCart
In current world, the immense size of a data set makes problems in finding similar/identitcal data. In addition, the dirtiness of data, i.e. typos, missing/tilting information, and additional noises usually occurred by careless editing or entry mistakes, makes further difficulty to identify entitybelongs. Therefore, we focus on the faster detection of data referring the same realworld entity from a large size data set under the error prone environments, while the high accuracy of detection is maintained. In this thesis, we study highperformance linkage algorithms using four different applications. First, we introduce the image linkage algorithm to find nearduplicate images with similar characteristics by bridging two seemingly unrelated fields – Multimedia Information Retrieval and Biology. Under this idea, we study how various image features and gene sequence generation methods affect the accuracy and performance of detecting nearduplicate images. Second, we develop the video linkage algorithm using record linkage methods to detect copied videos from a large multimedia database or sites such as YouTube and Yahoo Videos. The utilization of video characteristics is reflected to the hierarchical structure of