Results 1 - 10
of
74
Web-Scale Distributional Similarity and Entity Set Expansion
"... Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 2 ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance
Experiences in tuning performance of hybrid MPI/OpenMP applications on quad-core systems
- In Proc. of 10th LCI Int’l Conference on High-Performance Clustered Computing
, 2009
"... Abstract. The Hybrid method of parallelization (using MPI for inter-node communication and OpenMP for intra-node communication) seems a natural fit for the way most clusters are built today. It is generally ex-pected to help programs run faster due to factors like availability of greater bandwidth f ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The Hybrid method of parallelization (using MPI for inter-node communication and OpenMP for intra-node communication) seems a natural fit for the way most clusters are built today. It is generally ex-pected to help programs run faster due to factors like availability of greater bandwidth
PERFORMANCE ANALYSIS OF MESSAGE PASSING INTERFACE COLLECTIVE COMMUNICATION ON INTEL XEON QUAD-CORE GIGABIT ETHERNET AND INFINIBAND CLUSTERS
"... The performance of MPI implementation operations still presents critical issues for high performance computing systems, particularly for more advanced processor technology. Consequently, this study concentrates on benchmarking MPI implementation on multi-core architecture by measuring the performanc ..."
Abstract
- Add to MetaCart
the performance of Open MPI collective communication on Intel Xeon dual quad-core Gigabit Ethernet and InfiniBand clusters using SKaMPI. It focuses on well known collective communication routines such as MPI-Bcast, MPI-AlltoAll, MPI-Scatter and MPI-Gather. From the collection of results, MPI collective
HPCC Randomaccess Benchmark For Next Generation Supercomputers
- In Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
, 2009
"... In this paper we examine the key elements determin-ing the performance of the HPC Challenge RandomAccess benchmark on next generation supercomputers. We find that the performance of this benchmark is closely related to the bisection bandwidth of the underlying communication network, performance of i ..."
Using Processor Partitioning to Evaluate the Performance of MPI, OpenMP and Hybrid Parallel Applications on Dual- and Quad-core Cray XT4 Systems
"... Abstract: Chip multiprocessors (CMP) are widely used for high performance computing. While this presents significant new opportunities, such as on-chip high inter-core bandwidth and low inter-core latency, it also presents new challenges in the form of inter-core resource conflict and contention. A ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
to analyze and compare the performance of MPI, OpenMP and hybrid parallel applications on two dual- and quad-core Cray XT4 systems, Jaguar with quad-core at Oak Ridge National Laboratory (ORNL) and Franklin with dual-core at the DOE National Energy Research Scientific Computing Center (NERSC). We conduct
Variable Nodes
"... We present a highly optimized modification of the Sum-Product Algorithm for LDPC decoding for CPUs which achieves the same decoding properties as the original algorithm but offers a throughput on a CPU that is comparable to GPU implementations using hundreds of GPU cores. To achieve this improvement ..."
SANDIA REPORT Performance of an MPI-only Semiconductor Device Simulator on a Quad Socket/Quad Core InfiniBand Platform Performance of an MPI-only Semiconductor Device Simulator on a Quad Socket/Quad Core InfiniBand Platform
"... Abstract This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a capacity cluster with 272 compute nodes based on a homogeneous multicore node architecture utilizing 16 cores. The inter-node communication backbone for this Tri-Lab Li ..."
Abstract
- Add to MetaCart
-Lab Linux Capacity Cluster (TLCC) machine is comprised of an InfiniBand interconnect. The nonuniform memory access (NUMA) nodes consist of 2.2 GHz quad socket/quad core AMD Opteron processors. The performance results for this study are obtained with a FE semiconductor device simulation code (Charon
Communicated by Guest Editors
, 2008
"... In this work we present an initial performance evaluation of Intel's latest, secondgeneration quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architectu ..."
Abstract
- Add to MetaCart
analysis of intra-processor and intra-node scalability of microbenchmarks, and a range of large-scale scientific applications, indicates that quad-core processors can deliver an improvement in performance of up to 4x over a single core depending on the workload being processed. However, scalability can
Designing An Efficient Kernel-level and User-level Hybrid Approach for MPI Intra-node Communication on Multi-core Systems ∗
"... The emergence of multi-core processors has made MPI intra-node communication a critical component in high performance computing. In this paper, we use a three-step methodology to design an efficient MPI intra-node communication scheme from two popular approaches: shared memory and OS kernel-assisted ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
-assisted direct copy. We use an Intel quad-core cluster for our study. We first run microbenchmarks to analyze the advantages and limitations of these two approaches, including the impacts of processor topology, communication buffer reuse, process skew effects, and L2 cache utilization. Based on the results
Application Performance under Different XT Operating Systems,” in Cray User Group, May 2008
- in Proceedings of the USENIX Technical Conference, Winter
, 1993
"... Catamount (XT3/Red Storm’s Light Weight Kernel) to support multiple CPUs per node on XT systems while Cray has developed Compute Node Linux (CNL) which also supports multiple CPUs per node. This paper presents results from several applications run under both operating systems including preliminary r ..."
Results 1 - 10
of
74