Results 1 -
4 of
4
Massively Parallel Solutions for Molecular Sequence Analysis
- Proc. 1 st IEEE Int. Workshop on High Performance Computational Biology, Ft
, 2002
"... In this paper we present new approaches to high performance protein database scanning on two novel massively parallel architectures to gain supercomputer power at low cost. The first architecture is built around a Beowulf PCcluster linked by a high-speed network and fine-grained parallel Systola 102 ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
In this paper we present new approaches to high performance protein database scanning on two novel massively parallel architectures to gain supercomputer power at low cost. The first architecture is built around a Beowulf PCcluster linked by a high-speed network and fine-grained parallel Systola 1024 processor boards connected to each node. The second architecture is the Fuzion 150, a new parallel computer with a linear SIMD array of 1536 processing elements on a single chip. We present the design of a database scanning application based on the SmithWaterman algorithm in order to derive efficient mappings onto these architectures. The implementations lead to significant runtime savings for large-scale database scanning. This result shows that both architectures provide highthroughput sequence similarity analysis solutions at a good price/performance ratio.
Hyper customized processors for bio-sequence database scanning on fpgas
- In Proc. of ACM/SIGDA 13th Int’l Symp. on Field-Programmable Gate Arrays
, 2005
"... Protein sequences with unknown functionality are often compared to a set of known sequences to detect functional similarities. Efficient dynamic-programming algorithms exist for solving this problem, however current solutions still require significant scan times. These scan time requirements are lik ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Protein sequences with unknown functionality are often compared to a set of known sequences to detect functional similarities. Efficient dynamic-programming algorithms exist for solving this problem, however current solutions still require significant scan times. These scan time requirements are likely to become even more severe due to exponential database growth. In this paper we present a new approach to bio-sequence database scanning using re-configurable FPGA-based hardware platforms to gain high performance at low cost. Efficient mappings of the Smith-Waterman algorithm using fine-grained parallel processing elements (PEs) that are tailored towards the parameters of a query have been designed. We use customization opportunities available at run-time to dynamically hyper customize the systolic array to make better use of available resource. Our FPGA implementation achieves a speedup of approximately 170 for linear gap penalties and 125 for affine gap penalties as compared to a standard desktop computing platform. We show how hyper-customization at run-time can be used to further improve the performance.
Accelerating the Kernels of BLAST with an Efficient PIM (Processor-In-Memory) Architecture
- In 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference
, 2004
"... Abstract — BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of BLAST are not efficiently supported by general-purpose processors because of the special computational requirements of the kernels. The kernels involve large amounts of computation ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of BLAST are not efficiently supported by general-purpose processors because of the special computational requirements of the kernels. The kernels involve large amounts of computations which contain a high degree of potential parallelism that general-purpose processors can only exploit to a very limited extent. The kernels handle operands that are small (one byte) and not efficiently manipulated by general-purpose processors. The kernels entail only simple operations whereas current general-purpose processors expend significant proportion of their chip area to support complex operations, such as floating-point operations. The kernels perform a large amount of memory accesses, which translates into severe penalties. In this paper, we propose an efficient PIM (Processor-In-Memory) architecture to effectively execute the kernels of BLAST. We propose not only to reduce the memory latencies and increase the memory bandwidth but also to execute the operations inside the memory where the data are located. We also propose to execute the operations in parallel by dividing the memory into small segments and by having each of these segments executes operations concurrently. Our simulation results show that our computing paradigm provides a 242 × performance improvement for the executions of the kernels and a 12 × performance improvement for the overall execution of BLAST.
A performance study of load balancing strategies for approximate string matching on an MPI heterogeneous system environment
- In Proc. of the 9th Euro PVM/MPI 2002 Conference, number 2474 in Lecture Notes on Computer Science
, 2002
"... Abstract. In this paper, we present three parallel approximate string matching methods on a parallel architecture with heterogeneous workstations to gain supercomputer power at low cost. The first method is the static master-worker with uniform distribution strategy, the second one is the dynamic ma ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. In this paper, we present three parallel approximate string matching methods on a parallel architecture with heterogeneous workstations to gain supercomputer power at low cost. The first method is the static master-worker with uniform distribution strategy, the second one is the dynamic master-worker with allocation of subtexts and the third one is the dynamic master-worker with allocation of text pointers. Further, we propose a hybrid parallel method that combines the advantages of static and dynamic parallel methods in order to reduce the load imbalance and communication overhead. This hybrid method is based on the following optimal distribution strategy: the text collection is distributed proportional to workstation’s speed. We evaluated the performance of four methods with clusters 1, 2, 4, 6 and 8 heterogeneous workstations. The experimental results demonstrate that the dynamic allocation of text pointers and hybrid methods achieve better performance than the two original ones. 1

