DMCA
Advance Access publication on September 11, 2009 doi:10.1093/comjnl/bxp086 Accelerating Multiple Sequence Alignment with the Cell BE Processor (2009)
Citations
6487 |
The neighbor-joining method: a new method for reconstructing phylogenetic trees
- Saitou, Nei
- 1987
(Show Context)
Citation Context ...h– Waterman algorithm, is used to determine a good ordering of the sequences for the final progressive alignment phase. Hereto, a phylogenetic tree is constructed using the neighbor-joining algorithm =-=[7]-=-. in which the most closely related sequences are located on the same branch of a guide tree. In the third stage of the Clustal W algorithm, sequences are progressively aligned following the branching... |
5998 |
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research
- Thompson, Higgins, et al.
- 1994
(Show Context)
Citation Context ...achieving high performance is difficult as these features are exposed to the programmer. In this paper, we investigate how to tap high performance from the Cell BE by porting and optimizing Clustal W =-=[2]-=-, a compute-intensive bio-informatics program. As the Cell BE’s performance levers are exposed to the programmer, we spent a lot of effort rewriting and restructuring the program. Apart from overlappi... |
2216 |
Identification of common molecular subsequences
- Smith, Waterman
- 1981
(Show Context)
Citation Context ...umber of mutations that are necessary to align the sequences serves as a metric of divergence of the sequences. Two sequences are aligned using dynamic programming, using the Smith–Waterman algorithm =-=[5]-=-. This technique, however, does not scale to aligning multiple sequences, where finding a global optimum becomes NP-hard [6]. Therefore, a series of pairwise alignments is compared with each other, fo... |
148 |
The design and implementation of a first-generation CELL processor
- Pham
- 2005
(Show Context)
Citation Context ...e number of slim cores. Furthermore, cores can be specialized to specific tasks, in which case the multi-core becomes a heterogeneous multi-core processor. The Cell Broadband Engine (BE) Architecture =-=[1]-=- is such a heterogeneous multi-core architecture targeted at computeintensive workloads. The Cell BE has one superscalar processor (power processing element) and eight single instruction multiple data... |
91 | Synergistic processing in cell’s multicore architecture
- Gschwind, Flachs, et al.
(Show Context)
Citation Context ...erPC instruction set with SIMD Multimedia instructions (VMX). It uses a cache coherent memory hierarchy with an L1 data and instruction cache of 32 KB and a unified L2 cache of 512 KB. The eight SPEs =-=[3, 4]-=- deliver the compute power of the Cell processor. These 128-bit in-order dual-issue vector processors can issue two SIMD instructions per cycle: one compute instruction and one memory instruction. The... |
85 |
Striped Smith-Waterman speeds database searches six times over other SIMD implementations
- Farrar
(Show Context)
Citation Context ...plementations, important speed improvements can be obtained by making algorithmic changes. Liu et al. [20] propose improvements to the Smith–Waterman algorithm that avoid making the traceback. Farrar =-=[21]-=- proposes a different data layout of the profiles that improves the performance of vectorization. This change in data layout, however, percolates through the whole alignment program and may require ch... |
75 | Entering the petaflop era: the architecture and performance of Roadrunner
- Barker, Davis, et al.
- 2008
(Show Context)
Citation Context ...cision floating-point performance of the Cell is a major downside for scientific applications. The upcoming generation of Cell BE are expected to have enhanced double precision removing this drawback =-=[14]-=-. A high-performance FFT is described in [15]. Heman et al. [16] port a relational database to the Cell BE. Only some database operations (such as projection, selection etc.) are executed on the SPUs.... |
44 | BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications
- Bader, Li, et al.
- 2005
(Show Context)
Citation Context ...inct versions of Clustal W, with each one building upon the previous version and adding optimizations to it. The baseline version of Clustal W (version 1.83) is taken from the BioPerf benchmark suite =-=[12]-=-. The programs are run with the B and C inputs from the same benchmark suite as these are proposed to benchmark this version of Clustal W. The B input has 66 sequences with average length of 1082 and ... |
38 | 2001): Computational complexity of multiple sequence alignment with SP-score
- Just
(Show Context)
Citation Context ...s are aligned using dynamic programming, using the Smith–Waterman algorithm [5]. This technique, however, does not scale to aligning multiple sequences, where finding a global optimum becomes NP-hard =-=[6]-=-. Therefore, a series of pairwise alignments is compared with each other, followed by a progressive alignment that adds the sequence most closely related to the already aligned sequences. It should be... |
35 |
On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study on List Ranking
- Bader, Agarwal, et al.
- 2007
(Show Context)
Citation Context ...tion, selection etc.) are executed on the SPUs. The authors point out the importance of avoiding branches and of properly preparing the layout of data structures to enable vectorization. Bader et al. =-=[17]-=- develop a list ranking algorithm for the Cell BE. List ranking is a combinatorial application with highly irregular memory accesses. As memory accesses are hard to predict in this application, it is ... |
30 |
RAxML-cell: Parallel phylogenetic tree inference on the cell broadband engine, in:
- Blagojevic, Nikolopoulos, et al.
- 2007
(Show Context)
Citation Context ...st, the thread blocks and control switches to another thread. This results in a kind of software fine-grain multi-threading and yields speedups up to 8.4 times for this application. Blagojevic et al. =-=[18]-=- port a randomized axelerated maximum likelihood kernel for phylogenetic tree construction to the Cell BE. They use multiple levels of parallelism and implement a scheduler that selects at runtime bet... |
24 |
Streaming algorithms for biological sequence alignment on GPUs.
- Liu, Schmidt, et al.
- 2007
(Show Context)
Citation Context ...tes the implementation of this idea on GPUs. It is well known that, aside from tuning algorithm implementations, important speed improvements can be obtained by making algorithmic changes. Liu et al. =-=[20]-=- propose improvements to the Smith–Waterman algorithm that avoid making the traceback. Farrar [21] proposes a different data layout of the profiles that improves the performance of vectorization. This... |
22 | Exploring the viability of the cell broadband engine for bioinformatics applications
- Sachdeva, Kistler, et al.
- 2007
(Show Context)
Citation Context ...lism and implement a scheduler that selects at runtime between loop-level parallelism and task-level parallelism. Also, the Cell BE has been tested using bio-informatics applications. Sachdeva et al. =-=[19]-=- port the FASTA and Clustal W applications to the Cell BE. For Clustal W, they have only adapted the forward loop in the pairwise alignment phase for the SPU. Their implementation of the pairwise alig... |
17 |
The microarchitecture of the synergistic processor for a Cell processor
- Flachs
- 2006
(Show Context)
Citation Context ...erPC instruction set with SIMD Multimedia instructions (VMX). It uses a cache coherent memory hierarchy with an L1 data and instruction cache of 32 KB and a unified L2 cache of 512 KB. The eight SPEs =-=[3, 4]-=- deliver the compute power of the Cell processor. These 128-bit in-order dual-issue vector processors can issue two SIMD instructions per cycle: one compute instruction and one memory instruction. The... |
16 | Vectorized Data Processing on the Cell Broadband Engine.
- Heman, Nes, et al.
- 2007
(Show Context)
Citation Context ...e for scientific applications. The upcoming generation of Cell BE are expected to have enhanced double precision removing this drawback [14]. A high-performance FFT is described in [15]. Heman et al. =-=[16]-=- port a relational database to the Cell BE. Only some database operations (such as projection, selection etc.) are executed on the SPUs. The authors point out the importance of avoiding branches and o... |
15 | A parallel 64K complex FFT algorithm for the IBM/Sony/Toshiba Cell Broadband Engine processor
- Greene, Pepe, et al.
- 2006
(Show Context)
Citation Context ... is a major downside for scientific applications. The upcoming generation of Cell BE are expected to have enhanced double precision removing this drawback [14]. A high-performance FFT is described in =-=[15]-=-. Heman et al. [16] port a relational database to the Cell BE. Only some database operations (such as projection, selection etc.) are executed on the SPUs. The authors point out the importance of avoi... |
11 | MT-ClustalW: Multithreading Multiple Sequence Alignment. Fifth IEEE Int. Workshop on High Performance Computational The
- Chaichoompu, Kittitornkun, et al.
- 2006
(Show Context)
Citation Context ...our efforts on optimizing and parallelizing the pairwise alignment and progressive alignment stages. Note, however, that the second stage (guide tree construction) is also amenable to parallelization =-=[9, 10]-=-. 3.1. Code structure of pairwise alignment The majority of execution time of pairwise alignment is spent in three functions: forward_pass(), which consists of a forward-iterating loop nest, reverse_p... |
4 |
Parallel genomic alignments on the Cell Broadband Engine
- Sarje, Aluru
- 2009
(Show Context)
Citation Context ...proves the performance of vectorization. This change in data layout, however, percolates through the whole alignment program and may require changes to other data structures too [21]. Sarje and Aluru =-=[22]-=- develop alignment algorithms that are tuned for particular situations, e.g. finding similarities in mRNA sequences. It was the express intent of this work not to make algorithmic changes and to adher... |
1 |
Experiences with Parallelizing a
- Vandierendonck, Rul, et al.
- 2008
(Show Context)
Citation Context ..., we experimentally determined that the guide tree stage requires more than 5% of sequential execution time when aligning a large set (500–1000 sequences) of very short sequences (10–100 amino acids) =-=[8]-=-. For this reason we focus our efforts on optimizing and parallelizing the pairwise alignment and progressive alignment stages. Note, however, that the second stage (guide tree construction) is also a... |
1 |
MSA-CUDA: Multiple SequenceAlignment on Graphics Processing Units with CUDA
- Liu, Schmidt, et al.
- 2009
(Show Context)
Citation Context ...our efforts on optimizing and parallelizing the pairwise alignment and progressive alignment stages. Note, however, that the second stage (guide tree construction) is also amenable to parallelization =-=[9, 10]-=-. 3.1. Code structure of pairwise alignment The majority of execution time of pairwise alignment is spent in three functions: forward_pass(), which consists of a forward-iterating loop nest, reverse_p... |