Results 1  10
of
12
An algorithm for differential file comparison. Computer Science
, 1975
"... The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other. Diff has been engineered to make efficient use of time and space on typical inputs that arise in vetting versiontoversion changes in computermain ..."
Abstract

Cited by 107 (3 self)
 Add to MetaCart
The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other. Diff has been engineered to make efficient use of time and space on typical inputs that arise in vetting versiontoversion changes in computermaintained or computergenerated documents. Time and space usage are observed to vary about as the sum of the file lengths on real data, although they are known to vary as the product of the file lengths in the worst case. The central algorithm of diff solves the ‘longest common subsequence problem ’ to find the lines that do not change between files. Practical efficiency isgained by attending only to certain critical ‘candidate ’ matches between the files, the breaking of which would shorten the longest subsequence common to some pair of initial segments of the two files. Various techniques of hashing, presorting into equivalence classes, merging by binary search, and dynamic storage allocation are used to obtain good performance. [This document was scanned from Bell Laboratories Computing Science Technical Report #41, dated July 1976. Te xt was converted by OCR and handcorrected (last
Speeding up Dynamic Programming
 In Proc. 29th Symp. Foundations of Computer Science
, 1988
"... this paper we consider the problem of computing two similar recurrences: the onedimensional case ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
this paper we consider the problem of computing two similar recurrences: the onedimensional case
The UCSC Kestrel Parallel Processor
 IEEE Transactions on Parallel and Distributed Systems
, 2005
"... Abstract—The architectural landscape of highperformance computing stretches from superscalar uniprocessor to explicitly parallel systems to dedicated hardware implementations of algorithms. Singlepurpose hardware can achieve the highest performance and uniprocessors can be the most programmable. B ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract—The architectural landscape of highperformance computing stretches from superscalar uniprocessor to explicitly parallel systems to dedicated hardware implementations of algorithms. Singlepurpose hardware can achieve the highest performance and uniprocessors can be the most programmable. Between these extremes, programmable and reconfigurable architectures provide a wide range of choice in flexibility, programmability, computational density, and performance. The UCSC Kestrel parallel processor strives to attain singlepurpose performance while maintaining user programmability. Kestrel is a singleinstruction stream, multipledata stream (SIMD) parallel processor with a 512element linear array of 8bit processing elements. The system design focuses on efficient highthroughput DNA and protein sequence analysis, but its programmability enables high performance on computational chemistry, image processing, machine learning, and other applications. The Kestrel system has had unexpected longevity in its utility due to a careful design and analysis process. Experience with the system leads to the conclusion that programmable SIMD architectures can excel in both programmability and performance. This paper presents the architecture, implementation, applications, and observations of the Kestrel project at the University of California at Santa Cruz.
New Algorithms for the Longest Common Subsequence Problem
, 1994
"... Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequenc ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequence of maximal length is called the Longest CommonSubsequence (LCS) Problem. Two new algorithms based on the wellknown paradigm of computing minimal matches are presented. One runs in time O(ns+minfds; pmg) and the other runs in time O(ns +minfp(n \Gamma p); pmg) where s = j\Sigmaj is the alphabet size, p is the length of a longest common subsequence and d is the number of minimal matches. The ns term is charged by a standard preprocessing phase. When m n both algorithms are fast in situations when a LCS is expected to be short as well as in situations when a LCS is expected to be long. Further they show a much smaller degeneration in intermediate situations, especially the second al...
Efficient Algorithms for Sequence Analysis with Concave and Convex Gap Costs
, 1989
"... EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops only. We make the assumption that the gap cost or loop cost is a convex or concave function of the length of the gap or loop, and show how this assumption may be used to develop e#cient algorithms for these problems. We show how the restriction to convex or concave functions may be relaxed, and give algorithms for solving the problems when the cost functions are neither convex nor concave, but can be split into a small number of convex or concave functions. Finally we point out some sparsity in the structure of our sequence analysis problems, and describe how we may take advantage of that sparsity to further speed up our algorithms. CONTENTS 1. Introduction ............................1 ...
A fast parallel algorithm for finding the longest common sequence of multiple biosequences
 BMC BIOINFORMATICS 2006, 7(SUPPL 4):S4
, 2006
"... Background. Biological sequences can be represented as a sequence of symbols. For instance, a protein is a sequence of 20 different letters (amino acids), and DNA sequences (genes) can be represented as sequences of four letters A,C,G and T, corresponding to the four submolecules forming DNA. When ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Background. Biological sequences can be represented as a sequence of symbols. For instance, a protein is a sequence of 20 different letters (amino acids), and DNA sequences (genes) can be represented as sequences of four letters A,C,G and T, corresponding to the four submolecules forming DNA. When a new biosequence is found, we want to know which other sequences it is most similar to. Sequence comparison has been used successfully to establish the link between cancercausing genes and a gene evolved in normal growth and development. One way of detecting the similarity of two or more sequences is to find their longest common sequence (LCS). Searching for the LCS of biosequences is one of the most important tasks in bioinformatics. Here, on the premise of guaranteeing precision of the results of LCS, we present a parallel longest common subsequence algorithm named FAST_LCS based on a set of novel pruning techniques to improve the speed of finding LCS.
Generalized LCS
"... The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we study the L ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we study the Longest Common Substructure of two matrices and show that this problem is N Phard. We also study the Longest Common Subforest problem for multiple trees including a constrained version, as well. We show N Phardness for k> 2 unordered trees in the constrained LCS. We also give polynomial time algorithms for ordered trees and prove a lower bound for any decomposition strategy for k trees.
Fast Approximation to the NPhard Problem of Multiple Sequence Alignment
, 1996
"... The study and comparison of several sequences of characters from a finite alphabet is relevant to various areas of science, in particular molecular biology. It has been shown that multiple sequence alignment with the sumofpairs score is NPhard. Recently a fast heurstic method was proposed based o ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The study and comparison of several sequences of characters from a finite alphabet is relevant to various areas of science, in particular molecular biology. It has been shown that multiple sequence alignment with the sumofpairs score is NPhard. Recently a fast heurstic method was proposed based on a DivideandConquer technique. Recursively, all sequences were cut at some suitable positions. Eventually, the sets of subsequences were aligned optimally. In general, the (time) complexity of searching for good cutting points is O(l n ) (n the number and l the maximal length of the sequences involved). By a simple (n \Delta l)time technique, the base l was reduced, leading to a reasonable fast alignment algorithm for up to n = 7 and l 500. We refine the basereducing technique by spending computational time quadratic in n (and still linear in l). This improves the alignment procedure regarding the number of sequences managable up to n = 9 (of same length l). Moreover, we present two...
Efficient Algorithms for Sequence Analysis
 Proc. Second Workshop on Sequences: Combinatorics, Compression. Securiry
, 1991
"... : We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods f ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
: We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods for sequence analysis. 1. INTRODUCTION In this paper we consider algorithms for two problems in sequence analysis. The first problem is sequence alignment, and the second is the prediction of RNA structure. Although the two problems seem quite di#erent from each other, their solutions share a common structure, which can be expressed as a system of dynamic programming recurrence equations. These equations also can be applied to other problems, including text formatting and data storage optimization. We use a number of well motivated assumptions about the problems in order to provide e#cient algorithms. The primary assumption is that of concavity or convexity. The recurrence relations for bo...
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 3, MARCH 1998 283 Parallel Computation
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... A massive volume of biological sequence data is available in over 36 different databases worldwide, including the sequence data generated by the Human Genome project. These databases, which also contain biological and bibliographical information, are growing at an exponential rate. Consequently, t ..."
Abstract
 Add to MetaCart
A massive volume of biological sequence data is available in over 36 different databases worldwide, including the sequence data generated by the Human Genome project. These databases, which also contain biological and bibliographical information, are growing at an exponential rate. Consequently, the computational demands needed to explore and analyze the data contained in these databases is quickly becoming a great concern. To meet these demands, we must use high performance computing systems, such as parallel computers and distributed networks of workstations. We present two parallel computational methods for analyzing these biological sequences. The first method is used to retrieve sequences that are homologous to a query sequence. The biological information associated with the homologous sequences found in the database may provide important clues to the structure and function of the query sequence. The second method, which helps in the prediction of the function, structure, and evolutionary history of biological sequences, is used to align a number of homologous sequences with each other.