Results 1  10
of
46
A Guided Tour to Approximate String Matching
 ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 404 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
Algorithms for the longest common subsequence problem
 J. ACM
, 1977
"... AaS~ACT Two algorithms are presented that solve the longest common subsequence problem The first algorithm is applicable in the general case and requires O(pn + n log n) time where p is the length of the longest common subsequence The second algorithm requires time bounded by O(p(m + 1 p)log n) In ..."
Abstract

Cited by 176 (2 self)
 Add to MetaCart
AaS~ACT Two algorithms are presented that solve the longest common subsequence problem The first algorithm is applicable in the general case and requires O(pn + n log n) time where p is the length of the longest common subsequence The second algorithm requires time bounded by O(p(m + 1 p)log n) In the common speoal case where p is close to m, this algorithm takes much less time than n ~ KEY WORDS AND PHRASES ' subsequence, common subsequence, algorithm CR CATEOORIES 3 73, 3 79, 5 25, 5 39
An algorithm for differential file comparison. Computer Science
, 1975
"... The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other. Diff has been engineered to make efficient use of time and space on typical inputs that arise in vetting versiontoversion changes in computermain ..."
Abstract

Cited by 107 (3 self)
 Add to MetaCart
The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other. Diff has been engineered to make efficient use of time and space on typical inputs that arise in vetting versiontoversion changes in computermaintained or computergenerated documents. Time and space usage are observed to vary about as the sum of the file lengths on real data, although they are known to vary as the product of the file lengths in the worst case. The central algorithm of diff solves the ‘longest common subsequence problem ’ to find the lines that do not change between files. Practical efficiency isgained by attending only to certain critical ‘candidate ’ matches between the files, the breaking of which would shorten the longest subsequence common to some pair of initial segments of the two files. Various techniques of hashing, presorting into equivalence classes, merging by binary search, and dynamic storage allocation are used to obtain good performance. [This document was scanned from Bell Laboratories Computing Science Technical Report #41, dated July 1976. Te xt was converted by OCR and handcorrected (last
Species Adaption Genetic Algorithms: A Basis for a Continuing SAGA
, 1992
"... For Artificial Life applications it is useful to extend Genetic Algorithms from a finite search space with fixedlength genotypes to openended evolution with variablelength genotypes. A new theoretical analysis is required, as Holland's Schema Theorem only applies to fixed lengths. It will be argu ..."
Abstract

Cited by 106 (27 self)
 Add to MetaCart
For Artificial Life applications it is useful to extend Genetic Algorithms from a finite search space with fixedlength genotypes to openended evolution with variablelength genotypes. A new theoretical analysis is required, as Holland's Schema Theorem only applies to fixed lengths. It will be argued, using concepts of epistasis and fitness landscapes drawn from theoretical biology, that in the long run a population must havegenotypes of nearly equal length, and this length can only increase slowly. As the length increases, the population will be nearly converged, and hence evolving as a species.
Identifying the Semantic and Textual Differences Between Two Versions of a Program
 Proceedings of the ACM SIGPLAN 90 Conference on Programming Language Design and Implementation
, 1990
"... Textbased file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describe ..."
Abstract

Cited by 97 (6 self)
 Add to MetaCart
Textbased file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describes a technique for comparing two versions of a program, determining which program components represent changes, and classifying each changed component as representing either a semantic or a textual change. ######################## This work was supported in part by the Defense Advanced Research Projects Agency, monitored by the Office of Naval Research under contract N0001488K, by the National Science Foundation under grant CCR8958530, and by grants from Xerox, Kodak, and Cray. Author's address: Computer Sciences Department, Univ. of Wisconsin, 1210 W. Dayton St., Madison, WI 53706. Permission to copy without fee all or part of this material is granted provided that the copies are not made...
Identifying Syntactic Differences Between Two Programs
 Software  Practice and Experience
, 1991
"... this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the treematching algorithm and the synchronous prettyprinting technique are described. Experience with the comparator for the C languag ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the treematching algorithm and the synchronous prettyprinting technique are described. Experience with the comparator for the C language and some performance measurements are also presented. The last section discusses related work and concludes this paper
Bounds on the complexity of the longest common subsequence problem
 Journal of the ACM
, 1976
"... ABSTRACT The problem of finding a longest common subsequence of two strings is discussed This problem arises in data processing applications such as comparing two files and in genetic applications such as studying molecular evolution The ddlqculty of computing a longest common subsequence of two str ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
ABSTRACT The problem of finding a longest common subsequence of two strings is discussed This problem arises in data processing applications such as comparing two files and in genetic applications such as studying molecular evolution The ddlqculty of computing a longest common subsequence of two strings IS examined using the decision tree model of computation, m which vertices represent "equalunequal " comparisons It IS shown that unless a bound on the total number of 0istmct symbols is assumed, every solution to the problem can consume an amount of time that is proportional to the product of the lengths of the two strings A general lower bound as a function of the ratio of alphabet size to string length is derived The case where comparisons between symbols of the same string are forbidden is also considered and it is shown that this problem is of linear complexity for a twosymbol alphabet and quadratic for an alphabet of three or more symbols KEY WORDS AND PHR~tSES longest common subsequence, algorithm, computational complexity, file comparison, molecular evolution CR CATEGORIES 3 12, 3 73, 5 25 1.
Algorithms and Complexity for Annotated Sequence Analysis
, 1999
"... Molecular biologists use algorithms that compare and otherwise analyze sequences that represent genetic and protein molecules. Most of these algorithms, however, operate on the basic sequence and do not incorporate the additional information that is often known about the molecule and its pieces. Thi ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
Molecular biologists use algorithms that compare and otherwise analyze sequences that represent genetic and protein molecules. Most of these algorithms, however, operate on the basic sequence and do not incorporate the additional information that is often known about the molecule and its pieces. This research describes schemes to combinatorially annotate this information onto sequences so that it can be analyzed in tandem with the sequence; the overall result would thus reflect both types of information about the sequence. These annotation schemes include adding colours and arcs to the sequence. Colouring a sequence would produce a samelength sequence of colours or other symbols that highlight or label parts of the sequence. Arcs can be used to link sequence symbols (or coloured substrings) to indicate molecular bonds or other relationships. Adding these annotations to sequence analysis problems such as sequence alignment or finding the longest common subsequence can make the problem more complex, often depending on the complexity of the annotation scheme. This research examines the different annotation schemes and the corresponding problems of verifying annotations, creating annotations, and finding the longest common subsequence of pairs of sequences with annotations. This work involves both the conventional complexity framework and parameterized complexity, and includes algorithms and hardness results for both frameworks. Automata and transducers are created for some annotation verification and creation problems. Different restrictions on layered substring and arc annotation are considered to de iii termine what properties an annotation scheme must have to make its incorporation feasible. Extensions to the algorithms that use weighting schemes are explored. Examin...
A flexible motif search technique based on generalized profiles
 COMPUTERS AND CHEMISTRY
, 1996
"... ... generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functio ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
... generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functions of a variety of motif descriptors implemented in other methods, including regular expressionlike patterns, weight matrices, previously used profiles, and certain types of hidden Markov models (HMMs). The relationship between generalized profiles and other biomolecular motif descriptors is analyzed in detail, with special attention to HMMs. Generalized profiles are shown to be equivalent to a particular class of HMMs, and conversion procedures in both directions are given. The conversion procedures provide an interpretation for local alignment in the framework of stochastic models, allowing for clear, simple significance tests. A mathematical statement of the motif search problem defines the new method exactly without linking it to a specific algorithmic solution. Part of the definition includes a new definition of disjointness of alignments.
Secure and Private Sequence Comparisons
 In WPES’03: Proceedings of the 2003 ACM workshop on Privacy in the electronic society
, 2003
"... We give an e#cient protocol for sequence comparisons of the editdistance kind, such that neither party reveals anything about their private sequence to the other party (other than what can be inferred from the edit distance between their two sequences  which is unavoidable because computing that ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
We give an e#cient protocol for sequence comparisons of the editdistance kind, such that neither party reveals anything about their private sequence to the other party (other than what can be inferred from the edit distance between their two sequences  which is unavoidable because computing that distance is the purpose of the protocol). The amount of communication done by our protocol is proportional to the time complexity of the bestknown algorithm for performing the sequence comparison.