Results 1  10
of
116
Simple fast algorithms for the editing distance between trees and related problems
 SIAM J. COMPUT
, 1989
"... Ordered labeled trees are trees in which the lefttoright order among siblings is. significant. The distance between two ordered trees is considered to be the weighted number of edit operations (insert, delete, and modify) to transform one tree to another. The problem of approximate tree matching i ..."
Abstract

Cited by 405 (12 self)
 Add to MetaCart
Ordered labeled trees are trees in which the lefttoright order among siblings is. significant. The distance between two ordered trees is considered to be the weighted number of edit operations (insert, delete, and modify) to transform one tree to another. The problem of approximate tree matching is also considered. Specifically, algorithms are designed to answer the following kinds of questions: 1. What is the distance between two trees? 2. What is the minimum distance between T and T when zero or more subtrees can be removed from T2 3. Let the pruning of a tree at node n mean removing all the descendants of node n. The analogous question for prunings as for subtrees is answered. A dynamic programming algorithm is presented to solve the three questions in sequential time O(I Tll x IT2lxmin (depth ( Tt), leaves ( T)) x min (depth(T2), leaves(T2))) and space O(Ir, x lT21) compared with o(I T,I IT=I x(depth(T)): x (depth(T2))) for the best previous published algorithm due to Tai [J. Assoc. Comput. Mach., 26 (1979), pp. 422433]. Further, the algorithm presented here can be parallelized to give
On the Editing Distance between Undirected Acyclic Graphs
, 1995
"... We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for biochemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weig ..."
Abstract

Cited by 90 (7 self)
 Add to MetaCart
We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for biochemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weighted number of edit operations (insert node, delete node and relabel node) to transform G1 to G2. By reduction from exact cover by 3sets, one can show that finding the distance between two CUAL graphs is NPcomplete. In view of the hardness of the problem, we propose a constrained distance metric, called the degree2 distance, by requiring that any node to be inserted (deleted) have no more than 2 neighbors. With this metric, we present an efficient algorithm to solve the problem. The algorithm runs in time O(N_1 N_2 D²) for general weighting edit operations and in time O(N_1 N_2 D √D log D) for integral weighting edit operations, where N_i, i = 1, 2, is the number of nodes in G_i, D = min{d_1, d_2} and d_i is the maximum degree of G_i.
A General Edit Distance between RNA Structures
, 2001
"... Arcannotated sequences are useful in representing the structural information of RNA sequences. ..."
Abstract

Cited by 90 (0 self)
 Add to MetaCart
Arcannotated sequences are useful in representing the structural information of RNA sequences.
On the Complexity of Comparing Evolutionary Trees
, 1995
"... We study the computational complexity and approximation of several problems arising in the comparison of evolutionary trees. It is shown that the maximum agreement subtree (MAST) problem for three trees with unbounded degree cannot be approximated within ratio 2 log n in polynomial time for any ..."
Abstract

Cited by 77 (10 self)
 Add to MetaCart
We study the computational complexity and approximation of several problems arising in the comparison of evolutionary trees. It is shown that the maximum agreement subtree (MAST) problem for three trees with unbounded degree cannot be approximated within ratio 2 log n in polynomial time for any < 1, unless NP DTIME[2 polylog n ], and MAST with edge contractions for two binary trees is NPhard. This answers two open questions posed in [1]. For the maximum renement subtree (MRST) problem involving two trees, we show that it is polynomialtime solvable when both trees have bounded degree and is NPhard when one of the trees can have an arbitrary degree. Finally, we consider the problem of optimally transforming a tree into another by transferring subtrees around. It is shown that computing the subtreetransfer distance is NPhard and an approximation algorithm with performance ratio 3 is given. Key words: Evolutionary tree, phylogeny, compatibility, recombination, computational c...
New Techniques for BestMatch Retrieval
 ACM Transactions on Information Systems
, 1990
"... A scheme to answer bestmatch queries from a file containing a collection of objects is described. A bestmatch query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331 suggests that one can reduce the number of co ..."
Abstract

Cited by 58 (5 self)
 Add to MetaCart
A scheme to answer bestmatch queries from a file containing a collection of objects is described. A bestmatch query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331 suggests that one can reduce the number of comparisons required to achieve the desired results using the triangle inequality, starting with a data structure for the file that reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. Some empirical results are presented which illustrate the effectiveness of our scheme, and its performance relative to previous algorithms.
An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Ordered, labeled trees are trees in which each node has a label and the lefttoright order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered label ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
Ordered, labeled trees are trees in which each node has a label and the lefttoright order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered labeled tree T to be a connected subgraph of T . Given two ordered labeled trees T1 and T2 and an integer d, the largest approximately common substructure problem is to find a substructure U1 of T1 and a substructure U2 of T2 such that U1 is within edit distance d of U2 and where there does not exist any other substructure V1 of T1 and V2 of T2 such that V1 and V2 satisfy the distance constraint and the sum of the sizes of V1 and V2 is greater than the sum of the sizes of U1 and U2 . We present a dynamic programming algorithm to solve this problem, which runs as fast as the fastest known algorithm for computing the edit distance of two trees when the distance allowed in the common substruc...
SPIDER: Software for Protein identification from Sequence Tags with De Novo Sequencing Error
 J Bioinform Comput Biol
, 2004
"... For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de n ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MSBLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.
On Distances between Phylogenetic Trees
, 1997
"... Different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [18] or from different genes [12] in the study of molecular evolution. Comparing these trees to find their similarities (e.g. agreement or consensus) and dissimila ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
Different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [18] or from different genes [12] in the study of molecular evolution. Comparing these trees to find their similarities (e.g. agreement or consensus) and dissimilarities, i.e. distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance [26, 24, 32, 4, 5, 3, 16, 17, 19, 30, 20, 21, 23] and the subtreetransfer distance [12, 13, 15] are two major distance metrics that have been proposed and extensively studied for different reasons. Despite their many appealing aspects such as simplicity and sensitivity to tree topologies, computing these distances has remained very challenging. This article studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of the subtreetransfer distance, called the linearcost subtreetransfer distance. The ...
The Longest Common Subsequence Problem for ArcAnnotated Sequences
 In Proc. of 11th CPM, number 1848 in LNCS
, 2000
"... . Arcannotated sequences are useful in representing the structural information of RNA and protein sequences. Recently, the longest arcpreserving common subsequence problem has been introduced in [6, 7] as a framework for studying the similarity of arcannotated sequences. In this paper, we con ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
. Arcannotated sequences are useful in representing the structural information of RNA and protein sequences. Recently, the longest arcpreserving common subsequence problem has been introduced in [6, 7] as a framework for studying the similarity of arcannotated sequences. In this paper, we consider arcannotated sequences with various arc structures and present some new algorithmic and complexity results on the longest arcpreserving common subsequence problem. Some of our results answer an open question in [6, 7] and some others improve the hardness results in [6, 7]. Keywords: sequence annotation, longest common subsequence, approximation algorithm, maximum independent set, MAX SNPhard, dynamic programming. 1
Approximate Tree Matching in the Presence of Variable Length Don't Cares
 Journal of Algorithms
, 1993
"... Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variablelength don't cares (VLDC's). In strings, a VLDC symbol in the pattern may substitute for zero or more ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variablelength don't cares (VLDC's). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if "comer" is the pattern, then the "" would substitute for the substring "put" when matching the data string "computer". Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, "comer" matches "counter" within distance 1 (representing the cost of removing the "m" from "comer" and having the "" substitute for "unt"). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(jP j \Theta jDj \Theta min(depth(P ); leaves(P )) \Theta min(de...
Results 1  10
of
116