Results 1 
9 of
9
Bayesian graph edit distance
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2000
"... AbstractÐThis paper describes a novel framework for comparing and matching corrupted relational graphs. The paper develops the idea of editdistance originally introduced for graphmatching by Sanfeliu and Fu [1]. We show how the Levenshtein distance can be used to model the probability distribution ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
AbstractÐThis paper describes a novel framework for comparing and matching corrupted relational graphs. The paper develops the idea of editdistance originally introduced for graphmatching by Sanfeliu and Fu [1]. We show how the Levenshtein distance can be used to model the probability distribution for structural errors in the graphmatching problem. This probability distribution is used to locate matches using MAP label updates. We compare the resulting graphmatching algorithm with that recently reported by Wilson and Hancock. The use of editdistance offers an elegant alternative to the exhaustive compilation of label dictionaries. Moreover, the method is polynomial rather than exponential in its worstcase complexity. We support our approach with an experimental study on synthetic data and illustrate its effectiveness on an uncalibrated stereo correspondence problem. This demonstrates experimentally that the gain in efficiency is not at the expense of quality of match.
Edit distance from graph spectra
 In Proc. 9th IEEE Int. Conf. Comp. Vis
, 2003
"... This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that it lacks the formality and rigour of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that it lacks the formality and rigour of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that standard string edit distance techniques can be used. To do this we use graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We pose the problem of graphmatching as maximum a posteriori probability alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression for the edit costs. We compute the edit distance by finding the sequence of string edit operations which minimise the cost of the path traversing the edit lattice. The edit costs are defined in terms of the a posteriori probability of visiting a site on the lattice. We demonstrate the method with results on a dataset of Delaunay graphs. 1.
Efficient algorithms for normalized edit distance
 Journal of Discrete Algorithms
, 2000
"... ABSTRACT: A common model for computing the similarity of two stringsXandYof lengthsm andnrespectively, withmn, is to transformXintoYthrough a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, and substitution. A given cost function ass ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
ABSTRACT: A common model for computing the similarity of two stringsXandYof lengthsm andnrespectively, withmn, is to transformXintoYthrough a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, and substitution. A given cost function assigns a weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds requireO(mn2)time in the worstcase. We give provably better algorithms: anO(mnlogn)time algorithm when the cost function is uniform, i.e, the weights of edit operations depend only on the type but not on the individual symbols involved, and anO(mnlogm)time algorithm when the weights are rational.
An Efficient UniformCost Normalized Edit Distance Algorithm
 6th Symp. on String Processing and Info. Retrieval
, 1999
"... A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a nonnegative re ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a nonnegative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn
A Formal Theory for Optimal and Information Theoretic Syntactic Pattern Recognition
"... In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M * , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A * the set of words over A, we specify a stochastically consistent scheme by which a string U A * can be transformed into any Y A * by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[YU], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. One of the salient features of this scheme is that it demonstrates how dynamic programming can be applied to evaluate quantities involv...
Learning Significant Alignments: An Alternative to Normalized Local Alignment
"... We describe a supervised learning approach to resolve difficulties in nding biologically significant local alignments. It was noticed that the O(n²) algorithm by SmithWaterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe a supervised learning approach to resolve difficulties in nding biologically significant local alignments. It was noticed that the O(n²) algorithm by SmithWaterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring shorter, biologically significant ones. Arslan et. al. proposed an O(n²log n) algorithm which outputs a normalized local alignment that maximizes the degree of similarity rather than the total similarity score. Given a properly selected normalization parameter, the algorithm can discover significant alignments that would be missed by the SmithWaterman algorithm. Unfortunately, determining a proper normalization parameter requires repeated executions with different parameter values and expert feedback to determine the usefulness of the alignments. We propose a learning approach that uses existing biologically significant alignments to learn parameters for intelligently processing suboptimal SmithWaterman alignments. Our algorithm runs in O(n²) time and can discover biologically significant alignments without requiring expert feedback to produce meaningful results.
Efﬁcient Algorithms For Normalized Edit Distance
, 2000
"... ABSTRACT: A common model for computing the similarity of two strings X and Y of lengths m and n respectively, with m n, is to transform X into Y through a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, and substitution. A given cost ..."
Abstract
 Add to MetaCart
ABSTRACT: A common model for computing the similarity of two strings X and Y of lengths m and n respectively, with m n, is to transform X into Y through a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, and substitution. A given cost function assigns a weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn 2) time in the worstcase. We give provably better algorithms: an O(mn log n)time algorithm when the cost function is uniform, i.e, the weights of edit operations depend only on the type but not on the individual symbols involved, and an O(mn log m)time algorithm when the weights are rational.
Bioinformatics
, 2003
"... Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We e ..."
Abstract
 Add to MetaCart
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.
2010 International Conference on Pattern Recognition Normalized SumoverPaths Edit Distances
"... Abstract—In this paper, normalized SoP stringedit distances, taking into account all possible alignments between two sequences, are investigated. These normalized distances are variants of the SumoverPaths (SoP) distances which compute the expected cost on all sequence alignments by favoring lowc ..."
Abstract
 Add to MetaCart
Abstract—In this paper, normalized SoP stringedit distances, taking into account all possible alignments between two sequences, are investigated. These normalized distances are variants of the SumoverPaths (SoP) distances which compute the expected cost on all sequence alignments by favoring lowcost ones – therefore favoring good alignment. Such distances consider two sequences tied by many optimal or nearlyoptimal alignments as more similar than two sequences sharing only one, optimal, alignment. They depend on a parameter, θ, and reduce to the standard distances – the editdistance or the longest common subsequence – when θ → 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR data sets show that (i) the applied normalization generally improves the existing results, and (ii) as for the SoP editdistances, the normalized SoP editdistances clearly outperform the nonrandomized measures, i.e. the standard edit distance and longest common subsequence. Keywordsedit distance; longest common subsequence; randomizedshortest paths; normalization. I.