Results 1  10
of
14
Practical Algorithms for TranspositionInvariant StringMatching
"... We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transpositioninvariant text searching using indel distance. These problems have applications in music ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transpositioninvariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bitparallelism. Our branch and bound algorithm computes the longest common transpositioninvariant subsequence (LCTS) in time O((m²+log log sigma) log sigma) in the best case and O((m²+log sigma)sigma) in the worst case, where m and sigma, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bitparallelism and thus obtain a speedup of O(w/ log m) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bitparallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound method is very flexible, it can be further improved by combining it with other efficient algorithms such as our novel bitparallel algorithm. We experiment on several combination possibilities and discuss which are the best settings for each of those combinations. Our algorithms are easily extended to other musically relevant cases, such as deltamatching and polyphony (where there are several parallel texts to be considered). We also show how our bitparallel algorithm is adapted to text searching and illustrate its effectiveness in complex cases where the only known competing method is the use of brute force.
Flexible and efficient bitparallel techniques for transposition invariant approximate matching in music retrieval
 IN PROC. 10TH INT'L SYMP. ON STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE'03
, 2003
"... Recent research in music retrieval has shown that a combinatorial approach to the problem could be fruitful. Three distinguishing requirements of this particular problem are (a) approximate searching permitting missing, extra, and distorted notes, (b) transposition invariance, to allow matching a se ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Recent research in music retrieval has shown that a combinatorial approach to the problem could be fruitful. Three distinguishing requirements of this particular problem are (a) approximate searching permitting missing, extra, and distorted notes, (b) transposition invariance, to allow matching a sequence that appears in a different scale, and (c) handling polyphonic music. These combined requirements make up a complex combinatorial problem that is currently under research. On the other hand, bitparallelism has proved a powerful practical tool for combinatorial pattern matching, both flexible and efficient. In this paper we use bitparallelism to search for several transpositions at the same time, and obtain speedups of O(w = log k) over the classical algorithms, where the computer word has w bits and k is the error threshold allowed in the match. Although not the best solution for the easier approximation measures, we show that our technique can be adapted to complex cases where no competing method exists, and that are the most interesting in terms of music retrieval.
Large scale protein sequence alignment using FPGA reprogrammable logic devices
 In Field Programmable Logic and Application, 14th International Conference, FPL 2004
, 2004
"... Abstract. In this paper we show how to significantly accelerate SmithWaterman protein sequence alignment algorithm using reprogrammable logic devices – FPGAs (Field Programmable Gate Array). Due to perfect sensitivity, the SmithWaterman algorithm is important in a field of computational biology bu ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract. In this paper we show how to significantly accelerate SmithWaterman protein sequence alignment algorithm using reprogrammable logic devices – FPGAs (Field Programmable Gate Array). Due to perfect sensitivity, the SmithWaterman algorithm is important in a field of computational biology but computational complexity makes it impractical for large database searches when running on general purpose computers. Current approach allows for aminoacid sequence alignment with full substitution matrix which leads to more complex formula than used in DNA alignment and is much more memory demanding. We propose different parellization scheme than commonly used systolic arrays, leading to full utilization of PUs (Processing Units), regardless of sequence length. FPGA based implementation of SmithWaterman algorithm can accelerate sequence alignment on a Pentium desktop computer by two orders of magnitude comparing to standard OSEARCH program from FASTA package. 1
BitParallel LCSlength Computation Revisited
 In Proc. 15th Australasian Workshop on Combinatorial Algorithms (AWOCA
, 2004
"... The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the leng ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the lengths of the strings A and B, respectively, and w denote the computer word size. First we give a slightly improved formula for the bitparallel O(#m/w#n) LLCS algorithm of Crochemore et al. [4]. Then we discuss the relative performance of the bitparallel algorithms and compare our variant against one of the best conventional LLCS algorithms. Finally we propose and evaluate an O(#d/w#n) version of the algorithm, where d is the simple (indel) edit distance between A and B.
Speedingup Hirschberg and HuntSzymanski LCS Algorithms
, 2003
"... Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The first algorithm is an improvement of Hirschberg’s divideandconquer algorithm. The second algorithm is an improvement of HuntSzymanski algorithm based on an efficient computation of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The first algorithm is an improvement of Hirschberg’s divideandconquer algorithm. The second algorithm is an improvement of HuntSzymanski algorithm based on an efficient computation of all dominant match points. These two algorithms use bitvector operations and are shown to work very efficiently in practice.
Flexible music retrieval in sublinear time
 IN PROC. 10TH PRAGUE STRINGOLOGY CONFERENCE (PSC'05)
, 2005
"... Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new s ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up to date, several approaches derived from classical string matching have been proposed to cope with the new search problems, yet each problem had its own algorithms. In this paper we show that a technique recently developed for multipattern approximate string matching is flexible enough to be successfully extended to solve many different music retrieval problems, as well as combinations thereof not addressed before. We show that the resulting algorithms are close to optimal and much better than existing approaches in many practical cases.
Effective retrieval of polyphonic audio with polyphonic symbolic queries
 Proceedings of the 9th ACM SIGMM International Workshop on Multimedia Information Retrieval
, 2007
"... Accurately finding audio recordings in response to symbolic queries is one of the key challenges in the field of music information retrieval. Pitch is one of the main features of music; in this paper we propose and evaluate approaches for using pitch information in polyphonic symbolic queries to ret ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Accurately finding audio recordings in response to symbolic queries is one of the key challenges in the field of music information retrieval. Pitch is one of the main features of music; in this paper we propose and evaluate approaches for using pitch information in polyphonic symbolic queries to retrieve full tracks of audio recordings. The audio data is first converted into symbolic data, using an automated transcription process. This is a noisy process, adding up to three times as many notes to the transcription than are actually present. Nevertheless, recordings can be accurately retrieved by manuallyconstructed queries (either in full or truncated) using the longest common subsequence algorithm (and a sliding window if the queries are truncated). Precision at 1 of about 80 % was achieved, and around 85 % of queries return correct answers in the top 10 from a collection of 1808 recordings. Truncated queries are as effective as untruncated queries for retrieving correct answers in the first rank position. Thus, the burden on users is reduced as they only need to produce a small fraction of a song as a query.
Increased BitParallelism for Approximate and Multiple String Matching
"... Bitparallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Bitparallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal bitparallel speedup over the basic O(mn) time algorithm, it wastes bitparallelism’s power in the common case where m is much smaller than w, since w − m bits in the computer words get unused. In this paper we explore different ways to increase the bitparallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so
On Bitparallel Processing of Multibyte Text
 In: Asia Information Retrieval Symposium, AIRS 2004
, 2004
"... There exist practical bitparallel algorithms for several types of pairwise string processing, such as longest common subsequence computation or approximate string matching. The bitparallel algorithms typically use a size# table of match bitvectors, where the bits in the vector for a charact ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
There exist practical bitparallel algorithms for several types of pairwise string processing, such as longest common subsequence computation or approximate string matching. The bitparallel algorithms typically use a size# table of match bitvectors, where the bits in the vector for a character # identify the positions where the character # occurs in one of the processed strings, and # is the alphabet size. The time or space cost of computing the match table is not prohibitive with reasonably small alphabets such as ASCII text. However, for example in the case of general Unicode text the possible numerical code range of the characters is roughly one million. This makes using a simple table impractical. In this paper we evaluate three di#erent schemes for overcoming this problem. First we propose to replace the character code table by a character code automaton. Then we compare this method with two other schemes: using a hash table, and the binarysearch based solution proposed by Wu, Manber and Myers [25]. We find that the best choice is to use either the automatonbased method or a hash table.
String comparison by transposition networks
, 903
"... Abstract. Computing string or sequence alignments is a classical method of comparing strings and has applications in many areas of computing, such as signal processing and bioinformatics. Semilocal string alignment is a recent generalisation of this method, in which the alignment of a given string ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Computing string or sequence alignments is a classical method of comparing strings and has applications in many areas of computing, such as signal processing and bioinformatics. Semilocal string alignment is a recent generalisation of this method, in which the alignment of a given string and all substrings of another string are computed simultaneously at no additional asymptotic cost. In this paper, we show that there is a close connection between semilocal string alignment and a certain class of traditional comparison networks known as transposition networks. The transposition network approach can be used to represent different string comparison algorithms in a unified form, and in some cases provides generalisations or improvements on existing algorithms. This approach allows us to obtain new algorithms for sparse semilocal string comparison and for comparison of highly similar and highly dissimilar strings, as well as of runlength compressed strings. We conclude that the transposition network method is a very general and flexible way of understanding and improving different string comparison algorithms, as well as their efficient implementation. 1