Results 1 
2 of
2
Increased BitParallelism for Approximate and Multiple String Matching
"... Bitparallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Bitparallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal bitparallel speedup over the basic O(mn) time algorithm, it wastes bitparallelism’s power in the common case where m is much smaller than w, since w − m bits in the computer words get unused. In this paper we explore different ways to increase the bitparallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so
On Bitparallel Processing of Multibyte Text
 In: Asia Information Retrieval Symposium, AIRS 2004
, 2004
"... There exist practical bitparallel algorithms for several types of pairwise string processing, such as longest common subsequence computation or approximate string matching. The bitparallel algorithms typically use a size# table of match bitvectors, where the bits in the vector for a charact ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
There exist practical bitparallel algorithms for several types of pairwise string processing, such as longest common subsequence computation or approximate string matching. The bitparallel algorithms typically use a size# table of match bitvectors, where the bits in the vector for a character # identify the positions where the character # occurs in one of the processed strings, and # is the alphabet size. The time or space cost of computing the match table is not prohibitive with reasonably small alphabets such as ASCII text. However, for example in the case of general Unicode text the possible numerical code range of the characters is roughly one million. This makes using a simple table impractical. In this paper we evaluate three di#erent schemes for overcoming this problem. First we propose to replace the character code table by a character code automaton. Then we compare this method with two other schemes: using a hash table, and the binarysearch based solution proposed by Wu, Manber and Myers [25]. We find that the best choice is to use either the automatonbased method or a hash table.