Results 1 -
4 of
4
Text Retrieval: Theory and Practice
- In 12th IFIP World Computer Congress, volume I
, 1992
"... We present the state of the art of the main component of text retrieval systems: the searching engine. We outline the main lines of research and issues involved. We survey recently published results for text searching and we explore the gap between theoretical vs. practical algorithms. The main obse ..."
Abstract
-
Cited by 43 (14 self)
- Add to MetaCart
We present the state of the art of the main component of text retrieval systems: the searching engine. We outline the main lines of research and issues involved. We survey recently published results for text searching and we explore the gap between theoretical vs. practical algorithms. The main observation is that simpler ideas are better in practice. 1597 Shaks. Lover's Compl. 2 From off a hill whose concaue wombe reworded A plaintfull story from a sistring vale. OED2, reword, sistering 1 1 Introduction Full text retrieval systems are becoming a popular way of providing support for on-line text. Their main advantage is that they avoid the complicated and expensive process of semantic indexing. From the end-user point of view, full text searching of on-line documents is appealing because a valid query is just any word or sentence of the document. However, when the desired answer cannot be obtained with a simple query, the user must perform his/her own semantic processing to guess w...
Parameterized Pattern Matching by Boyer-Moore-type Algorithms
- In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms
, 1995
"... This paper investigates generalizations of the Boyer-Moore string pattern matching algorithm to parameterized pattern matching. Parameterized pattern matching was invented by Baker [Bak93b] for the purpose of finding sections of code in a software system that are the same except for a systematic cha ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
This paper investigates generalizations of the Boyer-Moore string pattern matching algorithm to parameterized pattern matching. Parameterized pattern matching was invented by Baker [Bak93b] for the purpose of finding sections of code in a software system that are the same except for a systematic change of parameters. We show that for Boyer-Moore-type algorithms that do not save information about previously matched portions of text, straightforward generalizations to parameterized pattern-matching must have a running time of \Omega\Gamma nmin(m; p)), where n is the text length, m is the pattern length, and p is the number of parameter symbols in the alphabet. However, we describe a parameterized pattern matching algorithm PturboBM that has the same overall structure as the Boyer-Moore algorithm but saves information about previously matched portions of text and runs in time O(n log min(m; p)), with preprocessing time O(m log min(m; p)), where n is the length of the text, m is the leng...
On Boyer-Moore Automata
, 1994
"... The notion of Boyer-Moore automaton was introduced by Knuth, Morris and Pratt in their historical paper on fast pattern matching. It leads to an algorithm that requires more preprocessing but is more efficient than the original Boyer-Moore's algorithm. We formalize the notion of Boyer-Moore automato ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The notion of Boyer-Moore automaton was introduced by Knuth, Morris and Pratt in their historical paper on fast pattern matching. It leads to an algorithm that requires more preprocessing but is more efficient than the original Boyer-Moore's algorithm. We formalize the notion of Boyer-Moore automaton, and we give an efficient building algorithm. Also, bounds on the number of states are presented, and the concept of potential of a transition is introduced to improve the worst and average case behavior of these machines. We show that looking at the rightmost unknown character, as suggested by Knuth et al., is not necessarily optimal. Keywords: string searching, pattern matching, finite automaton, average case analysis. 1 Introduction String searching is a very important component of many problems, including text editing, data retrieval and letter manipulation. Formally, the string searching or string matching problem consists in finding all occurrences (or the first occurrence) of a p...
Exact Analysis of Horspool’s and Sunday’s Pattern Matching Algorithms with Probabilistic Arithmetic Automata
"... of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several ..."
Abstract
- Add to MetaCart
of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several alternative constructions with different state spaces of the automata, leading to alternative time and space complexities for the computations. To our knowledge, this is the first time that suffix-based pattern matching algorithms are analyzed exactly. We present (perhaps surprising) exemplary results on short patterns and moderate text lengths. Our results easily generalize to any search-window based pattern matching algorithm. Abstract. We define deterministic arithmetic automata (DAAs) and connect them to a framework called probabilistic arithmetic automata (PAAs) [9]. We use DAAs and PAAs to compute the entire exact probability distribution (in contrast to, e.g., asymptotic expectation and variance) of the number X p ℓ 1

