Results 1 - 10
of
351
A Guided Tour to Approximate String Matching
- ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract
-
Cited by 306 (38 self)
- Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
Efficient randomized pattern-matching algorithms
, 1987
"... We present randomized algorithms to solve the
following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first occurrence of X as a consecutive block within Y. The algorithms represent strings of length n by much ..."
Abstract
-
Cited by 257 (0 self)
- Add to MetaCart
We present randomized algorithms to solve the
following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first occurrence of X as a consecutive block within Y. The algorithms represent strings of length n by much shorter strings called fingerprints, and achieve their efficiency by manipulating fingerprints instead of longer strings. The algorithms require a constant number of storage locations, and essentially run in real time. They are conceptually simple and easy to implement. The method readily generalizes to higher-dimensional pattern-matching problems.
Tutorial Notes on Partial Evaluation
- Proceedings of the Twentieth Annual ACM Symposium on Principles of Programming Languages
, 1993
"... The last years have witnessed a flurry of new results in the area of partial evaluation. These tutorial notes survey the field and present a critical assessment of the state of the art. 1 Introduction Partial evaluation is a source-to-source program transformation technique for specializing program ..."
Abstract
-
Cited by 230 (60 self)
- Add to MetaCart
The last years have witnessed a flurry of new results in the area of partial evaluation. These tutorial notes survey the field and present a critical assessment of the state of the art. 1 Introduction Partial evaluation is a source-to-source program transformation technique for specializing programs with respect to parts of their input. In essence, partial evaluation removes layers of interpretation. In the most general sense, an interpreter can be defined as a program whose control flow is determined by its input data. As Abelson points out, [43, Foreword], even programs that are not themselves interpreters have important interpreter-like pieces. These pieces contain both compile-time and run-time constructs. Partial evaluation identifies and eliminates the compile-time constructs. 1.1 A complete example We consider a function producing formatted text. Such functions exist in most programming languages (e.g., format in Lisp and printf in C). Figure 1 displays a formatting functio...
A New Approach to Text Searching
"... We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of ..."
Abstract
-
Cited by 200 (14 self)
- Add to MetaCart
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they don't need to buffer the input, they are real time algorithms (for constant size patterns), and they are suitable to be implemented in hardware. 1 Introduction String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation. Recent surveys of string searching can be found in [17, 4]. The string matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing "don't care" symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. Fo...
TIMBER: A Native XML Database
- The VLDB Journal
, 2002
"... This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in ..."
Abstract
-
Cited by 113 (10 self)
- Add to MetaCart
This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.
Integrating decision procedures into heuristic theorem provers: A case study of linear arithmetic
- Machine Intelligence
, 1988
"... We discuss the problem of incorporating into a heuristic theorem prover a decision procedure for a fragment of the logic. An obvious goal when incorporating such a procedure is to reduce the search space explored by the heuristic component of the system, as would be achieved by eliminating from the ..."
Abstract
-
Cited by 105 (9 self)
- Add to MetaCart
We discuss the problem of incorporating into a heuristic theorem prover a decision procedure for a fragment of the logic. An obvious goal when incorporating such a procedure is to reduce the search space explored by the heuristic component of the system, as would be achieved by eliminating from the system’s data base some explicitly stated axioms. For example, if a decision procedure for linear inequalities is added, one would hope to eliminate the explicit consideration of the transitivity axioms. However, the decision procedure must then be used in all the ways the eliminated axioms might have been. The difficulty of achieving this degree of integration is more dependent upon the complexity of the heuristic component than upon that of the decision procedure. The view of the decision procedure as a "black box " is frequently destroyed by the need pass large amounts of search strategic information back and forth between the two components. Finally, the efficiency of the decision procedure may be virtually irrelevant; the efficiency of the final system may depend most heavily on how easy it is to communicate between the two components. This paper is a case study of how we integrated a linear arithmetic procedure into a heuristic theorem prover. By linear arithmetic here we mean the decidable subset of number theory dealing with universally quantified formulas composed of the logical connectives, the identity relation, the Peano "less than " relation, the Peano addition and subtraction functions, Peano constants,
Practical fast searching in strings
- Software Practice and Experience
, 1980
"... The problem of searching through text to find a specified substring is considered in a practical setting. It is discovered that a method developed by Boyer and Moore can outperform even special-purpose search instructions that may be built into the, computer hardware. For very short substrings howev ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
The problem of searching through text to find a specified substring is considered in a practical setting. It is discovered that a method developed by Boyer and Moore can outperform even special-purpose search instructions that may be built into the, computer hardware. For very short substrings however, these special purpose instructions are fastest-provided that they are used in an optimal way. KEY WORDS String searching Pattern matching Text editing Bibliographic search
A Framework for Source Code Search using Program Patterns
- IEEE Transactions on Software Engineering
, 1994
"... For maintainers involved in understanding and reengineering large software, locating source code fragments that match certain patterns is a critical task. Existing solutions to the problem are few, and they either involve manual, painstaking scans of the source code using tools based on regular expr ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
For maintainers involved in understanding and reengineering large software, locating source code fragments that match certain patterns is a critical task. Existing solutions to the problem are few, and they either involve manual, painstaking scans of the source code using tools based on regular expressions, or the use of large, integrated software engineering environments that include simple patternbased query processors in their toolkits. We present a framework in which pattern languages are used to specify interesting code features. The pattern languages are derived by extending the source programming language with pattern-matching symbols. We describe SCRUPLE, a finite state machine-based source code search tool, that efficiently implements this framework. We also present experimental performance results obtained from a SCRUPLE prototype, and the user interface of a source code browser built on top of SCRUPLE. Keywords: Reverse engineering, software maintenance, software reengineer...
A fast algorithm for multi-pattern searching
, 1994
"... A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multi-pattern matching problem are discussed. We argue that, in addition ..."
Abstract
-
Cited by 92 (2 self)
- Add to MetaCart
A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multi-pattern matching problem are discussed. We argue that, in addition to previous applications that required such search, multi-pattern matching can be used in lieu of indexed or sorted data in some applications involving small to medium size datasets. Its advantage, of course, is that no additional search structure is needed.
Speeding Up Two String-Matching Algorithms
- ALGORITHMICA
, 1994
"... We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern.The main feature of both algorithms is that they scan ..."
Abstract
-
Cited by 83 (17 self)
- Add to MetaCart
We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern.The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2.n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.

