Results 1  10
of
19
Text Retrieval: Theory and Practice
 In 12th IFIP World Computer Congress, volume I
, 1992
"... We present the state of the art of the main component of text retrieval systems: the searching engine. We outline the main lines of research and issues involved. We survey recently published results for text searching and we explore the gap between theoretical vs. practical algorithms. The main obse ..."
Abstract

Cited by 51 (14 self)
 Add to MetaCart
We present the state of the art of the main component of text retrieval systems: the searching engine. We outline the main lines of research and issues involved. We survey recently published results for text searching and we explore the gap between theoretical vs. practical algorithms. The main observation is that simpler ideas are better in practice. 1597 Shaks. Lover's Compl. 2 From off a hill whose concaue wombe reworded A plaintfull story from a sistring vale. OED2, reword, sistering 1 1 Introduction Full text retrieval systems are becoming a popular way of providing support for online text. Their main advantage is that they avoid the complicated and expensive process of semantic indexing. From the enduser point of view, full text searching of online documents is appealing because a valid query is just any word or sentence of the document. However, when the desired answer cannot be obtained with a simple query, the user must perform his/her own semantic processing to guess w...
Computing similarity between rna strings
, 1996
"... Ribonucleic acid (RNA) strings are strings over the fourletter alphabet {A, C, G, U} with a secondary structure of basepairing between A U and C G pairs in the string 1. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed t ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
Ribonucleic acid (RNA) strings are strings over the fourletter alphabet {A, C, G, U} with a secondary structure of basepairing between A U and C G pairs in the string 1. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing basepairing naturally leads to a treelike representation of the secondary structure of RNA strings. In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary basepalring structure of the strings. We present efficient algorithms for exact matching and approximate matching between two RNA strings. We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming. We then present a method for optimally aligning a given RNA string with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known. The techniques employed to prove our results include reductions to wellknown string matching problems allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings.
Efficient 2dimensional Approximate Matching of Halfrectangular Figures
, 1993
"... Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m \Theta m pattern in an n \Theta n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to general ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m \Theta m pattern in an n \Theta n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to generalize this problem to nonrectangular figures. We make progress towards this goal by defining halfrectangular figures of height m and area a. The approximate two dimensional matching problem for halfrectangular patterns can be solved using a dynamic programming approach in time O(an 2 ). We show an O(kn 2 p m log m p k log k + k 2 n 2 ) algorithm which combines convolutions with dynamic programming. Note that our algorithm is superior to previous known solutions for k m 1=3 . At the heart of the algorithm are the Smaller Matching Problem and the kAligned Ones with Location Problem. These are interesting problems in their own right. Efficient algorithms to solve both t...
Two Dimensional Dictionary Matching
 Information Processing Letters
, 1992
"... Most traditional pattern matching algorithms solve the problem of finding all occurrences of a given pattern string P in a given text T . Another important paradigm is the dictionary matching problem. Let D = {P 1 , ..., P k } be the dictionary. We seek all locations of dictionary patterns that a ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
Most traditional pattern matching algorithms solve the problem of finding all occurrences of a given pattern string P in a given text T . Another important paradigm is the dictionary matching problem. Let D = {P 1 , ..., P k } be the dictionary. We seek all locations of dictionary patterns that appear in a given text T .
Two and Higher Dimensional Pattern Matching in Optimal Expected Time
, 1994
"... Algorithms with optimal expected running time are presented for searching the occurrences of a twodimensional m × m pattern P in a twodimensional n × n text T over an alphabet of size c. The algorithms are based on placing in the text a static grid of test points, determined on ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Algorithms with optimal expected running time are presented for searching the occurrences of a twodimensional m &times; m pattern P in a twodimensional n &times; n text T over an alphabet of size c. The algorithms are based on placing in the text a static grid of test points, determined only by n, m and c (not dynamically by earlier test results). Using test strings read from the test points the algorithms eliminate as many potential occurrences of P as possible. The remaining potential occurrences are separately checked for actual occurrences. A suitable choice of the test point set leads to algorithms with expected running time O(n 2 log c m 2 =m 2 ) using the uniform Bernoulli model of randomness. This is shown to be optimal by a generalization of a onedimensional lower bound result by Yao. Experimental results show that the algorithms are efficient in practice, too. The method is also generalized for the k mismatches problem. The resulting algorithm has expected running ti...
Resource Scheduling for Composite Multimedia Objects
 In Very Large Data Bases Conf
, 1998
"... Scheduling algorithms for composite multimedia presentations need to ensure that the userdefined synchronization constraints for the various presentation components are met. This requirement gives rise to task models that are significantly more complex than the models employed in scheduling theory ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Scheduling algorithms for composite multimedia presentations need to ensure that the userdefined synchronization constraints for the various presentation components are met. This requirement gives rise to task models that are significantly more complex than the models employed in scheduling theory and practice. In this paper, we formulate the resource scheduling problems for composite multimedia objects and develop novel efficient scheduling algorithms drawing on a number of techniques from pattern matching and multiprocessor scheduling. Our formulation is based on a novel sequence packing problem, where the goal is to superimpose numeric sequences (representing the objects ’ resource needs as a function of time) within a fixed capacity bin (representing the server’s resource capacity). Given the intractability of the problem, we propose heuristic solutions using a twostep approach. First, we present a “basic step ” method for packing two composite object sequences into a single, combined sequence. Second, we show how this basic step can be employed within different scheduling algorithms to obtain a playout schedule for multiple objects. More specifically, we present an algorithm based on Graham’s listscheduling method that is provably nearoptimal for monotonic object sequences. We also suggest a number of optimizations on the base listscheduling scheme. Preliminary experimental results confirm the effectiveness of our approach. 1
Multidimensional Pattern Matching: A Survey
, 1992
"... We review some recent algorithms motivated by computer vision. The problem inspiring this research is that of searching an aerial photograph for all appearances of some object. The issues we discuss are local errors, scaling, compression and dictionary matching. We review deterministic serial te ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We review some recent algorithms motivated by computer vision. The problem inspiring this research is that of searching an aerial photograph for all appearances of some object. The issues we discuss are local errors, scaling, compression and dictionary matching. We review deterministic serial techniques that are used for multidimensional pattern matching and discuss their strengths and weaknesses. College of Computing Georgia Institute of Technology Atlanta, Georgia 303320280 Paritally supported by NSF grant IRI9013055. 1 Motivation String Matching is one of the most widely studied problems in computer science [Gal85]. Part of its appeal is in its direct applicability to "real world" problems. The KnuthMorrisPratt [KMP77] algorithm is directly implemented in the emacs "s" and UNIX "grep" commands. The longest common subsequence dynamic programming algorithm [CKK72] is implemented in the UNIX "diff" command. The largest overlap heuristic for finding the shortest common s...
TwoDimensional Periodicity in Rectangular Arrays
 Proc. of the 3rd ACMSIAM Symposium on Discrete Algorithms
, 1992
"... String matching is rich with a variety of algorithmic tools. In contrast, multidimensional matching has had a rather sparse set of techniques. This paper presents a new algorithmic technique for twodimensional matching: periodicity analysis. Its strength appears to lie in the fact that it is inhere ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
String matching is rich with a variety of algorithmic tools. In contrast, multidimensional matching has had a rather sparse set of techniques. This paper presents a new algorithmic technique for twodimensional matching: periodicity analysis. Its strength appears to lie in the fact that it is inherently twodimensional. Periodicity in strings has been used to solve string matching problems. Multidimensional periodicity, though, is not as simple as it is in strings and was not formally studied or used in pattern matching. In this paper, we define and analyze twodimensional periodicity in rectangular arrays. One definition of string periodicity is that a periodic string can selfoverlap in a particular way. An analogous concept is true in two dimensions. The self overlap vectors of a rectangle generate a regular pattern of locations where the rectangle may originate. Based on this regularity, we define four categories of periodic arrays: nonperiodic, latticeperiodic, lineperiodic and...
New Models and Algorithms for Multidimensional Approximate Pattern Matching
 J. Discret. Algorithms
, 2000
"... We focus on how to compute the edit distance (or similarity) between two images and the problem of approximate string matching in two dimensions, that is, to find a pattern of size mm in a text of size n n with at most k errors (character substitutions, insertions and deletions). Pattern and text a ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
(Show Context)
We focus on how to compute the edit distance (or similarity) between two images and the problem of approximate string matching in two dimensions, that is, to find a pattern of size mm in a text of size n n with at most k errors (character substitutions, insertions and deletions). Pattern and text are matrices over an alphabet of size . We present new models and give the first sublinear time search algorithms for the new and the existing models.
The Truth, the Whole Truth, and Nothing but the Truth: Alphabet Independent Two Dimensional Witness Table Construction
, 1992
"... The new technique of two dimensional periodicity has been useful in many recent two dimensional matching results. A key step in using periodicity is computing the witness table. In this paper, an optimal, linear time, alphabet independent, two dimensional witness table construction algorithm is p ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
(Show Context)
The new technique of two dimensional periodicity has been useful in many recent two dimensional matching results. A key step in using periodicity is computing the witness table. In this paper, an optimal, linear time, alphabet independent, two dimensional witness table construction algorithm is presented. College of Computing Georgia Institute of Technology Atlanta, Georgia 303320280 College of Computing, Georgia Institute of Technology, Atlanta, GA 303320280; (404) 8530083; amir@cc.gatech.edu; Partially supported by NSF grant IRI9013055. y Department of Mathematics, University of Southern California, DRB 155, 1024 W. 36th Pl., Los Angeles, CA 900891113; (213) 7402404; gbenson@hto.usc.edu; Partially supported by NSF grant IRI9013055. z DIMACS, Box 1179, Rutgers University, Piscataway, NJ 08855; (908) 9325928; farach@dimacs.rutgers.edu; Supported by DIMACS under NSF contract STC8809648. 1 Introduction Recently the world has been witnessing a strong convergence ...