Results 1 -
3 of
3
Finding Approximate Matches in Large Lexicons
- SOFTWARE - PRACTICE AND EXPERIENCE
, 1995
"... Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and p ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. KEY WORDS: pattern matching; string indexing; approximate matching; compressed inverted files; Soundex
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
- Proc. International Conference on Very Large Databases
, 1993
"... There are several advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. Our experiments show that this method ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
There are several advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. Our experiments show that this method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requiring less memory than other pattern-matching data structures; indeed, in some cases requiring less memory than would be consumed by a single pointer to each string. The pattern search method is based on text indexing techniques and is a successful adaptation of inverted files to main memory databases.
Combinatorics of Periods in Strings
"... We consider the set (n) of all period sets of strings of length n over a nite alphabet. We show that there is redundancy in period sets and introduce the notion of an irreducible period set. We prove that (n) is a lattice under set inclusion and does not satisfy the JordanDedekind condition. We ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We consider the set (n) of all period sets of strings of length n over a nite alphabet. We show that there is redundancy in period sets and introduce the notion of an irreducible period set. We prove that (n) is a lattice under set inclusion and does not satisfy the JordanDedekind condition. We propose the rst enumeration algorithm for (n) and improve upon the previously known asymptotic lower bounds on the cardinality of (n). Finally, we provide a new recurrence to compute the number of strings sharing a given period set. 1

