Results 1 -
4 of
4
Finding Approximate Matches in Large Lexicons
- SOFTWARE - PRACTICE AND EXPERIENCE
, 1995
"... Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and p ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. KEY WORDS: pattern matching; string indexing; approximate matching; compressed inverted files; Soundex
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
- Proc. International Conference on Very Large Databases
, 1993
"... There are several advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. Our experiments show that this method ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
There are several advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. Our experiments show that this method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requiring less memory than other pattern-matching data structures; indeed, in some cases requiring less memory than would be consumed by a single pointer to each string. The pattern search method is based on text indexing techniques and is a successful adaptation of inverted files to main memory databases.
What's Next? - Index Structures for Efficient Phrase Querying
- Proc. Australasian Database Conference
, 1999
"... Text retrieval systems are used to fetch documents from large text collections, using queries consisting of words and word sequences. ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Text retrieval systems are used to fetch documents from large text collections, using queries consisting of words and word sequences.
B-Trees with Lazy Parent Split
"... A B-tree variant that postpones parent node splittings due to upcoming items until a later access of the same node is examined. This technique aims to decrease the possibility of propagating splittings to upper levels so that more concurrency is achieved. Insertion and deletion algorithms are given. ..."
Abstract
- Add to MetaCart
A B-tree variant that postpones parent node splittings due to upcoming items until a later access of the same node is examined. This technique aims to decrease the possibility of propagating splittings to upper levels so that more concurrency is achieved. Insertion and deletion algorithms are given. Time and space performance results are also reported and comparison with conventional B-trees is carried out. It is shown that this technique substantially improves the performance of small degree B-trees so that, indeed, concurrency is enhanced. 1.

