Results 1 -
4 of
4
Phonetic String Matching: Lessons from Information Retrieval
, 1996
"... Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic m ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic matching techniques. Our experimental comparison with existing techniques such as Soundex and edit distances, which is based on recall and precision, demonstrates that the new techniques are superior. In addition, reasoning from the similarity of phonetic matching and information retrieval, we have applied combination of evidence to phonetic matching. Our experiments with combining demonstrate that it leads to substantial improvements in effectiveness.
Finding Approximate Matches in Large Lexicons
- SOFTWARE - PRACTICE AND EXPERIENCE
, 1995
"... Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and p ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. KEY WORDS: pattern matching; string indexing; approximate matching; compressed inverted files; Soundex
On the Development of Name Search Techniques for Arabic
, 2003
"... The need for effective identity matching systems has led to extensive research in the area of name search. For the most part, such work has been limited to English and other Latin-based languages. Consequently, algorithms such as Soundex and n-gram matching are of limited utility for languages such ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The need for effective identity matching systems has led to extensive research in the area of name search. For the most part, such work has been limited to English and other Latin-based languages. Consequently, algorithms such as Soundex and n-gram matching are of limited utility for languages such as Arabic, which has vastly different morphologic features that rely heavily on phonetic information. The dearth of work in this field is partly caused by the lack of standardized test data. Consequently, we have built a collection of 7,939 Arabic names, along with 50 training queries and 111 test queries. We use this collection to evaluate a variety of algorithms, including a derivative of Soundex tailored to Arabic (ASOUNDEX), measuring effectiveness by using standard information retrieval measures. Our results show an improvement of 70 % over existing approaches.

