Results 1 -
7 of
7
Phonetic String Matching: Lessons from Information Retrieval
, 1996
"... Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic m ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic matching techniques. Our experimental comparison with existing techniques such as Soundex and edit distances, which is based on recall and precision, demonstrates that the new techniques are superior. In addition, reasoning from the similarity of phonetic matching and information retrieval, we have applied combination of evidence to phonetic matching. Our experiments with combining demonstrate that it leads to substantial improvements in effectiveness.
Tries for Approximate String Matching
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers), case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern d ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers), case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern differs from the document by k substitutions, transpositions, insertions or deletions, have hitherto been carried out only at costs linear in the size of the document. We present a trie-based method whose cost is independent of document size. H. Shang and T.H. Merrett are at the School of Computer Science, McGill University, Montr'eal, Qu'ebec, Canada H3A 2A7, Email: fshang, timg@cs.mcgill.ca 100 Our experiments show that this new method significantly outperforms the nearest competitor for k=0 and k=1, which are arguably the most important cases. The linear cost (in k) of the other methods begins to catch up, for our small files, only at k=2. For larger files, complexity arguments i...
Large Vocabulary Recognition of On-line Handwritten Cursive Words
, 1995
"... A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touchscreen and penbased input devices. Since handwriting is one of the most familiar communication media, pen-based interfaces combined with automa ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touchscreen and penbased input devices. Since handwriting is one of the most familiar communication media, pen-based interfaces combined with automatic handwriting recognition offers a very easy and natural input method. Pen-based interfaces are also essential in mobile computing because they are scalable. Recent advances in pen-based hardware and wireless communication have been influential factors in the renewed interest in on-line recognition systems. On-line handwriting recognition is fundamentally a pattern classification task; the objective is to take an input pattern, the handwritten signal collected on-line via a digitizing device, and classify it as one of a pre-specified set of words (i.e., the system's lexicon). Because exact recognition is very difficult, a lexicon is used to constrain the recognition output to a known vocab...
Generalizing Edit Distance To Incorporate Domain Information: Handwritten text recognition as a case study
- Pattern Recognition
, 1996
"... In this paper the Damerau-Levenshtein string difference metric is generalized in two ways to more accurately compensate for the types of errors that are present in the script recognition domain. First, the basic dynamic programming method for computing such a measure is extended to allow for merges, ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper the Damerau-Levenshtein string difference metric is generalized in two ways to more accurately compensate for the types of errors that are present in the script recognition domain. First, the basic dynamic programming method for computing such a measure is extended to allow for merges, splits and two-letter substitutions. Second, edit operations are refined into categories according to the effect they have on the visual "appearance" of words. A set of recognizer-independent constraints is developed to reflect the severity of the information lost due to each operation. These constraints are solved to assign specific costs to the operations. Experimental results on 2,335 corrupted strings and a lexicon of 21,299 words show higher correcting rates than with the original form. Keywords: string distance, string matching, spelling error correction, word recognition and correction, text editing, script recognition and post-processing 1 INTRODUCTION Since the goal of text recog...
Cross-Domain Approximate String Matching
, 1999
"... Approximate string matching is an important paradigm in domains ranging from speech recognition to information retrieval and molecular biology. In this paper, we introduce a new formalism for a class of applications that takes two strings as input, each specified in terms of a particular domain, and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Approximate string matching is an important paradigm in domains ranging from speech recognition to information retrieval and molecular biology. In this paper, we introduce a new formalism for a class of applications that takes two strings as input, each specified in terms of a particular domain, and performs a comparison motivated by constraints derived from a third, possibly different domain. This issue arises, for example, when searching multimedia databases built using imperfect recognition technologies (e.g., speech, optical character, and handwriting recognition). We present a polynomial time algorithm for solving the problem, and describe several variations that can also be solved efficiently. 1. Introduction Approximate string matching is a widely-studied paradigm with important applications in domains ranging from speech recognition to information retrieval and molecular biology [10, 3, 2, 12, 17, 4]. A key principle in this field is the concept of string edit distance, a mea...
Morphosyntactic Correction in Natural Language Interfaces
, 1988
"... Morphosyntax cannot be simply ignored in naturallanguage man-machine dialogue since it constitutes an important part of the meaning. Nevertheless, troublesome side effects can arise when morphosyntactic errors are combined with other types of errors. We describe here an efficient means of handling q ..."
Abstract
- Add to MetaCart
Morphosyntax cannot be simply ignored in naturallanguage man-machine dialogue since it constitutes an important part of the meaning. Nevertheless, troublesome side effects can arise when morphosyntactic errors are combined with other types of errors. We describe here an efficient means of handling quite complex combinations of typographical, phonographic and agreement errors in French, which are typical of C.A.I. users: a sentence as erroneous as les cott adgassan I'ippeauttainuz son perpndiqulre (!) will be perfectly recognized and translated into les cdts adjacents I'hypotnuse sont perpendiculaires (the legs adjacent to the hypotenuse are perpendicular) . 'slips of the pen'), whereas competence errors reflect ignorance about language rules or misconceptions about the domain. Phonographic errors (in French: ippeauttainuz for hypotenuse ) or agreement errors (/es c6t oppos for /es c6ts opposes ) are typical competence errors. In man- machine communication, the correction of competence errors is far more important than the correction of performance ones (see Vronis, 1988c). In fact, when faced with an error message, the user can correct typographical errors, for example, but he will generally be unable to correct phonographic or agreement errors. He can only try various spellings at random, which is a rather frustrating way of interacting with a system. We have tried elsewhere (Vronis, 1987b, c)'to demonstrate how some semantic and conceptual errors can be handled (especially wrong presuppositions) using a special many-sorted logic. The present paper focuses on morphosyntactic errors, inflexion and agreements.
Aspelling Checker
"... One of my concerns when conducting the assessment was that the carer was a single female parent and the child who was to be fostered was a teenage male. My concern was that the carer would be able, firstly to be able to protect the child and fulfill the needs of the young person ..."
Abstract
- Add to MetaCart
One of my concerns when conducting the assessment was that the carer was a single female parent and the child who was to be fostered was a teenage male. My concern was that the carer would be able, firstly to be able to protect the child and fulfill the needs of the young person

