Results 1 -
7 of
7
A Double Metaphone Encoding for Bangla and its Application in Spelling Checker
- Proc. 2005 IEEE �atural Language Processing and Knowledge Engineering
, 2005
"... Abstract- We present a Double Metaphone encoding for Bangla that can be used by spelling checkers to improve the quality of suggestions for misspelled words. The complex rules of Bangla spelling present a significant challenge in producing suggestions for a misspelled word when employing the traditi ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract- We present a Double Metaphone encoding for Bangla that can be used by spelling checkers to improve the quality of suggestions for misspelled words. The complex rules of Bangla spelling present a significant challenge in producing suggestions for a misspelled word when employing the traditional edit-distance methods; one must take phonetic similarity into account for the suggested alternatives to be reasonably accurate. We propose a Double Metaphone encoding for Bangla, taking into account the various context-sensitive rules, including those involving the large repertoire of consonant clusters in Bangla, and present a comparison with the traditional edit-distance based methods in producing suggestions for misspelled words.
Spelling correction for search engine queries
- In Proceedings of EsTAL-04, España for Natural Language Processing
, 2004
"... Abstract Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search engine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure. 1
Improving Precision and Recall Using a Spellchecker in a Search Engine
- Stockholm University
, 2004
"... Search engines constitute a key to finding specific information on the fast growing World Wide Web. Users query a search engine by using natural language to extract documents that refer to the desired subject. Sometimes no information is found because they make spelling and typing mistakes while ent ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Search engines constitute a key to finding specific information on the fast growing World Wide Web. Users query a search engine by using natural language to extract documents that refer to the desired subject. Sometimes no information is found because they make spelling and typing mistakes while entering their queries. Earlier reports suggest that 10-12 percent of all questions to a search engine are misspelled. The issue is how much does the use of a query spellchecker affect the performance of a search engine? This Master’s thesis presents an evaluation of how much a query spellchecker improves precision and recall in information retrieval for Swedish texts. Evaluation results indicate that spellchecking improved both precision and recall with 4 respectively 11.5 percent. Evaluering av ett stavningsstöd till en sökmotor Sammanfattning Sökmotorer är en nyckel till att kunna hitta specifik information i det snabbt växande Internet. Användaren brukar använda naturligt språk på en sökmotor för att kunna hitta den informationen han eller hon är in-tresserad av. Ibland misslyckas sökningen därför att användaren råkar stava eller skriva fel. Tidigare studier visar att 10-12 procent av alla frågor som ställs till en sökmotor är felstavade. Frågan är hur påverkar stavningsstödet resultaten av sökningen? Detta examensarbete utvärderar hur mycket en stavningskontroll kan förbättra precision och täckning vid informationssökning på svenska. Resultaten visar att stavningskontrollen förbättrade både precisionen och täckningen med 4 respektive 11.5 procent.
Isolated-word Error Correction for Partially Phonemic Languages using Phonetic Cues
"... Partially phonemic languages use writing systems which are in between strictly phonemic and non-phonemic orthography. Therefore, phonetic errors are very frequent in such languages. This paper introduces an approach for development of spellcheckers for partially phonemic languages that use grapheme- ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Partially phonemic languages use writing systems which are in between strictly phonemic and non-phonemic orthography. Therefore, phonetic errors are very frequent in such languages. This paper introduces an approach for development of spellcheckers for partially phonemic languages that use grapheme-to-phoneme mapping for isolated-word error correction. Since, a complete and accurate grapheme-to-phoneme system is overkill for a spellchecker, the framework can deal with incomplete phonological information through the use of metaphonemes. The paper also discusses the implementation of a Bengali spellchecker based on this approach and some other issues specific to the Bengali spell-checking. The framework described here is generic in nature and can be used for any partially phonemic languages by incorporating the language specific parts like phonological rules, the keyboard layout and ranking strategies. This approach is very useful for Indian languages as most of them are partially phonemic in nature. 1
Inducing Search Keys for Name Filtering
"... This paper describes ETK (Ensemble of Transformation based Keys) a new algorithm for inducing search keys for name filtering. ETK has the low computational cost and ability to filter by phonetic similarity characteristic of phonetic keys but is adaptable to alternative similarity models. A prelimina ..."
Abstract
- Add to MetaCart
This paper describes ETK (Ensemble of Transformation based Keys) a new algorithm for inducing search keys for name filtering. ETK has the low computational cost and ability to filter by phonetic similarity characteristic of phonetic keys but is adaptable to alternative similarity models. A preliminary empirical evaluation suggests that ETK may be well-suited for phonetic filtering applications such as recognizing alternative cross-lingual transliterations. 1 1
Automatic standardisation of texts containing spelling variation How
"... much training data do you need? ..."
A Novel Similarity Measure for Sequence Data
"... Abstract—A variety of different metrics has been introduced to measure the similarity of two given sequences. These widely used metrics are ranging from spell correctors and categorizers to new sequence mining applications. Different metrics consider different aspects of sequences, but the essence o ..."
Abstract
- Add to MetaCart
Abstract—A variety of different metrics has been introduced to measure the similarity of two given sequences. These widely used metrics are ranging from spell correctors and categorizers to new sequence mining applications. Different metrics consider different aspects of sequences, but the essence of any sequence is extracted from the ordering of its elements. In this paper, we propose a novel sequence similarity measure that is based on all ordered pairs of one sequence and where a Hasse diagram is built in the other sequence. In contrast with existing approaches, the idea behind the proposed sequence similarity metric is to extract all ordering features to capture sequence properties. We designed a clustering problem to evaluate our sequence similarity metric. Experimental results showed the superiority of our proposed sequence similarity metric in maximizing the purity of clustering compared to metrics such as d2, Smith-Waterman, Levenshtein, and Needleman-Wunsch. The limitation of those methods originates from some neglected sequence features, which are considered in our proposed sequence similarity metric.

