Results 1 -
6 of
6
Spelling correction for search engine queries
- In Proceedings of EsTAL-04, España for Natural Language Processing
, 2004
"... Abstract Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search engine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure. 1
Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian
"... This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora.We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing five different languages ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora.We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing five different languages. In order to compare how well different types of bilingual dictionaries covered the most common queries and terms on the website we tried a collection of ordinary bilingual dictionaries, a small manually constructed trilingual dictionary and an automatically constructed trilingual dictionary, constructed from the news corpus in the website using Uplug. The precision and recall of the automatically constructed Swedish-English dictionary using Uplug were 71 and 93 percent, respectively. We found that precision and recall increase significantly in samples with high word frequency, but we could not confirm that POS-tags improve precision. The collection of ordinary dictionaries, consisting of about 200 000 words, only cover 41 of the top 100 search queries at the website. The automatically built trilingual dictionary combined with the small manually built trilingual dictionary, consisting of about 2 300 words, and cover 36 of the top search queries.
Improving Precision and Recall Using a Spellchecker in a Search Engine
- Stockholm University
, 2004
"... Search engines constitute a key to finding specific information on the fast growing World Wide Web. Users query a search engine by using natural language to extract documents that refer to the desired subject. Sometimes no information is found because they make spelling and typing mistakes while ent ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Search engines constitute a key to finding specific information on the fast growing World Wide Web. Users query a search engine by using natural language to extract documents that refer to the desired subject. Sometimes no information is found because they make spelling and typing mistakes while entering their queries. Earlier reports suggest that 10-12 percent of all questions to a search engine are misspelled. The issue is how much does the use of a query spellchecker affect the performance of a search engine? This Master’s thesis presents an evaluation of how much a query spellchecker improves precision and recall in information retrieval for Swedish texts. Evaluation results indicate that spellchecking improved both precision and recall with 4 respectively 11.5 percent. Evaluering av ett stavningsstöd till en sökmotor Sammanfattning Sökmotorer är en nyckel till att kunna hitta specifik information i det snabbt växande Internet. Användaren brukar använda naturligt språk på en sökmotor för att kunna hitta den informationen han eller hon är in-tresserad av. Ibland misslyckas sökningen därför att användaren råkar stava eller skriva fel. Tidigare studier visar att 10-12 procent av alla frågor som ställs till en sökmotor är felstavade. Frågan är hur påverkar stavningsstödet resultaten av sökningen? Detta examensarbete utvärderar hur mycket en stavningskontroll kan förbättra precision och täckning vid informationssökning på svenska. Resultaten visar att stavningskontrollen förbättrade både precisionen och täckningen med 4 respektive 11.5 procent.
Automated Email Answering by Text Pattern Matching, in
- Proc. 7th International Conference on Natural Language Processing (IceTAL 2010
, 2010
"... Abstract. Answering email by standard answers is a common practice at contact centers. Our research assists this process by creating reply messages that contain one or several standard answers. Our standard answers are linked to representative text patterns that match incoming messages. The system w ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Answering email by standard answers is a common practice at contact centers. Our research assists this process by creating reply messages that contain one or several standard answers. Our standard answers are linked to representative text patterns that match incoming messages. The system works in three languages. The performance was evaluated on two email sets; the main advantage of our email answering technique is good correctness of the delivered replies.
University of Zurich Zurich Open Repository and Archive
"... Abstract—In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data ne ..."
Abstract
- Add to MetaCart
Abstract—In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data need to account for limited CPU and bandwidth. This demo shows P2P fast similarity search (P2PFastSS) running on mobile phones and laptops that is tailored to uncertain data entry and uses available resources efficiently. In this demo, users publish and search for textual content containing misspellings without relying on query logging, as done by Google, and with a minimum distributed indexing infrastructure. Similarity search is supported by using the concept of deletion neighborhood to evaluate the edit distance metric of string similarity. I.
Burkhard Stiller
"... Abstract—In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data ne ..."
Abstract
- Add to MetaCart
Abstract—In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data need to account for limited CPU and bandwidth. This demo shows P2P fast similarity search (P2PFastSS) running on mobile phones and laptops that is tailored to uncertain data entry and uses available resources efficiently. In this demo, users publish and search for textual content containing misspellings without relying on query logging, as done by Google, and with a minimum distributed indexing infrastructure. Similarity search is supported by using the concept of deletion neighborhood to evaluate the edit distance metric of string similarity. I.

