Results 1 -
6 of
6
A Large Scale Ranker-Based System for Search Query Spelling Correction
"... This paper makes three significant extensions to a noisy channel speller designed for standard written text to target the challenging domain of search queries. First, the noisy channel model is subsumed by a more general ranker, which allows a variety of features to be easily incorporated. Second, a ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper makes three significant extensions to a noisy channel speller designed for standard written text to target the challenging domain of search queries. First, the noisy channel model is subsumed by a more general ranker, which allows a variety of features to be easily incorporated. Second, a distributed infrastructure is proposed for training and applying Web scale n-gram language models. Third, a new phrase-based error model is presented. This model places a probability distribution over transformations between multi-word phrases, and is estimated using large amounts of query-correction pairs derived from search logs. Experiments show that each of these extensions leads to significant improvements over the state-of-the-art baseline methods. 1
Why Press Backspace? Understanding User Input Behaviors in Chinese Pinyin Input Method
"... Chinese Pinyin input method is very important for Chinese language information processing. Users may make errors when they are typing in Chinese words. In this paper, we are concerned with the reasons that cause the errors. Inspired by the observation that pressing backspace is one of the most commo ..."
Abstract
- Add to MetaCart
Chinese Pinyin input method is very important for Chinese language information processing. Users may make errors when they are typing in Chinese words. In this paper, we are concerned with the reasons that cause the errors. Inspired by the observation that pressing backspace is one of the most common user behaviors to modify the errors, we collect 54, 309, 334 error-correction pairs from a realworld data set that contains 2, 277, 786 users via backspace operations. In addition, we present a comparative analysis of the data to achieve a better understanding of users ’ input behaviors. Comparisons with English typos suggest that some language-specific properties result in a part of Chinese input errors. 1
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence User-Dependent Aspect Model for Collaborative Activity Recognition ∗
"... Activity recognition aims to discover one or more users ’ actions and goals based on sensor readings. In the real world, a single user’s data are often insufficient for training an activity recognition model due to the data sparsity problem. This is especially true when we are interested in obtainin ..."
Abstract
- Add to MetaCart
Activity recognition aims to discover one or more users ’ actions and goals based on sensor readings. In the real world, a single user’s data are often insufficient for training an activity recognition model due to the data sparsity problem. This is especially true when we are interested in obtaining a personalized model. In this paper, we study how to collaboratively use different users ’ sensor data to train a model that can provide personalized activity recognition for each user. We propose a user-dependent aspect model for this collaborative activity recognition task. Our model introduces user aspect variables to capture the user grouping information, so that a target user can also benefit from her similar users in the same group to train the recognition model. In this way, we can greatly reduce the need for much valuable and expensive labeled data required in training the recognition model for each user. Our model is also capable of incorporating time information and handling new user in activity recognition. We evaluate our model on a real-world WiFi data set obtained from an indoor environment, and show that the proposed model can outperform several state-of-art baseline algorithms. 1
CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method
"... Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in “shenme ” ( 什 么 , meaning “what” in English) but may type in “shenem ” instead. Existing Pinyin input methods fail in converting such ..."
Abstract
- Add to MetaCart
Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in “shenme ” ( 什 么 , meaning “what” in English) but may type in “shenem ” instead. Existing Pinyin input methods fail in converting such a Pinyin sequence with errors to the right Chinese words. To solve this problem, we developed an efficient error-tolerant Pinyin input method called “CHIME ” that can handle typing errors. By incorporating state-of-the-art techniques and languagespecific features, the method achieves a better performance than state-of-the-art input methods. It can efficiently find relevant words in milliseconds for an input Pinyin sequence. 1
CloudSpeller: Spelling Correction for Search Queries by Using a Unified Hidden Markov Model with Web-scale Resources
"... Query spelling correction is a crucial component of moden search engines that can help users to express an information need more accurately and thus improve search quality. In participation of the Microsoft Speller Challenge, we proposed and implemented an efficient end-to-end speller correction sys ..."
Abstract
- Add to MetaCart
Query spelling correction is a crucial component of moden search engines that can help users to express an information need more accurately and thus improve search quality. In participation of the Microsoft Speller Challenge, we proposed and implemented an efficient end-to-end speller correction system, namely CloudSpeller. The CloudSpeller system uses a Hidden Markov model to effectively model major types of spelling errors in a unified framework, in which we integrate a large-scale lexicon constructed using Wikipedia, an error model trained from high confidence correction pairs, and the Microsoft Web N-gram service. Our system achieves excellent performance on two search query spelling correction datasets, reaching 0.970 and 0.940 F1 scores on the TREC dataset and the MSN dataset respectively.
A Unified Approach to Transliteration-based Text Input with Online Spelling Correction
"... This paper presents an integrated, end-to-end approach to online spelling correction for text input. Online spelling correction refers to the spelling correction as you type, as opposed to post-editing. The online scenario is particularly important for languages that routinely use transliteration-ba ..."
Abstract
- Add to MetaCart
This paper presents an integrated, end-to-end approach to online spelling correction for text input. Online spelling correction refers to the spelling correction as you type, as opposed to post-editing. The online scenario is particularly important for languages that routinely use transliteration-based text input methods, such as Chinese and Japanese, because the desired target characters cannot be input at all unless they are in the list of candidates provided by an input method, and spelling errors prevent them from appearing in the list. For example, a user might type suesheng by mistake to mean xuesheng 学 生 'student ' in Chinese; existing input methods fail to convert this misspelled input to the desired target Chinese characters. In this paper, we propose a unified approach to the problem of spelling correction and transliteration-based character conversion using an approach inspired by the phrasebased statistical machine translation framework. At the phrase (substring) level, k most probable pinyin (Romanized Chinese) corrections are generated using a monotone decoder; at the sentence level, input pinyin strings are directly transliterated into target Chinese characters by a decoder using a loglinear model that refer to the features of both levels. A new method of automatically deriving parallel training data from user keystroke logs is also presented. Experiments on Chinese pinyin conversion show that our integrated method reduces the character error rate by 20 % (from 8.9 % to 7.12%) over the previous state-of-the art based on a noisy channel model. 1

