Results 1 -
3 of
3
Influence of Language Models and Candidate Set Size on Contextual Post-processing for Chinese Script Recognition
- In Proceedings of the 17th International Conference on Pattern Recognition
, 2004
"... In Chinese language, word is the basic syntaxmeaningful unit, however, each character also has the definite meaning itself. In this paper, we compare the perplexities of four n-gram language models (characterbased bigram, character-based trigram, word-based bigram and class-based bigram) and their i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In Chinese language, word is the basic syntaxmeaningful unit, however, each character also has the definite meaning itself. In this paper, we compare the perplexities of four n-gram language models (characterbased bigram, character-based trigram, word-based bigram and class-based bigram) and their influence on the performance of contextual post-processing of Chinese scripts in an offline handwritten Chinese character recognition system. We also demonstrate the influence of the candidate set size on the performance of contextual post-processing in detail, and indicate that the number of candidates should vary with each script. 1.
Combining character-based bigram with word-based bigram in contextual post-processing for Chinese script
- ACM TRANS. ASIAN LANGUAGE INFORMATION PROCESSING
, 2002
"... It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigram is time-consuming. This article presents a novel contextual postprocessing method that integrates character-based bigram postprocessing with word-based bigram postprocessing in light of the complementary action between Chinese characters and Chinese words. On the basis of isolated character recognition, character-based bigram postprocessing using a forward-backward search is first executed on a big candidate set, which improves both the accuracy and efficiency of the candidate set (the cumulative accuracy of the top ten candidates is greatly boosted). Then, to further improve accuracy, word-based bigram postprocessing (WBP) is executed on a small candidate set. This method obtains high accuracy while paying attention to postprocessing speed at the same time. Experimental results for three Chinese scripts (about 66,000 characters in total) demonstrate the effectiveness of our method: character-based bigram postprocessing improves accuracy from 81.58 % to 94.50%, and the cumulative accuracy of the top ten candidates rises from 94.33 % to 98.25%. After WBP, 95.75 % accuracy is achieved, which is equivalent to the accuracy of WBP executed on a big candidate set. However, our method is more than 100 times faster than that of WBP.
Analysis of Error Count Distributions for Improving the Postprocessing Performance of OCCR
- Communication of Chinese and Oriental Languages Information Processing Society
, 1996
"... Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is not so good as expect when the test data contain more unseen context, e.g., proper nouns such as pe ..."
Abstract
- Add to MetaCart
Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is not so good as expect when the test data contain more unseen context, e.g., proper nouns such as personal names and organizational names. This paper addresses the importance of analyzing the error count distributions before applying the language models. According to the analysis, more than 50% of errors can be reduced and more than 90% of time can be saved on the average based on the Markov character bigram model. Keywords: Contextual Language Processing, Error Count Distributions, Image Processing, Markov Model, OCCR, Unseen Context 1 Introduction To improve the interface with computers, the development of input devices such as optical character recognition (OCR) device and speech recognition (SR) device is expected. The OCR device is a good choice while the printed documents are ...

