INTEGRATING KNOWLEDGE SOURCES IN Devanagari Text Recognition (1999)
| Citations: | 9 - 0 self |
BibTeX
@MISC{Bansal99integratingknowledge,
author = {Veena Bansal},
title = {INTEGRATING KNOWLEDGE SOURCES IN Devanagari Text Recognition},
year = {1999}
}
OpenURL
Abstract
Reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. The same is the underlying philosophy of Devanagari document recognition system described in this work. We have identified various relevant knowledge sources which have been integrated using a blackboard model. Some of the knowledge sources are acquired a priori by an automated training process. The efficacy of each of these knowledge sources depends on the coverage of the sample space, the training algorithm and nature of the knowledge source itself. Some of the knowledge sources are constituted from the knowledge extracted from the text as it is processed. These knowledge sources are transient in nature and are meaningful in the domain of the text under consideration. The initial segmentation of text zone in text lines is based on image profile. However, the initial segmentation leaves the overlapping text lines unsegmented. The height information of text lines obtained after initial segmentation is statistically analyzed. The most frequent line height becomes the threshold line height for the text zone under consideration. The threshold line height is used for detecting overlapping text lines. This knowledge also provides clue for the possible segmentation points for these lines. The structural properties of Devanagari script, namely the header line and three horizontal strip of a word due to two dimensional composition of the script are exploited by the segmentation process at word level as well as at character level.







