## Offline Cursive Word Recognition using Continuous Density Hidden Markov Models trained with PCA or ICA Features (2001)

Venue: | ICA Features,” Sixth International Conference on Pattern Recognition (ICPR 2002 |

Citation Context ...ell i. The feature vector collects the values f i = n i P j n j . 3. Hidden Markov Models Hidden Markov Models are probability density functions over sequences of vectors (for a good introduction see =-=[4]-=-). The sequences are assumed to be produced by a system characterized by a state (belonging to a finite set of possible states Q = fQ i : i = 1; 2; : : : ; Ng) that changes at discrete time steps. The... |

Citation Context ...f W determines the properties of y. We used two different criteria leading to the extraction of the Principal Components (using PCA) and of the Independent Components (using ICA). When performing PCA =-=[2]-=-, the rows of W are the eigenvectors of the covariance matrix of the original data (assumed to have 0 mean, condition that can always be easily achieved by substracting the mean estimated over the tra... |

Citation Context ...f all the elements in the word image that are not useful for the recognition. The operations performed at this stage depend on the data. In our case, a binarization (performed with the Otsu algorithm =-=[1]-=-) is sufficient. The normalization is supposed to remove slant (the angle between the vertical direction and the direction of the strokes supposed to be vertical in an ideal model of handwriting) and ... |

Citation Context ...e raw data). The best system was obtained by training over the Principal Components. Its performance over the data set used is significantly higher than the accuracies claimed (over the same data) in =-=[5, 3]-=- (92:8% and 85:0% respectively) and slightly better than the 94:6% recognition rate presented in [6]. On the other hand, this last system is much more complex than our. The words are first segmented i... |

Citation Context ...of a reasonable interval) and the shear transformed image giving the highest value of deslantedness is assumed as the deslanted one. For a full description of the normalization technique we used, see =-=[7]-=-. The slope and slant removal methods applied are adaptive and do not use any parameters to be set empirically. This avoids the need of tuning a different parameter set for each writer and makes the s... |

Citation Context ...e raw data). The best system was obtained by training over the Principal Components. Its performance over the data set used is significantly higher than the accuracies claimed (over the same data) in =-=[5, 3]-=- (92:8% and 85:0% respectively) and slightly better than the 94:6% recognition rate presented in [6]. On the other hand, this last system is much more complex than our. The words are first segmented i... |

Citation Context ...er the data set used is significantly higher than the accuracies claimed (over the same data) in [5, 3] (92:8% and 85:0% respectively) and slightly better than the 94:6% recognition rate presented in =-=[6]-=-. On the other hand, this last system is much more complex than our. The words are first segmented into primitives using a sliding window approach. A Neural Network is then used 2 4 6 8 10 12 14 85 90... |

