Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
this paper we address an important step towards our goal of automatic musical accompaniment  the segmentation problem. Given a score to a piece of monophonic music and a sampled recording of a performance of that score, we attempt to segment the data into a sequence of contiguous regions corresponding to the notes and rests in the score. Within the framework of a hidden Markov model, we model our prior knowledge, perform unsupervised learning of the the data model parameters, and compute the segmentation that globally minimizes the posterior expected number of segmentation errors. We also show how to produce "online" estimates of score position. We present examples of our experimental results and readers are encouraged to access actual sound data we have made available from these experiments
Mathematical Expression Recognition: A Survey
, 2000
. Automatic recognition of mathematical expressions is one of the key vehicles in the drive towards transcribing documents in scientific and engineering disciplines into electronic form. This problem typically consists of two major stages, namely, symbol recognition and structural analysis. In this survey paper, we will review most of the existing work with respect to each of the two major stages of the recognition process. In particular, we try to put emphasis on the similarities and differences between systems. Moreover, some important issues in mathematical expression recognition will be addressed in depth. All these together serve to provide a clear overall picture of how this research area has been developed to date. Key words: error detection and correction  mathematical expression recognition  performance evaluation  structural analysis  symbol recognition 1
Automatic Recognition of Handwritten Numerical Strings: A Recognition and Verification Strategy
 IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2002
A modular system to recognize handwritten numerical strings is proposed. It uses a segmentationbased recognition approach and a Recognition and Verification strategy. The approach combines the outputs from different levels such as segmentation, recognition and postprocessing in a probabilistic model. A new verification scheme which contains two verifiers to deal with the problems of oversegmentation and undersegmentation is presented. A new
Geometric layout analysis techniques for document image understanding: a review
, 1998
Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with particular attention to two subprocesses: document skew angle estimation and page decomposition. Several algorithms proposed in the literature are synthetically described. They are included in a novel classification scheme. Some methods proposed for the evaluation of page decomposition algorithms are described. Critical discussions are reported about the current status of the field and about the open problems. Some considerations about the logical layout analysis are also reported.
Document Structure Analysis Algorithms: A Literature Survey
, 2003
Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among the physical or logical components of a document page can be described by an ordered tree structure and can be modeled by a tree grammar which describes the page at the component level in terms of regions or blocks. This paper provides a detailed survey of past work on document structure analysis algorithms and summarize the limitations of past approaches. In particular, we survey past work on document physical layout representations and algorithms, document logical structure representations and algorithms, and performance evaluation of document structure analysis algorithms. In the last section, we summarize this work and point out its limitations.
Toward WorkCentered Digital Information Services
 IEEE Computer
, 1996
We are engaged in developing technologies to support workcentered digital information services. Workcentered digital information services are a set of librarylike services meant to address the mission of the work group. Workplace users have somewhat specialized needs. In particular, they have internal collections of "legacy" document that need to be accessed along with external collections; they frequently want to retrieve information, rather than documents per se; and they require that digital information systems be integrated into, as well as augment, established work practices. Realizing workcentered digital information systems requires a broad technical agenda: Document image analysis, natural language analysis, computer vision analysis are necessary to facilitate information extraction; new user interface paradigms and authoring tools are required for users to better access multimedia information; improved protocols are required for client programs to interact with repositorie...
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, wordlevel postprocessing methods, such as dictionarylookup, have been applied successfully. However, many OCR errors cannot be corrected by wordlevel postprocessing. To overcome this limitation, passagelevel postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passagelevel postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passagelevel postprocessing. The objective is to develop a postprocessin...
CoarsetoFine Dynamic Programming
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
We introduce an extension of dynamic programming (DP) we call "CoarsetoFine Dynamic Programming" (CFDP), ideally suited to DP problems with large state space. CFDP uses dynamic programming to solve a sequence of coarse approximations which are lower bounds to the original DP problem. These approximations are developed by merging states in the original graph into "superstates" in a coarser graph which uses an optimistic arc cost between superstates. The approximations are designed so that when CFDP terminates the optimal path through the original state graph has been found. CFDP leads to significant decreases in the amount of computation necessary to solve many DP problems and can, in some instances, make otherwise infeasible computations possible. CFDP generalizes to DP problems with continuous state space and we offer a convergence result for this extension. The computation of the approximations requires that we bound the arc cost over all possible arcs associated with an adjacent pair of superstates; thus the feasibility of our proposed method requires the identification of such a lower bound. We demonstrate applications of this technique to optimization of functions and boundary estimation in mine recognition.
Spatial random tree grammars for modeling hierarchal structure in images with . . .
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
We present a novel probabilistic model for the hierarchical structure of an image and its regions. We call this model spatial random tree grammars (SRTGs). We develop algorithms for the exact computation of likelihood and maximum a posteriori (MAP) estimates and the exact expectationmaximization (EM) updates for modelparameter estimation. We collectively call these algorithms the centersurround algorithm. We use the centersurround algorithm to automatically estimate the maximum likelihood (ML) parameters of SRTGs and classify images based on their likelihood and based on the MAP estimate of the associated hierarchical structure. We apply our method to the task of classifying natural images and demonstrate that the addition of hierarchical structure significantly improves upon the performance of a baseline model that lacks such structure.
Robust Least Square Baseline Finding using a Branch and Bound Algorithm
 in Document Recognition and Retrieval VIII, SPIE
, 2002
Many document analysis and OCR systems depend on precise identification of page rotation, as well as the reliable identification of text lines. This paper presents a new algorithm to address both problems. It uses a branchandbound approach to globally optimal line finding and simultaneously models the baseline and the descender line under a Gaussian error/robust least square model. Results of applying the algorithm to documents in the University of Washington Database 2 are presented. Keywords: document analysis, layout analysis, skew detection, page rotation, text line finding, a#ne transformations 1.