Results 1 - 10
of
15
Extraction, layout analysis and classification of diagrams in PDF documents
- In ICDAR 2003
, 2003
"... Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector (not raster) format in online research papers. PDF files are parsed and their vector graphics components installed in a spatial index. Subdiagrams are found by analyzing white space gaps. A set of statistics is generated for each diagram, e.g., the number of horizontal lines and vertical lines. The statistics form a feature vector description of the diagram. The vectors are used in a kernel-based machine learning system (Support Vector Machine). Separating a set of bar graphs from non-bar-graphs gathered from 20,000 biology research papers gave a classification accuracy of 91.7%. The approach is directly applicable to diagrams vectorized from images.
Support Vector Machines for Handwritten Numerical String Recognition
- Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR-9
, 2004
"... In this paper we discuss the use of SVMs to recognize handwritten numerical strings. Such a problem is more complex than recognizing isolated digits since one must deal with problems such as segmentation, overlapping, unknown number of digits, etc. In order to perform our experiments, we have used a ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
In this paper we discuss the use of SVMs to recognize handwritten numerical strings. Such a problem is more complex than recognizing isolated digits since one must deal with problems such as segmentation, overlapping, unknown number of digits, etc. In order to perform our experiments, we have used a segmentation-based recognition system using heuristic over-segmentation. The contribution of this paper is twofold. Firstly, we demonstrate by experimentation that SVMs improve the overall recognition rates. Secondly, we observe that SVMs deal with outliers such as over- and under-segmentation better than multi-layer perceptron neural networks.
Neighborhood Property based Pattern Selection For Support Vector Machines
, 2006
"... Support Vector Machine (SVM) has been spotlighted in the machine learning community thanks to its theoretical soundness and practical performance. When applied to a large data set, however, it requires a large memory and long time for training. To cope with the practical difficulty, we propose a pat ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Support Vector Machine (SVM) has been spotlighted in the machine learning community thanks to its theoretical soundness and practical performance. When applied to a large data set, however, it requires a large memory and long time for training. To cope with the practical difficulty, we propose a pattern selection algorithm based on neighborhood properties. The idea is to select only the patterns that are likely to be located near the decision boundary. Those patterns are expected to be more informative than the randomly selected patterns. The experimental results provide promising evidence that it is possible to successfully employ the proposed algorithm ahead of SVM training. 1 1
ECM-Aware Cell-Graph Mining for Bone Tissue Modeling and Classification
"... Pathological examination of a biopsy is the most reliable and widely used technique to diagnose bone cancer. However, it suffers from both interand intra- observer subjectivity. Techniques for automated tissue modeling and classification can reduce this subjectivity and increases the accuracy of bo ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Pathological examination of a biopsy is the most reliable and widely used technique to diagnose bone cancer. However, it suffers from both interand intra- observer subjectivity. Techniques for automated tissue modeling and classification can reduce this subjectivity and increases the accuracy of bone cancer diagnosis. This paper presents a graph theoretical method, called extracellular matrix (ECM)-aware cell-graph mining, that combines the ECM formation with the distribution of cells in hematoxylin and eosin (H&E) stained histopathological images of bone tissues samples. This method can identify different types of cells that coexist in the same tissue as a result of its functional state. Thus, it models the structure-function relationships more precisely and classifies bone tissue samples accurately for cancer diagnosis. The tissue images are segmented, using the eigenvalues of the hessian matrix, to compute spatial coordinates of cell nuclei as the nodes of corresponding cell-graph.Upon segmentation a color code is assigned to each node based on the composition of its surrounding ECM. An edge is hypothesized (and established) between a pair of nodes if the corresponding cell membranes are in physical contact and
S.Z.: Response Modeling with Support Vector Machine. Expert Systems with Applications 30(4
- Expert Systems with Applications
, 1997
"... Support Vector Machine (SVM) employs Structural Risk Minimization (SRM) principle to generalize better than conventional machine learning methods employing the traditional Empirical Risk Minimization (ERM) principle. When applying SVM to response modeling in direct marketing, however, one has to dea ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Support Vector Machine (SVM) employs Structural Risk Minimization (SRM) principle to generalize better than conventional machine learning methods employing the traditional Empirical Risk Minimization (ERM) principle. When applying SVM to response modeling in direct marketing, however, one has to deal with the practical difficulties: large training data, class imbalance and scoring from binary SVM output. For the first difficulty, we propose a way to alleviate or solve it through a novel informative sampling. For the latter two difficulties, we provide guidelines within SVM framework so that one can readily use the paper as a quick reference for SVM response modeling: use of different costs for different classes and use of distance to decision boundary, respectively. This paper also provides various evaluation measures for response models in terms of accuracies, lift chart analysis and computational efficiency.
Automatic Recognition of Handwritten Medical Forms for Search Engines
"... A new paradigm, which models the relationships between handwriting and topic categories, in the context of medical forms, is presented. The ultimate goals are (i) the recognition of medical handwriting, and (ii) the use of such information for practical applications such as a medical form search eng ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A new paradigm, which models the relationships between handwriting and topic categories, in the context of medical forms, is presented. The ultimate goals are (i) the recognition of medical handwriting, and (ii) the use of such information for practical applications such as a medical form search engine. Medical forms have diverse, complex and large lexicons consisting of English, Medical and Pharmacology corpus. Our technique shows that a few recognized characters, returned by handwriting recognition, can be used to construct a linguistic model capable of representing a medical topic
Automatic Recognition of Handwritten Dates on Brazilian Bank Cheques
, 2003
"... In this thesis, an HMM-MLP hybrid system for segmenting and recognizing unconstrained handwritten dates written on Brazilian bank cheques is presented. The system evolves by dealing with many sources of variability, such as heterogeneous data types and styles, variations present in the date field, a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this thesis, an HMM-MLP hybrid system for segmenting and recognizing unconstrained handwritten dates written on Brazilian bank cheques is presented. The system evolves by dealing with many sources of variability, such as heterogeneous data types and styles, variations present in the date field, and difficult cases of segmentation that make the recognizer task particular hard to do. The system takes an HMM-based...
Support Vector Machines Applied to Handwritten Numerals Recognition
"... A good classifier must accurately and efficiently separate patterns from one another, given a set of training data. By maximizing the minimal distance from the training data to the decision boundary, support vector machines (SVMs) can achieve an errorless separation of the training data if one exist ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A good classifier must accurately and efficiently separate patterns from one another, given a set of training data. By maximizing the minimal distance from the training data to the decision boundary, support vector machines (SVMs) can achieve an errorless separation of the training data if one exists and a relatively low error separation if the data
Hybrid Solution for the Feature Selection in Personal Identification Problems through Keystroke Dynamics
"... Abstract Techniques based on biometrics have been successfully applied to personal identification systems. One rather promising technique uses the keystroke dynamics of each user in order to recognize him/her. In this work, we present the development of a hybrid system based on support vector machin ..."
Abstract
- Add to MetaCart
Abstract Techniques based on biometrics have been successfully applied to personal identification systems. One rather promising technique uses the keystroke dynamics of each user in order to recognize him/her. In this work, we present the development of a hybrid system based on support vector machines and stochastic optimization techniques. The main objective is the analysis of these optimization algorithms for feature selection. We evaluate two optimization techniques for this task: genetic algorithms (GA) and particle swarm optimization (PSO). In the present study, PSO outperformed GA with regard to classification error and processing time, but was inferior regarding the feature reduction rate. I.

