Results 1 - 10
of
119
The fifth annual test of ocr accuracy
, 1996
"... ISRI has conducted its third annual test of the accuracy of OCR systems. Vendors submitted their latest technology for recognizing machine-printed English text from page images. This year’s test re-used the 460-page sample from U.S. Department of Energy (DOE) documents that was used a year ago [Rice ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
ISRI has conducted its third annual test of the accuracy of OCR systems. Vendors submitted their latest technology for recognizing machine-printed English text from page images. This year’s test re-used the 460-page sample from U.S. Department of Energy (DOE) documents that was used a year ago [Rice 93a]. In addition, a new 200-page sample, randomly selected from popular magazines, was utilized.
Bayesian graph edit distance
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2000
"... AbstractÐThis paper describes a novel framework for comparing and matching corrupted relational graphs. The paper develops the idea of edit-distance originally introduced for graph-matching by Sanfeliu and Fu [1]. We show how the Levenshtein distance can be used to model the probability distribution ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
AbstractÐThis paper describes a novel framework for comparing and matching corrupted relational graphs. The paper develops the idea of edit-distance originally introduced for graph-matching by Sanfeliu and Fu [1]. We show how the Levenshtein distance can be used to model the probability distribution for structural errors in the graph-matching problem. This probability distribution is used to locate matches using MAP label updates. We compare the resulting graph-matching algorithm with that recently reported by Wilson and Hancock. The use of edit-distance offers an elegant alternative to the exhaustive compilation of label dictionaries. Moreover, the method is polynomial rather than exponential in its worst-case complexity. We support our approach with an experimental study on synthetic data and illustrate its effectiveness on an uncalibrated stereo correspondence problem. This demonstrates experimentally that the gain in efficiency is not at the expense of quality of match.
Approximate Nearest Neighbors and Sequence Comparison With Block Operations
- IN STOC
, 2000
"... We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S; Q) d(S; T ) for any other sequence T in D. Here d(S; Q) denotes the distance between sequences ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S; Q) d(S; T ) for any other sequence T in D. Here d(S; Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S; T ) = d(T; S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences with p...
An evaluation of OCR accuracy
, 1993
"... ISRI has conducted its second annual assessment of the accuracy of devices for optical character recognition (OCR) of machine-printed, English-language documents. This year’s test featured more devices, more data, ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
ISRI has conducted its second annual assessment of the accuracy of devices for optical character recognition (OCR) of machine-printed, English-language documents. This year’s test featured more devices, more data,
Scientific Paper Summarization Using Citation Summary Networks
"... Quickly moving to a new area of research is painful for researchers due to the vast amount of scientific literature in each field of study. One possible way to overcome this problem is to summarize a scientific topic. In this paper, we propose a model of summarizing a single article, which can be fu ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
Quickly moving to a new area of research is painful for researchers due to the vast amount of scientific literature in each field of study. One possible way to overcome this problem is to summarize a scientific topic. In this paper, we propose a model of summarizing a single article, which can be further used to summarize an entire topic. Our model is based on analyzing others’ viewpoint of the target article’s contributions and the study of its citation summary network using a clustering approach. 1
Constructing virtual documents for ontology matching
- In Proceedings of the 15th International World Wide Web Conference
, 2006
"... On the investigation of linguistic techniques used in ontology matching, we propose a new idea of virtual documents to pursue a cost-effective approach to linguistic matching in this paper. Basically, as a collection of weighted words, the virtual document of a URIref declared in an ontology contain ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
On the investigation of linguistic techniques used in ontology matching, we propose a new idea of virtual documents to pursue a cost-effective approach to linguistic matching in this paper. Basically, as a collection of weighted words, the virtual document of a URIref declared in an ontology contains not only the local descriptions but also the neighboring information to reflect the intended meaning of the URIref. Document similarity can be computed by traditional vector space techniques, and then be used in the similaritybased approaches to ontology matching. In particular, the RDF graph structure is exploited to define the description formulations and the neighboring operations. Experimental results show that linguistic matching based on the virtual documents is dominant in average F-Measure as compared to other three approaches. It is also demonstrated by our experiments that the virtual documents approach is cost-effective as compared to other linguistic matching approaches.
Graph edit distance from spectral seriation
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... Abstract—This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to st ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Abstract—This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems. Index Terms—Graph edit distance, graph seriation, maximum a posteriori probability (MAP), graph-spectral methods. 1
On the Common Substring Alignment Problem
"... The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings and a target string. is a common substring of all strings, that is. The goal is to compute the similarity of all strings with, without computing the part of again and again. Using the classical dynamic p ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings and a target string. is a common substring of all strings, that is. The goal is to compute the similarity of all strings with, without computing the part of again and again. Using the classical dynamic programming tables, each appearance of in a source string would require the computation of all the values in a dynamic programming table of size where is the size of. Here we describe an algorithm which is composed of an encoding stage and an alignment stage. During the first stage, a data structure is constructed which encodes the comparison of with. Then, during the alignment stage, for each comparison of a source with, the pre-compiled data structure is used to speed up the part of. We show how to reduce the alignment work, for each appearance of the common substring in a source string, to- at the cost of encoding work, which is executed only once.
The Fundamentals of iSPARQL: A Virtual Triple Approach For Similarity-Based Semantic Web Tasks
"... Abstract. This research explores three SPARQL-based techniques to solve Semantic Web tasks that often require similarity measures, such as semantic data integration, ontology mapping, and Semantic Web service matchmaking. Our aim is to see how far it is possible to integrate customized similarity fu ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract. This research explores three SPARQL-based techniques to solve Semantic Web tasks that often require similarity measures, such as semantic data integration, ontology mapping, and Semantic Web service matchmaking. Our aim is to see how far it is possible to integrate customized similarity functions (CSF) into SPARQL to achieve good results for these tasks. Our first approach exploits virtual triples calling property functions to establish virtual relations among resources under comparison; the second approach uses extension functions to filter out resources that do not meet the requested similarity criteria; finally, our third technique applies new solution modifiers to post-process a SPARQL solution sequence. The semantics of the three approaches are formally elaborated and discussed. We close the paper with a demonstration of the usefulness of our iSPARQL framework in the context of a data integration and an ontology mapping experiment. 1
Diagnosing meaning errors in short answers to reading comprehension questions
- Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, held at ACL 2008. Columbus, Ohio: Associa12 for Computational Linguistics
, 2008
"... A common focus of systems in Intelligent Computer-Assisted Language Learning (ICALL) is to provide immediate feedback to language learners working on exercises. Most of this research has focused on providing feedback on the form of the learner input. Foreign language practice and second language acq ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
A common focus of systems in Intelligent Computer-Assisted Language Learning (ICALL) is to provide immediate feedback to language learners working on exercises. Most of this research has focused on providing feedback on the form of the learner input. Foreign language practice and second language acquisition research, on the other hand, emphasizes the importance of exercises that require the learner to manipulate meaning. The ability of an ICALL system to diagnose and provide feedback on the meaning conveyed by a learner response depends on how well it can deal with the response variation allowed by an activity. We focus on short-answer reading comprehension questions which have a clearly defined target response but the learner may convey the meaning of the target in multiple ways. As empirical basis of our work, we collected an English as a Second Language (ESL) learner corpus of short-answer reading comprehension questions, for which two graders provided target answers and correctness judgments. On this basis, we developed a Content-Assessment Module (CAM), which performs shallow semantic analysis to diagnose meaning errors. It reaches an accuracy of 88 % for semantic error detection and 87 % on semantic error diagnosis on a held-out test data set. 1

