Results 1 - 10
of
22
Electronic Marking and Identification Techniques to Discourage Document Copying
- IEEE Journal on Selected Areas in Communications
, 1995
"... Modern computer networks make it possible to distribute documents quickly and economically by electronic means rather than by conventional paper means. However, the widespread adoption of electronic distribution of copyrighted material is currently impeded by the ease of unauthorized copying and dis ..."
Abstract
-
Cited by 94 (11 self)
- Add to MetaCart
Modern computer networks make it possible to distribute documents quickly and economically by electronic means rather than by conventional paper means. However, the widespread adoption of electronic distribution of copyrighted material is currently impeded by the ease of unauthorized copying and dissemination. In this paper we propose techniques that discourage unauthorized distribution by embedding each document with a unique codeword. Our encoding techniques are indiscernible by readers, yet enable us to identify the sanctioned recipient of a document by examination of a recovered document. We propose three coding methods, describe one in detail, and present experimental results showing that our identification techniques are highly reliable, even after documents have been photocopied. I. INTRODUCTION E LECTRONIC distribution of publications is increasingly available through on-line text databases, CD-ROM's, computer network based retrieval services, and electronic libraries [1]--[...
A Survey of Table Recognition: Models, Observations, Transformations, and Inferences
- International Journal of Document Analysis and Recognition
, 2003
"... Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations and inferences. A table model defines ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations and inferences. A table model defines the physical and logical structure of tables; the model is used to detect tables, and to analyze and decompose the detected tables. Observations perform feature measurements and data lookup, transformations alter or restructure data, and inferences generate and test hypotheses. This presentation clarifies the decisions that are made by a table recognizer, and the assumptions and inferencing techniques that underlie these decisions.
Copyright Protection for Electronic Publishing over Computer Networks
- AT&T Bell Laboratories
, 1994
"... The increased availability of computers, printers and high-speed networks could make electronic publishing a reality. One of the major technical and economic challenges faced by electronic publishing is that of preventing individuals from easily copying and illegally distributing electronic document ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
The increased availability of computers, printers and high-speed networks could make electronic publishing a reality. One of the major technical and economic challenges faced by electronic publishing is that of preventing individuals from easily copying and illegally distributing electronic documents. In this paper, we explore the use of cryptographic protocols to discourage the distribution of illicit electronic copies. We propose an architecture and two separate schemes for making electronic document distribution secure. The first strategy requires special-purpose firmware in the printers and displays to decrypt encrypted documents. In the second strategy, encrypted documents are decrypted in software in the recipient's computer. 1 Introduction The increased use of facsimile has made the electronic transfer of paper documents more accepted. Electronic mail, electronic bulletin boards and networks such as the Internet make it possible to distribute electronic information to large gro...
Document Marking and Identification using Both Line and Word Shifting
, 1994
"... We continue our study of document marking to deter illicit dissemination. An experiment we performed reveals that the distortion on the photocopy of a document is very different in the vertical and horizontal directions. This leads to the strategy that marks a text line both vertically using line sh ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We continue our study of document marking to deter illicit dissemination. An experiment we performed reveals that the distortion on the photocopy of a document is very different in the vertical and horizontal directions. This leads to the strategy that marks a text line both vertically using line shifting and horizontally using word shifting. A line that is marked is always accompanied by two unmarked control lines one above and one below. They are used to measure distortions in the vertical and horizontal directions in order to decide whether line or word shift should be detected. Line shifts are detected using a centroid method that bases its decision on the relative distance of line centroids. Word shifts are detected using a correlation method that treats a profile as a waveform and decides whether it originated from a waveform whose middle block has been shifted left or right. The maximum likelihood detectors for both methods are given.
Machine Printed Text and Handwriting Identification in Noisy Document Images
- IEEE Trans. Pattern Analysis Machine Intelligence
, 2004
"... In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental inf ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications.
Document Identification for Copyright Protection using Centroid Detection
"... A way to discourage illicit reproduction of copyrighted or sensitive documents is to watermark each copy before distribution. A unique mark is embedded in the text whose recipient is registered. The mark can be extracted from a possibly noisy illicit copy, identifying the registered recipient. Most ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
A way to discourage illicit reproduction of copyrighted or sensitive documents is to watermark each copy before distribution. A unique mark is embedded in the text whose recipient is registered. The mark can be extracted from a possibly noisy illicit copy, identifying the registered recipient. Most image marking techniques are vulnerable to binarization attack and hence not suitable for text marking. We propose a different approach where a text document is marked by shifting certain text lines slightly up or down or words slightly left or right from their original positions. The shifting pattern constitutes the mark and is different on different copies. In this paper we develop and evaluate a method to detect such minute shifts. We describe a marking and identification prototype that implements the proposed method. We present preliminary experimental results which confirms the analytical prediction that centroid detection performs remarkably well on line shifts even in the presence of ...
Performance Comparison of Two Text Marking Methods
- IEEE Journal on Selected Areas in Communications
, 1998
"... A text document typically consists of a collection of regular structures such as words, lines and paragraphs, a slight movement of which seems less perceptible than, say, dithering of the document image. In this paper we exploit this property to watermark formatted text documents by shifting slightl ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A text document typically consists of a collection of regular structures such as words, lines and paragraphs, a slight movement of which seems less perceptible than, say, dithering of the document image. In this paper we exploit this property to watermark formatted text documents by shifting slightly certain lines and words, in order to discourage illicit distribution. We analyze two methods for reliable document identification in the presence of severe distortions introduced by photocopying, facsimile transmission and other processing. The correlation method uses document profiles directly for detection. To eliminate the effect of certain distortions, the centroid method bases its decision on the distances between the centroids of adjacent profile blocks. We present the maximum likelihood detectors for both methods and evaluate their relative performance. Our analysis indicates that line-shift generally has a smaller error than word-shift detection, and that the correlation detector o...
A Language for Specifying and Comparing Table Recognition Strategies
, 2004
"... Table recognition algorithms may be described by models of table location and struc-ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Table recognition algorithms may be described by models of table location and struc-ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize these algorithms as strategies in an imitation game, where the goal of the game is to match table interpretations from a chosen procedure as closely as possible. The chosen procedure may be a person or persons producing ‘ground truth, ’ or an algorithm. To describe table recognition strategies we have defined the Recognition Strat-egy Language (RSL). RSL is a simple functional language for describing strategies as sequences of abstract decision types whose results are determined by any suit-able decision method. RSL defines and maintains interpretation trees, a simple data structure for describing recognition results. For each interpretation in an interpreta-tion tree, we annotate hypothesis histories which capture the creation, revision, and rejection of individual hypotheses, such as the logical type and structure of regions. We present a proof-of-concept using two strategies from the literature. We demon-strate how RSL allows strategies to be specified at the level of decisions rather than ii algorithms, and we compare results of our strategy implementations using new tech-niques. In particular, we introduce historical recall and precision metrics. Con-ventional recall and precision characterize hypotheses accepted after a strategy has finished. Historical recall and precision provide additional information by describing all generated hypotheses, including any rejected in the final result. iii
A Document Image Analysis System on Parallel Processors
, 1997
"... This paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step Minimum Containing Rectangle (MCR) detection stage. Text B ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step Minimum Containing Rectangle (MCR) detection stage. Text Block Filtering (TBF) is then done heuristically and the filtered blocks are submitted to a Multi-Layer Perceptron (MLP) for recognition of characters. Smoothing of the document image is done during MLPbased character recognition to reduce the preprocessing time. It also reduces the formation of merged characters, a main source of recognition errors in conventional approaches. The MLP identifies the bold words during recognition which are used for automatic indexing of documents. Data is partitioned exploiting the inherent parallelism in a document image data. Communication overhead is small compared to the computation time so that a high degree of parallelization is achieved, reducing the to...
Text Identification in Noisy Document Images Using Markov Random Field
- In 7th International Conference on Document Analysis and Recognition (ICDAR
, 2003
"... In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov Random Field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.

