Results 1 - 10
of
14
Functionally accurate, cooperative distributed systems
- IEEE Transactions on Systems, Man, and Cybernetics
, 1981
"... A new approach for structuring distributed processing systems, called functionally accurate, cooperative (FA/C), is proposed. The approach differs from conventional ones in its emphasis on handling distribution-caused uncertainty and errors as an integral part of the network problem-solving process. ..."
Abstract
-
Cited by 89 (18 self)
- Add to MetaCart
A new approach for structuring distributed processing systems, called functionally accurate, cooperative (FA/C), is proposed. The approach differs from conventional ones in its emphasis on handling distribution-caused uncertainty and errors as an integral part of the network problem-solving process. In this approach nodes cooperatively problem solve by exchanging partial tentative results (at various levels of abstraction) within the context of common goals. The approach is especially suited to applications in which the data necessary to achieve a solution cannot be partitioned in such a way that a node can complete a task without seeing the intermediate state of task processing at other nodes. Much of the inspiration for the FA/C approach comes from the mechanisms used in knowledge-based artificial intelligence (AI) systems for resolving uncertainty caused by noisy input data and the use of approximate knowledge. The appropriateness of the FA/C approach is explored in three application domains: distributed interpretation, distributed network traffic-light control, and distributed planning. Additionally, the relationship between the approach and the structure of management organizations is developed. Finally, a number of current research directions necessary to more fully develop the FA/C approach are outlined. These research directions include distributed search, the integration of implicit and explicit forms of control, and distributed planning and organizational self-design. I.
OCR with No Shape Training
- Proc. of 15th ICPR
, 2000
"... We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a "clump" metric, typically yields several hundred clusters with highly skewed populations. Letter identities are assigned ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a "clump" metric, typically yields several hundred clusters with highly skewed populations. Letter identities are assigned to each cluster by maximizing matches with a lexicon of English words. We found that for 2/3 of the pages, we can identify almost 80% of the words included in the lexicon, without any shape training. Residual errors are caused by mis-segmentation including missed lines and punctuation. This research differs from earlier attempts to apply cipher decoding to OCR in (1) using real data (2) a more appropriate clustering algorithm, and (3) decoding a many-to-many instead of a one-to-one mapping between clusters and letters. 1.
Unsupervised Analysis for Decipherment Problems
"... We study a number of natural language decipherment problems using unsupervised learning. These include letter substitution ciphers, character code conversion, phonetic decipherment, and word-based ciphers with relevance to machine translation. Straightforward unsupervised learning techniques most of ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
We study a number of natural language decipherment problems using unsupervised learning. These include letter substitution ciphers, character code conversion, phonetic decipherment, and word-based ciphers with relevance to machine translation. Straightforward unsupervised learning techniques most often fail on the first try, so we describe techniques for understanding errors and significantly increasing performance. 1
Substitution Deciphering Based on HMMs with Applications to Compressed Document Processing
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... It has been shown that simple substitution ciphers can be solved using statistical methods such as probabilistic relaxation. However, the utility of such solutions has been limited by their inability to cope with noise encountered in practical applications. In this paper, we propose a new solution ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
It has been shown that simple substitution ciphers can be solved using statistical methods such as probabilistic relaxation. However, the utility of such solutions has been limited by their inability to cope with noise encountered in practical applications. In this paper, we propose a new solution to substitution deciphering based on hidden Markov models. We show that our algorithm is more accurate than relaxation and much more robust in the presence of noise, making it useful for applications in compressed document processing. Recovering character interpretations from the sequence of cluster identifiers in a symbolically compressed document can be treated as a cipher problem. Although a significant amount of noise is present in the cluster sequence, enough information can be recovered with a robust deciphering algorithm to accomplish certain document analysis tasks. The feasibility of this approach is demonstrated in a multilingual document duplicate detection system.
Bayesian inference for finite-state transducers
- in HLT-NAACL
, 2010
"... We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finite-state transducers on end-toend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four different tasks demonstrate the genericity of ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finite-state transducers on end-toend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four different tasks demonstrate the genericity of this framework, and, where applicable, large improvements in performance over EM. We also show, for unsupervised part-of-speech tagging, that automatic run selection gives a large improvement over previous Bayesian approaches. 1
Attacking decipherment problems optimally with low-order n-gram models
- In Proceedings of EMNLP 2008
, 2008
"... We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy var ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy varies as a function of cipher length and n-gram order. We also make an empirical investigation of Shannon’s (1949) theory of uncertainty in decipherment. 1
Duplicate Detection in Symbolically Compressed Documents
, 1999
"... A new family of symbolic compression algorithms, such as the ongoing JBIG2 standardization and commercial products, has recently been developed. These techniques are specifically targeted for binary document images. They cluster individual blobs in a document and store the sequence of occurrence of ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
A new family of symbolic compression algorithms, such as the ongoing JBIG2 standardization and commercial products, has recently been developed. These techniques are specifically targeted for binary document images. They cluster individual blobs in a document and store the sequence of occurrence of blobs and representative blob templates, hence the name symbolic compression. This paper describes a method for duplicate detection on symbolically compressed document images. It recognizes the text in an image by deciphering the sequence of occurrence of blobs in the compressed representation. We propose a Hidden Markov Model (HMM) method for solving such deciphering problems and suggest applications in multilingual document duplicate detection.
The Applications of Genetic Algorithms in Cryptanalysis
, 1996
"... This thesis describes a method of deciphering messages encrypted with rotor machines utilising a Genetic Algorithm to search the keyspace. A fitness measure based on the phi test for non randomness of text is described and the results show that an unknown three rotor machine can generally be cryptan ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis describes a method of deciphering messages encrypted with rotor machines utilising a Genetic Algorithm to search the keyspace. A fitness measure based on the phi test for non randomness of text is described and the results show that an unknown three rotor machine can generally be cryptanalysed with about 4000 letters of ciphertext. The results are compared to those given using a previously published technique and found to be superior. Acknowledgements I would like to thank my supervisors, Vic Rayward-Smith and Geoff McKeown, for their help and encouragement. Contents 1 Introduction 8 2 Statistical Inference 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 Uncertainty : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.1 Rules of Probability : : : : : : : : : : : : : : : : : : : 12 2.2.2 Frequency Probability : : : : : : : : : : : : : : : : : : 15 2.2.3 Subjective Probability : : : : : : : : : : : : : : : : : : 15 2.3 Modelling...
Probabilistic Methods for a Japanese Syllable Cipher
- In proceedings of the 22 nd International Conference on Computer Processing of Oriental Languages, Lecture Notes in Artificial Intelligence
, 2009
"... Abstract. This paper attacks a Japanese syllable-substitution cipher. We use a probabilistic, noisy-channel framework, exploiting various Japanese language models to drive the decipherment. We describe several innovations, including a new objective function for searching for the highestscoring decip ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. This paper attacks a Japanese syllable-substitution cipher. We use a probabilistic, noisy-channel framework, exploiting various Japanese language models to drive the decipherment. We describe several innovations, including a new objective function for searching for the highestscoring decipherment. We include empirical studies of the relevant phenomena, and we give improved decipherment accuracy rates.
Abstract Shape-Free Statistical Information in Optical Character Recognition
"... The fundamental task facing Optical Character Recognition (OCR) systems involves the conversion of input document images into corresponding sequences of symbolic character codes. Traditionally, this has been accomplished in a bottom-up fashion: the image of each symbol is isolated, then classified b ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The fundamental task facing Optical Character Recognition (OCR) systems involves the conversion of input document images into corresponding sequences of symbolic character codes. Traditionally, this has been accomplished in a bottom-up fashion: the image of each symbol is isolated, then classified based on its pixel intensities. While such shape-based classifiers are initially trained on a wide array of fonts, they still tend to perform poorly when faced with novel glyph shapes. In this thesis, we attempt to bypass this problem by pursuing a top-down “codebreaking ” approach. We assume no a priori knowledge of character shape, instead relying on statistical information and language constraints to determine an appropriate character mapping. We introduce and contrast three new top-down approaches, and present experimental results on several real and synthetic datasets. Given sufficient amounts of data, our font and shape independent approaches are shown to perform about as well as shape-based classifiers. ii Acknowledgements First and foremost, I would like to thank my supervisor Sam Roweis for his tireless sup-

