Results 1 -
6 of
6
Minimum Message Length and Kolmogorov Complexity
- Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract
-
Cited by 86 (20 self)
- Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465]
Unan algorithm for the unsupervised learning of morphology. «Natural Language Engineering
, 2006
"... This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith 2001 and has ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith 2001 and has been implemented in software that is available for downloading and testing. 1. Scope of this paper This paper describes in detail an algorithm used for the unsupervised learning of natural language morphology which works well for European languages and other languages in which the average number of morphemes per word is not too high. 1 It has been implemented and tested in Linguistica, and is based on the theoretical principles described in Goldsmith 2001. The present paper describes that framework briefly, but the reader is referred there for a more careful development. The executable for this program, and the source code as well, is available at
PFSA Modelling of Behavioural Sequences by Evolutionary Programming
"... Behavioural observations can often be described as a sequence of symbols drawn from a finite alphabet. However the inductive inference of such strings by any automated technique to produce models of the data is a non-trivial task. This paper considers modelling of behavioural data using probabilisti ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Behavioural observations can often be described as a sequence of symbols drawn from a finite alphabet. However the inductive inference of such strings by any automated technique to produce models of the data is a non-trivial task. This paper considers modelling of behavioural data using probabilistic finite state automata (PFSAs). There are a number of information-theoretic techniques for evaluating possible hypotheses. The measure used in this paper is the Minimum Message Length (MML) of Wallace. Although attempts have been made to construct PFSA models by incremental addition of sub-strings using heuristic rules and the MML to give the lowest information cost, the resultant models cannot be shown to be globally optimal. Fogel's Evolutionary Programming can produce globally optimal PFSA models by evolving data structures of arbitrary complexity without the requirement to encode the PFSA into binary strings as in Genetic Algorithms. However, evaluation of PFSAs during the evolution pro...
INDUCTIVE INFERENCE BY USING INFORMATION COMPRESSION
"... Inductive inference is of central importance to all scientific inquiries. Automating the process of inductive inference is the major concern of machine learning researchers. This article proposes inductive inference techniques to address three inductive problems: (1) how to automatically construct a ..."
Abstract
- Add to MetaCart
Inductive inference is of central importance to all scientific inquiries. Automating the process of inductive inference is the major concern of machine learning researchers. This article proposes inductive inference techniques to address three inductive problems: (1) how to automatically construct a general description, a model, or a theory to describe a sequence of observations or experimental data, (2) how to modify an existing model to account for new observations, and (3) how to handle the situation where the new observations are not consistent with the existing models. The techniques proposed in this article implement the inductive principle called the minimum descriptive length principle and relate to Kolmogorov complexity and Occam’s razor. They employ finite state machines as models to describe sequences of observations and measure the descriptive complexity by measuring the number of states. They can be used to draw inference from sequences of observations where one observation may depend on previous observations. Thus, they can be applied to time series prediction problems and to one-to-one mapping problems. They are implemented to form an automated inductive machine. Key words: finite state machine, inductive inference, Kolmogorov complexity, learning mechanism, minimum descriptive length, Occam’s razor.
A Complexity Measure for Diachronic Chinese Phonology
- In Proceedings of the SIGPHON97 workshop on computational linguistics at the ACL'97/EACL'97 joint conference
, 1997
"... This paper addresses the problem of deriving distance measures between parent and daughter languages with specific relevance to historical Chinese phonology. The diachronic relationship between the languages is modelled as a Probabilistic Fi- nite State Automaton. The Minimum Mes- sage Length ..."
Abstract
- Add to MetaCart
This paper addresses the problem of deriving distance measures between parent and daughter languages with specific relevance to historical Chinese phonology. The diachronic relationship between the languages is modelled as a Probabilistic Fi- nite State Automaton. The Minimum Mes- sage Length principle is then employed to find the complexity of this structure. The idea is that this measure is representative of the amount of dissimilarity between the two languages.
Prospex:ProtocolSpecificationExtraction
"... Protocol reverse engineering is the process of extracting application-level specifications for network protocols. Such specificationsare very useful in a numberof security-related contexts, forexample, to perform deep packet inspectionand black-box fuzzing, or to quickly understand custom botnet com ..."
Abstract
- Add to MetaCart
Protocol reverse engineering is the process of extracting application-level specifications for network protocols. Such specificationsare very useful in a numberof security-related contexts, forexample, to perform deep packet inspectionand black-box fuzzing, or to quickly understand custom botnet command and control (C&C) channels. Since manual reverse engineering is a time-consuming and tedious process, a number of systems have been proposed that aim to automate this task. These systems either analyze network traffic directly or monitor the execution of the application that receivestheprotocolmessages.While previoussystemsshow thatprecise message formatscanbe extractedautomatically, they do not provide a protocol specification. The reason is that they do not reverse engineerthe protocol state machine. In this paper, we focus on closing this gap by presenting a system that is capable of automatically inferring state machines. This greatly enhances the results of automatic protocol reverse engineering, while further reducing the need for human interaction. We extend previous work that focuses on behavior-based message format extraction, and introduce techniques for identifying and clustering different types of messages not only based on their structure, but also accordingto the impact of each message on server behavior. Moreover, we present an algorithm for extracting the state machine. We have applied our techniques to a number of real-world protocols, including the command and control protocol used by a malicious bot. Our results demonstrate that we are able to extract format specifications for different types of messages and meaningful protocol state machines. We use these protocol specifications to automatically generate input for a stateful fuzzer, allowing us to discover security vulnerabilities in real-world applications. 1.

