Results 1 - 10
of
12
Probabilistic reasoning as information compression by multiple alignment, unification and search: an introduction and overview
- Journal of Universal Computer Science
, 1996
"... ..."
Syntax, Parsing and Production of Natural Language in a Framework of Information Compression by Multiple Alignment, Unification and Search
, 2000
"... This article introduces the idea that information compression by multiple alignment, unification and search (ICMAUS) provides a framework within which natural language syntax may be represented in a simple format and the parsing and production of natural language may be performed in a transparent ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This article introduces the idea that information compression by multiple alignment, unification and search (ICMAUS) provides a framework within which natural language syntax may be represented in a simple format and the parsing and production of natural language may be performed in a transparent manner. In this context, multiple alignment has a meaning which is similar to its meaning in bio-informatics but with significant differences, while unification means a simple merging of matching patterns, a meaning which is related to but simpler than the meaning of that term in logic. The concept of search in the present context means search for alignments which are `good' in terms of information compression, using heuristic methods or arbitrary constraints (or both) to restrict the size of the search space. These concepts are embodied in a software model, SP61. The organisation and operation of the model are described and a simple example is presented showing how the model can achieve parsing of natural language. Notwithstanding the apparent paradox of `decompression by compression', the IC-
Unsupervised learning in a framework of information compression by multiple alignment, unification and search
- Artificial Intelligence Review
, 2003
"... This paper describes a novel approach to unsupervised learning that has been developed within a framework designed to integrate learning with such things as parsing and production of language, fuzzy pattern recognition and best-match information retrieval, class hierarchies with inheritance of attri ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper describes a novel approach to unsupervised learning that has been developed within a framework designed to integrate learning with such things as parsing and production of language, fuzzy pattern recognition and best-match information retrieval, class hierarchies with inheritance of attributes, probabilistic and exact forms of reasoning, and others. This framework, which may be characterised as information compression by multiple alignment, unification and search (ICMAUS), is founded on principles of Minimum Length Encoding. Some of its capabilities (other than learning) are briefly described. The main body of the paper describes SP70, a computer model of the ICMAUS framework that incorporates processes for unsupervised learning. Examples are presented to show how the model can infer plausible grammars from appropriate input. Anticipated future developments of the model are briefly discussed.
Computational Grammar Induction for Linguists
- Grammars
, 2004
"... In general a grammar describes a (possibly infinite) set of sentences with a finite structural description. Computational Grammar Induction (CGI) deals with the creation of computational models for identification of these infinite sets on the basis of a finite set of examples. CGI is a field in its ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In general a grammar describes a (possibly infinite) set of sentences with a finite structural description. Computational Grammar Induction (CGI) deals with the creation of computational models for identification of these infinite sets on the basis of a finite set of examples. CGI is a field in its own right, with its own internal research
Unsupervised grammar induction in a framework of information compression by multiple alignment, unification and search, in: C. de la
- Proceedings of the Workshop and Tutorial on Learning Context-Free Grammars
, 2003
"... Abstract. This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called information compression by multiple alignment, unification and search (ICM ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called information compression by multiple alignment, unification and search (ICMAUS), is founded on principles of Minimum Length Encoding pioneered by Solomonoff and others. Most of the paper describes SP70, a computer model of the ICMAUS framework that incorporates processes for unsupervised learning of grammars. An example is presented to show how the model can infer a plausible grammar from appropriate input. Limitations of the current model and how they may be overcome are briefly discussed. 1
The cruncher: Automatic concept formation using minimum description length
- In proceedings of the 6th International Symposium on Abstraction, Reformulation and Approximation (SARA 2005), Lecture Notes in Artificial Intelligence
"... Abstract. We present The Cruncher, a simple representation framework and algorithm based on minimum description length for automatically forming an ontology of concepts from attribute-value data sets. Although unsupervised, when The Cruncher is applied to an animal data set, it produces a nearly zoo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We present The Cruncher, a simple representation framework and algorithm based on minimum description length for automatically forming an ontology of concepts from attribute-value data sets. Although unsupervised, when The Cruncher is applied to an animal data set, it produces a nearly zoologically accurate categorization. We demonstrate The Cruncher’s utility for finding useful macro-actions in Reinforcement Learning, and for learning models from uninterpreted sensor data. We discuss advantages The Cruncher has over concept lattices and hierarchical clustering. 1
Parsing As Information Compression By Multiple Alignment, Unification And Search: SP52
- IN THIS ISSUE
, 1998
"... This article introduces the idea that parsing in the sense associated with computational linguistics and natural language processing may be understood as information compression by multiple alignment, unification and search (ICMAUS). In this context, `multiple alignment' has a meaning which is simil ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This article introduces the idea that parsing in the sense associated with computational linguistics and natural language processing may be understood as information compression by multiple alignment, unification and search (ICMAUS). In this context, `multiple alignment' has a meaning which is similar to its meaning in bio-informatics, while `unification' means a simple merging of matching patterns, a meaning which is related to but simpler than the meaning of that term in logic. This concept of parsing is embodied in a software model, SP52. The organisation and operation of the model are described with a simple example of what the model can do. An example is presented showing how the same theoretical framework and the same software model may support the production of sentences, not just analysis of sentences. The accompanying article (Wolff, 1988) presents some other, more realistic examples showing how syntax may be represented in the proposed formalism and how sentences may be pa...
The Power and Perils of MDL
"... Abstract — We point out a potential weakness in the application of the celebrated Minimum Description Length (MDL) principle for model selection. Specifically, it is shown that (although the index of the model class which actually minimizes a two-part code has many desirable properties) a model whic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — We point out a potential weakness in the application of the celebrated Minimum Description Length (MDL) principle for model selection. Specifically, it is shown that (although the index of the model class which actually minimizes a two-part code has many desirable properties) a model which has a shorter twopart code-length than another is not necessarily better (unless of course it achieves the global minimum). This is illustrated by an application to infer a grammar (DFA) from positive examples. We also analyze computability issues, and robustness under recoding of the data. Generally, the classical approach is inadequate to express the goodness-of-fit of individual models for individual data sets. In practice however, this is precisely what we are interested in: both to express the goodness of a procedure and where and how it can fail. To achieve this practical goal, we paradoxically have to use the, supposedly impractical, vehicle of Kolmogorov complexity. I.
Some Theoretical and Practical Results in Context-Sensitive and Adaptive Parsing
- Progress in Complexity, Information, and Design
, 2002
"... We introduce a fifth language accepting machine called the PDA-T, demonstrate some of its interesting formal properties, and show its role in the $-Calculus( Based upon this new machine and the $-Calculus' other properties, we demonstrate the $-Calculus' formal Turing Power, and then propose a forma ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce a fifth language accepting machine called the PDA-T, demonstrate some of its interesting formal properties, and show its role in the $-Calculus( Based upon this new machine and the $-Calculus' other properties, we demonstrate the $-Calculus' formal Turing Power, and then propose a formal language classification (the $-Hierarchy), derived largely from the Chomsky Hierarchy, but with a fifth class of language accepted by the PDA-T. We show that this modified hierarchy yields several conceptual benefits over the standard four machine Chomsky Hierarchy. We also provide some practical examples of the use of $-grammars in contextsensitive and semantic parsing.

