Results 11  20
of
62
Learning Probabilistic Automata with Variable Memory Length
 In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory
, 1994
"... We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by real applications in manma ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by real applications in manmachine interaction such as handwriting and speech recognition. Conventionally used fixed memory Markov and hidden Markov models have either severe practical or theoretical drawbacks. Though general hardness results are known for learning distributions generated by sources with similar structure, we prove that our algorithm can indeed efficiently learn distributions generated by our more restricted sources. In Particular, we show that the KLdivergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made small with high confidence in polynomial time and sample complexity. We demonstrate the applicability of our algorithm by learni...
Predicting the Future of Discrete Sequences From Fractal Representations of the Past
, 2001
"... We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common sux shared by any two nblocks, the closer lie their point representations.
Learning bias and phonologicalrule induction
 Computational Linguistics
, 1996
"... A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approache ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domainindependent learning rule (Error BackPropagation, Instancebased Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data. In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English ( S P E)style phonological rules. We represent phonological rules as finitestate transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a generalpurpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments
Architectural Bias in Recurrent Neural Networks  Fractal Analysis
 IEEE Transactions on Neural Networks
, 1931
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finitestate transition diagram  a scenario that has been frequently considered in the past e.g. when studying RNNbased learning and implementation of regular grammars and finitestate transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as boxcounting and Hausdor# dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Modeling Interaction Using Learnt Qualitative SpatioTemoral Relations and Variable Length Markov Models
 Proc. of the 15 th European Conference on Artificial Intelligence, 2002
"... Motivated by applications such as automated visual surveillance and video monitoring and annotation, there has been a lot of interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. In this paper we present a novel approach for automatical ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
Motivated by applications such as automated visual surveillance and video monitoring and annotation, there has been a lot of interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. In this paper we present a novel approach for automatically inferring models of object interactions that can be used to interpret observed behaviour within a scene. A realtime lowlevel computer vision system, together with an attentional control mechanism, are used to identify incidents or events that occur in the scene. A data driven approach has been taken in order to automatically infer discrete and abstract representations (symbols) of primitive object interactions; effectively the system learns a set of qualitative spatial relations relevant to the dynamic behaviour of the domain. These symbols then form the alphabet of a VLMM which automatically infers the high level structure of typical interactive behaviour. The learnt behaviour model has generative capabilities and is also capable of recognizing typical or atypical activities within a scene. Experiments have been performed within the traffic monitoring domain; however the proposed method is applicable to the general automatic surveillance task since it does not assume a priori knowledge of a specific domain. 1
A Model of Facial Behaviour
 In IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, May 17
, 2002
"... We consider the problem of learning how a person’s face behaves in a long video sequence, with the aim of synthesising convincing sequences demonstrating the same behaviours. We describe a novel approach to segment a sequence into short sections, each representing a distinct action (or a part of an ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We consider the problem of learning how a person’s face behaves in a long video sequence, with the aim of synthesising convincing sequences demonstrating the same behaviours. We describe a novel approach to segment a sequence into short sections, each representing a distinct action (or a part of an action). These sections are grouped and a model of the variability of the action learnt. A variable length Markov model is trained on the sequence of such actions to learn the temporal relationships. The result is a system that can generate realistic sequences of an individual face.
Towards an Architecture for Cognitive Vision using Qualitative SparioTemporal Representations and Abduction
 In Spatial Cognition III
, 2002
"... In recent years there has been increasing interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. Purely quantitative approaches to the task of constructing such systems have met with some success. However, qualitative analysis of dyn ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
In recent years there has been increasing interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. Purely quantitative approaches to the task of constructing such systems have met with some success. However, qualitative analysis of dynamic scenes has the advantage of allowing easier generalisation of classes of different behaviours and guarding against the propagation of errors caused by uncertainty and noise in the quantitative data. Our aim is to integrate quantitative and qualitative modes of representation and reasoning for the analysis of dynamic scenes. In particular, in this paper we outline an approach for constructing cognitive vision systems using qualitative spatialtemporal representations including prototypical spatial relations and spatiotemporal event descriptors automatically inferred from input data. The overall architecture relies on abduction: the system searches for explanations, phrased in terms of the learned spatiotemporal event descriptors, to account for the video data.
Spatial Representation of Symbolic Sequences through Iterative Function Systems
 IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans
, 1998
"... Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the ..."
Abstract

Cited by 24 (11 self)
 Add to MetaCart
Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the statistical characteristics of the DNA sequences themselves. We 1) generalize Jeffrey's graphic representation to accommodate (possibly infinite) sequences over an arbitrary finite number of symbols, 2) establish a direct correspondence between the statistical characterization of symbolic sequences via R'enyi entropy spectra and the multifractal characteristics (R'enyi generalized dimensions) of the sequences' spatial representations, 3) show that for general symbolic dynamical systems, the multifractal f H  spectra in the sequence space coincide with the f H spectra on spatial sequence representations. Keywords Multifractal theory, Iterative function systems, Chaos game representation...
Passively Learning Finite Automata
, 1996
"... We provide a survey of methods for inferring the structure of a finite automaton from passive observation of its behavior. We consider both deterministic automata and probabilistic automata (similar to Hidden Markov Models). While it is computationally intractible to solve the general problem exactl ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
We provide a survey of methods for inferring the structure of a finite automaton from passive observation of its behavior. We consider both deterministic automata and probabilistic automata (similar to Hidden Markov Models). While it is computationally intractible to solve the general problem exactly, we will consider heuristic algorithms, and also special cases which are tractible. Most of the algorithms we consider are based on the idea of building a tree which encodes all of the examples we have seen, and then merging equivalent nodes to produce a (near) minimal automaton. Contents 1 Introduction 4 1.1 Applications of automaton inference : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Why PFAs instead of other probabilistic models? : : : : : : : : : : : : : : : : : : : : : : : : : 5 1.3 The input to/output from the algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 1.4 Batch vs. online algorithms : : : : : : : : : : : : : : : : : : : : : :...
Recurrent Neural Networks With Small Weights Implement Definite Memory Machines
 NEURAL COMPUTATION
, 2003
"... Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition funct ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition function of recurrent network with small weights and `squashing ' activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite mem