Results 1 - 10
of
28
Learning policies for partially observable environments: Scaling up
, 1995
"... Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optim ..."
Abstract
-
Cited by 202 (10 self)
- Add to MetaCart
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of finding near-optimal policies for a selection of extremely small pomdp's taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains. 1 Introduction Mobile robots must act on the basis of thei...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract
-
Cited by 158 (7 self)
- Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a long-run measure of reward, and "I" is an automated planning or learning system (agent). In particular,
NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models
, 1998
"... Animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physics-based models to produce desired animations usually entails formidable computational cost. This paper de ..."
Abstract
-
Cited by 78 (3 self)
- Add to MetaCart
Animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physics-based models to produce desired animations usually entails formidable computational cost. This paper demonstrates the possibility of replacing the numerical simulation and control of model dynamics with a dramatically more efficient alternative. In particular, we propose the NeuroAnimator, a novel approach to creating physically realistic animation that exploits neural networks. NeuroAnimators are automatically trained off-line to emulate physical dynamics through the observation of physics-based models in action. Depending on the model, its neural network emulator can yield physically realistic animation one or two orders of magnitude faster than conventional numerical simulation. Furthermore, by exploiting the network structure of the NeuroAnimator, we introduce a fast algorithm for learning controllers that enables either physics-based models or their neural network emulators to synthesize motions satisfying prescribed animation goals. We demonstrate NeuroAnimators for passive and active (actuated) rigid body, articulated, and deformable physics-based models.
Improving the Rprop Learning Algorithm
- PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON NEURAL COMPUTATION (NC 2000)
, 2000
"... The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing first-order learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks a ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing first-order learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.
Ant colony optimization and stochastic gradient descent
- Artificial Life
, 2002
"... In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of aco algorithms. We then use this insight to explore the mutual contributions of the two techniques.
Using Kohonen’s selforganizing feature map to uncover automobile bodily injury claims fraud
- The Journal of Risk and Insurance
, 1998
"... Claims fraud is an increasingly vexing problem confronting the insurance industry. In this empirical study, we apply Kohonen's Self-Organizing Feature Map to classify automobile bodily injury (BI) claims by the degree of fraud suspicion. Feed forward neural networks and a back propagation algorithm ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Claims fraud is an increasingly vexing problem confronting the insurance industry. In this empirical study, we apply Kohonen's Self-Organizing Feature Map to classify automobile bodily injury (BI) claims by the degree of fraud suspicion. Feed forward neural networks and a back propagation algorithm are used to investigate the validity of the Feature Map approach. Comparative experiments illustrate the potential usefulness of the proposed methodology. We show that this technique performs better than both an insurance adjuster's fraud assessment and an insurance investigator's fraud assessment with respect to consistency and reliability. INTRODUCTION AND BACKGROUND One vexing problem confronting the property-casualty insurance industry is claims fraud. Individuals and conspiratorial rings of claimants and providers unfortunately can and do manipulate the claim processing system for their own undeserved benefit (Derrig and Ostaszewski, 1994; Cummins and Tennyson, 1992). The
On Supervised Learning From Sequential Data With Applications For Speech Recognition
, 1999
"... visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In this synthetic example, the one-dimensional target data would be represented poorly by a uni-modal Gaussian distribution with a constant variance (which corresponds to using the squared-error objective function), which would average the two separate branches, indicated by the fat lines as the mean and constant variance of the single Gaussian. Compare this figure with Figure 3.10, Figure 3.11 and Figure 3.12 to see a subsequent improvement of the model.
Automatic Recognition of Cortical Sulci of the Human Brain Using a Congregation of Neural Networks
- Elsevier, Medical Image Analysis
, 2002
"... This paper describes a complete system allowing automatic recognition of the main sulci of the human cortex. This system relies on a preprocessing of magnetic resonance images leading to abstract structural representations of the cortical folding patterns. The representation nodes are cortical folds ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
This paper describes a complete system allowing automatic recognition of the main sulci of the human cortex. This system relies on a preprocessing of magnetic resonance images leading to abstract structural representations of the cortical folding patterns. The representation nodes are cortical folds, which are given a sulcus name by a contextual pattern recognition method. This method can be interpreted as a graph matching approach, which is driven by the minimization of a global function made up of local potentials. Each potential is a measure of the likelihood of the labelling of a restricted area. This potential is given by a multi-layer perceptron trained on a learning database. A base of 26 brains manually labelled by a neuroanatomist is used to validate our approach. The whole system developed for the right hemisphere is made up of 265 neural networks. The mean recognition rate is 86% for the learning base and 76% for a generalization base, which is very satisfying considering the current weak understanding of the variability of the cortical folding patterns.
Vehicle Traffic Light Control Using SARSA
- Online]. Available: citeseer.ist.psu.edu/thorpe97vehicle.html
, 1997
"... SARSA (Sutton, 1996) is applied to a simulated, traffic-light control problem (Thorpe, 1997) and its performance is compared with several, fixed control strategies. The performance of SARSA with four different representations of the current state of traffic is analyzed using two reinforcement scheme ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
SARSA (Sutton, 1996) is applied to a simulated, traffic-light control problem (Thorpe, 1997) and its performance is compared with several, fixed control strategies. The performance of SARSA with four different representations of the current state of traffic is analyzed using two reinforcement schemes. Training on one intersection is compared to, and is as effective as training on all intersections in the environment. SARSA is shown to be better than fixed-duration light timing and four-way stops for minimizing total traffic travel time, individual vehicle travel times, and vehicle wait times. Comparisons of performance using a constant reinforcement function versus a variable reinforcement function dependent on the number of vehicles at an intersection showed that the variable reinforcement resulted in slightly improved performance for some cases. 1. Introduction A variety of traffic control strategies are being studied in real traffic networks and in simulation. The Denver Regional Co...
Simulations Combining Evolution and Learning
- In Adaptive Individuals in Evolving Populations: Models and Algorithms: Santa Fe Institute Studies in the Sciences of Complexity
, 1995
"... This paper addresses the issue of how computational versions of learning and evolution have been made to interact in simulated systems. It examines various benefits of such combinations and details how supervised learning, reinforcement learning, and unsupervised learning can be adapted to fit into ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper addresses the issue of how computational versions of learning and evolution have been made to interact in simulated systems. It examines various benefits of such combinations and details how supervised learning, reinforcement learning, and unsupervised learning can be adapted to fit into an evolutionary framework. 2 Evolution and Learning

