Results 1  10
of
14
Markov games as a framework for multiagent reinforcement learning
 IN PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1994
"... In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior ..."
Abstract

Cited by 500 (10 self)
 Add to MetaCart
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Qlearninglike algorithm for finding optimal policies and demonstrates its application to a simple twoplayer game in which the optimal policy is probabilistic.
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 177 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
On the Existence and Synthesis of Multifinger Positive Grips
, 1987
"... this paper has been supported by Office of Naval Research Grant N0001482 K0381, National Science Foundation Grant No. NSFDCR8320085, and by grants from the Digital Equipment Corporation, and the IBM Corporation. 2 1 Introduction ..."
Abstract

Cited by 135 (15 self)
 Add to MetaCart
this paper has been supported by Office of Naval Research Grant N0001482 K0381, National Science Foundation Grant No. NSFDCR8320085, and by grants from the Digital Equipment Corporation, and the IBM Corporation. 2 1 Introduction
Solution of NonLinear Ordinary Differential equations by Feedforward Neural Networks
 Mathematical and Computer Modelling
, 1994
"... ABSTRACT It is demonstrated, through theory and numerical examples, how it is possible to directly construct a feedforward neural network to approximate nonlinear ordinary differential equations without the need for training. The method, utilizing a piecewise linear map as the activation function, i ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
ABSTRACT It is demonstrated, through theory and numerical examples, how it is possible to directly construct a feedforward neural network to approximate nonlinear ordinary differential equations without the need for training. The method, utilizing a piecewise linear map as the activation function, is linear in storage, and the L2 norm of the network approximation error decreases monotonically with the increasing number of hidden layer neurons. The construction requires imposing certain constraints on the values of the input, bias, and output weights, and the attribution of certain roles to each of these parameters. All results presented used the piecewise linear activation function. However, the presented approach should also be applicable to the use of hyperbolic tangents, sigmoids, and radial basis functions.
COARSE EMBEDDABILITY INTO BANACH SPACES
, 802
"... Abstract. The main purposes of this paper are (1) To survey the area of coarse embeddability of metric spaces into Banach spaces, and, in particular, coarse embeddability of different Banach spaces into each other; (2) To present new results on the problems: (a) Whether coarse nonembeddability into ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. The main purposes of this paper are (1) To survey the area of coarse embeddability of metric spaces into Banach spaces, and, in particular, coarse embeddability of different Banach spaces into each other; (2) To present new results on the problems: (a) Whether coarse nonembeddability into ℓ2 implies presence of expanderlike structures? (b) To what extent ℓ2 is the most difficult space to embed into? 1.
Numerical Solution Of A Calculus Of Variations Problem Using The Feedforward Neural Network Architecture
 Advances in Engineering Software
, 1996
"... It is demonstrated, through theory and numerical example, how it is possible to construct directly and noniteratively a feedforward neural network to solve a calculus of variations problem. The method, using the piecewise linear and cubic sigmoid transfer functions, is linear in storage and processi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
It is demonstrated, through theory and numerical example, how it is possible to construct directly and noniteratively a feedforward neural network to solve a calculus of variations problem. The method, using the piecewise linear and cubic sigmoid transfer functions, is linear in storage and processing time. The L 2 norm of the network approximation error decreases quadratically with the piecewise linear transfer function and quartically with the piecewise cubic sigmoid as the number of hidden layer neurons increases. The construction requires imposing certain constraints on the values of the input, bias, and output weights, and the attribution of certain roles to each of these parameters. All results presented used the piecewise linear and cubic sigmoid transfer functions. However, the noniterative approach should also be applicable to the use of hyperbolic tangents and radial basis functions. NUMERICAL SOLUTION OF A CALCULUS OF VARIATIONS PROBLEM USING THE FEEDFORWARD NEURAL NETWORK...
Models of Migration by Age and Spatial Structures
, 2004
"... The paper provides a unified perspective on the modeling of migration. In applied research, the level of migration is measured in different ways: number of migrants during a period of time, the share of migrants in a population, and the rate of migration. The different measures are related in some w ..."
Abstract
 Add to MetaCart
The paper provides a unified perspective on the modeling of migration. In applied research, the level of migration is measured in different ways: number of migrants during a period of time, the share of migrants in a population, and the rate of migration. The different measures are related in some way. A unified perspective should encompass
A New Approach for Estimation of Eigenvalues of Images
"... In this paper, a new approach for estimation of eigenvalues of images is presented. The proposed approach is based on the Gerschgorin’s circles theorem. This approach is more efficient as there is no need of calculation of all real eigenvalues. It is also helpful for all type of images where the cal ..."
Abstract
 Add to MetaCart
In this paper, a new approach for estimation of eigenvalues of images is presented. The proposed approach is based on the Gerschgorin’s circles theorem. This approach is more efficient as there is no need of calculation of all real eigenvalues. It is also helpful for all type of images where the calculation of eigenvalues may be impractical. More importantly, anyone can come to the conclusion by visual inspection as it is a graphical method. The estimation of eigenvalues can be used to extract the important information of images for various applications.
Markov Game Controller Design Algorithms
"... Abstract—Markov games are a generalization of Markov decision process to a multiagent setting. Twoplayer zerosum Markov game framework offers an effective platform for designing robust controllers. This paper presents two novel controller design algorithms that use ideas from gametheory literatu ..."
Abstract
 Add to MetaCart
Abstract—Markov games are a generalization of Markov decision process to a multiagent setting. Twoplayer zerosum Markov game framework offers an effective platform for designing robust controllers. This paper presents two novel controller design algorithms that use ideas from gametheory literature to produce reliable controllers that are able to maintain performance in presence of noise and parameter variations. A more widely used approach for controller design is the H∞ optimal control, which suffers from high computational demand and at times, may be infeasible. Our approach generates an optimal control policy for the agent (controller) via a simple Linear Program enabling the controller to learn about the unknown environment. The controller is facing an unknown environment, and in our formulation this environment corresponds to the behavior rules of the noise modeled as the opponent. Proposed controller architectures attempt to improve controller reliability by a gradual mixing of algorithmic approaches drawn from the game theory literature and the MinimaxQ Markov game solution approach, in a reinforcementlearning framework. We test the proposed algorithms on a simulated Inverted Pendulum Swingup task and compare its performance against standard Q learning.