Results 1  10
of
29
Learning to Cooperate via Policy Search
, 2000
"... Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcementlearning algorithms, such as variants of Qlearning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Poli ..."
Abstract

Cited by 133 (4 self)
 Add to MetaCart
(Show Context)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcementlearning algorithms, such as variants of Qlearning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to valuebased methods for partially observable environments. In this paper, we provide a gradientbased distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain. 1 INTRODUCTION The interaction of decision makers who share an environment is traditionally studied in game theory and economics. The game theoretic formalism is very general, and analyzes the problem in terms of solution concepts such as Nash equilibrium [12], but usually works under the assu...
Learning a Local Similarity Metric for CaseBased Reasoning
 In International Conference on CaseBased Reasoning (ICCBR95
, 1995
"... . This paper presents a new class of local similarity metrics, called AASM, that are not symmetric and that can be adopted as the basic retrieval method in a CBR system. An anytime learning procedure is also introduced that, starting from an initial set of stored cases, improves the retrieval accura ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
(Show Context)
. This paper presents a new class of local similarity metrics, called AASM, that are not symmetric and that can be adopted as the basic retrieval method in a CBR system. An anytime learning procedure is also introduced that, starting from an initial set of stored cases, improves the retrieval accuracy by modifying the local definition of the metric. The learning procedure is a reinforcement learning algorithm and can be run as a black box since no particular setting is required. With the aid of classical test sets it is shown that AASM can improve in many cases the accuracy of both nearest neighbour methods and Salzberg's NGE. Moreover, AASM can achieve significant data compression (10%) while maintainig the same accuracy as NN. 1 Introduction Classification methods based on nearest neighbor (NN) have many advantages compared with other classification techniques. First of all, NN supports incremental learning from new cases without degradation in performance on previous training data....
Reinforcement Learning by Policy Search
, 2000
"... One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are know ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policya mapping of observations into actionsbased on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multiagent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience reuse. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
Simulation Study of Multiple Intelligent Vehicle Control Using Stochastic Learning Automata
 IEEE Transactions on Systems, Man and Cybernetics  Part A : Systems and Humans
, 1997
"... An intelligent controller is described for an automated vehicle planning its trajectory based on sensor and communication data received. The intelligent controller is designed using a stochastic learning automaton. Using the data received from onboard sensors, two automata (for lateral and longitud ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
An intelligent controller is described for an automated vehicle planning its trajectory based on sensor and communication data received. The intelligent controller is designed using a stochastic learning automaton. Using the data received from onboard sensors, two automata (for lateral and longitudinal actions) are capable of learning the best possible actions to avoid collisions. The system has the advantage of being able to work in unmodeled stochastic environments. Computer simulation is a way to test the effectiveness of the learning automata method because the system becomes highly complex because of the presence of a large number of vehicles. Simulations for simultaneous lateral and longitudinal control of a vehicle using this method provide encouraging results. Multiple vehicle simulations are also given, and the resulting complexity is discussed. The analysis of the situations is made possible by the study of the interacting rewardpenalty mechanisms in individual vehicles. Simple scenarios consisting of multiple vehicles are defined as collections of discrete states, and each state is treated as a game of automata. The definition of the physical environment as a series of discrete state transitions associated with a "stationary automata environment" is the key to this analysis and to the design of the intelligent controller. The aim is to obtain the necessary and sufficient rules for state transitions to reach the goal state.
Learning automatabased algorithms for finding minimum weakly connected dominating set in stochastic graphs
 INT. J. UNCERTAIN. FUZZINESS KNOWL.BASED SYST
, 2010
"... A weakly connected dominating set (WCDS) of graph G is a subset of G so that the vertex set of the given subset and all vertices with at least one endpoint in the subset induce a connected subgraph of G. The minimum WCDS (MWCDS) problem is known to be NPhard, and several approximation algorithms h ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
A weakly connected dominating set (WCDS) of graph G is a subset of G so that the vertex set of the given subset and all vertices with at least one endpoint in the subset induce a connected subgraph of G. The minimum WCDS (MWCDS) problem is known to be NPhard, and several approximation algorithms have been proposed for solving MWCDS in deterministic graphs. However, to the best of our knowledge no work has been done on finding the WCDS in stochastic graphs. In this paper, a definition of the MWCDS problem in a stochastic graph is first presented and then several learning automatabased algorithms are proposed for solving the stochastic MWCDS problem where the probability distribution function of the weight associated with the graph vertices is unknown. The proposed algorithms significantly reduce the number of samples needs to be taken from the vertices of the stochastic graph. It is shown that by a proper choice of the parameters of the proposed algorithms, the probability of finding the MWCDS is as close to unity as possible. Experimental results show the major superiority of the proposed algorithms over the standard sampling method in terms of the sampling rate.
Internal and External Forces in Language Change
, 2000
"... If every productive form of linguistic expression can be described by some idealized human grammar, an individuals's variable linguistic behavior (Weinreich, Labov, & Herzog, 1968) can be modeled as a statistical distribution of multiple idealized grammars. The distribution of grammars is d ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
If every productive form of linguistic expression can be described by some idealized human grammar, an individuals's variable linguistic behavior (Weinreich, Labov, & Herzog, 1968) can be modeled as a statistical distribution of multiple idealized grammars. The distribution of grammars is determined by the interaction between the biological constraints on human grammar and the properties of linguistic data in the environment during the course of language acquisition. Such interaction can be formalized precisely and quantitatively in a mathematical model of language learning. Consequently, we model language change as the change in grammar distribution over time, which can be related to the statistical properties of historical linguistic data. As an empirical test, we apply the proposed model to explain the loss of the verbsecond phenomenon in Old French and Old English based on corpus studies of historical texts.
A Selectionist Theory of Language Acquisition
, 1999
"... This paper argues that developmental patterns in child language be taken seriously in computational models of language acquisition, and proposes a forreal theory that meets this criterion. We first present developmental facts that are problematic for statistical learning approaches which assume no p ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper argues that developmental patterns in child language be taken seriously in computational models of language acquisition, and proposes a forreal theory that meets this criterion. We first present developmental facts that are problematic for statistical learning approaches which assume no prior knowledge of grammar, and for traditional learnability models which assume the learner moves from one UGdefined grammar to another. In contrast, we view language acquisition as a population of grammars associated with 'weights", that compete in a Darwinian selectionist process. Selection is made possible by the variationa! properties of individual grammars; specifically, their differential compatibility with the primary linguistic data in the environment. In addition to a convergence proof, we present empirical evidence in child language development, that a learner is best modeled as multiple grammars in coexistence and competition.
Markov Random Fields and Neural Networks with Applications to Early Vision Problems
, 1991
"... The current resurgence of interest in Neural Networks has opened up several basic issues. In this chapter, we explore the connections between this area and Markov Random Fields. We are specifically concerned with early vision problems which have already benefited from a parallel and distributed comp ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The current resurgence of interest in Neural Networks has opened up several basic issues. In this chapter, we explore the connections between this area and Markov Random Fields. We are specifically concerned with early vision problems which have already benefited from a parallel and distributed computing perspective. We explore the relationships between the two fields at two different levels of a computational approach. Applications highlighting specific instances where ideas from the two approaches intertwine are discussed.
Learning an Asymmetric and Anisotropic Similarity Metric for CaseBased Reasoning
 AI Review: Special Issue on Lazy Learning
, 1995
"... this paper we introduce a novel approach to compute nearest neighbour based on a local metric which we call AASM (asymmetric anisotropic similarity metric). In this approach we make two basic assumptions. The first one (anisotropic) states that the metric is defined locally: the space around a trial ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
this paper we introduce a novel approach to compute nearest neighbour based on a local metric which we call AASM (asymmetric anisotropic similarity metric). In this approach we make two basic assumptions. The first one (anisotropic) states that the metric is defined locally: the space around a trial case is measured using the metric attached to that case. The second one (asymmetric) states that the distance between two points in a continuous feature space F i is not symmetric, i.e.,
A Formal Theory of Language Development
 In Proceedings of 37th Meeting of the Association for Computational Linguistics
, 1999
"... This paper presents a formal model of language acquisition that, in addition to meeting the traditional learnability condition, makes timecourse predictions about child language development. We view language acquisition as a population of human grammars, made available by UG, compete in a selecti ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This paper presents a formal model of language acquisition that, in addition to meeting the traditional learnability condition, makes timecourse predictions about child language development. We view language acquisition as a population of human grammars, made available by UG, compete in a selectionist process. Selection is made possible by the variational properties of individual grammars; specifically, their di#erential compatibility with the primary linguistic data in the environment. We first present developmental facts that are problematic for both a statistical learning approach and traditional learnability models. We then propose a formal model of acquisition in the variational framework, and derive some convergence results. The model receives empirical support from developmental evidence in child language, including the subject drop phenomenon, that a learner is best modeled as a population of grammars in coexistence and competition, rather than a single grammar suppl...