Results 21  30
of
243
Alopex: a correlationbased learning algorithm for feedforward and recurrent neural networks
 Neural Computation
, 1994
"... We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, an ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, and does not explicitly depend on the functional form of the error measure. Hence, it can be used in networks with arbitrary transfer functions and for minimizing a large class of error measures. The learning algorithm is the same for feedforward and recurrent networks. All the weights in a network are updated simultaneously, using only local computations. This allows complete parallelization of the algorithm. The algorithm is stochastic and it uses a ‘temperature ’ parameter in a manner similar to that in simulated annealing. A heuristic ‘ annealing schedule ’ is presented which is effective in finding global minima of error surfaces. In this paper, we report extensive simulation studies illustrating these advantages and show that learning times are comparable to those for standard gradient descent methods. Feedforward networks trained with Alopex are used to solve the MONK’s problems and symmetry problems. Recurrent networks trained with the same algorithm are used for solving temporal XOR problems. Scaling properties of the algorithm are demonstrated using encoder problems of different sizes and advantages of appropriate error measures are illustrated using a variety of problems.
Exploration and Inference in Learning from Reinforcement
, 1997
"... Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling explorati ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis presents a number of novel extensions to existing techniques for controlling exploration and inference in reinforcement learning. First I distinguish between the well known explorationexploitation tradeoff and what I term exploration for future exploitation. It is argued that there are many tasks where it is more appropriate to maximise this latter measure. In particular it is appropriate when we want to employ learning algorithms as part of the process of designing a controller. Informed by this insight I develop a number of novel measures of the agent's task knowledge. The first of these is a measure of the probability of a particular course of action being the optimal course of action. Estimators are developed for this measure for boolean and nonboolean processes. These...
SociableSense: Exploring the Tradeoffs of Adaptive Sampling and Computation Offloading for Social Sensing
 In Proc. of MobiCom’11. ACM
, 2011
"... The interactions and social relations among users in workplaces have been studied by many generations of social psychologists. There is evidence that groups of users that interact more in workplaces are more productive. However, it is still hard for social scientists to capture finegrained data abo ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
The interactions and social relations among users in workplaces have been studied by many generations of social psychologists. There is evidence that groups of users that interact more in workplaces are more productive. However, it is still hard for social scientists to capture finegrained data about phenomena of this kind and to find the right means to facilitate interaction. It is also difficult for users to keep track of their level of sociability with colleagues. While mobile phones offer a fantastic platform for harvesting long term and fine grained data, they also pose challenges: battery power is limited and needs to be tradedoff for sensor reading accuracy and data transmission, while energy costs in processing computationally intensive tasks are high. In this paper, we propose SociableSense, a smart phones based platform that captures user behavior in office environments, while providing the users with a quantitative measure of their sociability and that of colleagues. We tackle the technical challenges of building such a tool: the system provides an adaptive sampling mechanism as well as models to decide whether to perform computation of tasks, such as the execution of classification and inference algorithms, locally or remotely. We perform several microbenchmark tests to finetune and evaluate the performance of these mechanisms and we show that the adaptive sampling and computation distribution schemes balance tradeoffs among accuracy, energy, latency, and data traffic. Finally, by means of a social psychological study with ten participants for two working weeks, we demonstrate that SociableSense fosters interactions among the participants and helps in enhancing their sociability.
Ant colony optimization and stochastic gradient descent
 Artificial Life
, 2002
"... In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empirical aco algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of aco algorithms. We then use this insight to explore the mutual contributions of the two techniques.
LargeScale Dynamic Optimization Using Teams of Reinforcement Learning Agents
, 1996
"... Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state trans ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multiagent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algor...
A cellular learning automata based clustering algorithm for wireless sensor networks
 Sensor Letters
, 2008
"... In the first part of this paper, we propose a generalization of cellular learning automata (CLA) called irregular cellular learning automata (ICLA) which removes the restriction of rectangular grid structure in traditional CLA. In the second part of the paper, based on the proposed model a new clust ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
In the first part of this paper, we propose a generalization of cellular learning automata (CLA) called irregular cellular learning automata (ICLA) which removes the restriction of rectangular grid structure in traditional CLA. In the second part of the paper, based on the proposed model a new clustering algorithm for sensor networks is designed. The proposed clustering algorithm is fully distributed and the nodes in the network don't need to be fully synchronized with each other. The proposed clustering algorithm consists of two phases; initial clustering and reclustering. Unlike existing methods in which the reclustering phase is performed periodically on the entire network, reclustering phase in the proposed method is performed locally whenever it is needed. This results in a reduction in the consumed energy for reclustering phase and also allows reclustering phase to be performed as the network operates. The proposed clustering method in comparison to existing methods produces a clustering in which each cluster has higher number of nodes and higher residual energy for the cluster head. Local reclustering, higher residual energy in cluster heads and higher number of nodes in each cluster results in a network with longer lifetime. To evaluate the performance of the proposed algorithm several experiments have been conducted. The results of experiments have shown that the proposed clustering algorithm outperforms existing clustering methods in terms of quality of clustering measured by the total number of clusters, the number of sparse clusters and the remaining energy level of the cluster heads. Experiments have also shown that the proposed clustering algorithm in comparison to other existing methods prolongs the network lifetime.
Reinforcement learning with immediate rewards and linear hypotheses
 Algorithmica
, 2003
"... Abstract. We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when • the consequence of a given action is felt immediately, and • a linear function, which is unknown a priori, (approximately) relates ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Abstract. We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when • the consequence of a given action is felt immediately, and • a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuousvalued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binaryvalued rewards is obtained. For these cases we provide bounds on the pertrial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.
Coordination of Multiple Mobile Robots via Communication
 PROC. SPIE'98, MOBILE ROBOTS XIII CONFERENCE
, 1998
"... Research on the coordination of multiple mobile robots has to address three main problems: (i) how to appropriately divide the functionality of the system into multiple robots, (ii) how to manage the dynamic configuration of the system, and (iii) how to realise cooperation behaviour. This paper wi ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
Research on the coordination of multiple mobile robots has to address three main problems: (i) how to appropriately divide the functionality of the system into multiple robots, (ii) how to manage the dynamic configuration of the system, and (iii) how to realise cooperation behaviour. This paper will concentrate on the third aspect. More specifically, the aim of our research is to develop a team of coordinating mobile robots via effective communication for real world applications. We will describe the methodology to achieve cooperative behaviour, the experimental mobile robots developed, and potential application areas. The developed system is demonstrated by two examples such as flocking and shared experience learning.
A NEW VERTEX COLORING ALGORITHM BASED ON VARIABLE ACTIONSET LEARNING AUTOMATA
"... Abstract. In this paper, we propose a learning automatabased iterative algorithm for approximating a near optimal solution to the vertex coloring problem. Vertex coloring is a wellknown NPhard optimization problem in graph theory in which each vertex is assigned a color so that no two adjacent ve ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
Abstract. In this paper, we propose a learning automatabased iterative algorithm for approximating a near optimal solution to the vertex coloring problem. Vertex coloring is a wellknown NPhard optimization problem in graph theory in which each vertex is assigned a color so that no two adjacent vertices have the same color. Each iteration of the proposed algorithm is subdivided into several stages, and at each stage a subset of the uncolored non adjacent vertices are randomly selected and assigned the same color. This process continues until no more vertices remain uncolored. As the proposed algorithm proceeds, taking advantage of the learning automata the number of stages per iteration and so the required number of colors tends to the chromatic number of the graph since the number of vertices which are colored at each stage is maximized. To show the performance of the proposed algorithm we compare it with several existing vertex coloring algorithms in terms of the time and the number of colors required for coloring the graphs. The obtained results show the superiority of the proposed algorithm over the others.
Reinforcement Learning for LongRun Average Cost
, 2004
"... A large class of sequential decisionmaking problems undl uncertainty can bemodB3z as Markovand semiMarkovdrkov4B problems (SMDPs), when theirund4LzBII probability structure has a Markov chain. They may be solved by using classicaldassic programming (DP)methodV However, DPmethod su#er from ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
A large class of sequential decisionmaking problems undl uncertainty can bemodB3z as Markovand semiMarkovdrkov4B problems (SMDPs), when theirund4LzBII probability structure has a Markov chain. They may be solved by using classicaldassic programming (DP)methodV However, DPmethod su#er from thecurs of dimensORPOcG ( and breakdea rapidx in face of large statespaces. In addition,