Results 1  10
of
10
Feature reinforcement learning: Part I. Unstructured MDPs
 Journal of General Artificial Intelligence
, 2009
"... www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
www.hutter1.net Generalpurpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and nonMarkovian. On the other hand, reinforcement learning is welldeveloped for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part
A Monte Carlo AIXI Approximation
 J. Artif. Intell. Res
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
A MonteCarlo AIXI Approximation
, 2009
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
Feature Markov Decision Processes
"... General purpose intelligent learning agents cycle through (complex,nonMDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is welldeveloped for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
General purpose intelligent learning agents cycle through (complex,nonMDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is welldeveloped for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in the companion article [Hut09].
One decade of universal artificial intelligence
 In Theoretical Foundations of Artificial General Intelligence
, 2012
"... The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book (Hutter, 2005), an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the awardwinning PhD thesis (Legg, 2008) provided the philosophical embedding and investigated the UAIbased universal measure of rational intelligence, which is formal, objective and nonanthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper (Veness et al. 2011). This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same
Avoiding Unintended AI Behaviors
"... Abstract: Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract: Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a twostage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.
NonlinearDynamical Attention Allocation via Information Geometry
"... Abstract. Inspired by a broader perspective viewing intelligent system dynamics in terms of the geometry of ”cognitive spaces, ” we conduct a preliminary investigation of the application of informationgeometry based learning to ECAN (Economic Attention Networks), the component of the integrative Op ..."
Abstract
 Add to MetaCart
Abstract. Inspired by a broader perspective viewing intelligent system dynamics in terms of the geometry of ”cognitive spaces, ” we conduct a preliminary investigation of the application of informationgeometry based learning to ECAN (Economic Attention Networks), the component of the integrative OpenCog AGI system concerned with attention allocation and credit assignment. We generalize Amari’s ”natural gradient” algorithm for network learning to encompass ECAN and other recurrent networks, and apply it to small example cases of ECAN, demonstrating a dramatic improvement in the effectiveness of attention allocation compared to prior (Hebbian learning like) ECAN methods. Scaling up the method to deal with realisticallysized ECAN networks as used in OpenCog remains for the future, but should be achievable using sparse matrix methods on GPUs.
Chapter 5 One Decade of Universal Artificial Intelligence
"... The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers ..."
Abstract
 Add to MetaCart
The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book [24] an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the awardwinning PhD thesis [38] provided the philosophical embedding and investigated the UAIbased universal measure of rational intelligence, which is formal, objective and nonanthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper [79] This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same agent is able to selfadapt to a diverse range of interactive environments. For instance, AIXI is able to learn from scratch to play TicTacToe, Pacman, Kuhn Poker, and other games by trial and error, without even providing the rules of the games. These achievements give new hope that the grand goal of Artificial General Intelligence is not elusive. This chapter provides an informal overview of UAI in context. It attempts to gently introduce a very theoretical, formal, and mathematical subject, and discusses philosophical and technical ingredients, traits of intelligence, some social questions, and the past and future of UAI. “The formulation of a problem is often more essential than its solution, which may be merely a matter of mathematical or experimental skill. To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science.”
Decision Support for Safe AI Design
"... Abstract: There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter's agentenvironment framework to define a decision support system for simulating, visualizing and analyzing AI designs to unders ..."
Abstract
 Add to MetaCart
Abstract: There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter's agentenvironment framework to define a decision support system for simulating, visualizing and analyzing AI designs to understand their consequences. The simulations do not have to be accurate predictions of the future; rather they show the futures that an agent design predicts will fulfill its motivations and that can be explored by AI designers to find risks to humans. In order to safely create a simulation model this paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions. It also discusses the risks of running an AI in a simulated environment.