Results 1 - 10
of
17,264
Finite-time analysis of the multiarmed bandit problem
- Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract
-
Cited by 817 (15 self)
- Add to MetaCart
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We exte ..."
Abstract
-
Cited by 569 (38 self)
- Add to MetaCart
extend the usual notion of action in this framework to include options|closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint knowledge
Decentralized Trust Management
- In Proceedings of the 1996 IEEE Symposium on Security and Privacy
, 1996
"... We identify the trust management problem as a distinct and important component of security in network services. Aspects of the trust management problem include formulating security policies and security credentials, determining whether particular sets of credentials satisfy the relevant policies, an ..."
Abstract
-
Cited by 1025 (24 self)
- Add to MetaCart
approach to trust management, based on a simple language for specifying trusted actions and trust relationships. It also describes a prototype implementation of a new trust management system, called PolicyMaker, that will facilitate the development of security features in a wide range of network services
Implications of rational inattention
- JOURNAL OF MONETARY ECONOMICS
, 2002
"... A constraint that actions can depend on observations only through a communication channel with finite Shannon capacity is shown to be able to play a role very similar to that of a signal extraction problem or an adjustment cost in standard control problems. The resulting theory looks enough like fa ..."
Abstract
-
Cited by 525 (11 self)
- Add to MetaCart
A constraint that actions can depend on observations only through a communication channel with finite Shannon capacity is shown to be able to play a role very similar to that of a signal extraction problem or an adjustment cost in standard control problems. The resulting theory looks enough like
Department of Economic and Social Affairs, Population Division
, 1999
"... vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States ..."
Abstract
-
Cited by 505 (3 self)
- Add to MetaCart
vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States
What Can Economists Learn from Happiness Research?
- FORTHCOMING IN JOURNAL OF ECONOMIC LITERATURE
, 2002
"... Happiness is generally considered to be an ultimate goal in life; virtually everybody wants to be happy. The United States Declaration of Independence of 1776 takes it as a self-evident truth that the “pursuit of happiness” is an “unalienable right”, comparable to life and liberty. It follows that e ..."
Abstract
-
Cited by 545 (24 self)
- Add to MetaCart
for economists to consider happiness. The first is economic policy. At the micro-level, it is often impossible to make a Pareto-optimal proposal, because a social action entails costs for some individuals. Hence an evaluation of the net effects, in terms of individual utilities, is needed. On an aggregate level
Investor psychology and security market under- and overreactions
- Journal of Finance
, 1998
"... We propose a theory of securities market under- and overreactions based on two well-known psychological biases: investor overconfidence about the precision of private information; and biased self-attribution, which causes asymmetric shifts in investors ’ confidence as a function of their investment ..."
Abstract
-
Cited by 698 (43 self)
- Add to MetaCart
outcomes. We show that overconfidence implies negative long-lag autocorrelations, excess volatility, and, when managerial actions are correlated with stock mispricing, public-event-based return predictability. Biased self-attribution adds positive short-lag autocorrela-tions ~“momentum”!, short
Motivation through the Design of Work: Test of a Theory. Organizational Behavior and Human Performance,
, 1976
"... A model is proposed that specifies the conditions under which individuals will become internally motivated to perform effectively on their jobs. The model focuses on the interaction among three classes of variables: (a) the psychological states of employees that must be present for internally motiv ..."
Abstract
-
Cited by 622 (2 self)
- Add to MetaCart
under government sponsorship are encouraged to express their own judgment freely, this report does not necessarily represent the official opinion or policy of the government. redesign are not fully adequate to meet the problems encountered in their application. Especially troublesome is the paucity
Implementation intentions. Strong effects of simple plans
- AMERICAN PSYCHOLOGIST
, 1999
"... When people encounter problems in translating their goals into action (e.g., failing to get started, becoming distracted, or falling into bad habits), they may strategically call on automatic processes in an attempt to secure goal attain-ment. This can be achieved by plans in the form of imple-menta ..."
Abstract
-
Cited by 478 (52 self)
- Add to MetaCart
When people encounter problems in translating their goals into action (e.g., failing to get started, becoming distracted, or falling into bad habits), they may strategically call on automatic processes in an attempt to secure goal attain-ment. This can be achieved by plans in the form of imple
Policy gradient methods for reinforcement learning with function approximation.
- In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract
-
Cited by 439 (20 self)
- Add to MetaCart
that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal
Results 1 - 10
of
17,264