Results 1 - 10
of
29
Preference Elicitation for Interface Optimization
- In Proceedings of UIST 2005
, 2005
"... Decision-theoretic optimization is becoming a popular tool in the user interface community, but creating accurate cost (or utility) functions has become a bottleneck --- in most cases the numerous parameters of these functions are chosen manually, which is a tedious and error-prone process. This pap ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
Decision-theoretic optimization is becoming a popular tool in the user interface community, but creating accurate cost (or utility) functions has become a bottleneck --- in most cases the numerous parameters of these functions are chosen manually, which is a tedious and error-prone process. This paper describes ARNAULD, a general interactive tool for eliciting user preferences concerning concrete outcomes and using this feedback to automatically learn a factored cost function. We empirically evaluate our machine learning algorithm and two automatic query generation approaches and report on an informal user study.
Learning Social Preferences in Games
, 2004
"... This paper presents a machine-learning approach to modeling human behavior in one-shot games. It provides a framework for representing and reasoning about the social factors that affect people's play. The model predicts how a human player is likely to react to different actions of another playe ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
This paper presents a machine-learning approach to modeling human behavior in one-shot games. It provides a framework for representing and reasoning about the social factors that affect people's play. The model predicts how a human player is likely to react to different actions of another player, and these predictions are used to determine the best possible strategy for that player. Data collection and evaluation of the model were performed on a negotiation game in which humans played against each other and against computer models playing various strategies. A computer player trained on human data outplayed Nash equilibrium and Nash bargaining computer players as well as humans. It also generalized to play people and game situations it had not seen before.
Pairwise Preference Learning and Ranking
- Proceedings of the 14th European Conference on Machine Learning
, 2003
"... We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a rank ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a ranking function by reducing the original problem to a number of binary classification problems, one for each pair of labels. The main objective of this work is to investigate the trade-off between the quality of the induced ranking function and the computational complexity of the algorithm, both depending on the amount of preference information given for each example. To this end, we present theoretical results on the complexity of pairwise preference learning.
Bayesian models of human action understanding
- Advances in Neural Information Processing Systems 18
, 2006
"... We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Action-understanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Action-understanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order to achieve their goals given the constraints of their environment. Working in a simple sprite-world domain, we show how this model can be used to infer the goal of an agent and predict how the agent will act in novel situations or when environmental constraints change. The model provides a qualitative account of several kinds of inferences that preverbal infants have been shown to perform, and also fits quantitative predictions that adult observers make in a new experiment. 1
Valuebased policy teaching with active indirect elicitation
- In Proc. 23rd National Conference on Artificial Intelligence
, 2008
"... Many situations arise in which an interested party’s utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer’s behavior. We consider an environment in which the interested party can provide incentives to ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
Many situations arise in which an interested party’s utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer’s behavior. We consider an environment in which the interested party can provide incentives to affect the agent’s actions but cannot otherwise enforce actions. In value-based policy teaching, we situate this within the framework of sequential decision tasks modeled by Markov Decision Processes, and seek to associate limited rewards with states that induce the agent to follow a policy that maximizes the total expected value of the interested party. We show value-based policy teaching is NP-hard and provide a mixed integer program formulation. Focusing in particular on environments in which the agent’s reward is unknown to the interested party, we provide a method for active indirect elicitation wherein the agent’s reward function is inferred from observations about its response to incentives. Experimental results suggest that we can generally find the optimal incentive provision in a small number of elicitation rounds.
Fitting and Compilation of Multiagent Models through Piecewise Linear Functions
, 2004
"... Decision-theoretic models have become increasingly popular as a basis for solving agent and multiagent problems, due to their ability to quantify the complex uncertainty and preferences that pervade most nontrivial domains. However, this quantitative nature also complicates the problem of constructi ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
Decision-theoretic models have become increasingly popular as a basis for solving agent and multiagent problems, due to their ability to quantify the complex uncertainty and preferences that pervade most nontrivial domains. However, this quantitative nature also complicates the problem of constructing models that accurately represent an existing agent or multiagent system, leading to the common question, "Where do the numbers come from?" In this work, we present a method for exploiting knowledge about the qualitative structure of a problem domain to automatically derive the correct quantitative values that would generate an observed pattern of agent behavior. In particular, we propose the use of piecewise linear functions to represent probability distributions and utility functions with a structure that we can then exploit to more efficiently compute value functions. More importantly, we have designed algorithms that can (for example) take a sequence of actions and automatically generate a reward function that would generate that behavior within our agent model. This algorithm allows us to efficiently fit an agent or multiagent model to observed behavior. We illustrate the application of this framework with examples in multiagent modeling and social simulation, using decision-theoretic models drawn from the alphabet soup of existing research (e.g., MDPs, POMDPs, Dec-POMDPs, Com-MTDPs).
P.: Identifying and forecasting economic regimes in tac scm
- In: IJCAI 2005 Workshop on Trading Agent Design and Analysis. (2005) 53–60
, 2006
"... We present methods for an autonomous agent to identify dominant market conditions, such as over-supply or scarcity, and to predict market changes. The characteristics of economic regimes are learned from historic data and used, together with real-time observable information, to identify the current ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We present methods for an autonomous agent to identify dominant market conditions, such as over-supply or scarcity, and to predict market changes. The characteristics of economic regimes are learned from historic data and used, together with real-time observable information, to identify the current market regime and to forecast market changes. We use a Gaussian Mixture Model to represent the probabilities of market prices. By clustering these probabilities we identify different economic regimes. We show that the regimes so identified have properties that represent different prevailing market conditions. We then present preliminary work on methods to predict future regime transitions. A trading agent can use these predictions to make operational and strategic decisions regarding pricing, raw material acquisition, and production. We validate our method by presenting experimental results obtained with data from the Trading Agent Competition for Supply Chain Management. 1
Policy teaching through reward function learning
- In Proc. 10th ACM Conf. on Electronic Commerce
, 2009
"... Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent’s decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent’s ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent’s decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent’s reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
Specifying Label Layout Styles by Example
- UIST
, 2007
"... Creating high-quality label layouts in a particular visual style is a time-consuming process. Although automated labeling algorithms can aid the layout process, expert design knowledge is required to tune these algorithms so that they produce layouts which meet the designer’s expectations. We propos ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Creating high-quality label layouts in a particular visual style is a time-consuming process. Although automated labeling algorithms can aid the layout process, expert design knowledge is required to tune these algorithms so that they produce layouts which meet the designer’s expectations. We propose a system which can learn a label layout style from a single example layout and then apply this style to new labeling problems. Because designers find it much easier to create example layouts than tune algorithmic parameters, our system provides a more natural workflow for graphic designers. We demonstrate that our system is capable of learning a variety of label layout styles from examples. ACM Classification I.2.6 [Artificial Intelligence]: Learning.

