Results 1  10
of
17
A New MetricBased Approach to Model Selection
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI97
, 1997
"... We introduce a new approach to model selection that performs better than the standard complexitypenalization and holdout error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabel ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
We introduce a new approach to model selection that performs better than the standard complexitypenalization and holdout error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabeled training patterns, and use this metric as a reference to detect whether the empirical error estimates derived from a small (labeled) training sample can be trusted in the region around an empirically optimal hypothesis. Using simple metric intuitions we develop new geometric strategies for detecting overfitting and performing robust yet responsive model selection in spaces of candidate functions. These new metricbased strategies dramatically outperform previous approaches in experimental studies of classical polynomial curve fitting. Moreover, the technique is simple, efficient, and can be applied to most function learning tasks. The only requirement is access to an auxiliary collection ...
E.H.Durfee. Rational communication in multiagent environments
 Autonomous Agents and MultiAgent Systems Journal
, 2000
"... Abstract. We address the issue of rational communicative behavior among autonomous selfinterested agents that have to make decisions as to what to communicate, to whom, and how. Following decision theory, we postulate that a rational speaker should design a speech act so as to optimize the benefit ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Abstract. We address the issue of rational communicative behavior among autonomous selfinterested agents that have to make decisions as to what to communicate, to whom, and how. Following decision theory, we postulate that a rational speaker should design a speech act so as to optimize the benefit it obtains as the result of the interaction. We quantify the gain in the quality of interaction in terms of the expected utility, and we present a framework that allows an agent to compute the expected utilities of various communicative actions. Our framework uses the Recursive Modeling Method as the specialized representation used for decisionmaking in a multiagent environment. This representation includes information about the agent’s state of knowledge, including the agent’s preferences, abilities and beliefs about the world, as well as the beliefs the agent has about the other agents, the beliefs it has about the other agents ’ beliefs, and so on. Decisiontheoretic pragmatics of a communicative act can be then defined as the transformation the act induces on the agent’s state of knowledge about its decisionmaking situation. This transformation leads to a change in the quality of interaction, expressed in terms of the expected utilities of the agent’s best actions before and after the communicative act. We analyze decisiontheoretic pragmatics of a number of important kinds of communicative acts and investigate their expected utilities using examples. Finally, we report on the agreement between our method of message selection and messages that human subjects choose in various circumstances, and show an implementation and experimental validation of our framework in a simulated multiagent environment. Keywords: decision theory, rationality, multiagent systems, communication, pragmatics
Rational coordination in multiagent environments
 JAAMAS
, 2000
"... Abstract. We adopt the decisiontheoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents, and present a framework that uses this paradigm to determine the choice of coordinated action. We endow an agent with a specialized representation that capture ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Abstract. We adopt the decisiontheoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents, and present a framework that uses this paradigm to determine the choice of coordinated action. We endow an agent with a specialized representation that captures the agent’s knowledge about the environment and about the other agents, including its knowledge about their states of knowledge, which can include what they know about the other agents, and so on. This reciprocity leads to a recursive nesting of models. Our framework puts forth a representation for the recursive models and, under the assumption that the nesting of models is finite, uses dynamic programming to solve this representation for the agent’s rational choice of action. Using a decisiontheoretic approach, our work addresses concerns of agent decisionmaking about coordinated action in unpredictable situations, without imposing upon agents predesigned prescriptions, or protocols, about standard rules of interaction. We implemented our method in a number of domains and we show results of coordination among our automated agents, among humancontrolled agents, and among our agents coordinating with humancontrolled agents. Keywords: coordination; rationality; decision theory; game theory; agent modeling 1.
Unsupervised learning
 Advanced Lectures on Machine Learning
, 2004
"... We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mix ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, statespace models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental concepts in graphical models, and inference algorithms on graphs. This is followed by a quick tour of approximate Bayesian inference, including Markov chain Monte Carlo (MCMC), Laplace approximation, BIC, variational approximations, and expectation propagation (EP). The aim of this chapter is to provide a highlevel view of the field. Along the way, many stateoftheart ideas and future directions are also reviewed. Contents 1
An ObservationConstrained Generative Approach for Probabilistic Classification of Image Regions
, 2003
"... In this paper, we propose a probabilistic region classification scheme for natural scene images. In conventional generative methods, a generative model is learnt for each class using all the available training data belonging to that class. However, if an input image has been generated from only a su ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
In this paper, we propose a probabilistic region classification scheme for natural scene images. In conventional generative methods, a generative model is learnt for each class using all the available training data belonging to that class. However, if an input image has been generated from only a subset of the model support, use of the full model to assign generative probabilities can produce serious artifacts in the probability assignments. This problem arises mainly when the different classes have multimodal distributions with considerable overlap in the feature space. We propose an approach to constrain the class generative probability of a set of newly observed data by exploiting the distribution of the new data itself and using linear weighted mixing. A Kullback  Leibler Divergencebased fast model selection procedure is also proposed for learning mixture models in a low dimensional feature space. The preliminary results on the natural scene images support the effectiveness of the proposed approach.
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem. 1
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
, 2007
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.
On the Accuracy of Stochastic Complexity Approximations
 In A. Gammerman (Ed.), Causal
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. U ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. Unfortunately for cases where the data has missing information, computing the stochastic complexity requires marginalizing (integrating) over the missing data, which results even in the discrete data case to computing a sum with an exponential number of terms. Therefore in most cases the stochastic complexity measure has to be approximated. In this paper we will investigate empirically the performance of some of the most common stochastic complexity approximations in an attempt to understand their small sample behavior in the incomplete data framework. In earlier empirical evaluations the problem of not knowing the actual stochastic complexity for incomplete data was circumvented either by us...
On the Greediness of Feature Selection Algorithms
, 1998
"... : Based on our analysis and experiments using realworld datasets, we find that the greediness of forward feature selection algorithms does not severely corrupt the accuracy of function approximation using the selected input features, but improves the efficiency significantly. Hence, we propose thre ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
: Based on our analysis and experiments using realworld datasets, we find that the greediness of forward feature selection algorithms does not severely corrupt the accuracy of function approximation using the selected input features, but improves the efficiency significantly. Hence, we propose three greedier algorithms in order to further enhance the efficiency of the feature selection processing. We provide empirical results for linear regression, locally weighted regression and knearestneighbor models. We also propose to use these algorithms to develop an offline Chinese and Japanese handwriting recognition system with automatically configured, local models. Keywords: feature selection, crossvalidation, function approximation, handwriting recognition, memorybased learning, instancebased learning. 2 1. Introduction A fundamental problem of machine learning is to approximate the function relationship f() between an input and an output Y, based on a memory of data points, , ...
Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection
, 2000
"... Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have pr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have previously gone unnoticed and clarifying misconceptions which seem to have propagated in the applied literature. Our tour begins with Schwarz's approximation of Bayesian integrals via Laplace's method. We then introduce the concepts underlying Rissanen 's minimum description length principle via a Bayesian scenario with a known prior; this provides the groundwork for understanding his more complex nonBayesian MDL which employs a "universal" encoding of the integers. Rissanen's method of parameter truncation is contrasted with that employed in various versions of Wallace's minimum message length criteria.