Results 1 - 10
of
13
A New Metric-Based Approach to Model Selection
- In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97
, 1997
"... We introduce a new approach to model selection that performs better than the standard complexitypenalization and hold-out error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabel ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
We introduce a new approach to model selection that performs better than the standard complexitypenalization and hold-out error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabeled training patterns, and use this metric as a reference to detect whether the empirical error estimates derived from a small (labeled) training sample can be trusted in the region around an empirically optimal hypothesis. Using simple metric intuitions we develop new geometric strategies for detecting overfitting and performing robust yet responsive model selection in spaces of candidate functions. These new metric-based strategies dramatically outperform previous approaches in experimental studies of classical polynomial curve fitting. Moreover, the technique is simple, efficient, and can be applied to most function learning tasks. The only requirement is access to an auxiliary collection ...
Rational Communication in Multi-Agent Environments
- AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS
, 2000
"... We address the issue of rational communicative behavior among autonomous self-interested agents that have to make decisions as to what to communicate, to whom, and how. Following decision theory, we postulate that a rational speaker should design a speech act so as to optimize the benefit it obta ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
We address the issue of rational communicative behavior among autonomous self-interested agents that have to make decisions as to what to communicate, to whom, and how. Following decision theory, we postulate that a rational speaker should design a speech act so as to optimize the benefit it obtains as the result of the interaction. We quantify the gain in the quality of interaction in terms of the expected utility, and we present a framework that allows an agent to compute the expected utilities of various communicative actions. Our framework uses the Recursive Modeling Method as the specialized representation used for decision-making in a multi-agent environment. This representation includes information about the agent's state of knowledge, including the agent's preferences, abilities and beliefs about the world, as well as the beliefs the agent has about the other agents, the beliefs it has about the other agents' beliefs, and so on. Decision-theoretic pragmatics of a comm...
Unsupervised learning
- Advanced Lectures on Machine Learning
, 2004
"... We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mix ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, state-space models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental concepts in graphical models, and inference algorithms on graphs. This is followed by a quick tour of approximate Bayesian inference, including Markov chain Monte Carlo (MCMC), Laplace approximation, BIC, variational approximations, and expectation propagation (EP). The aim of this chapter is to provide a high-level view of the field. Along the way, many state-of-the-art ideas and future directions are also reviewed. Contents 1
Rational Coordination in Multi-Agent Environments
, 1999
"... We adopt the decision-theoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents, and present a framework that uses this paradigm to determine the choice of coordinated action. We endow an agent with a specialized representation that captures the a ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We adopt the decision-theoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents, and present a framework that uses this paradigm to determine the choice of coordinated action. We endow an agent with a specialized representation that captures the agent's knowledge about the environment and about the other agents, including its knowledge about their states of knowledge, which can include what they know about the other agents, and so on. This reciprocity leads to a recursive nesting of models. Our framework puts forth a representation for the recursive models and, under the assumption that the nesting of models is finite, uses dynamic programming to solve this representation for the agent's rational choice of action. Using a decision-theoretic approach, our work addresses concerns of agent decision-making about coordinated action in unpredictable situations, without imposing upon agents pre-designed prescriptions, or protocols, ...
An Observation-Constrained Generative Approach for Probabilistic Classification of Image Regions
, 2003
"... In this paper, we propose a probabilistic region classification scheme for natural scene images. In conventional generative methods, a generative model is learnt for each class using all the available training data belonging to that class. However, if an input image has been generated from only a su ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper, we propose a probabilistic region classification scheme for natural scene images. In conventional generative methods, a generative model is learnt for each class using all the available training data belonging to that class. However, if an input image has been generated from only a subset of the model support, use of the full model to assign generative probabilities can produce serious artifacts in the probability assignments. This problem arises mainly when the different classes have multimodal distributions with considerable overlap in the feature space. We propose an approach to constrain the class generative probability of a set of newly observed data by exploiting the distribution of the new data itself and using linear weighted mixing. A Kullback -- Leibler Divergence-based fast model selection procedure is also proposed for learning mixture models in a low dimensional feature space. The preliminary results on the natural scene images support the effectiveness of the proposed approach.
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem. 1
On the Accuracy of Stochastic Complexity Approximations
- In A. Gammerman (Ed.), Causal
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. U ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. Unfortunately for cases where the data has missing information, computing the stochastic complexity requires marginalizing (integrating) over the missing data, which results even in the discrete data case to computing a sum with an exponential number of terms. Therefore in most cases the stochastic complexity measure has to be approximated. In this paper we will investigate empirically the performance of some of the most common stochastic complexity approximations in an attempt to understand their small sample behavior in the incomplete data framework. In earlier empirical evaluations the problem of not knowing the actual stochastic complexity for incomplete data was circumvented either by us...
On the Greediness of Feature Selection Algorithms
, 1998
"... : Based on our analysis and experiments using real-world datasets, we find that the greediness of forward feature selection algorithms does not severely corrupt the accuracy of function approximation using the selected input features, but improves the efficiency significantly. Hence, we propose thre ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: Based on our analysis and experiments using real-world datasets, we find that the greediness of forward feature selection algorithms does not severely corrupt the accuracy of function approximation using the selected input features, but improves the efficiency significantly. Hence, we propose three greedier algorithms in order to further enhance the efficiency of the feature selection processing. We provide empirical results for linear regression, locally weighted regression and k-nearestneighbor models. We also propose to use these algorithms to develop an off-line Chinese and Japanese handwriting recognition system with automatically configured, local models. Keywords: feature selection, cross-validation, function approximation, handwriting recognition, memory-based learning, instance-based learning. 2 1. Introduction A fundamental problem of machine learning is to approximate the function relationship f() between an input and an output Y, based on a memory of data points, , ...
Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection
, 2000
"... Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have pr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Investigators interested in model order estimation have tended to divide themselves into widely separated camps; this survey of the contributions of Schwarz, Wallace, Rissanen, and their coworkers attempts to build bridges between the various viewpoints, illuminating connections which may have previously gone unnoticed and clarifying misconceptions which seem to have propagated in the applied literature. Our tour begins with Schwarz's approximation of Bayesian integrals via Laplace's method. We then introduce the concepts underlying Rissanen 's minimum description length principle via a Bayesian scenario with a known prior; this provides the groundwork for understanding his more complex non-Bayesian MDL which employs a "universal" encoding of the integers. Rissanen's method of parameter truncation is contrasted with that employed in various versions of Wallace's minimum message length criteria.
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
, 2007
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.

