Results 1  10
of
30
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 64 (16 self)
 Add to MetaCart
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Asymptotic Model Selection for Naive Bayesian Networks
 In Proc. of the 18th Conference on Uncertainty in Artificial Intelligence (UAI02
, 2002
"... We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features.
Predicting the Future of Discrete Sequences From Fractal Representations of the Past
, 2001
"... We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common sux shared by any two nblocks, the closer lie their point representations.
Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds via VisionBased Reinforcement Learning and Development
 Artificial Intelligence
, 1999
"... In this paper, we rst discuss the meaning of physical embodiment and the complexity of the environment in the context of multiagent learning. We then propose a visionbased reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initia ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
In this paper, we rst discuss the meaning of physical embodiment and the complexity of the environment in the context of multiagent learning. We then propose a visionbased reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup [12] to illustrate the eectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identication. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior...
Robust Full Bayesian Learning for Radial Basis Networks
, 2001
"... We propose a hierachical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters,... ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
We propose a hierachical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters,...
Universal Composite Hypothesis Testing: A Competitve Minimax Approach
, 2001
"... A novel approach is presented for the longstanding problem of composite hypothesis testing. In composite hypothesis testing, unlike in simple hypothesis testing, the probability function of the observed data given the hypothesis, is uncertain as it depends on the unknown value of some parameter. Th ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
A novel approach is presented for the longstanding problem of composite hypothesis testing. In composite hypothesis testing, unlike in simple hypothesis testing, the probability function of the observed data given the hypothesis, is uncertain as it depends on the unknown value of some parameter. The proposed approach is to minimize the worstcase ratio between the probability of error of a decision rule that is independent of the unknown parameters and the minimum probability of error attainable given the parameters. The principal solution to this minimax problem is presented and the resulting decision rule is discussed. Since the exact solution is, in general, hard to find, and afortiori hard to implement, an approximation method that yields an asymptotically minimax decision rule is proposed. Finally, a variety of potential application areas are provided in signal processing and communications with special emphasis on universal decoding.
Combinatorial optimization by learning and simulation of Bayesian networks
 in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000
"... This paper shows how the Bayesian network paradigm can be used in order to solve combinatorial optimization problems. To do it some methods of structure learning from data and simulation of Bayesian networks are inserted inside Estimation of Distribution Algorithms (EDA). EDA are a new tool for evol ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
This paper shows how the Bayesian network paradigm can be used in order to solve combinatorial optimization problems. To do it some methods of structure learning from data and simulation of Bayesian networks are inserted inside Estimation of Distribution Algorithms (EDA). EDA are a new tool for evolutionary computation in which populations of individuals are created by estimation and simulation of the joint probability distribution of the selected individuals. We propose new approaches to EDA for combinatorial optimization based on the theory of probabilistic graphical models. Experimental results are also presented.
MetricBased Methods for Adaptive Model Selection and Regularization
 Machine Learning
, 2001
"... We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the di ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the distribution of unlabeled data. We show how this metric can be used to detect untrustworthy training error estimates, and devise novel model selection strategies that exhibit theoretical guarantees against overtting (while still avoiding under tting). We then extend the approach to derive a general training criterion for supervised learningyielding an adaptive regularization method that uses unlabeled data to automatically set regularization parameters. This new criterion adjusts its regularization level to the specic set of training data received, and performs well on a variety of regression and conditional density estimation tasks. The only proviso for these methods is that s...
Reversible Jump MCMC Simulated Annealing for Neural Networks
"... We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We also show that by calibrating a Bayesian model, we can obtain the classical AIC, BIC and MDL model selection criteria within a penalized likelihood framework. Finally, we show theoretically and empirically that the algorithm converges to the modes of the full posterior distribution in an efficient way. likelihood estimation, with the aforementioned model selection criteria, is performed by maximizing the calibrated posterior distribution. To accomplish this goal, we propose an MCMC simulated annealing algorithm, which makes use of a homogeneous reversible jump MCMC kernel as proposal. This approach has the advantage that we can start with an arbitrary model order and the algorithm will perform dimension jumps until it finds the "true " model order. That is, one does not have to resort to the more expensive task of running a fixed dimension algorithm for each possible model order and subsequently selecting the best model. We also present a convergence theorem for the algorithm. The complexity of the problem does not allow for a comprehensive discussion in this short paper.