Results 11 - 20
of
31
Model selection using Rademacher Penalization
- In Proceedings of the Second ICSC Symposia on Neural Computation (NC2000). ICSC Adademic
, 2000
"... In this paper we describe the use of Rademacher penalization for model selection. As in Vapnik's Guaranteed Risk Minimization (GRM), Rademacher penalization attemps to balance the complexity of the model with its t to the data by minimizing the sum of the training error and a penalty term, which is ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this paper we describe the use of Rademacher penalization for model selection. As in Vapnik's Guaranteed Risk Minimization (GRM), Rademacher penalization attemps to balance the complexity of the model with its t to the data by minimizing the sum of the training error and a penalty term, which is an upper bound on the absolute dierence between the training error and the generalization error. However, while the GRM penalty is universal, the computation of the Rademacher penalty is data driven which means that it depends on the distribution of the data and hence one can expect better performance for particular instances of learning problems. We present experimental evidence that shows that Rademacher penalization can be used as an eective method of model selection in learning problems. In particular wehave shown that for the intervals model selection problem, Rademacher penalization outperforms GRM and cross validation (CV) over a wide range of sample sizes. Our experiments also sho...
Towards Perceptual Intelligence: Statistical Modeling of Human Individual and Interactive Behaviors
- Prediction of Human Behavior, IEEE Intelligent Vehicles
, 1995
"... This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception and Machine Learning techniques. In the thesis I develop the statistical machine learning algorithms (dynamic graphical models) necessary for detecting and recognizing individual and interactive behaviors. In the case of the interactions two Hidden Markov Models (HMMs) are coupled in a novel architecture called Coupled Hidden Markov Models (CHMMs) that explicitly captures the interactions between them. The algorithms for learning the parameters from data as well as for doing inference with those models are developed and described. Four systems that experimentally evaluate the proposed paradigm are presented: (1) LAFTER, an automatic face detection and tracking system with facial expression recognition; (2) a Tai-Chi gesture recognition system; (3) a pedestrian surveillance system that recognizes typical human to human interactions; (4) and a SmartCar for driver maneuver recognition. These systems capture human behaviors of different nature and increasing complexity: first, isolated, single-user facial expressions, then, two-hand gestures and human-to-human interactions,...
Selected Training Exemplars for Neural Network Learning
, 1994
"... The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: Co-Chair Co-Chair ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: Co-Chair Co-Chair
Asymptotic optimality of likelihood-based cross-validation
- STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
, 2003
"... Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection o ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold crossvalidation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.
An Empirical Investigation of Learning from the Semantic Web
, 2002
"... The Semantic Web is a vision of a machine readable Web of resources, interlinked and connected through meta-data with common ontologies. In this paper we explore the impact such a Semantic Web would have on Machine Learning algorithms used for user profiling and personalisation. Our hypothesis is th ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The Semantic Web is a vision of a machine readable Web of resources, interlinked and connected through meta-data with common ontologies. In this paper we explore the impact such a Semantic Web would have on Machine Learning algorithms used for user profiling and personalisation. Our hypothesis is that learning from the Semantic Web should outperform traditional learning from today 's World Wide Web for both performance and accuracy. In this paper we present results obtained with two different datasets marked-up with semantic meta-data; using these we have investigated different instance representations and various learning techniques. Our initial results with the Nave Bayes and K-NN algorithms were disappointing, leading us to examine the use of the Progol algorithm.
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
A Scaling Law for the Validation-Set Training-Set Size Ratio
- AT & T Bell Laboratories
, 1997
"... We address the problem of determining what fraction of the training set should be reserved as development test set or validation set. We determine that the ratio of the validation set size over the training set size scales like the square root of two complexity parameters: the complexity of the seco ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We address the problem of determining what fraction of the training set should be reserved as development test set or validation set. We determine that the ratio of the validation set size over the training set size scales like the square root of two complexity parameters: the complexity of the second level of inference (minimizing the validation error) over the complexity of the first level of inference (minimizing the error rate on the training set). Keywords: Cross-validation; Learning Theory; Statistics; Machine Learning; Pattern Recognition; Training Set; Validation Set; Test Set; Experiment Design. Introduction The problem often arises when organizing benchmarks in pattern recognition to determine what size test set will give statistically significant results. In a companion paper [1], we tackled the problem from the point of view of the benchmark organizer: From a corpus of available data, how much data should be reserved for the benchmark test set? In this paper, we tackle th...
VC Theory of Large Margin Multi-Category Classifiers
"... In the context of discriminant analysis, Vapnik’s statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binary-valued functions, the computation of dichotomies with real-valued functions, and the computation of polytomies with functions taking ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In the context of discriminant analysis, Vapnik’s statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binary-valued functions, the computation of dichotomies with real-valued functions, and the computation of polytomies with functions taking their values in finite sets, typically the set of categories itself. The case of classes of vectorvalued functions used to compute polytomies has seldom been considered independently, which is unsatisfactory, for three main reasons. First, this case encompasses the other ones. Second, it cannot be treated appropriately through a naïve extension of the results devoted to the computation of dichotomies. Third, most of the classification problems met in practice involve multiple categories. In this paper, a VC theory of large margin multi-category classifiers is introduced. Central in this theory are generalized VC dimensions called the γ-Ψ-dimensions. First, a uniform convergence bound on the risk of the classifiers of interest is derived. The capacity measure involved in this bound is a covering number. This covering number can be upper bounded in terms of the γ-Ψdimensions thanks to generalizations of Sauer’s lemma, as is illustrated in the specific case of the scale-sensitive Natarajan dimension. A bound on this latter dimension is then computed for the class of functions on which multi-class SVMs are based. This makes it possible to apply the structural risk minimization inductive principle to those machines.
Handling Uncertainty When You're Handling Uncertainty: Model Selection and Error Bars for Belief Networks
, 2000
"... Belief networks are a common way of handling uncertainty in AI. A belief network represents the joint distribution of a set of random variables. When network parameters are estimated from a sample, the parameter values are also random variables whose distribution is given by the sampling distributio ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Belief networks are a common way of handling uncertainty in AI. A belief network represents the joint distribution of a set of random variables. When network parameters are estimated from a sample, the parameter values are also random variables whose distribution is given by the sampling distribution of the true model (Frequentist perspective) or the posterior distribution over the parameter space (Bayesian perspective). The uncertainty in parameter values has implications for both inference and learning. In learning network structure from data, a fundamental issue is how to handle the bias-variance trade-off -- increasing model complexity decreases bias but increases the variance in parameter values. We compare model selection criteria for handling the bias-variance trade-off in structure learning, on theoretical and empirical grounds. We also look at the issue of the uncertainty in belief network inference. Once constructed, belief networks are typically used to answer queries about mar...
The Effects Of Pruning Methods On The Predictive Accuracy Of Induced Decision Trees
, 1999
"... ... This article presents a unifying framework according to which any pruning method can be defined as a four-tuple (Space, Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization problem. Six well-known pruning methods are investigated by means of th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
... This article presents a unifying framework according to which any pruning method can be defined as a four-tuple (Space, Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization problem. Six well-known pruning methods are investigated by means of this framework and their common aspects, strengths and weaknesses are described. Furthermore, a new empirical analysis of the effect of post-pruning on both the predictive accuracy and the size of induced decision trees is reported. The experimental comparison of the pruning methods involves 14 datasets and is based on the cross-validation procedure. The results confirm most of the conclusions drawn in a previous comparison based on the holdout procedure.

