Results 1  10
of
22
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 317 (66 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
Multiresolution image classification by hierarchical modeling with two dimensional hidden Markov models
 IEEE TRANS. INFORMATION THEORY
, 2000
"... This paper treats a multiresolution hidden Markov model for classifying images. Each image is represented by feature vectors at several resolutions, which are statistically dependent as modeled by the underlying state process, a multiscale Markov mesh. Unknowns in the model are estimated by maximum ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
This paper treats a multiresolution hidden Markov model for classifying images. Each image is represented by feature vectors at several resolutions, which are statistically dependent as modeled by the underlying state process, a multiscale Markov mesh. Unknowns in the model are estimated by maximum likelihood, in particular by employing the expectationmaximization algorithm. An image is classified by finding the optimal set of states with maximum a posteriori probability. States are then mapped into classes. The multiresolution model enables multiscale information about context to be incorporated into classification. Suboptimal algorithms based on the model provide progressive classification that is much faster than the algorithm based on singleresolution hidden Markov models.
Locating and Tracking of Human Faces with Neural Networks
, 1994
"... Effective HumantoHuman communication involves both auditory and visual modalities, providing robustness and naturalness in realistic communication situations. Recent efforts at our lab are aimed at providing such multimodal capabilities for humanmachine communication as well by introducing gest ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Effective HumantoHuman communication involves both auditory and visual modalities, providing robustness and naturalness in realistic communication situations. Recent efforts at our lab are aimed at providing such multimodal capabilities for humanmachine communication as well by introducing gesture, character and speech recognition, eyetracking and lipreading. Most of the visual modalities require a stable image of a speaker's face. In this technical report a connectionist face tracker is proposed that manipulates camera orientation and zoom, to keep a person's face located at all times in an image sequence. The system operates in real time and can adapt rapidly to different lighting conditions, different cameras and faces, making it robust against environmental variability. Extensions and integration of the system with a multimodal interface will be presented. Contents 1 Introduction 1 1.1 Overview on the Face Tracking System : : : : : : : : : : : : : 2 1.2 Approach and Chapte...
BYY Harmony Learning, Independent State Space, and Generalized APT Financial Analyses
, 2001
"... First, the relationship between factor analysis (FA) and the wellknown arbitrage pricing theory (APT) for financial market has been discussed comparatively, with a number of tobeimproved problems listed. An overview has been made from a unified perspective on the related studies in the literature ..."
Abstract

Cited by 23 (20 self)
 Add to MetaCart
First, the relationship between factor analysis (FA) and the wellknown arbitrage pricing theory (APT) for financial market has been discussed comparatively, with a number of tobeimproved problems listed. An overview has been made from a unified perspective on the related studies in the literatures of statistics, control theory, signal processing, and neural networks. Second, we introduce the fundamentals of the Bayesian Ying Yang (BYY) system and the harmony learning principle which has been systematically developed in past several years as a unified statistical framework for parameter learning, regularization and model selection, in both nontemporal and temporal stochastic environments. We further show that a specific case of the framework, called BYY independent state space (ISS) system, provides a general guide for systematically tackling various FA related learning tasks and the above tobeimproved problems for the APT analyses. Third, on various specific cases of the BYY ISS s...
Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family
 Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science
"... five action circling ..."
Hierarchical BayesianKalman Models For Regularisation And ARD In Sequential Learning
 DEPARTMENT OF ENGINEERING, CAMBRIDGE UNIVERSITY
, 1998
"... In this paper, we show that a hierarchical Bayesian modelling approach to sequential learning leads to many interesting attributes such as regularisation and automatic relevance determination. We identify three inference levels within this hierarchy, namely model selection, parameter estimation and ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
In this paper, we show that a hierarchical Bayesian modelling approach to sequential learning leads to many interesting attributes such as regularisation and automatic relevance determination. We identify three inference levels within this hierarchy, namely model selection, parameter estimation and noise estimation. In environments where data arrives sequentially, techniques such as crossvalidation to achieve regularisation or model selection are not possible. The Bayesian approach, with extended Kalman filtering at the parameter estimation level, allows for regularisation within a minimum variance framework. A multilayer perceptron is used to generate the extended Kalman filter nonlinear measurements mapping. We describe several algorithms at the noise estimation level, which allow us to implement adaptive regularisation and automatic relevance determination of model inputs and basis functions. An important contribution of this paper is to show the theoretical links between adaptive...
A visionbased learning method for pushing manipulation
 In AAAI Fall Symposium Series: Machine Learning in Vision: What Why and
, 1993
"... AbstractWe describe an unsupervised online method for learning of manipulative actions that allows a robot to push an object connected to it with a rotational point contact to a desired point in imagespace. By observing the results of its actions on the object's orientation in imagespace, the sys ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
AbstractWe describe an unsupervised online method for learning of manipulative actions that allows a robot to push an object connected to it with a rotational point contact to a desired point in imagespace. By observing the results of its actions on the object's orientation in imagespace, the system forms a predictive forward empirical model. This acquired model is used online for manipulation planning and control as it improves. Rather than explicitly inverting the forward model to achieve trajectory control, a stochastic action selection technique [Moore, 1990] is used to select the most informative and promising actions, thereby integrating active perception and learning by combining online improvement, taskdirected exploration, and model exploitation. Simulation and experimental results of the approach are presented. I.
Temporal BYY Encoding, Markovian State Spaces, and Space Dimension Determination
, 2004
"... As a complementary to those temporal coding approaches of the current major stream, this paper aims at the Markovian state space temporal models from the perspective of the temporal Bayesian YingYang (BYY) learning with both new insights and new results on not only the discrete state featured Hidde ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
As a complementary to those temporal coding approaches of the current major stream, this paper aims at the Markovian state space temporal models from the perspective of the temporal Bayesian YingYang (BYY) learning with both new insights and new results on not only the discrete state featured Hidden Markov model and extensions but also the continuous state featured linear state spaces and extensions, especially with a new learning mechanism that makes selection of the state number or the dimension of state space either automatically during adaptive learning or subsequently after learning via model selection criteria obtained from this mechanism. Experiments are demonstrated to show how the proposed approach works.
A class of logistictype discriminant functions
 In revision for Biometrika
, 2000
"... In twogroup discriminant analysis, the NeymanPearson Lemma establishes that the ROC curve for an arbitrary linear function is everywhere below the ROC curve for the true likelihood ratio. The weighted area between these two curves can be used as a risk function for finding good discriminant functi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
In twogroup discriminant analysis, the NeymanPearson Lemma establishes that the ROC curve for an arbitrary linear function is everywhere below the ROC curve for the true likelihood ratio. The weighted area between these two curves can be used as a risk function for finding good discriminant functions. The weight function corresponds to the objective of the analysis, for example to minimize the expected cost of misclassification, or to maximize the area under ROC. The resulting discriminant functions can be estimated by iteratively reweighted logistic regression. We investigate some asymptotic properties in the “nearlogistic ” setting, where we assume the covariates have been chosen such that a linear function gives a reasonable (but not necessarily exact) approximation to the true log likelihood ratio. Some examples are discussed, including a study of medical diagnosis in breast cytology. Some key words:Discriminant analysis; Logistic regression; NeymanPerson lemma; ROC curves. 1