Results 1 - 10
of
91
Operations for Learning with Graphical Models
- Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract
-
Cited by 214 (13 self)
- Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
The Extraction of Refined Rules from Knowledge-Based Neural Networks
- Machine Learning
, 1993
"... Neural networks, despite their empirically-proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge in some form must be inserted into a neural network. Second, the network must be refined. Third, knowledge mus ..."
Abstract
-
Cited by 176 (4 self)
- Add to MetaCart
Neural networks, despite their empirically-proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge in some form must be inserted into a neural network. Second, the network must be refined. Third, knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this paper, we propose and empirically evaluate a method for the final, and possibly most difficult, step. This method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules: (1) closely reproduce (and can even exceed) the accuracy of the network from which they are extracted; (2) are superior to the rules produced by methods that directly refine symbolic rules; (3) are superior to those produced by previous techniques fo...
PROBEN1 - a set of neural network benchmark problems and benchmarking rules
, 1994
"... Proben1 is a collection of problems for neural network learning in the realm of pattern classification and function approximation plus a set of rules and conventions for carrying out benchmark tests with these or similar problems. Proben1 contains 15 data sets from 12 different domains. All datasets ..."
Abstract
-
Cited by 156 (0 self)
- Add to MetaCart
Proben1 is a collection of problems for neural network learning in the realm of pattern classification and function approximation plus a set of rules and conventions for carrying out benchmark tests with these or similar problems. Proben1 contains 15 data sets from 12 different domains. All datasets represent realistic problems which could be called diagnosis tasks and all but one consist of real world data. The datasets are all presented in the same simple format, using an attribute representation that can directly be used for neural network training. Along with the datasets, Proben1 defines a set of rules for how to conduct and how to document neural network benchmarking. The purpose of the problem and rule collection is to give researchers easy access to data for the evaluation of their algorithms and networks and to make direct comparison of the published results feasible. This report describes the datasets and the benchmarking rules. It also gives some basic performance measures indicating the difficulty of the various problems. These measures can be used as baselines for comparison.
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
"... Supervised neural networks generalize well if there is much less information in the weights than there is in the output vectors of the training cases. So during learning, it is important to keep the weights simple by penalizing the amount of information they contain. The amount of information in a w ..."
Abstract
-
Cited by 113 (1 self)
- Add to MetaCart
Supervised neural networks generalize well if there is much less information in the weights than there is in the output vectors of the training cases. So during learning, it is important to keep the weights simple by penalizing the amount of information they contain. The amount of information in a weight can be controlled by adding Gaussian noise and the noise level can be adapted during learning to optimize the trade-off between the expected squared error of the network and the amount of information in the weights. We describe a method of computing the derivatives of the expected squared error and of the amount of information in the noisy weights in a network that contains a layer of non-linear hidden units. Provided the output units are linear, the exact derivatives can be computed efficiently without time-consuming Monte Carlo simulations. The idea of minimizing the amount of information that is required to communicate the weights of a neural network leads to a number of interesting schemes for encoding the weights.
Query Learning with Large Margin Classifiers
, 2000
"... The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance s ..."
Abstract
-
Cited by 92 (1 self)
- Add to MetaCart
The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance selection strategies. In this paper we propose an algorithm for the training of Support Vector Machines using instance selection. We give a theoretical justification for the strategy and experimental results on real and artificial data demonstrating its effectiveness. The technique is most efficient when the dataset can be learnt using few support vectors. 1. Introduction The labour-intensive task of labelling data is a serious bottleneck for many data mining tasks. Often cost or time constraints mean that only a fraction of the available instances can be labeled. For this reason there has been increasing interest in the problem of handling partially labeled datasets. One approach ...
Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex
- Neural Computation
, 1995
"... this paper, we describe a hierarchical network model of visual recognition that explains these experimental observations by using a form of the extended Kalman filter as given by the Minimum Description Length (MDL) principle. The model dynamically combines input-driven bottom-up signals with expec ..."
Abstract
-
Cited by 77 (20 self)
- Add to MetaCart
this paper, we describe a hierarchical network model of visual recognition that explains these experimental observations by using a form of the extended Kalman filter as given by the Minimum Description Length (MDL) principle. The model dynamically combines input-driven bottom-up signals with expectation-driven top-down signals to predict current recognition state. Synaptic weights in the model are adapted in a Hebbian manner according to a learning rule also derived from the MDL principle. The resulting prediction/learning scheme can be viewed as implementing a form of the Expectation-Maximization (EM) algorithm. The architecture of the model posits an active computational role for the reciprocal connections between adjoining visual cortical areas in determining neural response properties. In particular, the model demonstrates the possible role of feedback from higher cortical areas in mediating neurophysiological effects due to stimuli from beyond the classical receptive field. Si
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z, ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z,
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.
Probabilistic Matrix Factorization
"... Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, pe ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7 % better than the score of Netflix’s own system. 1
A framework for combining symbolic and neural learning
, 1992
"... This article describes an approach to combining symbolic and connectionist approaches to machine learning. A three-stage framework is presented and the research of several groups is reviewed with respect to this framework. The first stage involves the insertion of symbolic knowledge into neural netw ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This article describes an approach to combining symbolic and connectionist approaches to machine learning. A three-stage framework is presented and the research of several groups is reviewed with respect to this framework. The first stage involves the insertion of symbolic knowledge into neural networks, the second addresses the refinement of this prior knowledge in its neural representation, while the third concerns the extraction of the refined symbolic knowledge. Experimental results and open research issues are discussed.

