Results 1  10
of
83
Wrappers for feature subset selection
 ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1023 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 564 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Mining highspeed data streams
, 2000
"... Categories and Subject ���������� � �¨�������������������������¦���¦����������¡¤�� ¡ � ¡����������������¦¡¤����§�£���� ..."
Abstract

Cited by 288 (10 self)
 Add to MetaCart
Categories and Subject ���������� � �¨�������������������������¦���¦����������¡¤�� ¡ � ¡����������������¦¡¤����§�£����
Locally Weighted Learning for Control
, 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract

Cited by 159 (17 self)
 Add to MetaCart
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.
Efficient algorithms for minimizing cross validation error
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is che ..."
Abstract

Cited by 128 (6 self)
 Add to MetaCart
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is cheap and human expertise costly. Cross validation can then be a highly effective method for automatic model selection. Large scale cross validation search can, however, be computationally expensive. This paper introduces new algorithms to reduce the computational burden of such searches. We show how experimental design methods can achieve this, using a technique similar to a Bayesian version of Kaelbling’s Interval Estimation. Several improvements are then given, including (1) the use of blocking to quickly spot nearidentical models, and (2) schemata search: a new method for quickly finding families of relevant features. Experiments are presented for robot data and noisy synthetic datasets. The new algorithms speed up computation without sacrificing reliability, and in some cases are more reliable than conventional techniques. 1
A Racing Algorithm for Configuring Metaheuristics
, 2002
"... This paper describes a racing procedure for finding, in a limited amount of time, a configuration of a metaheuristic that performs as good as possible on a given instance class of a combinatorial optimization problem. Taking inspiration from methods proposed in the machine learning literature ..."
Abstract

Cited by 117 (34 self)
 Add to MetaCart
This paper describes a racing procedure for finding, in a limited amount of time, a configuration of a metaheuristic that performs as good as possible on a given instance class of a combinatorial optimization problem. Taking inspiration from methods proposed in the machine learning literature for model selection through crossvalidation, we propose a procedure that empirically evaluates a set of candidate configurations by discarding bad ones as soon as statistically sufficient evidence is gathered against them. We empirically evaluate our procedure using as an example the configuration of an ant colony optimization algorithm applied to the traveling salesman problem.
The Power of Decision Tables
 Proceedings of the European Conference on Machine Learning
, 1995
"... . We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and realworld domains containing only discre ..."
Abstract

Cited by 100 (5 self)
 Add to MetaCart
. We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and realworld domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform stateoftheart algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing crossvalidation that is applicable to incremental learning algorithms including IDTM. Using incremental crossvalidation, it is possible to crossvalidate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incre...
Machine learning for sequential data: A review
 Structural, Syntactic, and Statistical Pattern Recognition
, 2002
"... Abstract. Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window met ..."
Abstract

Cited by 82 (1 self)
 Add to MetaCart
Abstract. Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues. 1
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... We propose to scale learning algorithms to arbitrarily large databases by the following method. First derive an upper bound for the learner's loss as a function of the number of examples used in each step of the algorithm. ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
We propose to scale learning algorithms to arbitrarily large databases by the following method. First derive an upper bound for the learner's loss as a function of the number of examples used in each step of the algorithm.
Learning One More Thing
, 1994
"... Most research on machine learning has focused on scenarios in which a learner faces a single, isolated learning task. The lifelong learning frameworkassumes instead that the learner encounters a multitude of related learning tasks over its lifetime, providing the opportunity for the transfer of know ..."
Abstract

Cited by 61 (6 self)
 Add to MetaCart
Most research on machine learning has focused on scenarios in which a learner faces a single, isolated learning task. The lifelong learning frameworkassumes instead that the learner encounters a multitude of related learning tasks over its lifetime, providing the opportunity for the transfer of knowledge. This paper studies lifelong learning in the context of binary classification. It presents the invariance approach, in which knowledge is transferred via a learned model of the invariances of the domain. Results on learning to recognize objects from color images demonstrate superior generalization capabilities if invariances are learned and used to bias subsequent learning. This research is sponsored in part by the National Science Foundation under award IRI9313367, and by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant number F336159311330. Views and conclusions contained in this doc...