Results 11  20
of
31
Improving Backpropagation Learning with Feature Selection
 APPLIED INTELLIGENCE: THE INTERNATIONAL JOURNAL OF ARTIFICAL INTELLIGENCE, NEURAL NETWORKS, AND COMPLEX PROBLEMSOLVING TECHNOLOGIES
, 1996
"... There exist redundant, irrelevant and noisy data. Using proper data to train a network can speed up training, simplify the learned structure, and improve its performance. A twophase training algorithm is proposed. In the first phase, the number of input units of the network is determined by usin ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
There exist redundant, irrelevant and noisy data. Using proper data to train a network can speed up training, simplify the learned structure, and improve its performance. A twophase training algorithm is proposed. In the first phase, the number of input units of the network is determined by using an information base method. Only those attributes that meet certain criteria for inclusion will be considered as the input to the network. In the second phase, the number of hidden units of the network is selected automatically based on the performance of the network on the training data. One hidden unit is added at a time only if it is necessary. The experimental results show that this new algorithm can achieve a faster learning time, a simpler network and an improved performance.
Algorithms for approximate minimization of the difference between submodular functions, with applications
, 2012
"... We extend the work of Narasimhan and Bilmes [30] for minimizing set functions representable as a difference between submodular functions. Similar to [30], our new algorithms are guaranteed to monotonically reduce the objective function at every step. We empirically and theoretically show that the pe ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
We extend the work of Narasimhan and Bilmes [30] for minimizing set functions representable as a difference between submodular functions. Similar to [30], our new algorithms are guaranteed to monotonically reduce the objective function at every step. We empirically and theoretically show that the periteration cost of our algorithms is much less than [30], and our algorithms can be used to efficiently minimize a difference between submodular functions under various combinatorial constraints, a problem not previously addressed. We provide computational bounds and a hardness result on the multiplicative inapproximability of minimizing the difference between submodular functions. We show, however, that it is possible to give worstcase additive bounds by providing a polynomial time computable lowerbound on the minima. Finally we show how a number of machine learning problems can be modeled as minimizing the difference between submodular functions. We experimentally show the validity of our algorithms by testing them on the problem of feature selection with submodular cost features.
Using ilp to improve planning in hierarchical reinforcement learning
 The Tenth International Conference ILP2000
, 2000
"... Abstract. Hierarchical reinforcement learning has been proposed as a solution to the problem of scaling up reinforcement learning. The RLTOPs Hierarchical Reinforcement Learning System is an implementation of this proposal which structures an agent’s sensors and actions into various levels of repre ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Hierarchical reinforcement learning has been proposed as a solution to the problem of scaling up reinforcement learning. The RLTOPs Hierarchical Reinforcement Learning System is an implementation of this proposal which structures an agent’s sensors and actions into various levels of representation and control. Disparity between levels of representation means actions can be misused by the planning algorithm in the system. This paper reports on how ILP was used to bridge these representation gaps and shows empirically how this improved the system’s performance. Also discussed are some of the problems encountered when using an ILP system in what is inherently a noisy and incremental domain. 1
Hierarchical Reinforcement Learning: A Hybrid Approach
, 2002
"... In this thesis we investigate the relationships between the symbolic and subsymbolic methods used for controlling agents by artificial intelligence, focusing in particular on methods that learn. In light of the strengths and weaknesses of each approach, we propose a hybridisation of symbolic and su ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this thesis we investigate the relationships between the symbolic and subsymbolic methods used for controlling agents by artificial intelligence, focusing in particular on methods that learn. In light of the strengths and weaknesses of each approach, we propose a hybridisation of symbolic and subsymbolic methods to capitalise on the best features of each. We implement such a hybrid system, called Rachel which incorporates techniques from TeleoReactive Planning, Hierarchical Reinforcement Learning and Inductive Logic Programming. Rachel uses a novel representation of behaviours, ReinforcementLearnt Teleooperators (RLTops), which defines the behaviour in terms of its desired consequences but leaves the implementation of the policy to be learnt by reinforcement learning. An RLTop is an abstract, symbolic description of the purpose of a behaviour, and is used by Rachel both as a planning operator and as the definition of a reward function by which the behaviour can be learnt. Two new
CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules
, 2004
"... The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that invo ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal, continuous, and missingvalue attributes. The algorithm's flexibility and e#ciency are shown on several wellknown benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4's accuracy, CPU time, and rule complexity. CLIP4 has builtin features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), dataindependent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes. CLIP4 generates model of data that consists of wellgeneralized rules, and ranks attributes and selectors that can be used for feature selection.
Concept Reliability in Machine Learning
 Proceedings of the Second Midwest Artificial Intelligence and Cognitive Science Society Conference. J. Dinsmore and T. Koschmann (Eds
, 1990
"... Introduction Much machine learning research addresses inductive learning  learning relationships from a set of examples (Michalski (1986) provides an excellent introduction). For instance, some programs have been used to learn medical diagnostic rules from a database of patients whose diagnoses a ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Introduction Much machine learning research addresses inductive learning  learning relationships from a set of examples (Michalski (1986) provides an excellent introduction). For instance, some programs have been used to learn medical diagnostic rules from a database of patients whose diagnoses are known. These programs examine a number of attributes (e.g. age, temperature, and pulse rate) for a set of examples whose classification (e.g. diagnosis) is known. This set of examples is termed a training set. Attributes tests are combined into logical rules which are used to predict the classification (e.g. if (age > 5) and (temperature > 100), then (preliminarydiagnosis = notnormal)). These rules are generally termed concepts. Reliability and induction The probability that a given concept will accurately classify a training set by chance alone, denoted here as P, is a fundamental cha
Order Effects in Incremental Learning
"... this paper. We maintain that any viable theory of human learning must be based on this definition, and we will see that many common learning methods satisfy it, though they are seldom presented in these terms. We can loosen our definition somewhat to allow storage of a few competing knowledge struct ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
this paper. We maintain that any viable theory of human learning must be based on this definition, and we will see that many common learning methods satisfy it, though they are seldom presented in these terms. We can loosen our definition somewhat to allow storage of a few competing knowledge structures, or to allow a current structure with a number of possible successors, from which one is then selected. These variations still restrict memory to a manageable size. 2.2 Definitions of Order Effects
An Attribute Weight Setting Method for kNN Based Binary Classification using Quadratic Programming
 In Proceedings of 15th European Conference on Artificial Intelligence (ECAI
, 2002
"... Abstract. In this paper, we propose a new attribute weight setting method for kNN based classifiers using quadratic programming, which is particular suitable for binary classification problems. Our method formalises the attribute weight setting problem as a quadratic programming problem and exploit ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we propose a new attribute weight setting method for kNN based classifiers using quadratic programming, which is particular suitable for binary classification problems. Our method formalises the attribute weight setting problem as a quadratic programming problem and exploits commercial software to calculate attribute weights. Experiments show that our method is quite practical for various problems and can achieve a competitive performance. Another merit of the method is that it can use small training sets. 1.
ORIGINAL CONTRIBUTION ARTMAP: Supervised RealTime Learning and Classification of Nonstationary Data by a SelfOrganizing Neural Network
, 1991
"... AbstractThis article introduces a new neural network architecture, called ARTMAP, that autonomously learns to class(~v arbitrarily many, arbitrarily ordered vectors into recognition categories based on predictive success. This supervised learning system is " built up from a pair of Adapti ..."
Abstract
 Add to MetaCart
AbstractThis article introduces a new neural network architecture, called ARTMAP, that autonomously learns to class(~v arbitrarily many, arbitrarily ordered vectors into recognition categories based on predictive success. This supervised learning system is &quot; built up from a pair of Adaptive Resonance Theory modules (ART, and ARTh) that are capable of selforganizing stable recognition categories in response to arbitra O ' sequences of input patterns. During training trials, the ART, module receives a stream [a ~pj] of input patterns, and ARTh receives a stream [b ~p~] of input patterns, where b ~ is the correct prediction given a ~p~. These ART modules are linked by an associative learning network and an internal controller that ensures autonomous system operation in real time. During test trials', the remaining patterns a ~' ~ are presented without b ~p~, and their predictions at ARTb are compared with b(pL Tested on a benchmark machine learning database in both online and off line simulations, the ARTMAP system learns orders &quot; of magnitude more quickly, efficiently, and accurate(v than alternative algorithms, and achieves 100 % accuracy after training on less than half the input patterns in the database. It achieves these properties by using an internal controller that conjointly maximizes predictive generalization and minimizes predictive error by linking predictive success to category size on a trialbytrial basis', using only local operations. This computation increases the vigilance parameter p~, of ART ~ bv the minimal amount needed to correct a predictive error at A R TI,. Parameter p,, calibrates the minimum confidence that A R T,
Some Performance Comparisons for SelfGenerating Neural Tree
"... The performance of SelfGenerating Neural Tree (SGNT) method is analysed using some public domain data sets and the Monk problems  a de facto standard benchmark set. Comparisons are performed between SGNT and other methods. The results show that SGNT method is superior to all the unsupervised l ..."
Abstract
 Add to MetaCart
(Show Context)
The performance of SelfGenerating Neural Tree (SGNT) method is analysed using some public domain data sets and the Monk problems  a de facto standard benchmark set. Comparisons are performed between SGNT and other methods. The results show that SGNT method is superior to all the unsupervised learning methods and some popular supervised learning methods in both accuracy and speed of learning. 1 INTRODUCTION The SGNN (SelfGenerating Neural Network) method proposed in [10, 1] is an unsupervised learning method. It has been applied to different application areas such as image coding, diagnostic expert system and document/information retrieval system [8]. In this paper, the performance of a kind of SGNN, SelfGenerating Neural Trees (SGNT) is analysed using some public domain data sets and the Monk problems [6] a de facto standard benchmark set for testing supervised/unsupervised learning methods. Comparisons between our method and other (supervised/unsupervised) learning metho...