Results 11  20
of
217
Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets
 Journal of Artificial Intelligence Research
, 1997
"... This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to c ..."
Abstract

Cited by 122 (19 self)
 Add to MetaCart
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of nonzero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worstcase bounds for this structure for several models of data distribution. We empirically demonstrate that tractablysized data structures can be produced for large realworld datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its...
Bayesian Models for Keyhole Plan Recognition in an Adventure Game
, 1998
"... We present an approach to keyhole plan recognition which uses a dynamic belief (Bayesian) network to represent features of the domain that are needed to identify users' plans and goals. The application domain is a MultiUser Dungeon adventure game with thousands of possible actions and locations. W ..."
Abstract

Cited by 118 (10 self)
 Add to MetaCart
We present an approach to keyhole plan recognition which uses a dynamic belief (Bayesian) network to represent features of the domain that are needed to identify users' plans and goals. The application domain is a MultiUser Dungeon adventure game with thousands of possible actions and locations. We propose several network structures which represent the relations in the domain to varying extents, and compare their predictive power for predicting a user's current goal, next action and next location. The conditional probability distributions for each network are learned during a training phase, which dynamically builds these probabilities from observations of user behaviour. This approach allows the use of incomplete, sparse and noisy data during both training and testing. We then apply simple abstraction and learning techniques in order to speed up the performance of the most promising dynamic belief networks without a significant change in the accuracy of goal predictions. Our experi...
Efficient Memorybased Learning for Robot Control
, 1990
"... This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensorsan approach which is formalized here as the $AB (StateActionBehaviour) ..."
Abstract

Cited by 108 (2 self)
 Add to MetaCart
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensorsan approach which is formalized here as the $AB (StateActionBehaviour) control cycle. A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which permits fast recall of the closest previous experience to any new situation, thus permitting very quick predictions of the effects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. The learning can take place in highdimensional nonlinear control spaces with realvalued ranges of variables. Furthermore, the method avoids a number of shortcomings of earlier learning methods in which the controller can become trapped in inadequate performance which does not improve. Also considered is how the system is made resistant to noisy inputs and how it adapts to environmental changes. A well founded mechanism for choosing actions is introduced which solves the experiment/perform dilemma for this domain with adequate computational efficiency, and with fast convergence to the goal behaviour. The dissertation explefins in detail how the $AB control cycle can be integrated into both low and high complexity tasks. The methods and algorithms are evaluated with numerous experiments using both real and simulated robot domefins. The final experiment also illustrates how a compound learning task can be structured into a hierarchy of simple learning tasks.
Numerical Uncertainty Management in User and Student Modeling: An Overview of Systems and Issues
, 1996
"... . A rapidly growing number of user and student modeling systems have employed numerical techniques for uncertainty management. The three major paradigms are those of Bayesian networks, the DempsterShafer theory of evidence, and fuzzy logic. In this overview, each of the first three main sections fo ..."
Abstract

Cited by 104 (10 self)
 Add to MetaCart
. A rapidly growing number of user and student modeling systems have employed numerical techniques for uncertainty management. The three major paradigms are those of Bayesian networks, the DempsterShafer theory of evidence, and fuzzy logic. In this overview, each of the first three main sections focuses on one of these paradigms. It first introduces the basic concepts by showing how they can be applied to a relatively simple user modeling problem. It then surveys systems that have applied techniques from the paradigm to user or student modeling, characterizing each system within a common framework. The final main section discusses several aspects of the usability of these techniques for user and student modeling, such as their knowledge engineering requirements, their need for computational resources, and the communicability of their results. Key words: numerical uncertainty management, Bayesian networks, DempsterShafer theory, fuzzy logic, user modeling, student modeling 1. Introdu...
Controlling the Complexity of Learning in Logic through Syntactic and TaskOriented Models
 INDUCTIVE LOGIC PROGRAMMING
, 1992
"... Due to the inadequacy of attributeonly representations for many learning problems, there is now a renewed interest in algorithms employing firstorder logic or restricted variants thereof as their knowledge representation. In this paper, we give a brief overview of the dimensions along which the ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
Due to the inadequacy of attributeonly representations for many learning problems, there is now a renewed interest in algorithms employing firstorder logic or restricted variants thereof as their knowledge representation. In this paper, we give a brief overview of the dimensions along which the complexity of learning in such representations can be controlled. We then present RDT, a modelbased learning algorithm for functionfree Horn clauses with negation that introduces two new means of complexity control, namely the use of syntactic rule models, and the use of a taskoriented domain topology. We briefly describe some preliminary application results of RDT within the knowledge acquisition system MOBAL, and present directions of further research.
Efficient Progressive Sampling
, 1999
"... Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with small samples and progressively increasing them as long as model accuracy improves. We show that a simple, geometric sampling schedule is efficient in an asymptotic sense. We then explore the notion of optimal efficiency: what is the absolute best sampling schedule? We describe the issues involved in instantiating an "optimally efficient" progressive sampler. Finally,we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling often is preferable to analyzing all data instances.
Using Decision Trees to Improve CaseBased Learning
 In Proceedings of the Tenth International Conference on Machine Learning
, 1993
"... This paper shows that decision trees can be used to improve the performance of casebased learning (CBL) systems. We introduce a performance task for machine learning systems called semiflexible prediction that lies between the classification task performed by decision tree algorithms and the flexib ..."
Abstract

Cited by 90 (8 self)
 Add to MetaCart
This paper shows that decision trees can be used to improve the performance of casebased learning (CBL) systems. We introduce a performance task for machine learning systems called semiflexible prediction that lies between the classification task performed by decision tree algorithms and the flexible prediction task performed by conceptual clustering systems. In semiflexible prediction, learning should improve prediction of a specific set of features known a priori rather than a single known feature (as in classification) or an arbitrary set of features (as in conceptual clustering). We describe one such task from natural language processing and present experiments that compare solutions to the problem using decision trees, CBL, and a hybrid approach that combines the two. In the hybrid approach, decision trees are used to specify the features to be included in knearest neighbor case retrieval. Results from the experiments show that the hybrid approach outperforms both the decision ...
ErrorCorrecting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs
 IN PROCEEDINGS OF AAAI91
, 1991
"... Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k ? 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form hx i ; f(x i )i. Existing approaches to this pro ..."
Abstract

Cited by 88 (7 self)
 Add to MetaCart
Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k ? 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form hx i ; f(x i )i. Existing approaches to this problem include (a) direct application of multiclass algorithms such as the decisiontree algorithms ID3 and CART, (b) application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and (c) application of binary concept learning algorithms with distributed output codes such as those employed by Sejnowski and Rosenberg in the NETtalk system. This paper compares these three approaches to a new technique in which BCH errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the performance of ID3 on the NETtalk task and of backpropagation on an isolatedletter speechrecognition t...
Efficient Locally Weighted Polynomial Regression Predictions
 In Proceedings of the 1997 International Machine Learning Conference
"... Locally weighted polynomial regression (LWPR) is a popular instancebased algorithm for learning continuous nonlinear mappings. For more than two or three inputs and for more than a few thousand datapoints the computational expense of predictions is daunting. We discuss drawbacks with previous appr ..."
Abstract

Cited by 79 (11 self)
 Add to MetaCart
Locally weighted polynomial regression (LWPR) is a popular instancebased algorithm for learning continuous nonlinear mappings. For more than two or three inputs and for more than a few thousand datapoints the computational expense of predictions is daunting. We discuss drawbacks with previous approaches to dealing with this problem, and present a new algorithm based on a multiresolution search of a quicklyconstructible augmented kdtree. Without needing to rebuild the tree, we can make fast predictions with arbitrary local weighting functions, arbitrary kernel widths and arbitrary queries. The paper begins with a new, faster, algorithm for exact LWPR predictions. Next we introduce an approximation that achieves up to a twoordersof magnitude speedup with negligible accuracy losses. Increasing a certain approximation parameter achieves greater speedups still, but with a correspondingly larger accuracy degradation. This is nevertheless useful during operations such as the early stages...
Learning at the Knowledge Level
, 1986
"... When Newell introduced the concept of the knowledge level as a useful level of description for computer systems, he focused on the representation of knowledge. This paper applies the knowledge level notion to the problem of knowledge acquisition. Two interesting issues arise. First, some existing ma ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
When Newell introduced the concept of the knowledge level as a useful level of description for computer systems, he focused on the representation of knowledge. This paper applies the knowledge level notion to the problem of knowledge acquisition. Two interesting issues arise. First, some existing machine learning programs appear to be completely static when viewed at the knowledge level. These programs improve their performance without changing their "knowledge." Second, the behavior of some other machine learning programs cannot be predicted or described at the knowledge level. These programs take unjustified inductive leaps. The first programs are called symbol level learning (SLL) programs; the second, nondeductive knowledge level learning (NKLL) programs. The paper analyzes both of these classes of learning programs and speculates on the possibility of developing coherent theories of each. A theory of symbol level learning is sketched, and some reasons are presented for believing...