Results 1 - 10
of
10
Instance-based learning algorithms
- Machine Learning
, 1991
"... Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to ..."
Abstract
-
Cited by 897 (18 self)
- Add to MetaCart
Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several realworld databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.
The omnipresence of case-based reasoning in science and application
- KNOWLEDGE-BASED SYSTEMS
, 1998
"... A surprisingly large number of research disciplines have contributed towards the development of knowledge on lazy problem solving, which is characterized by its storage of ground cases and its demand driven response to queries. Case-based reasoning (CBR) is an alternative, increasingly popular appro ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
A surprisingly large number of research disciplines have contributed towards the development of knowledge on lazy problem solving, which is characterized by its storage of ground cases and its demand driven response to queries. Case-based reasoning (CBR) is an alternative, increasingly popular approach for designing expert systems that implements this approach. This paper lists pointers to some contributions in some related disciplines that offer insights for CBR research. We then outline a small number of Navy applications based on this approach that demonstrate its breadth of applicability. Finally, we list a few successful and failed attempts to apply CBR, and list some predictions on the future roles of CBR in applications.
Prototype Selection for Composite Nearest Neighbor Classifiers
, 1997
"... Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identificatio ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number o...
Applying MDL to Learning Best Model Granularity
, 1994
"... The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies "Occam's Razor." In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Base...
Recognition for large sets of handwritten mathematical symbols
- In ICDAR
, 2005
"... Natural and convenient mathematical handwriting recognition requires recognizers for large sets of handwritten symbols. This paper presents a recognition system for such handwritten mathematical symbols. We use a pre-classification strategy, in combination with elastic matching, to improve recogniti ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Natural and convenient mathematical handwriting recognition requires recognizers for large sets of handwritten symbols. This paper presents a recognition system for such handwritten mathematical symbols. We use a pre-classification strategy, in combination with elastic matching, to improve recognition speed. Elastic matching is a model-based method that involves computation proportional to the set of candidate models. To solve this problem, we prune prototypes by examining character features. To this end, we have defined and analyzed different features. By applying these features into an elastic recognition system, the recognition speed is improved while maintain high recognition accuracy. 1.
Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition
"... Successful mathematical handwriting recognition will require recognizers for large sets of handwritten symbols. This paper presents a recognition system for such handwritten mathematical symbols. The recognizer can provide a component of a handwritten interface for computer algebra systems such as M ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Successful mathematical handwriting recognition will require recognizers for large sets of handwritten symbols. This paper presents a recognition system for such handwritten mathematical symbols. The recognizer can provide a component of a handwritten interface for computer algebra systems such as Maple. Large sets of similar symbols present new challenges in the area of handwriting recognition, and we address these here. We use a pre-classification strategy, in combination with elastic matching, to improve recognition speed. Elastic matching is a model-based method that involves computation proportional to the set of candidate models. To solve this problem, we prune prototypes by examining character features. To this end, we have defined and analyzed different features. By applying these features into an elastic recognition system, the recognition speed is improved while maintaining high recognition accuracy. 1
Learning On-Line Handwritten Characters
- In The Minimum Description Length Criterion
"... We report on an experiment in learning to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on elastic matching, the optimal prediction rate is predicted for the parameters (length of sampling interval) considered most likely by a pri ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We report on an experiment in learning to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on elastic matching, the optimal prediction rate is predicted for the parameters (length of sampling interval) considered most likely by a prior-free form of Bayesian inference, the so-called minimum description length (MDL) principle. There, the most likely hypothesis is the one which minimizes the sum of the length of the description of the hypothesis and the length of the description of the data relative to the hypothesis. This theory is solidly based on a provably ideal method of inference using Kolmogorov complexity. The soundness of the theoretical approach to the problem of learning handwritten characters is evidenced by the fact that the learning parameters achieving the optimal recognition rate as predicted by the theory coincide with the best parameters found experimentally. 1 Introduction It is commonly accepted that a...
Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms
- Machine Learning: Proceedings of the Eleventh International Conference
, 1994
"... With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term "prototypes" refers to the reference instances used in a nearest neighbor computation --- the instances with respec ..."
Abstract
- Add to MetaCart
With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term "prototypes" refers to the reference instances used in a nearest neighbor computation --- the instances with respect to which similarity is assessed in order to assign a class to a new data item. Both algorithms rely on stochastic techniques to search the space of sets of prototypes and are simple to implement. The first is a Monte Carlo sampling algorithm; the second applies random mutation hill climbing. On four datasets we show that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater. Finally, we explain the performance of the sampling algorithm on these datasets in terms of a statistical measure of the extent of clustering displayed by the target class...
Machine Learning: An Annotated Bibliography for the 1995 AI & . . .
, 1995
"... This is a brief annotated bibliography that I wanted to make available to the attendees of my Machine Learning tutorial at the 1995 AI & Statistics Workshop. These slides ..."
Abstract
- Add to MetaCart
This is a brief annotated bibliography that I wanted to make available to the attendees of my Machine Learning tutorial at the 1995 AI & Statistics Workshop. These slides
On the Recognition of Handwritten . . .
, 2007
"... We have examined the problem of machine recognition of handwritten mathematical symbols. We focus on the case where ink-stroke information is available, as it would be collected from a digital pen. We have examined a number of problems: handwriting variant analysis, feature extraction, grouping set ..."
Abstract
- Add to MetaCart
We have examined the problem of machine recognition of handwritten mathematical symbols. We focus on the case where ink-stroke information is available, as it would be collected from a digital pen. We have examined a number of problems: handwriting variant analysis, feature extraction, grouping sets of characters, encoding handwritten mathematical symbols and building recognizers. One of the difficulties of handwritten mathematical symbol recognition lies in the variability of the symbols. We have performed handwriting variance analysis and identified the factors contributing to the variants. Based on the analysis of 800M data in a format that includes symbol names, start time, end time, x and y coordi-nates and pressure, we developed an allomorph set for each mathematical symbol. We then used them to build models. We have examined and selected different fea-tures of handwritten mathematical symbols and proposed new algorithms for feature extraction are proposed. For well-known features such as loops, our algorithms can

