Results 1 - 10
of
13
Instance-based learning algorithms
- Machine Learning
, 1991
"... Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to ..."
Abstract
-
Cited by 897 (18 self)
- Add to MetaCart
Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several realworld databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features
- Machine Learning
, 1993
"... In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of t ..."
Abstract
-
Cited by 249 (3 self)
- Add to MetaCart
In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior ...
Data Mining: Research Trends, Challenges, and Applications
- in Roughs Sets and Data Mining: Analysis of Imprecise Data
, 1997
"... Data mining is an interdisciplinary research area spanning severals disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and pra ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Data mining is an interdisciplinary research area spanning severals disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Many aspects of data mining have been investigated in several related fields. A unique but important aspect of the problem lies in the significance of needs to extend these studies to include the nature of the contents of the real-world databases. In this chapter, we discuss the theory and foundational issues in data mining, describe data mining methods and algorithms, and review data mining applications. Since a major focus of this book is on rough sets and its applications to database mining, one full section is devoted to summari...
Case-based reasoning: an overview
- AI Communications
, 1997
"... Abstract. An important step in the solution of a target problem in case-based reasoning (CBR) is the retrieval of similar previous cases that can be used to solve the target problem. We review a selection of papers from the CBR literature on aspects of retrieval, such as approaches to the assessment ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract. An important step in the solution of a target problem in case-based reasoning (CBR) is the retrieval of similar previous cases that can be used to solve the target problem. We review a selection of papers from the CBR literature on aspects of retrieval, such as approaches to the assessment of surface and structural similarity and techniques for automating the construction and maintenance of similarity measures. We also examine a number of retrieval techniques that have been developed to address the limitations of retrieval based purely on similarity. 1
Best-Case Results for Nearest Neighbor Learning
- IEEE Trans. Pattern Anal. Machine Intell
, 1995
"... In this paper we propose a theoretical model for analysis of classification methods, in which the teacher knows the classification algorithm and chooses examples in the best way possible. We apply this model using the nearestneighbor learning algorithm, and develop upper and lower bounds on sample c ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we propose a theoretical model for analysis of classification methods, in which the teacher knows the classification algorithm and chooses examples in the best way possible. We apply this model using the nearestneighbor learning algorithm, and develop upper and lower bounds on sample complexity for several different concept classes. For some concept classes, the sample complexity turns out to be exponential even using this best-case model, which implies that the concept class is inherently difficult for the nearest-neighbor algorithm. We identify several geometric properties that make learning certain concepts relatively easy. Finally we discuss the relation of our work to helpful teacher models, its application to decision-tree learning algorithms, and some of its implications for current experimental work. Keywords---machine learning, nearest-neighbor, geometric concepts. I. Introduction Since their introduction in the 1950's, nearest-neighbor (NN) classifiers have be...
Improving Classification Methods via Feature Selection
- Machine Learning
, 1992
"... We have been experimenting with methods for improving the speed and accuracy of machine learning programs on large data sets, especially those in which the data objects have large numbers of features. The development of automated solutions to this problem is crucial for the success of future data co ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We have been experimenting with methods for improving the speed and accuracy of machine learning programs on large data sets, especially those in which the data objects have large numbers of features. The development of automated solutions to this problem is crucial for the success of future data collection efforts, in which hundreds of millions of objects will need to be classified on-line. Accuracies must be very high in order to ensure that objects are not stored with the wrong labels or in the wrong databases. In addition, methods should be able to identify the most relevant features to use for a particular classification task. We have developed feature selection methods and classification algorithms for application on large, real-world databases. Our feature selection algorithm searches a small fraction of the possible subsets of features, and it often finds optimal or near-optimal classifiers. By combining this algorithm with machine learning methods, we have been able to elimina...
Self-Organizing Cases to Find Paradigms
, 1999
"... Case-based information systems can be seen as lazy machine learning algorithms; they select a number of training instances and then classify unseen cases as the most similar stored instance. One of the main disadvantages of these systems is the high number of patterns retained. In this paper, a new ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Case-based information systems can be seen as lazy machine learning algorithms; they select a number of training instances and then classify unseen cases as the most similar stored instance. One of the main disadvantages of these systems is the high number of patterns retained. In this paper, a new method for extracting just a small set of paradigms from a set of training examples is presented. Additionally, we provide the set of attributes describing the representative examples that are relevant for classification purposes. Our algorithm computes the Kohonen self-organizing maps attached to the training set to then compute the coverage of each map node. Finally, a heuristic procedure selects both the paradigms and the dimensions (or attributes) to be considered when measuring similarity in future classification tasks.
Applications of Machine Learning in Information Retrieval
, 1997
"... Information retrieval systems provide access to collections of thousands, or millions, of documents, from which, by providing an appropriate description, users can recover any one. Typically, users iteratively refine the descriptions they provide to satisfy their needs, and retrieval systems can uti ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Information retrieval systems provide access to collections of thousands, or millions, of documents, from which, by providing an appropriate description, users can recover any one. Typically, users iteratively refine the descriptions they provide to satisfy their needs, and retrieval systems can utilize user feedback on selected documents to indicate the accuracy of
The State of Rough Sets for Database Mining Applications
- San Jose State University
, 1995
"... The database mining problem is often cited as one of the most promising research topics in the fields of database systems and machine learning. Although many available machine learning algorithms are potentially applicable, real-world databases pose additional difficulties partly due to the nature o ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The database mining problem is often cited as one of the most promising research topics in the fields of database systems and machine learning. Although many available machine learning algorithms are potentially applicable, real-world databases pose additional difficulties partly due to the nature of their contents. In this article, we describe the characteristic features of the database mining problem, a subset of data mining queries, and the approaches for designing a database mining environment. Then, in that context, we summarize the state of rough sets and present future directions. 1 Introduction It is estimated that the amount of information in the world doubles every 20 months[10]; that is, some scientific, government and corporate information systems are being overwhelmed by a flood of data that are generated and stored, routinely. These massive amounts of data are beyond human experts' ability to be analyzed, though they contain potential gold mine of valuable information. U...
SHAPE: A Machine Learning System from Examples
, 1995
"... This paper presents a new machine learning system called SHAPE. The input data are vectors of properties (represented as attribute-value pairs) which are used to describe individual cases, examples or observations in a given world. Each case belongs to exactly one of a set of classes, and the aim ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper presents a new machine learning system called SHAPE. The input data are vectors of properties (represented as attribute-value pairs) which are used to describe individual cases, examples or observations in a given world. Each case belongs to exactly one of a set of classes, and the aim is to produce a collection of decision rules concluding the class according to the properties observed.

