Results 1 - 10
of
49
Instance-based learning algorithms
- Machine Learning
, 1991
"... Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to ..."
Abstract
-
Cited by 897 (18 self)
- Add to MetaCart
Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several realworld databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.
Epsilon-Nets and Simplex Range Queries
, 1986
"... We present a new technique for half-space and simplex range query using O(n) space and O(n a) query time, where a < if(a-l) +7 for all dimensions d ~2 a(a-l) + 1 and 7> 0. These bounds are better than those previously published for all d ~ 2. The technique uses random sampling to build a partition- ..."
Abstract
-
Cited by 240 (10 self)
- Add to MetaCart
We present a new technique for half-space and simplex range query using O(n) space and O(n a) query time, where a < if(a-l) +7 for all dimensions d ~2 a(a-l) + 1 and 7> 0. These bounds are better than those previously published for all d ~ 2. The technique uses random sampling to build a partition-tree structure. We introduce the concept of an e-net for an abstract set of ranges to describe the desired result of this random sampling and give necessary and sufficient conditions that a random sample is an e-net with high probability. We illustrate the application of these ideas to other range query problems.
Computational Limitations on Learning from Examples
- Journal of the ACM
, 1988
"... Abstract. The computational complexity of learning Boolean concepts from examples is investigated. It is shown for various classes of concept representations that these cannot be learned feasibly in a distribution-free sense unless R = NP. These classes include (a) disjunctions of two monomials, (b) ..."
Abstract
-
Cited by 182 (10 self)
- Add to MetaCart
Abstract. The computational complexity of learning Boolean concepts from examples is investigated. It is shown for various classes of concept representations that these cannot be learned feasibly in a distribution-free sense unless R = NP. These classes include (a) disjunctions of two monomials, (b) Boolean threshold functions, and (c) Boolean formulas in which each variable occurs at most once. Relationships between learning of heuristics and finding approximate solutions to NP-hard optimization problems are given. Categories and Subject Descriptors: F. 1.1 [Computation by Abstract Devices]: Models of Computation-relations among models; F. 1.2 [Computation by Abstract Devices]: Modes of Computation-probabi-listic computation; F. 1.3 [Computation by Abstract Devices]: Complexity Classes-reducibility and completeness; 1.2.6 [Artificial Intelligence]: Learning-concept learning; induction
On the learnability of boolean formulae
- In Proceedings of the nineteenth annual ACM symposium on theory of computing
, 1987
"... ..."
Approximation Algorithms For Geometric Problems
, 1995
"... INTRODUCTION 8.1 This chapter surveys approximation algorithms for hard geometric problems. The problems we consider typically take inputs that are point sets or polytopes in two- or three-dimensional space, and seek optimal constructions, (which may be trees, paths, or polytopes). We limit attent ..."
Abstract
-
Cited by 74 (1 self)
- Add to MetaCart
INTRODUCTION 8.1 This chapter surveys approximation algorithms for hard geometric problems. The problems we consider typically take inputs that are point sets or polytopes in two- or three-dimensional space, and seek optimal constructions, (which may be trees, paths, or polytopes). We limit attention to problems for which no polynomial-time exact algorithms are known, and concentrate on bounds for worst-case approximation ratios, especially bounds that depend intrinsically on geometry. We illustrate our intentions with two well-known problems. Given a finite set of points S in the plane, the Euclidean traveling salesman problem asks for the shortest tour of S. Christofides' algorithm achieves approximation ratio 3 2 for this problem, meaning that it always computes a tour of length at most three-halves the length of the optimal tour. This bound depends only on the triangle inequality, so Christofides' algorit
Learning Simple Concepts Under Simple Distributions
- SIAM JOURNAL OF COMPUTING
, 1991
"... We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to le ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to learn are usually learnable. To model the intuitive notion of learning more closely, we do not require that the learning algorithm learns (polynomially) under all distributions, but only under all simple distributions. A distribution is simple if it is dominated by an enumerable distrib...
Helly-type theorems and generalized linear programming
- Discrete Comput. Geom
, 1994
"... This thesis establishes a connection between the Helly theorems, a collection of results from combinatorial geometry, and the class of problems whichwe call Generalized Linear Programming, or GLP, which can be solved by combinatorial linear programming algorithms like the simplex method. We use thes ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
This thesis establishes a connection between the Helly theorems, a collection of results from combinatorial geometry, and the class of problems whichwe call Generalized Linear Programming, or GLP, which can be solved by combinatorial linear programming algorithms like the simplex method. We use these results to explore the class GLP and show new applications to geometric optimization, and also to prove Helly theorems. In general, a GLP is a set...
Relating data compression and learnability
, 1986
"... We explore the learnability of two-valued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
We explore the learnability of two-valued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sufficient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. This approach provides an alternative to that of [BEHW86], which uses the Vapnik-Chervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the Vapnik-Chervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning. 1
Compression, Significance and Accuracy
, 1992
"... Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance t ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance tests take account of the relevance or utility of the background knowledge. In this paper we describe a method, called HP-compression, of evaluating the significance of a hypothesis based on the degree to which it allows compression of the observed data with respect to the background knowledge. This can be measured by comparing the lengths of the input and output tapes of a reference Turing machine which will generate the examples from the hypothesis and a set of derivational proofs. The model extends an earlier approach of Muggleton by allowing for noise. The truth values of noisy instances are switched by making use of correction codes. The utility of compression as a significance measure is evaluated empirically in three independent domains. In particular, the results show that the existence of positive compression distinguishes a larger number of significant clauses than other significance tests The method is also shown to reliably distinguish artificially introduced noise as incompressible data.

