Results 1  10
of
56
Instancebased learning algorithms
 Machine Learning
, 1991
"... Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to ..."
Abstract

Cited by 1053 (18 self)
 Add to MetaCart
Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instancebased learning, that generates classification predictions using only specific instances. Instancebased learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storagereducing algorithm performs well on several realworld databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noisetolerant decision tree algorithm.
EpsilonNets and Simplex Range Queries
, 1986
"... We present a new technique for halfspace and simplex range query using O(n) space and O(n a) query time, where a < if(al) +7 for all dimensions d ~2 a(al) + 1 and 7> 0. These bounds are better than those previously published for all d ~ 2. The technique uses random sampling to build a partition ..."
Abstract

Cited by 266 (11 self)
 Add to MetaCart
We present a new technique for halfspace and simplex range query using O(n) space and O(n a) query time, where a < if(al) +7 for all dimensions d ~2 a(al) + 1 and 7> 0. These bounds are better than those previously published for all d ~ 2. The technique uses random sampling to build a partitiontree structure. We introduce the concept of an enet for an abstract set of ranges to describe the desired result of this random sampling and give necessary and sufficient conditions that a random sample is an enet with high probability. We illustrate the application of these ideas to other range query problems.
Computational Limitations on Learning from Examples
 Journal of the ACM
, 1988
"... Abstract. The computational complexity of learning Boolean concepts from examples is investigated. It is shown for various classes of concept representations that these cannot be learned feasibly in a distributionfree sense unless R = NP. These classes include (a) disjunctions of two monomials, (b) ..."
Abstract

Cited by 192 (10 self)
 Add to MetaCart
Abstract. The computational complexity of learning Boolean concepts from examples is investigated. It is shown for various classes of concept representations that these cannot be learned feasibly in a distributionfree sense unless R = NP. These classes include (a) disjunctions of two monomials, (b) Boolean threshold functions, and (c) Boolean formulas in which each variable occurs at most once. Relationships between learning of heuristics and finding approximate solutions to NPhard optimization problems are given. Categories and Subject Descriptors: F. 1.1 [Computation by Abstract Devices]: Models of Computationrelations among models; F. 1.2 [Computation by Abstract Devices]: Modes of Computationprobabilistic computation; F. 1.3 [Computation by Abstract Devices]: Complexity Classesreducibility and completeness; 1.2.6 [Artificial Intelligence]: Learningconcept learning; induction
Approximation Algorithms For Geometric Problems
, 1995
"... INTRODUCTION 8.1 This chapter surveys approximation algorithms for hard geometric problems. The problems we consider typically take inputs that are point sets or polytopes in two or threedimensional space, and seek optimal constructions, (which may be trees, paths, or polytopes). We limit attent ..."
Abstract

Cited by 82 (1 self)
 Add to MetaCart
INTRODUCTION 8.1 This chapter surveys approximation algorithms for hard geometric problems. The problems we consider typically take inputs that are point sets or polytopes in two or threedimensional space, and seek optimal constructions, (which may be trees, paths, or polytopes). We limit attention to problems for which no polynomialtime exact algorithms are known, and concentrate on bounds for worstcase approximation ratios, especially bounds that depend intrinsically on geometry. We illustrate our intentions with two wellknown problems. Given a finite set of points S in the plane, the Euclidean traveling salesman problem asks for the shortest tour of S. Christofides' algorithm achieves approximation ratio 3 2 for this problem, meaning that it always computes a tour of length at most threehalves the length of the optimal tour. This bound depends only on the triangle inequality, so Christofides' algorit
Hellytype theorems and generalized linear programming
 DISCRETE COMPUT. GEOM
, 1994
"... This thesis establishes a connection between the Helly theorems, a collection of results from combinatorial geometry, and the class of problems which we call Generalized Linear Programming, or GLP, which can be solved by combinatorial linear programming algorithms like the simplex method. We use the ..."
Abstract

Cited by 60 (0 self)
 Add to MetaCart
This thesis establishes a connection between the Helly theorems, a collection of results from combinatorial geometry, and the class of problems which we call Generalized Linear Programming, or GLP, which can be solved by combinatorial linear programming algorithms like the simplex method. We use these results to explore the class GLP and show new applications to geometric optimization, and also to prove Helly theorems. In general, a GLP is a set...
Learning Simple Concepts Under Simple Distributions
 SIAM JOURNAL OF COMPUTING
, 1991
"... We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to le ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to learn are usually learnable. To model the intuitive notion of learning more closely, we do not require that the learning algorithm learns (polynomially) under all distributions, but only under all simple distributions. A distribution is simple if it is dominated by an enumerable distrib...
Relating data compression and learnability
, 1986
"... We explore the learnability of twovalued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts ..."
Abstract

Cited by 55 (1 self)
 Add to MetaCart
We explore the learnability of twovalued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sufficient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. This approach provides an alternative to that of [BEHW86], which uses the VapnikChervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the VapnikChervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning. 1
Compression, Significance and Accuracy
, 1992
"... Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance t ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance tests take account of the relevance or utility of the background knowledge. In this paper we describe a method, called HPcompression, of evaluating the significance of a hypothesis based on the degree to which it allows compression of the observed data with respect to the background knowledge. This can be measured by comparing the lengths of the input and output tapes of a reference Turing machine which will generate the examples from the hypothesis and a set of derivational proofs. The model extends an earlier approach of Muggleton by allowing for noise. The truth values of noisy instances are switched by making use of correction codes. The utility of compression as a significance measure is evaluated empirically in three independent domains. In particular, the results show that the existence of positive compression distinguishes a larger number of significant clauses than other significance tests The method is also shown to reliably distinguish artificially introduced noise as incompressible data.