Results 1  10
of
95
Mining Association Rules between Sets of Items in Large Databases
 IN: PROCEEDINGS OF THE 1993 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, WASHINGTON DC (USA
, 1993
"... We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel esti ..."
Abstract

Cited by 2418 (15 self)
 Add to MetaCart
We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
 Machine Learning
, 1988
"... learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each ex ..."
Abstract

Cited by 680 (5 self)
 Add to MetaCart
learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linearthreshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space. 1.
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 288 (6 self)
 Add to MetaCart
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 195 (7 self)
 Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
Learning in the Presence of Malicious Errors
 SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
Abstract

Cited by 168 (12 self)
 Add to MetaCart
In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worstcase model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distributionfree model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distributionfree model typically makes the idealize...
An empirical comparison of pattern recognition, neural nets, and machine learning classification methods
 In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence
, 1989
"... Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four realworld data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by ..."
Abstract

Cited by 126 (2 self)
 Add to MetaCart
Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four realworld data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by statisucal uncertainty; there is no completely accurate solution to these problems. Training and testing or resampling techniques are used to estimate the true error rates of the classification methods. Detailed attention is given to the analysis of performance of the neural nets using back propagation. For these problems, which have relatively few hypotheses and features, the machine learning procedures for rule induction or tree induction clearly performed best. 1
Learning conjunctions of Horn clauses
 In Proceedings of the 31st Annual Symposium on Foundations of Computer Science
, 1990
"... Abstract. An algorithm is presented for learning the class of Boolean formulas that are expressible as conjunctions of Horn clauses. (A Horn clause is a disjunction of literals, all but at most one of which is a negated variable.) The algorithm uses equivalence queries and membership queries to prod ..."
Abstract

Cited by 112 (16 self)
 Add to MetaCart
Abstract. An algorithm is presented for learning the class of Boolean formulas that are expressible as conjunctions of Horn clauses. (A Horn clause is a disjunction of literals, all but at most one of which is a negated variable.) The algorithm uses equivalence queries and membership queries to produce a formula that is logically equivalent to the unknown formula to be learned. The amount of time used by the algorithm is polynomial in the number of variables and the number of clauses in the unknown formula.
Bounding the VapnikChervonenkis dimension of concept classes parameterized by real numbers
 Machine Learning
, 1995
"... Abstract. The VapnikChervonenkis (VC) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the VC dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are au ..."
Abstract

Cited by 91 (1 self)
 Add to MetaCart
Abstract. The VapnikChervonenkis (VC) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the VC dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are automatic for discrete concept classes, but hitherto little has been known about what general conditions guarantee polynomial bounds on VC dimension for classes in which concepts and examples are represented by tuples of real numbers. In this paper, we show that for two general kinds of concept class the VC dimension is polynomially bounded in the number of real numbers used to define a problem instance. One is classes where the criterion for membership of an instance in a concept can be expressed as a formula (in the firstorder theory of the reals) with fixed quantification depth and exponentiallybounded length, whose atomic predicates are polynomial inequalities of exponentiallybounded degree. The other is classes where containment of an instance in a concept is testable in polynomial time, assuming we may compute standard arithmetic operations on reals exactly in constant time. Our results show that in the continuous case, as in the discrete, the real barrier to efficient learning in the Occam sense is complexitytheoretic and not informationtheoretic. We present examples to show how these results apply to concept classes defined by geometrical figures and neural nets, and derive polynomial bounds on the VC dimension for these classes. Keywords: Concept learning, information theory, VapnikChervonenkis dimension, Milnor’s theorem 1.
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withf ..."
Abstract

Cited by 87 (18 self)
 Add to MetaCart
Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but nonnegligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribuand exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).