Results 1  10
of
155
Property Testing and its connection to Learning and Approximation
"... We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the fun ..."
Abstract

Cited by 421 (57 self)
 Add to MetaCart
We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the function on instances of its choice. First, we establish some connections between property testing and problems in learning theory. Next, we focus on testing graph properties, and devise algorithms to test whether a graph has properties such as being kcolorable or having a aeclique (clique of density ae w.r.t the vertex set). Our graph property testing algorithms are probabilistic and make assertions which are correct with high probability, utilizing only poly(1=ffl) edgequeries into the graph, where ffl is the distance parameter. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph which corre...
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 317 (66 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 247 (12 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Correlation Clustering
 MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract

Cited by 222 (4 self)
 Add to MetaCart
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Efficient Distributionfree Learning of Probabilistic Concepts
 Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or pconcepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of pconcepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of pconcepts, we study and develop in detail an underlying theory of learning pconcepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis
 IN PROCEEDINGS OF THE TWENTYSIXTH ANNUAL SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learn ..."
Abstract

Cited by 118 (23 self)
 Add to MetaCart
We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learning general DNF in polynomial time in a nontrivial model. Our results should be contrasted with those of Kharitonov [12], who proved that AC 0 is not eciently learnable in this model based on cryptographic assumptions. We also present ecient learning algorithms in various models for the readk and SATk subclasses of DNF. We then turn our attention to the recently introduced statistical query model of learning [9]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model, and practically every PAC learning algorithm falls into the statistical query model [9]. We prove that DNF and decision trees are not even weakly learnable in polynomial time in this model. This result is informationtheoretic and therefore does not rely on any unproven assumptions, and demonstrates that no straightforward modication of the existing algorithms for learning various restricted forms of DNF and decision trees will solve the general problem. These lower bounds are a corollary of a more general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. The underlying tool for all of our results is the Fourier analysis of the concept class to be learned.
An experimental and theoretical comparison of model selection methods. Machine Learning 27
, 1997
"... In the model selection problem, we must balance the complexity of a statistical model with its goodness of fit to the training data. This problem arises repeatedly in statistical estimation, machine learning, and scientific inquiry in general. ..."
Abstract

Cited by 110 (5 self)
 Add to MetaCart
In the model selection problem, we must balance the complexity of a statistical model with its goodness of fit to the training data. This problem arises repeatedly in statistical estimation, machine learning, and scientific inquiry in general.
Algorithmic Stability and SanityCheck Bounds for LeaveOneOut CrossValidation
 Neural Computation
, 1997
"... In this paper we prove sanitycheck bounds for the error of the leaveoneout crossvalidation estimate of the generalization error: that is, bounds showing that the worstcase error of this estimate is not much worse than that of the training error estimate. The name sanitycheck refers to the fact ..."
Abstract

Cited by 100 (0 self)
 Add to MetaCart
In this paper we prove sanitycheck bounds for the error of the leaveoneout crossvalidation estimate of the generalization error: that is, bounds showing that the worstcase error of this estimate is not much worse than that of the training error estimate. The name sanitycheck refers to the fact that although we often expect the leaveoneout estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on crossvalidation. Any nontrivial bound on the error of leaveoneout must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearestneighbor and other local algorithms. Here we introduce the new and weaker notion of error stability, and apply it to obtain sanitycheck b...
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withf ..."
Abstract

Cited by 87 (18 self)
 Add to MetaCart
Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but nonnegligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribuand exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).
Robust Trainability of Single Neurons
, 1995
"... It is well known that (McCullochPitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linearprogramming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex conce ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
It is well known that (McCullochPitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linearprogramming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex concepts than just halfspaces are allowed. We show that the problem of learning a probably almost optimal weight vector for a neuron is so difficult that the minimum error cannot even be approximated to within a constant factor in polynomial time (unless RP = NP); we obtain the same hardness result for several variants of this problem. We considerably strengthen these negative results for neurons with binary weights 0 or 1. We also show that neither heuristical learning nor learning by sigmoidal neurons with a constant reject rate is efficiently possible (unless RP = NP).