Results 1 - 10
of
128
Property Testing and its connection to Learning and Approximation
"... We study the question of determining whether an unknown function has a particular property or is ffl-far from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the fun ..."
Abstract
-
Cited by 370 (48 self)
- Add to MetaCart
We study the question of determining whether an unknown function has a particular property or is ffl-far from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the function on instances of its choice. First, we establish some connections between property testing and problems in learning theory. Next, we focus on testing graph properties, and devise algorithms to test whether a graph has properties such as being k-colorable or having a ae-clique (clique of density ae w.r.t the vertex set). Our graph property testing algorithms are probabilistic and make assertions which are correct with high probability, utilizing only poly(1=ffl) edge-queries into the graph, where ffl is the distance parameter. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph which corre...
How to Use Expert Advice
- JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract
-
Cited by 267 (60 self)
- Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
- Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract
-
Cited by 196 (11 self)
- Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Efficient Distribution-free Learning of Probabilistic Concepts
- Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract
-
Cited by 182 (8 self)
- Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or p-concepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of p-concepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of p-concepts, we study and develop in detail an underlying theory of learning p-concepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Correlation Clustering
- MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract
-
Cited by 158 (4 self)
- Add to MetaCart
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis
- IN PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... We present new results on the well-studied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learn ..."
Abstract
-
Cited by 105 (24 self)
- Add to MetaCart
We present new results on the well-studied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learning general DNF in polynomial time in a nontrivial model. Our results should be contrasted with those of Kharitonov [12], who proved that AC 0 is not eciently learnable in this model based on cryptographic assumptions. We also present ecient learning algorithms in various models for the read-k and SAT-k subclasses of DNF. We then turn our attention to the recently introduced statistical query model of learning [9]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model, and practically every PAC learning algorithm falls into the statistical query model [9]. We prove that DNF and decision trees are not even weakly learnable in polynomial time in this model. This result is information-theoretic and therefore does not rely on any unproven assumptions, and demonstrates that no straightforward modication of the existing algorithms for learning various restricted forms of DNF and decision trees will solve the general problem. These lower bounds are a corollary of a more general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. The underlying tool for all of our results is the Fourier analysis of the concept class to be learned.
An experimental and theoretical comparison of model selection methods. Machine Learning 27
, 1997
"... In the model selection problem, we must balance the complexity of a statistical model with its goodness of fit to the training data. This problem arises repeatedly in statistical estimation, machine learning, and scientific inquiry in general. ..."
Abstract
-
Cited by 101 (5 self)
- Add to MetaCart
In the model selection problem, we must balance the complexity of a statistical model with its goodness of fit to the training data. This problem arises repeatedly in statistical estimation, machine learning, and scientific inquiry in general.
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping n-variate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of ag-fieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of alln-variate degreedpolynomials which agree withf ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Given a function f mapping n-variate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of ag-fieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of alln-variate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but non-negligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribu-and exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).
Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation
- Neural Computation
, 1997
"... In this paper we prove sanity-check bounds for the error of the leave-one-out cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate. The name sanity-check refers to the fact ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
In this paper we prove sanity-check bounds for the error of the leave-one-out cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate. The name sanity-check refers to the fact that although we often expect the leave-one-out estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on cross-validation. Any nontrivial bound on the error of leave-one-out must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearest-neighbor and other local algorithms. Here we introduce the new and weaker notion of error stability, and apply it to obtain sanity-check b...
Robust Trainability of Single Neurons
, 1995
"... It is well known that (McCulloch-Pitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linear-programming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex conce ..."
Abstract
-
Cited by 75 (0 self)
- Add to MetaCart
It is well known that (McCulloch-Pitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linear-programming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex concepts than just halfspaces are allowed. We show that the problem of learning a probably almost optimal weight vector for a neuron is so difficult that the minimum error cannot even be approximated to within a constant factor in polynomial time (unless RP = NP); we obtain the same hardness result for several variants of this problem. We considerably strengthen these negative results for neurons with binary weights 0 or 1. We also show that neither heuristical learning nor learning by sigmoidal neurons with a constant reject rate is efficiently possible (unless RP = NP).

