Results 1 - 10
of
120
Small-Bias Probability Spaces: Efficient Constructions and Applications
- SIAM J. Comput
, 1993
"... We show how to efficiently construct a small probability space on n binary random variables such that for every subset, its parity is either zero or one with "almost" equal probability. They are called ffl-biased random variables. The number of random bits needed to generate the random variables is ..."
Abstract
-
Cited by 227 (14 self)
- Add to MetaCart
We show how to efficiently construct a small probability space on n binary random variables such that for every subset, its parity is either zero or one with "almost" equal probability. They are called ffl-biased random variables. The number of random bits needed to generate the random variables is O(log n + log 1 ffl ). Thus, if ffl is polynomially small, then the size of the sample space is also polynomial. Random variables that are ffl-biased can be used to construct "almost" k-wise independent random variables where ffl is a function of k. These probability spaces have various applications: 1. Derandomization of algorithms: many randomized algorithms that require only k- wise independence of their random bits (where k is bounded by O(log n)), can be derandomized by using ffl-biased random variables. 2. Reducing the number of random bits required by certain randomized algorithms, e.g., verification of matrix multiplication. 3. Exhaustive testing of combinatorial circui...
Decoding Reed Solomon Codes beyond the Error-Correction Bound
, 1997
"... We present a randomized algorithm which takes as input n distinct points f(xi; yi)g n i=1 from F \Theta F (where F is a field) and integer parameters t and d and returns a list of all univariate polynomials f over F in the variable x of degree at most d which agree with the given set of points in a ..."
Abstract
-
Cited by 183 (16 self)
- Add to MetaCart
We present a randomized algorithm which takes as input n distinct points f(xi; yi)g n i=1 from F \Theta F (where F is a field) and integer parameters t and d and returns a list of all univariate polynomials f over F in the variable x of degree at most d which agree with the given set of points in at least t places (i.e., yi = f (xi) for at least t values of i), provided t = \Omega (
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution
, 1994
"... We present a membership-query algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this alg ..."
Abstract
-
Cited by 150 (12 self)
- Add to MetaCart
We present a membership-query algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this algorithm for learning DNF over certain nonuniform distributions and for learning a class of geometric concepts that generalizes DNF. Furthermore, we show that DNF is weakly learnable with respect to uniform from noisy examples. Our strong learning algorithm utilizes one of Freund's boosting techniques and relies on the fact that boosting does not require a completely distribution-independent weak learner. The boosted weak learner is a nonuniform extension of a parity-finding algorithm discovered by Goldreich and Levin. 3 1 Introduction Consider the following 20-questions-like game between two players, Bob and Alice. Bob has a Disjunctive Normal Form (DNF) expression f in mind. Alice is allo...
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
- Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract
-
Cited by 148 (15 self)
- Add to MetaCart
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis
- IN PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... We present new results on the well-studied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learn ..."
Abstract
-
Cited by 105 (24 self)
- Add to MetaCart
We present new results on the well-studied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learning general DNF in polynomial time in a nontrivial model. Our results should be contrasted with those of Kharitonov [12], who proved that AC 0 is not eciently learnable in this model based on cryptographic assumptions. We also present ecient learning algorithms in various models for the read-k and SAT-k subclasses of DNF. We then turn our attention to the recently introduced statistical query model of learning [9]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model, and practically every PAC learning algorithm falls into the statistical query model [9]. We prove that DNF and decision trees are not even weakly learnable in polynomial time in this model. This result is information-theoretic and therefore does not rely on any unproven assumptions, and demonstrates that no straightforward modication of the existing algorithms for learning various restricted forms of DNF and decision trees will solve the general problem. These lower bounds are a corollary of a more general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. The underlying tool for all of our results is the Fourier analysis of the concept class to be learned.
On the Boosting Ability of Top-Down Decision Tree Learning Algorithms
- In Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing
, 1995
"... We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of ..."
Abstract
-
Cited by 81 (6 self)
- Add to MetaCart
We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion function G used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1=2 \Gamma fl as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1=ffl) O(1=fl 2 ffl 2 ) and (1=ffl) O(log(1=ffl)=fl 2 ) (respectively) suffice to drive the error below ffl. Thus, small constant advantage over...
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping n-variate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of ag-fieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of alln-variate degreedpolynomials which agree withf ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Given a function f mapping n-variate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of ag-fieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of alln-variate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but non-negligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribu-and exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).
Collective Data Mining: A New Perspective Toward Distributed Data Analysis
- Advances in Distributed and Parallel Knowledge Discovery
, 1999
"... This paper introduces the collective data mining (CDM) framework, a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may result in ambiguous or incorrect global data models. It ..."
Abstract
-
Cited by 75 (12 self)
- Add to MetaCart
This paper introduces the collective data mining (CDM) framework, a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may result in ambiguous or incorrect global data models. It also notes that any function can be expressed in a distributed fashion using a set of appropriate basis functions and orthogonal basis functions can be eectively used for developing a general DDM framework that guarantees correct local analysis and correct aggregation of local data models with minimal data communication. This paper develops the foundation of CDM, discusses decision tree learning and polynomial regression in CDM for discrete and continuous variables, and describes the BODHI, a CDM-based experimental system for distributed knowledge discovery. 1 Introduction Distributed data mining (DDM) is a fast growing area that deals with the problem of nding data patterns in a...
Oracles and Queries that are Sufficient for Exact Learning
- Journal of Computer and System Sciences
, 1996
"... We show that the class of all circuits is exactly learnable in randomized expected polynomial time using weak subset and weak superset queries. This is a consequence of the following result which we consider to be of independent interest: circuits are exactly learnable in randomized expected poly ..."
Abstract
-
Cited by 72 (5 self)
- Add to MetaCart
We show that the class of all circuits is exactly learnable in randomized expected polynomial time using weak subset and weak superset queries. This is a consequence of the following result which we consider to be of independent interest: circuits are exactly learnable in randomized expected polynomial time with equivalence queries and the aid of an NP-oracle. We also show that circuits are exactly learnable in deterministic polynomial time with equivalence queries and a \Sigma 3 -oracle. The hypothesis class for the above learning algorithms is the class of circuits of larger---but polynomially related---size. Also, the algorithms can be adapted to learn the class of DNF formulas with hypothesis class consisting of depth-3 -- formulas (by the work of Angluin [A90], this is optimal in the sense that the hypothesis class cannot be reduced to DNF formulas, i.e. depth-2 - formulas).
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.

