Results 11 - 20
of
36
Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression
- Proceedings of the Twentieth International Conference on Machine Learning (ICML
, 2003
"... The problem of learning with positive and unlabeled examples arises frequently in retrieval applications. ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
The problem of learning with positive and unlabeled examples arises frequently in retrieval applications.
Halfspace matrices
- In Proc. of the 22nd Conference on Computational Complexity (CCC
, 2007
"... A halfspace matrix is a Boolean matrix A with rows indexed by linear threshold functions f, columns indexed by inputs x ∈ {−1,1} n, and the entries given by A f,x = f (x). We demonstrate the potential of halfspace matrices as tools to answer nontrivial open questions. 1. (Communication complexity) W ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
A halfspace matrix is a Boolean matrix A with rows indexed by linear threshold functions f, columns indexed by inputs x ∈ {−1,1} n, and the entries given by A f,x = f (x). We demonstrate the potential of halfspace matrices as tools to answer nontrivial open questions. 1. (Communication complexity) We exhibit a Boolean function f with discrepancy Ω(1/n 4) under every product distribution but O ( √ n/2 n/4) under a certain non-product distribution. This partially solves an open problem of Kushilevitz and Nisan [25]. 2. (Complexity of sign matrices) We construct a matrix A ∈ {−1,1} N×NlogN with dimension complexity logN but margin complexity Ω(N 1/4 / √ logN). This gap is an exponential improvement over previous work. As an application to circuit complexity, we prove an Ω(2n/4 /(d √ n)) circuit lower bound for computing halfspaces by a majority of an arbitrary set of d gates. This complements a result of Goldmann, H˚astad, and Razborov [15]. In addition, we prove new results on the complexity measures of sign matrices, complementing recent work by Linial et al. [27–29]. 3. (Learning theory) We give a short and simple proof that the statistical-query (SQ) dimension of halfspaces in n dimensions is less than 2(n + 1) 2 under all distributions (with n + 1 being a trivial lower bound). This improves on the n O(1) estimate from the fundamental paper of Blum et al. [5]. Finally, we motivate our learning-theoretic result for the complexity community by showing that SQ dimension estimates for natural classes of Boolean functions can resolve major open problems in complexity theory. Specifically, we show that an exp(2 (logn)o(1) ) upper bound on the SQ dimension of AC 0 would imply an explicit language in PSPACE cc \ PH cc. 1
The sign-rank of AC^0
- IN PROC. OF THE 49TH SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2008
"... The sign-rank of a matrix A = [Ai j] with ±1 entries is the least rank of a real matrix B = [Bi j] with Ai j Bi j> 0 for all i, j. We obtain the first exponential lower bound on the sign-rank of a function in AC 0. Namely, let f (x, y) = �m �m2 i=1 j=1 (xi j ∧ yi j). We show that the matrix [ f (x, ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
The sign-rank of a matrix A = [Ai j] with ±1 entries is the least rank of a real matrix B = [Bi j] with Ai j Bi j> 0 for all i, j. We obtain the first exponential lower bound on the sign-rank of a function in AC 0. Namely, let f (x, y) = �m �m2 i=1 j=1 (xi j ∧ yi j). We show that the matrix [ f (x, y)]x,y has sign-rank 2�(m). This in particular implies that �cc 2 � ⊆ UPPcc, which solves a long-standing open problem posed by Babai, Frankl, and Simon (1986). Our result additionally implies a lower bound in learning theory. Specifically, let φ1,..., φr: {0, 1} n → R be functions such that every DNF formula f: {0, 1} n → {−1, +1} of polynomial size has the representation f ≡ sign(a1φ1 + · · · + ar φr) for some reals a1,..., ar. We prove that then r � 2�(n1/3) , which essentially matches an upper bound of 2Õ(n1/3) due to Klivans and Servedio (2001). Finally, our work yields the first exponential lower bound on the size of threshold-of-majority circuits computing a function in AC 0. This substantially generalizes and strengthens the results of Krause and Pudlák (1997).
Optimal Outlier Removal in High-Dimensional Spaces
- IN PROCEEDINGS OF THE 33RD ACM SYMPOSIUM ON THEORY OF COMPUTING
, 2001
"... We study the problem of nding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. A point x is de ned to be a -outlier if there exists some direction w in which its squared distance from the mean along w is greater than times the averag ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We study the problem of nding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. A point x is de ned to be a -outlier if there exists some direction w in which its squared distance from the mean along w is greater than times the average squared distance from the mean along w[1]. Our main theorem is that for any > 0, there exists a (1 ) fraction of the original distribution that has no O( (b + log ))- outliers, improving on the previous bound of O(n b=). This bound is shown to be nearly the best possible. The theorem is constructive, and results in a 1 approximation to the following optimization problem: given a distribution (i.e. the ability to sample from it), and a parameter > 0, nd the minimum for which there exists a subset of probability at least (1 ) with no -outliers.
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
- Artificial Intelligence
, 1998
"... The absolute loss is the absolute difference between the desired and predicted outcome. This paper demonstrates worst-case upper bounds on the absolute loss for the Perceptron learning algorithm and the Exponentiated Update learning algorithm, which is related to the Weighted Majority algorithm. The ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The absolute loss is the absolute difference between the desired and predicted outcome. This paper demonstrates worst-case upper bounds on the absolute loss for the Perceptron learning algorithm and the Exponentiated Update learning algorithm, which is related to the Weighted Majority algorithm. The bounds characterize the behavior of the algorithms over any sequence of trials, where each trial consists of an example and a desired outcome interval (any value in the interval is an acceptable outcome). The worst-case absolute loss of both algorithms is bounded by: the absolute loss of the best linear function in a comparison class, plus a constant dependent on the initial weight vector, plus a per-trial loss. The per-trial loss can be eliminated if the learning algorithm is allowed a tolerance from the desired outcome. For concept learning, the worst-case bounds lead to mistake bounds that are comparable to past results. This paper is a revised and extended version of Bylander [7]. 1 ...
Agnostic Learning of Monomials by Halfspaces is Hard
"... Abstract — We prove the following strong hardness result for learning: Given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with (1 − ϵ)-fraction of the examples, it is NP-hard to find a halfspace that is correct on ( 1 +ϵ)-fractio ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Abstract — We prove the following strong hardness result for learning: Given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with (1 − ϵ)-fraction of the examples, it is NP-hard to find a halfspace that is correct on ( 1 +ϵ)-fraction of the examples, 2 for arbitrary constant ϵ> 0. In learning theory terms, weak agnostic learning of monomials by halfspaces is NP-hard. This hardness result bridges between and subsumes two previous results which showed similar hardness results for the proper learning of monomials and halfspaces. As immediate corollaries of our result, we give the first optimal hardness results for weak agnostic learning of decision lists and majorities. Our techniques are quite different from previous hardness proofs for learning. We use an invariance principle and sparse approximation of halfspaces from recent work on fooling halfspaces to give a new natural list decoding of a halfspace in the context of dictatorship tests/label cover reductions. In addition, unlike previous invariance principle based proofs which are only known to give Unique Games hardness, we give a reduction from a smooth version of Label Cover that is known to be NP-hard.
Learning Noisy Linear Threshold Functions
, 1998
"... This papers describes and analyzes algorithms for learning linear threshold function (LTFs) in the presence of classification noise and monotonic noise. When there is classification noise, each randomly drawn example is mislabeled (i.e., differs from the target LTF) with the same probability. For mo ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This papers describes and analyzes algorithms for learning linear threshold function (LTFs) in the presence of classification noise and monotonic noise. When there is classification noise, each randomly drawn example is mislabeled (i.e., differs from the target LTF) with the same probability. For monotonic noise, the probability of mislabeling an example monotonically decreases with the separation between the target LTF hyperplane and the example. Monotonic noise is a generalization of classification noise as well as the cases of independent binary features (aka naive Bayes) and normal distributions with equal covariance matrices. Monotonic noise provides a more realistic model of noise because it allows confidence to increase as a function of the distance from the threshold, but it does not impose any artificial form on the function. This paper shows that LTFs are polynomially PAC-learnable in the presence of classification noise and monotonic noise if the separation between examples ...
Learning with online constraints: shifting concepts and active learning
- PHD THESIS. MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LAB
, 2006
"... Many practical problems such as forecasting, real-time decision making, streaming data applications, and resource-constrained learning, can be modeled as learning with online constraints. This thesis is concerned with analyzing and designing algorithms for learning under the following online constra ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Many practical problems such as forecasting, real-time decision making, streaming data applications, and resource-constrained learning, can be modeled as learning with online constraints. This thesis is concerned with analyzing and designing algorithms for learning under the following online constraints: i) The algorithm has only sequential, or one-at-time, access to data. ii) The time and space complexity of the algorithm must not scale with the number of observations. We analyze learning with online constraints in a variety of settings, including active learning. The active learning model is applicable to any domain in which unlabeled data is easy to come by and there exists a (potentially difficult or expensive) mechanism by which to attain labels. First, we
A Discriminative Model for Semi-Supervised Learning
, 2008
"... Supervised learning — that is, learning from labeled examples — is an area of Machine Learning that has reached substantial maturity. It has generated general-purpose and practically-successful algorithms and the foundations are quite well understood and captured by theoretical frameworks such as th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Supervised learning — that is, learning from labeled examples — is an area of Machine Learning that has reached substantial maturity. It has generated general-purpose and practically-successful algorithms and the foundations are quite well understood and captured by theoretical frameworks such as the PAC-learning model and the Statistical Learning theory framework. However, for many contemporary practical problems such as classifying web pages or detecting spam, there is often additional information available in the form of unlabeled data, which is often much cheaper and more plentiful than labeled data. As a consequence, there has recently been substantial interest in semi-supervised learning — using unlabeled data together with labeled data — since any useful information that reduces the amount of labeled data needed can be a significant benefit. Several techniques have been developed for doing this, along with experimental results on a variety of different learning problems. Unfortunately, the standard learning frameworks for reasoning about supervised learning do not capture the key aspects and the assumptions underlying these semisupervised learning methods. In this paper we describe an augmented version of the PAC model designed for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past
Sublinear Optimization for Machine Learning
"... Abstract—We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, a ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract—We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and L2-SVM, for which sublinear-time algorithms were not known before. These new algorithms use a combination of a novel sampling techniques and a new multiplicative update algorithm. We give lower bounds which show the running times of many of our algorithms to be nearly best possible in the unitcost RAM model. We also give implementations of our algorithms in the semi-streaming setting, obtaining the first low pass polylogarithmic space and sublinear time algorithms achieving arbitrary approximation factor. I.

