Results 1  10
of
29
Explicit learning curves for transduction and application to clustering and compression algorithms
 Journal of Artificial Intelligence Research
, 2004
"... Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik’s basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While being tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PACBayesian version of Vapnik’s transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space. 1.
Transductive rademacher complexity and its applications
 Proc. 20th Annual Conference on Computational Learning Theory
, 2007
"... Abstract. We present datadependent error bounds for transductive learning based on transductive Rademacher complexity. For specific algorithms we provide bounds on their Rademacher complexity based on their “unlabeledlabeled ” decomposition. This decomposition technique applies to many current and ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Abstract. We present datadependent error bounds for transductive learning based on transductive Rademacher complexity. For specific algorithms we provide bounds on their Rademacher complexity based on their “unlabeledlabeled ” decomposition. This decomposition technique applies to many current and practical graphbased algorithms. Finally, we present a new PACBayesian bound for mixtures of transductive algorithms based on our Rademacher bounds. 1
Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach
 In International Conference on Machine Learning (ICML
, 2014
"... Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the MetropolisHastings (MH) algorithm in this context. We propose an approximate implementation of th ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the MetropolisHastings (MH) algorithm in this context. We propose an approximate implementation of the accept/reject step of MH that only requires evaluating the likelihood of a random subset of the data, yet is guaranteed to coincide with the accept/reject step based on the full dataset with a probability superior to a userspecified tolerance level. This adaptive subsampling technique is an alternative to the recent approach developed in (Korattikara et al., 2014), and it allows us to establish rigorously that the resulting approximate MH algorithm samples from a perturbed version of the target distribution of interest, whose total variation distance to this very target is controlled explicitly. We explore the benefits and limitations of this scheme on several examples. 1.
Tight finitekey analysis for quantum cryptography
, 2011
"... Despite enormous theoretical and experimental progress in quantum cryptography, the security of most current implementations of quantum key distribution is still not rigorously established. one significant problem is that the security of the final key strongly depends on the number, M, of signals ex ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Despite enormous theoretical and experimental progress in quantum cryptography, the security of most current implementations of quantum key distribution is still not rigorously established. one significant problem is that the security of the final key strongly depends on the number, M, of signals exchanged between the legitimate parties. Yet, existing security proofs are often only valid asymptotically, for unrealistically large values of M. Another challenge is that most security proofs are very sensitive to small differences between the physical devices used by the protocol and the theoretical model used to describe them. Here we show that these gaps between theory and experiment can be simultaneously overcome by using a recently developed proof technique based on the uncertainty relation for smooth entropies.
Analysis of Computational Time of Simple Estimation of Distribution Algorithms
, 2010
"... Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well ED ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well EDAs (with a finite population size larger than two) will scale up when the dimension of the optimization problem (problem size) goes up. This paper studies the computational time complexity of a simple EDA, i.e., the univariate marginal distribution algorithm (UMDA), in order to gain more insight into EDAs complexity. First, we discuss how to measure the computational time complexity of EDAs. A classification of problem hardness based on our discussions is then given. Second, we prove a theorem related to problem hardness and the probability conditions of
Machine Learning with Data Dependent Hypothesis Classes
 Journal of Machine Learning Research
, 2002
"... We extend the VC theory of statistical learning to data dependent spaces of classifiers. ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
We extend the VC theory of statistical learning to data dependent spaces of classifiers.
Improved bounds on sample size for implicit matrix trace estimators
, 2013
"... This article is concerned with MonteCarlo methods for the estimation of the trace of an implicitly given matrix A whose information is only available through matrixvector products. Such a method approximates the trace by an average of N expressions of the form wt(Aw), with random vectors w drawn f ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
This article is concerned with MonteCarlo methods for the estimation of the trace of an implicitly given matrix A whose information is only available through matrixvector products. Such a method approximates the trace by an average of N expressions of the form wt(Aw), with random vectors w drawn from an appropriate distribution. We prove, discuss and experiment with bounds on the number of realizations N required in order to guarantee a probabilistic bound on the relative error of the trace estimation upon employing Rademacher (Hutchinson), Gaussian and uniform unit vector (with and without replacement) probability distributions. In total, one necessary bound and six sufficient bounds are proved, improving upon and extending similar estimates obtained in the seminal work of Avron and Toledo (2011) in several dimensions. We first improve their bound on N for the Hutchinson method, dropping a term that relates to rank(A) and making the bound comparable with that for the Gaussian estimator. We further prove new sufficient bounds for the Hutchinson, Gaussian and the unit vector estimators, as well as a necessary bound for the Gaussian estimator, which depend more specifically on properties of the matrix A. As such they may suggest for what type of matrices one distribution or another provides a particularly effective or relatively ineffective stochastic estimation method.
When Is an Estimation of Distribution Algorithm Better than an Evolutionary Algorithm
 in Proc. 2009 IEEE Congr. Evol. Comput. (CEC’09
, 2009
"... Abstract—Despite the widespread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than traditional evolutionary algorithms. Here, it is proved rigorously that on a problem ca ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Despite the widespread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than traditional evolutionary algorithms. Here, it is proved rigorously that on a problem called SUBSTRING, a simple EDA called univariate marginal distribution algorithm (UMDA) is efficient, whereas the (1+1) EA is highly inefficient. Such studies are essential in gaining insight into fundamental research issues, i.e., what problem characteristics make an EDA or EA efficient, under what conditions an EDA is expected to outperform an EA, and what key factors are in an EDA that make it efficient or inefficient. I.
LIST DECODING TENSOR PRODUCTS AND INTERLEAVED CODES
"... Abstract. We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. • We show that for every code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rat ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. • We show that for every code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinatorial bounds for some natural codes including multivariate polynomials where the degree in each variable is bounded. • We show that for every code, its list decoding radius remains unchanged under mwise interleaving for an integer m. This generalizes a recent result of Dinur et al. [6], who proved such a result for interleaved Hadamard codes (equivalently, linear transformations). • Using the notion of generalized Hamming weights, we give better list size bounds for both tensoring and interleaving of binary linear codes. By analyzing the weight distribution of these codes, we reduce the task of bounding the list size to bounding the number of closeby lowrank codewords. For decoding linear transformations, using rankreduction together with other ideas, we obtain list size bounds that are tight over small fields. Our results give better bounds on the list decoding radius than what is obtained from the Johnson bound, and yield rather general families of codes decodable beyond the Johnson bound. 1.