Results 1  10
of
20
Explicit learning curves for transduction and application to clustering and compression algorithms
 Journal of Artificial Intelligence Research
, 2004
"... Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik’s basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While being tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PACBayesian version of Vapnik’s transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space. 1.
Transductive rademacher complexity and its applications
 Proc. 20th Annual Conference on Computational Learning Theory
, 2007
"... Abstract. We present datadependent error bounds for transductive learning based on transductive Rademacher complexity. For specific algorithms we provide bounds on their Rademacher complexity based on their “unlabeledlabeled ” decomposition. This decomposition technique applies to many current and ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Abstract. We present datadependent error bounds for transductive learning based on transductive Rademacher complexity. For specific algorithms we provide bounds on their Rademacher complexity based on their “unlabeledlabeled ” decomposition. This decomposition technique applies to many current and practical graphbased algorithms. Finally, we present a new PACBayesian bound for mixtures of transductive algorithms based on our Rademacher bounds. 1
Machine Learning with Data Dependent Hypothesis Classes
 Journal of Machine Learning Research
, 2002
"... We extend the VC theory of statistical learning to data dependent spaces of classifiers. ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We extend the VC theory of statistical learning to data dependent spaces of classifiers.
Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach
 In International Conference on Machine Learning (ICML
, 2014
"... Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the MetropolisHastings (MH) algorithm in this context. We propose an approximate implementation of th ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the MetropolisHastings (MH) algorithm in this context. We propose an approximate implementation of the accept/reject step of MH that only requires evaluating the likelihood of a random subset of the data, yet is guaranteed to coincide with the accept/reject step based on the full dataset with a probability superior to a userspecified tolerance level. This adaptive subsampling technique is an alternative to the recent approach developed in (Korattikara et al., 2014), and it allows us to establish rigorously that the resulting approximate MH algorithm samples from a perturbed version of the target distribution of interest, whose total variation distance to this very target is controlled explicitly. We explore the benefits and limitations of this scheme on several examples. 1.
When Is an Estimation of Distribution Algorithm Better than an Evolutionary Algorithm
 in Proc. 2009 IEEE Congr. Evol. Comput. (CEC’09
, 2009
"... Abstract—Despite the widespread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than traditional evolutionary algorithms. Here, it is proved rigorously that on a problem ca ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Despite the widespread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than traditional evolutionary algorithms. Here, it is proved rigorously that on a problem called SUBSTRING, a simple EDA called univariate marginal distribution algorithm (UMDA) is efficient, whereas the (1+1) EA is highly inefficient. Such studies are essential in gaining insight into fundamental research issues, i.e., what problem characteristics make an EDA or EA efficient, under what conditions an EDA is expected to outperform an EA, and what key factors are in an EDA that make it efficient or inefficient. I.
Analysis of Computational Time of Simple Estimation of Distribution Algorithms
, 2010
"... Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well ED ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well EDAs (with a finite population size larger than two) will scale up when the dimension of the optimization problem (problem size) goes up. This paper studies the computational time complexity of a simple EDA, i.e., the univariate marginal distribution algorithm (UMDA), in order to gain more insight into EDAs complexity. First, we discuss how to measure the computational time complexity of EDAs. A classification of problem hardness based on our discussions is then given. Second, we prove a theorem related to problem hardness and the probability conditions of
LIST DECODING TENSOR PRODUCTS AND INTERLEAVED CODES
"... Abstract. We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. • We show that for every code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rat ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. • We show that for every code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinatorial bounds for some natural codes including multivariate polynomials where the degree in each variable is bounded. • We show that for every code, its list decoding radius remains unchanged under mwise interleaving for an integer m. This generalizes a recent result of Dinur et al. [6], who proved such a result for interleaved Hadamard codes (equivalently, linear transformations). • Using the notion of generalized Hamming weights, we give better list size bounds for both tensoring and interleaving of binary linear codes. By analyzing the weight distribution of these codes, we reduce the task of bounding the list size to bounding the number of closeby lowrank codewords. For decoding linear transformations, using rankreduction together with other ideas, we obtain list size bounds that are tight over small fields. Our results give better bounds on the list decoding radius than what is obtained from the Johnson bound, and yield rather general families of codes decodable beyond the Johnson bound. 1.
Rigorous Time Complexity Analysis of Univariate Marginal Distribution Algorithm with Margins
, 2009
"... Marginal Distribution Algorithms (EDAs) which do not consider the dependencies among the variables. In this paper, on the basis of our proposed approach in [1], we present a rigorous proof for the result that the UMDA with margins (in [1] we merely showed the effectiveness of margins) cannot find th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Marginal Distribution Algorithms (EDAs) which do not consider the dependencies among the variables. In this paper, on the basis of our proposed approach in [1], we present a rigorous proof for the result that the UMDA with margins (in [1] we merely showed the effectiveness of margins) cannot find the global optimum of the TRAPLEADINGONES problem [2] within polynomial number of generations with a probability that is superpolynomially close to 1. Such a theoretical result is significant in sheding light on the fundamental issues of what problem characteristics make an EDA hard/easy and when an EDA is expected to perform well/poorly for a given problem.
Transductive learning over graphs: Incremental assessment
 In International The Learning Workshop (SNOWBIRD), Technical Report ESATSISTA, K.U.Leuven
, 2007
"... Graphs constitute a most natural way to represent problems involving finite or countable universes. This might be especially so in the context of bioinformatics (e.g. for proteininteraction graphs), collaborative filtering, the analysis of social networks and citation graphs, and to various proble ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Graphs constitute a most natural way to represent problems involving finite or countable universes. This might be especially so in the context of bioinformatics (e.g. for proteininteraction graphs), collaborative filtering, the analysis of social networks and citation graphs, and to various problems in operations research in the context of incomplete information. A further argument for using graphs for characterizing learning problems was found in the connection it makes to the literature on network flow algorithms and other deep results of combinatorial optimization problems. This short note reviews results obtained in [3], and extends results slightly towards an incremental setting by exploiting a subresult of [4]. The relevance for machine learning of this result can be seen e.g. in