Results 1  10
of
12
Explicit learning curves for transduction and application to clustering and compression algorithms
 Journal of Artificial Intelligence Research
, 2004
"... Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik’s basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While being tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PACBayesian version of Vapnik’s transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space. 1.
Transductive rademacher complexity and its applications
 Proc. 20th Annual Conference on Computational Learning Theory
, 2007
"... Abstract. We present datadependent error bounds for transductive learning based on transductive Rademacher complexity. For specific algorithms we provide bounds on their Rademacher complexity based on their “unlabeledlabeled ” decomposition. This decomposition technique applies to many current and ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Abstract. We present datadependent error bounds for transductive learning based on transductive Rademacher complexity. For specific algorithms we provide bounds on their Rademacher complexity based on their “unlabeledlabeled ” decomposition. This decomposition technique applies to many current and practical graphbased algorithms. Finally, we present a new PACBayesian bound for mixtures of transductive algorithms based on our Rademacher bounds. 1
Machine Learning with Data Dependent Hypothesis Classes
 Journal of Machine Learning Research
, 2002
"... We extend the VC theory of statistical learning to data dependent spaces of classifiers. ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We extend the VC theory of statistical learning to data dependent spaces of classifiers.
Analysis of Computational Time of Simple Estimation of Distribution Algorithms
, 2010
"... Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well ED ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Estimation of distribution algorithms (EDAs) are widely used in stochastic optimization. Impressive experimental results have been reported in the literature. However, little work has been done on analyzing the computation time of EDAs in relation to the problem size. It is still unclear how well EDAs (with a finite population size larger than two) will scale up when the dimension of the optimization problem (problem size) goes up. This paper studies the computational time complexity of a simple EDA, i.e., the univariate marginal distribution algorithm (UMDA), in order to gain more insight into EDAs complexity. First, we discuss how to measure the computational time complexity of EDAs. A classification of problem hardness based on our discussions is then given. Second, we prove a theorem related to problem hardness and the probability conditions of
When Is an Estimation of Distribution Algorithm Better than an Evolutionary Algorithm
 in Proc. 2009 IEEE Congr. Evol. Comput. (CEC’09
, 2009
"... Abstract—Despite the widespread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than traditional evolutionary algorithms. Here, it is proved rigorously that on a problem ca ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract—Despite the widespread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than traditional evolutionary algorithms. Here, it is proved rigorously that on a problem called SUBSTRING, a simple EDA called univariate marginal distribution algorithm (UMDA) is efficient, whereas the (1+1) EA is highly inefficient. Such studies are essential in gaining insight into fundamental research issues, i.e., what problem characteristics make an EDA or EA efficient, under what conditions an EDA is expected to outperform an EA, and what key factors are in an EDA that make it efficient or inefficient. I.
Rigorous Time Complexity Analysis of Univariate Marginal Distribution Algorithm with Margins
, 2009
"... Marginal Distribution Algorithms (EDAs) which do not consider the dependencies among the variables. In this paper, on the basis of our proposed approach in [1], we present a rigorous proof for the result that the UMDA with margins (in [1] we merely showed the effectiveness of margins) cannot find th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Marginal Distribution Algorithms (EDAs) which do not consider the dependencies among the variables. In this paper, on the basis of our proposed approach in [1], we present a rigorous proof for the result that the UMDA with margins (in [1] we merely showed the effectiveness of margins) cannot find the global optimum of the TRAPLEADINGONES problem [2] within polynomial number of generations with a probability that is superpolynomially close to 1. Such a theoretical result is significant in sheding light on the fundamental issues of what problem characteristics make an EDA hard/easy and when an EDA is expected to perform well/poorly for a given problem.
LIST DECODING TENSOR PRODUCTS AND INTERLEAVED CODES
"... Abstract. We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. • We show that for every code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rat ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. • We show that for every code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinatorial bounds for some natural codes including multivariate polynomials where the degree in each variable is bounded. • We show that for every code, its list decoding radius remains unchanged under mwise interleaving for an integer m. This generalizes a recent result of Dinur et al. [6], who proved such a result for interleaved Hadamard codes (equivalently, linear transformations). • Using the notion of generalized Hamming weights, we give better list size bounds for both tensoring and interleaving of binary linear codes. By analyzing the weight distribution of these codes, we reduce the task of bounding the list size to bounding the number of closeby lowrank codewords. For decoding linear transformations, using rankreduction together with other ideas, we obtain list size bounds that are tight over small fields. Our results give better bounds on the list decoding radius than what is obtained from the Johnson bound, and yield rather general families of codes decodable beyond the Johnson bound. 1.
A CharlierParseval approach to Poisson approximation and its applications
, 2008
"... A new approach to Poisson approximation is proposed. The basic idea is very simple and based on properties of the Charlier polynomials and the Parseval identity. Such an approach quickly leads to new effective bounds for several Poisson approximation problems. A selected survey on diverse Poisson ap ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A new approach to Poisson approximation is proposed. The basic idea is very simple and based on properties of the Charlier polynomials and the Parseval identity. Such an approach quickly leads to new effective bounds for several Poisson approximation problems. A selected survey on diverse Poisson approximation results is also given.
A SYSTEMATIC MARTINGALE CONSTRUCTION WITH APPLICATIONS TO PERMUTATION INEQUALITIES
"... Abstract. We illustrate a process that constructs martingales from raw material that arises naturally from the theory of sampling without replacement. The usefulness of the new martingales is illustrated by the development of maximal inequalities for permuted sequences of real numbers. Some of these ..."
Abstract
 Add to MetaCart
Abstract. We illustrate a process that constructs martingales from raw material that arises naturally from the theory of sampling without replacement. The usefulness of the new martingales is illustrated by the development of maximal inequalities for permuted sequences of real numbers. Some of these inequalities are new and some are variations of classical inequalities like those introduced by A. Garsia in the study of rearrangement of orthogonal series.