Results 1  10
of
11
Coregularization Based Semisupervised Domain Adaptation
"... This paper presents a coregularization based approach to semisupervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
This paper presents a coregularization based approach to semisupervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source to target. This semisupervised approach to domain adaptation is extremely simple to implement and can be applied as a preprocessing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as few other representative baseline approaches. 1
Empirical risk minimization for probabilistic grammars: Sample complexity and hardness of learning
 Computational Linguistics
, 2012
"... Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammar ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the logloss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distributiondependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NPhard. We therefore suggest an approximate algorithm, similar to expectationmaximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotated
Asymptotic Analysis of Generative SemiSupervised Learning
"... Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semisupervised learning. In doing so, we complement distributionfre ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semisupervised learning. In doing so, we complement distributionfree analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP. 1.
Smart PAClearners
 Theoretical Computer Science
"... The PAClearning model is distributionindependent in the sense that the learner must reach a learning goal with a limited number of labeled random examples without any prior knowledge of the underlying domain distribution. In order to achieve this, one needs generalization error bounds that are val ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The PAClearning model is distributionindependent in the sense that the learner must reach a learning goal with a limited number of labeled random examples without any prior knowledge of the underlying domain distribution. In order to achieve this, one needs generalization error bounds that are valid uniformly for every domain distribution. These bounds are (almost) tight in the sense that there is a domain distribution which does not admit a generalization error being significantly smaller than the general bound. Note however that this leaves open the possibility to achieve the learning goal faster if the underlying distribution is “simple”. Informally speaking, we say a PAClearner L is “smart ” if, for a “vast majority ” of domain distributions D, L does not require significantly more examples to reach the “learning goal ” than the best learner whose strategy is specialized to D. In this paper, focusing on sample complexity and ignoring computational issues, we show that smart learners do exist. This implies (at least from an informationtheoretical perspective) that full prior knowledge of the domain distribution (or access to a huge collection of unlabeled examples) does (for a vast majority of domain distributions) not significantly reduce the number of labeled examples required to achieve the learning goal. 1
Learning largemargin halfspaces with more malicious noise
"... We describe a simple algorithm that runs in time poly(n, 1/γ, 1/ε) and learns an unknown ndimensional γmargin halfspace to accuracy 1 − ε in the presence of malicious noise, when the noise rate is allowed to be as high as Θ(εγ √ log(1/γ)). Previous efficient algorithms could only learn to accuracy ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe a simple algorithm that runs in time poly(n, 1/γ, 1/ε) and learns an unknown ndimensional γmargin halfspace to accuracy 1 − ε in the presence of malicious noise, when the noise rate is allowed to be as high as Θ(εγ √ log(1/γ)). Previous efficient algorithms could only learn to accuracy ε in the presence of malicious noise of rate at most Θ(εγ). Our algorithm does not work by optimizing a convex loss function. We show that no algorithm for learning γmargin halfspaces that minimizes a convex proxy for misclassification error can tolerate malicious noise at a rate greater than Θ(εγ); this may partially explain why previous algorithms could not achieve the higher noise tolerance of our new algorithm. 1
Definition
"... Synonyms: Learning from labeled and unlabeled data, transductive learning ..."
Abstract
 Add to MetaCart
Synonyms: Learning from labeled and unlabeled data, transductive learning
CoTraining as a Human Collaboration Policy
"... We consider the task of human collaborative category learning, where two people work together to classify test items into appropriate categories based on what they learn from a training set. We propose a novel collaboration policy based on the CoTraining algorithm in machine learning, in which the ..."
Abstract
 Add to MetaCart
We consider the task of human collaborative category learning, where two people work together to classify test items into appropriate categories based on what they learn from a training set. We propose a novel collaboration policy based on the CoTraining algorithm in machine learning, in which the two people play the role of the base learners. The policy restricts each learner’s view of the data and limits their communication to only the exchange of their labelings on test items. In a series of empirical studies, we show that the CoTraining policy leads collaborators to jointly produce unique and potentially valuable classification outcomes that are not generated under other collaboration policies. We further demonstrate that these observations can be explained with appropriate machine learning models.
Efficient Semisupervised and Active Learning of Disjunctions
"... We provide efficient algorithms for learning disjunctions in the semisupervised setting under a natural regularity assumption introduced by (Balcan & Blum, 2005). We prove bounds on the sample complexity of our algorithms under a mild restriction on the data distribution. We also give an active lea ..."
Abstract
 Add to MetaCart
We provide efficient algorithms for learning disjunctions in the semisupervised setting under a natural regularity assumption introduced by (Balcan & Blum, 2005). We prove bounds on the sample complexity of our algorithms under a mild restriction on the data distribution. We also give an active learning algorithm with improved sample complexity and extend all our algorithms to the random classification noise setting. 1.
ProbabilityOne Homotopy Maps for Tracking Constrained Clustering Solutions
"... semisupervised learning Abstract. Modern machine learning problems typically have multiple criteria, but there is currently no systematic mathematical theory to guide the design of formulations and exploration of alternatives. Homotopy methods are a promising approach to characterize solution spaces ..."
Abstract
 Add to MetaCart
semisupervised learning Abstract. Modern machine learning problems typically have multiple criteria, but there is currently no systematic mathematical theory to guide the design of formulations and exploration of alternatives. Homotopy methods are a promising approach to characterize solution spaces by smoothly tracking solutions from one formulation (typically an “easy” problem) to another (typically a “hard ” problem). New results in constructing homotopy maps for constrained clustering problems are here presented, which combine quadratic loss functions with discrete evaluations of constraint violations are presented. These maps balance requirements of locality in clusters as well as those of discrete mustlink and mustnotlink constraints. Experimental results demonstrate advantages in tracking solutions compared to stateoftheart constrained clustering algorithms.
New Bounds for Learning Intervals with Implications for SemiSupervised Learning
"... We study learning of initial intervals in the prediction model. We show that for each distribution D over the domain, there is an algorithm AD, whose probability of a mistake in round m is at most ( 1 2 + o(1)) 1 m. We also show that the best possible bound that can be achieved in the case in ..."
Abstract
 Add to MetaCart
We study learning of initial intervals in the prediction model. We show that for each distribution D over the domain, there is an algorithm AD, whose probability of a mistake in round m is at most ( 1 2 + o(1)) 1 m. We also show that the best possible bound that can be achieved in the case in