Results 1  10
of
27
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 757 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Theorybased Bayesian models of inductive learning and reasoning
 Trends in Cognitive Sciences
, 2006
"... Theorybased Bayesian models of inductive reasoning 2 Theorybased Bayesian models of inductive reasoning ..."
Abstract

Cited by 139 (26 self)
 Add to MetaCart
Theorybased Bayesian models of inductive reasoning 2 Theorybased Bayesian models of inductive reasoning
Bayesian hierarchical clustering
 In Proceedings of the 22nd International Conference on Machine Learning. ACM
, 2005
"... We present a novel algorithm for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. This algorithm has several advantages over traditional distancebased agglomerative clustering algorithms. (1) It defines a probabilistic model of the data which ..."
Abstract

Cited by 69 (11 self)
 Add to MetaCart
(Show Context)
We present a novel algorithm for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. This algorithm has several advantages over traditional distancebased agglomerative clustering algorithms. (1) It defines a probabilistic model of the data which can be used to compute the predictive distribution of a test point and the probability of it belonging to any of the existing clusters in the tree. (2) It uses a modelbased criterion to decide on merging clusters rather than an adhoc distance metric. (3) Bayesian hypothesis testing is used to decide which merges are advantageous and to output the recommended depth of the tree. (4) The algorithm can be interpreted as a novel fast bottomup approximate inference method for a Dirichlet process (i.e. countably infinite) mixture model (DPM). It provides a new lower bound on the marginal likelihood of a DPM by summing over exponentially many clusterings of the data in polynomial time. We describe procedures for learning the model hyperparameters, computing the predictive distribution, and extensions to the algorithm. Experimental results on synthetic and realworld data sets demonstrate useful properties of the algorithm. 1.
Statistical strategies for avoiding false discoveries in metabolomics and related experiments
, 2006
"... Many metabolomics, and other highcontent or highthroughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case ’ and ‘control ’ samples. However, it is unfortunately ve ..."
Abstract

Cited by 60 (11 self)
 Add to MetaCart
(Show Context)
Many metabolomics, and other highcontent or highthroughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case ’ and ‘control ’ samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious, and there are wellknown examples in the proteomics literature. The main types of danger are not entirely independent of each other, but include bias, inadequate sample size (especially relative to the number of metabolite variables and to the required statistical power to prove that a biomarker is discriminant), excessive false discovery rate due to multiple hypothesis testing, inappropriate choice of particular numerical methods, and overfitting (generally caused by the failure to perform adequate validation and crossvalidation). Many studies fail to take these into account, and thereby fail to discover anything of true significance (despite their claims). We summarise these problems, and provide pointers to a substantial existing literature that should assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. We provide a list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact, and suggest a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers. These tools can be applied to individual metabolites by using multiple univariate tests performed in parallel across all metabolite peaks. They may also be applied to the validation of multivariate models. We stress in
Nonparametric function induction in semisupervised learning
 In Proc. Artificial Intelligence and Statistics
, 2005
"... There has been an increase of interest for semisupervised learning recently, because of the many datasets with large amounts of unlabeled examples and only a few labeled ones. This paper follows up on proposed nonparametric algorithms which provide an estimated continuous label for the given unlabe ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
(Show Context)
There has been an increase of interest for semisupervised learning recently, because of the many datasets with large amounts of unlabeled examples and only a few labeled ones. This paper follows up on proposed nonparametric algorithms which provide an estimated continuous label for the given unlabeled examples. First, it extends them to function induction algorithms that minimize a regularization criterion applied to an outofsample example, and happen to have the form of Parzen windows regressors. This allows to predict test labels without solving again a linear system of dimension n (the number of unlabeled and labeled training examples), which can cost O(n 3). Second, this function induction procedure gives rise to an efficient approximation of the training process, reducing the linear system to be solved to m ≪ n unknowns, using only a subset of m examples. An improvement of O(n 2 /m 2) in time can thus be obtained. Comparative experiments are presented, showing the good performance of the induction formula and approximation algorithm. 1
TreeStructured Stick Breaking for Hierarchical Data
"... Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stickbreaking processes to allow for trees of unbounded width and depth, where data can live at any node and are i ..."
Abstract

Cited by 49 (8 self)
 Add to MetaCart
(Show Context)
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stickbreaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stickbreaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data. 1
Bayesian Agglomerative Clustering with Coalescents
 In Advances in Neural Information Processing Systems
"... We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottomup agglomerative fashion. We show experimentally the superiority of our algorithms over the s ..."
Abstract

Cited by 45 (3 self)
 Add to MetaCart
(Show Context)
We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottomup agglomerative fashion. We show experimentally the superiority of our algorithms over the stateoftheart, and demonstrate our approach in document clustering and phylolinguistics. 1
Learning domain structures
 In Proceedings of the 26th Annual Conference of the Cognitive Science Society
, 2004
"... How do people acquire and use knowledge about domain structures, such as the treestructured taxonomy of folk biology? These structures are typically seen either as consequences of innate domainspecific knowledge or as epiphenomena of domaingeneral associative learning. We present an alternative: ..."
Abstract

Cited by 29 (19 self)
 Add to MetaCart
(Show Context)
How do people acquire and use knowledge about domain structures, such as the treestructured taxonomy of folk biology? These structures are typically seen either as consequences of innate domainspecific knowledge or as epiphenomena of domaingeneral associative learning. We present an alternative: a framework for statistical inference that discovers the structural principles that best account for different domains of objects and their properties. Our approach infers that a tree structure is best for a biological dataset, and a linear structure (“left”–“right”) is best for a dataset of people and their political views. We compare our proposal with unstructured associative learning and argue that our structured approach gives the better account of inductive
Unsupervised learning
 Advanced Lectures on Machine Learning
, 2004
"... We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mix ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
(Show Context)
We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, statespace models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental concepts in graphical models, and inference algorithms on graphs. This is followed by a quick tour of approximate Bayesian inference, including Markov chain Monte Carlo (MCMC), Laplace approximation, BIC, variational approximations, and expectation propagation (EP). The aim of this chapter is to provide a highlevel view of the field. Along the way, many stateoftheart ideas and future directions are also reviewed. Contents 1
SemiSupervised Random Forests ∗
"... Random Forests (RFs) have become commonplace in many computer vision applications. Their popularity is mainly driven by their high computational efficiency during both training and evaluation while still being able to achieve stateoftheart accuracy. This work extends the usage of Random Forests t ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
(Show Context)
Random Forests (RFs) have become commonplace in many computer vision applications. Their popularity is mainly driven by their high computational efficiency during both training and evaluation while still being able to achieve stateoftheart accuracy. This work extends the usage of Random Forests to SemiSupervised Learning (SSL) problems. We show that traditional decision trees are optimizing multiclass margin maximizing loss functions. From this intuition, we develop a novel multiclass margin definition for the unlabeled data, and an iterative deterministic annealingstyle training algorithm maximizing both the multiclass margin of labeled and unlabeled samples. In particular, this allows us to use the predicted labels of the unlabeled data as additional optimization variables. Furthermore, we propose a control mechanism based on the outofbag error, which prevents the algorithm from degradation if the unlabeled data is not useful for the task. Our experiments demonstrate stateoftheart semisupervised learning performance in typical machine learning problems and constant improvement using unlabeled data for the Caltech101 object categorization task. 1.