#### DMCA

## Semi-Supervised Learning Literature Survey (2006)

### Cached

### Download Links

- [pages.cs.wisc.edu]
- [www.cs.wisc.edu]
- [pages.cs.wisc.edu]
- [www.cs.wisc.edu]
- [www.cs.wisc.edu]
- [pages.cs.wisc.edu]
- [cms.brookes.ac.uk]
- [pages.cs.wisc.edu]
- [www.umiacs.umd.edu]
- [pages.cs.wisc.edu]
- [pages.cs.wisc.edu]
- [www.cs.stevens-tech.edu]
- [storm.cis.fordham.edu]
- [pages.cs.wisc.edu]
- [www.cs.utah.edu]
- [www.umiacs.umd.edu]
- [pages.cs.wisc.edu]
- [storm.cis.fordham.edu]

Citations: | 750 - 8 self |

### Citations

11710 | Maximum Likelihood from Incomplete Data via the EM Algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ccuracy, while (c)’s is much better. 2.3 EM Local Maxima Even if the mixture model assumption is correct, in practice mixture components are identified by the Expectation-Maximization (EM) algorithm (=-=Dempster et al., 1977-=-). EM is prone to local maxima. If a local maximum is far from the global maximum, unlabeled data may again hurt learning. Remedies include smart choice of starting point by active learning (Nigam, 20... |

4175 | Latent Dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...Each document in turn has a fixed topic proportion (a multinomial on a higher level). However there is no link between the topic proportions in different documents. Latent Dirichlet Allocation (LDA) (=-=Blei et al., 2003-=-) is one step further. It assumes the topic proportion of each document is drawn from a Dirichlet distribution. With variational approximation, each document is represented by a posterior Dirichlet ov... |

1675 | On spectral clustering: Analysis and an algorithm
- Ng, Jordan, et al.
- 2002
(Show Context)
Citation Context ...el or constraint information, and therefore applicable for transductive classification. The data points are mapped into a new space spanned by the first k eigenvectors of the normalized Laplacian in (=-=Ng et al., 2001-=-), with special normalization. Clustering is then performed with traditional methods (like k-means) in this new space. This is very similar to kernel PCA. Fowlkes et al. (2004) use the Nyström method ... |

1600 | Combiningn labeled and unlabeled data with co-training
- Mitchel
(Show Context)
Citation Context ...r specific base learners, there has been some analyzer’s on convergence. See e.g. (Haffari & Sarkar, 2007; Culp & Michailidis, 2007). 4 Co-Training and Multiview Learning 4.1 Co-Training Co-training (=-=Blum & Mitchell, 1998-=-) (Mitchell, 1999) assumes that (i) features can be split into two sets; (ii) each sub-feature set is sufficient to train a good classifier; (iii) the two sets are conditionally independent given the ... |

1193 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
(Show Context)
Citation Context .... Representative methods include Isomap (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis & Saul, 2000) (Saul & Roweis, 2003), Hessian LLE (Donoho & Grimes, 2003), Laplacian eigenmaps (=-=Belkin & Niyogi, 2003-=-), and semidefinite embedding (SDE) (Weinberger & Saul, 2004) (Weinberger et al., 2004) (Weinberger et al., 2005). If one has some labeled data, for example in the form of the target low-dimensional r... |

1024 | Text classification from labeled and unlabeled documents using EM - Nigam, McCallum, et al. |

1001 | A comparison of event models for Näıve Bayes text classification
- Andrew, Nigam
- 1998
(Show Context)
Citation Context ...ify the two components. For instance, the mixtures on the second and third line give the same p(x), but they classify x = 0.5 differently. Gaussian is identifiable. Mixture of multivariate Bernoulli (=-=McCallum & Nigam, 1998-=-a) is not identifiable. More discussions on identifiability and semi-supervised learning can be found in e.g. (Ratsaby & Venkatesh, 1995) and (Corduneanu & Jaakkola, 2001). 2.2 Model Correctness If th... |

878 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...emiriz, 1999) (Demirez & Bennett, 2000) (Fung & Mangasarian, 1999) either cannot handle more than a few hundred unlabeled examples, or did not do so in experiments. The SVM-light TSVM implementation (=-=Joachims, 1999-=-) is the first widely used software. De Bie and Cristianini (De Bie & Cristianini, 2004; De Bie & Cristianini, 2006b) relax the TSVM training problem, and transductive learning problems in general to ... |

755 | Probabilistic latent semantic analysis
- HOFMANN
- 1999
(Show Context)
Citation Context ...hen class separation is linear and along the principal component directions, and unlabeled helps by reducing the variance in estimating such directions. Probabilistic Latent Semantic Analysis (PLSA) (=-=Hofmann, 1999-=-) is an important improvement over LSI. Each word in a document is generated by a ‘topic’ (a multinomial, i.e. unigram). Different words in the document may be generated by different topics. Each docu... |

621 | Machine learning
- Mitchell
- 1996
(Show Context)
Citation Context ...g? Now let us turn our attention from machine learning to human learning. It is possible that understanding of the human cognitive model will lead to novel machine learning approaches (Langley, 2006; =-=Mitchell, 2006-=-). We ask the question: Do humans do semi-supervised learning? My hypothesis is yes. We humans accumulate ‘unlabeled’ input data, which we use (often unconsciously) to help building 4110 10 world pop... |

585 | A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts - Pang, Lee |

557 | Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” The - Belkin, Niyogi, et al. - 2006 |

533 | Exploiting generative models in discriminative classifiers
- Jaakkola, Haussler
- 1999
(Show Context)
Citation Context ...nerative model for classification, each labeled example is converted into a fixed-length Fisher score vector, i.e. the derivatives of log likelihood w.r.t. model parameters, for all component models (=-=Jaakkola & Haussler, 1998-=-). These Fisher score vectors are then used in a discriminative classifier 10like an SVM, which empirically has high accuracy. 3 Self-Training Self-training is a commonly used technique for semi-supe... |

450 |
Semi-supervised learning
- Chapelle, Scholkopf, et al.
- 2006
(Show Context)
Citation Context ...ints and the goal is clustering, is only briefly discussed later in the survey. We will follow the above convention in the survey. Q: Where can I learn more? A: A book on semi-supervised learning is (=-=Chapelle et al., 2006-=-c). An older survey can be found in (Seeger, 2001). I gave a tutorial at ICML 2007, the slides can be found at http://pages.cs.wisc.edu/ ∼ jerryzhu/icml07tutorial. html. 2 Generative Models Generative... |

328 | Learning from labeled and unlabeled data using graph mincuts - Blum, Chawla - 2001 |

313 | Employing EM and pool-based active learning for text classification
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...ify the two components. For instance, the mixtures on the second and third line give the same p(x), but they classify x = 0.5 differently. Gaussian is identifiable. Mixture of multivariate Bernoulli (=-=McCallum & Nigam, 1998-=-a) is not identifiable. More discussions on identifiability and semi-supervised learning can be found in e.g. (Ratsaby & Venkatesh, 1995) and (Corduneanu & Jaakkola, 2001). 2.2 Model Correctness If th... |

310 | Spectral grouping using the Nyström method
- Fowlkes, Belongie, et al.
(Show Context)
Citation Context ... in (Delalleau et al., 2005) the authors proposes an induction scheme to classify a new point x by ∑ i∈L∪U f(x) = wxif(xi) ∑ i∈L∪U wxi (17) This can be viewed as an application of the Nyström method (=-=Fowlkes et al., 2004-=-). Yu et al. (2004) report an early attempt on semi-supervised induction using RBF basis functions in a regularization framework. In (Belkin et al., 2004b), the function f does not have to be restrict... |

271 |
Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data
- Donoho, Grimes
(Show Context)
Citation Context ...ed to spectral graph semi-supervised learning. Representative methods include Isomap (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis & Saul, 2000) (Saul & Roweis, 2003), Hessian LLE (=-=Donoho & Grimes, 2003-=-), Laplacian eigenmaps (Belkin & Niyogi, 2003), and semidefinite embedding (SDE) (Weinberger & Saul, 2004) (Weinberger et al., 2004) (Weinberger et al., 2005). If one has some labeled data, for exampl... |

254 | Analyzing the effectiveness and applicability of co-training - Nigam, Ghani - 2000 |

250 | Colorization using optimization - Levin, Lischinski, et al. - 2004 |

235 | Transductive learning via spectral graph partitioning
- Joachims
- 2003
(Show Context)
Citation Context ... is used for approximation. The authors also propose a way to classify unseen points. This spectrum transformation is relatively simple. 6.1.8 Spectral Graph Transducer The spectral graph transducer (=-=Joachims, 2003-=-) can be viewed with a loss function and regularizer minc(f − γ) ⊤ C(f − γ) + f ⊤ Lf (14) s.t.f ⊤ 1 = 0andf ⊤ f = n (15) where γi = √ l−/l+ for positive labeled data, − √ l+/l− for negative data, l− b... |

219 | Semi-supervised support vector machines
- Bennett, Demiriz
(Show Context)
Citation Context ...um margin boundary would be the one with solid lines. However finding the exact transductive SVM solution is NP-hard. Major effort has focused on efficient approximation algorithms. Early algorithms (=-=Bennett & Demiriz, 1999-=-) (Demirez & Bennett, 2000) (Fung & Mangasarian, 1999) either cannot handle more than a few hundred unlabeled examples, or did not do so in experiments. The SVM-light TSVM implementation (Joachims, 19... |

219 | Diffusion kernels on graphs and other discrete input spaces
- Kondor, Lafferty
- 2002
(Show Context)
Citation Context ... the Laplacian. Chapelle et al. (2002) and Smola and Kondor (2003) both show the spectral transformation of a Laplacian results in kernels suitable for semi-supervised learning. The diffusion kernel (=-=Kondor & Lafferty, 2002-=-) corresponds to a spectrum transform of the Laplacian with r(λ) = exp(− σ2 λ) (12) 2 The regularized Gaussian process kernel ∆ + I/σ2 in (Zhu et al., 2003c) corresponds to r(λ) = 1 (13) λ + σ Similar... |

188 | Cluster kernels for semi-supervised learning
- Chapelle, Weston, et al.
- 2002
(Show Context)
Citation Context ...graph computation every time one encounters new points. Zhu et al. (2003c) propose that new test point be classified by its nearest neighbor in L∪U. This is sensible when U is sufficiently large. In (=-=Chapelle et al., 2002-=-) the authors approximate a new point by a linear combination of labeled and unlabeled points. Similarly in (Delalleau et al., 2005) the authors proposes an induction scheme to classify a new point x ... |

180 | Integrating topics and syntax - Griffiths, Steyvers, et al. |

172 | Semi-supervised classification by low density separation - Chapelle, Zien - 2005 |

155 | Enhancing supervised learning with unlabeled data - Goldman, Zhou - 2000 |

142 | Tikhonov regularization and semi-supervised learning on large graphs - Belkin, Matveeva, et al. - 2004 |

138 | Partially Supervised Classification of Text Documents
- Liu, Lee, et al.
- 2002
(Show Context)
Citation Context ...t for text classification with Naive Bayes models. Another set of methods heuristically identify some ‘reliable’ negative examples in the unlabeled set, and use EM on generative (Naive Bayes) models (=-=Liu et al., 2002-=-) or logistic regression (Lee & Liu, 2003). Ranking Given a large collection of items, and a few ‘query’ items, ranking orders the items according to their similarity to the queries. Information retri... |

137 | Maximum entropy discrimination
- Jaakkola, Meila, et al.
- 1999
(Show Context)
Citation Context ...n labeled examples, yet maximally ignorant on unrelated examples. Zhang and Oles (2000) point out that TSVMs may not behave well under some circumstances. The maximum entropy discrimination approach (=-=Jaakkola et al., 1999-=-) also maximizes the margin, and is able to take into account unlabeled data, with SVM as a special case. 5.2 Gaussian Processes Lawrence and Jordan (2005) proposed a Gaussian process approach, which ... |

121 |
A Mixture of Experts Classifier with Learning based on both Labeled and Unlabeled data
- Miller, Uyar
- 1997
(Show Context)
Citation Context ...ation a topic may contain several sub-topics, and will be better modeled by multiple multinomial instead of a single one (Nigam et al., 2000). Some other examples are (Shahshahani & Landgrebe, 1994) (=-=Miller & Uyar, 1997-=-). Another solution is to down-weighing unlabeled data (Corduneanu & Jaakkola, 2001), which is also used by Nigam et al. (2000), and by Callison-Burch et al. (2004) who estimate word alignment for mac... |

114 | Gaussian processes for ordinal regression
- Chu, Ghahramani
- 2005
(Show Context)
Citation Context ...s dense unlabeled data region. However nothing special is done on the process model. Therefore all the benefit of unlabeled data comes from the noise model. A very similar noise model is proposed in (=-=Chu & Ghahramani, 2004-=-) for ordinal regression. Chu et al. (2006) develop Gaussian process models that incorporate pairwise label relations (e.g. two points tends to have similar or different labels). Note such similar-lab... |

113 | The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing paramter
- Castelli, Cover
- 1996
(Show Context)
Citation Context ...saby & Venkatesh, 1995) and (Corduneanu & Jaakkola, 2001). 2.2 Model Correctness If the mixture model assumption is correct, unlabeled data is guaranteed to improve accuracy (Castelli & Cover, 1995) (=-=Castelli & Cover, 1996-=-) (Ratsaby & Venkatesh, 1995). However if the model is wrong, unlabeled data may actually hurt accuracy. Figure 3 shows an example. This has been observed by multiple researchers. Cozman et al. (2003)... |

112 | Does Baum-Welch re-estimation help taggers
- Elworthy
- 1994
(Show Context)
Citation Context ... training Hidden Markov Model with unlabeled data (the Baum-Welsh algorithm, which by the way qualifies as semi-supervised learning on sequences) can reduce accuracy under certain initial conditions (=-=Elworthy, 1994-=-). See (Cozman et al., 2003) for a more recent argument. Not much is in the literature though, presumably because of the publication bias. Q: How many semi-supervised learning methods are there? A: Ma... |

111 | Latent semantic kernels - Cristianini, Shawe-Taylor, et al. - 2002 |

107 | Active + semi-supervised learning = robust multi-view learning - Muslea, Minton, et al. - 2002 |

100 | Semi-supervised learning by entropy minimization - Grandvalet, Bengio - 2005 |

95 | On manifold regularization
- Belkin, Niyogi, et al.
- 2005
(Show Context)
Citation Context ...the loss function and regularizer: 1/k ∑ (fi − yi) 2 + γf ⊤ Sf (9) where S = ∆ or ∆ p for some integer p. 6.1.6 Manifold Regularization i The manifold regularization framework (Belkin et al., 2004b) (=-=Belkin et al., 2005-=-) employs two regularization terms: 1 l l∑ i=1 V (xi, yi, f) + γA||f|| 2 K + γI||f|| 2 I (10) where V is an arbitrary loss function, K is a ‘base kernel’, e.g. a linear or RBF kernel. I is a regulariz... |

95 | Y.: Kernel conditional random fields: representation and clique selection - Lafferty, Zhu, et al. - 2004 |

92 | Semisupervised clustering using genetic algorithms
- Demiriz, Bennett, et al.
- 1999
(Show Context)
Citation Context ...on that instead of using an probabilistic generative mixture model, some approaches employ various clustering algorithms to cluster the whole dataset, then label each cluster with labeled data, e.g. (=-=Demiriz et al., 1999-=-) (Dara et al., 2002). Although they can perform well if the particular clustering algorithms match the true data distribution, these approaches are hard to analyze due to their algorithmic nature. 2.... |

90 | Seeing stars when there aren’t many stars: graph-based semisupervised learning for sentiment categorization - Goldberg, Zhu - 2006 |

88 |
On the exponential value of labeled samples
- Castelli, Cover
- 1995
(Show Context)
Citation Context ...can be found in e.g. (Ratsaby & Venkatesh, 1995) and (Corduneanu & Jaakkola, 2001). 2.2 Model Correctness If the mixture model assumption is correct, unlabeled data is guaranteed to improve accuracy (=-=Castelli & Cover, 1995-=-) (Castelli & Cover, 1996) (Ratsaby & Venkatesh, 1995). However if the model is wrong, unlabeled data may actually hurt accuracy. Figure 3 shows an example. This has been observed by multiple research... |

87 | Trading convexity for scalability - Collobert, Sinz, et al. - 2006 |

79 | Semi-supervised learning using randomized mincuts - Blum, Rwebangira, et al. - 2004 |

73 | Co-training and expansion: Towards bridging theory and practice - Balcan, Blum, et al. |

70 | The role of unlabeled data in supervised learning
- Mitchell
- 1999
(Show Context)
Citation Context ..., there has been some analyzer’s on convergence. See e.g. (Haffari & Sarkar, 2007; Culp & Michailidis, 2007). 4 Co-Training and Multiview Learning 4.1 Co-Training Co-training (Blum & Mitchell, 1998) (=-=Mitchell, 1999-=-) assumes that (i) features can be split into two sets; (ii) each sub-feature set is sufficient to train a good classifier; (iii) the two sets are conditionally independent given the class. Initially ... |

67 | Using unlabeled data to improve text classification. Doctoral dissertation
- Nigam
- 2001
(Show Context)
Citation Context ...al., 1977). EM is prone to local maxima. If a local maximum is far from the global maximum, unlabeled data may again hurt learning. Remedies include smart choice of starting point by active learning (=-=Nigam, 2001-=-). 2.4 Cluster-and-Label We shall also mention that instead of using an probabilistic generative mixture model, some approaches employ various clustering algorithms to cluster the whole dataset, then ... |

65 | Maximum margin semi-supervised learning for structured variables - Altun, McAllester, et al. - 2006 |

65 | Semi-supervised support vector machines for unlabeled data classification
- Fung, Mangasarian
- 2001
(Show Context)
Citation Context .... However finding the exact transductive SVM solution is NP-hard. Major effort has focused on efficient approximation algorithms. Early algorithms (Bennett & Demiriz, 1999) (Demirez & Bennett, 2000) (=-=Fung & Mangasarian, 1999-=-) either cannot handle more than a few hundred unlabeled examples, or did not do so in experiments. The SVM-light TSVM implementation (Joachims, 1999) is the first widely used software. De Bie and Cri... |

63 |
Can Infants Map Meaning to Newly Segmented Words?: Statistical Segmentation and Word Learning
- Estes, Evans, et al.
- 2007
(Show Context)
Citation Context ...supervised learning. 4313.2 Infant Word-Meaning Mapping 17-month old infants were shown to be able to associate a word with a visual object better if they have heard the word many times before (Graf =-=Estes et al., 2006-=-). If the word was not heard before, the infant’s ability to associate it with the object was weaker. If we view the sound of the word as unlabeled data, and the object as the label, we can propose a ... |

59 | A pac-style model for learning from labeled and unlabeled data - Balcan, Blum - 2005 |

57 | Probabilistic modeling for face orientation discrimination: learning from labeled and unlabeled data - Baluja - 1998 |

55 | Semi-supervised learning of mixture models
- Cozman, Cohen, et al.
- 2003
(Show Context)
Citation Context ...v Model with unlabeled data (the Baum-Welsh algorithm, which by the way qualifies as semi-supervised learning on sequences) can reduce accuracy under certain initial conditions (Elworthy, 1994). See (=-=Cozman et al., 2003-=-) for a more recent argument. Not much is in the literature though, presumably because of the publication bias. Q: How many semi-supervised learning methods are there? A: Many. Some often-used methods... |

55 | Learning with positive and unlabeled examples using weighted logistic regression
- Lee, Liu
- 2003
(Show Context)
Citation Context ... models. Another set of methods heuristically identify some ‘reliable’ negative examples in the unlabeled set, and use EM on generative (Naive Bayes) models (Liu et al., 2002) or logistic regression (=-=Lee & Liu, 2003-=-). Ranking Given a large collection of items, and a few ‘query’ items, ranking orders the items according to their similarity to the queries. Information retrieval is the standard technique under this... |

54 | Semi-supervised learning via gaussian processes - Lawrence, Jordan - 2004 |

49 | Efficient non-parametric function induction in semi-supervised learning
- Delalleau, Bengio, et al.
- 2005
(Show Context)
Citation Context ...e model to ‘carve up’ the original L ∪ U dataset. Learning on the smaller graph is much faster. Similar ideas have been used for e.g. dimensionality reduction (Teh & Roweis, 2002). The heuristics in (=-=Delalleau et al., 2005-=-) similarly create a small graph with a subset of the unlabeled data. They enables fast approximate computation by reducing the problem size. Garcke and Griebel (2005) propose the use of sparse grids ... |

47 | On semi-supervised classification - Krishnapuram, Williams, et al. - 2004 |

46 | Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials - Grady, Funka-Lea |

42 | A continuation method for semi-supervised SVMs
- Chapelle, Chi, et al.
- 2006
(Show Context)
Citation Context ...ints and the goal is clustering, is only briefly discussed later in the survey. We will follow the above convention in the survey. Q: Where can I learn more? A: A book on semi-supervised learning is (=-=Chapelle et al., 2006-=-c). An older survey can be found in (Seeger, 2001). I gave a tutorial at ICML 2007, the slides can be found at http://pages.cs.wisc.edu/ ∼ jerryzhu/icml07tutorial. html. 2 Generative Models Generative... |

42 | Relational learning with Gaussian processes - Chu, Sindhwani, et al. - 2007 |

39 | Measure based regularization - Bousquet, Chapelle, et al. - 2003 |

39 |
On information regularization
- Corduneanu, Jaakkola
- 2003
(Show Context)
Citation Context ...uct of p(x) mass in a region with I(x; y) (normalized by a variance term). The minimization is carried out on multiple overlapping regions covering the data space. The theory is developed further in (=-=Corduneanu & Jaakkola, 2003-=-). Corduneanu and Jaakkola (2005) extend the work by formulating semi-supervised learning as a communication problem. Regularization is expressed as the rate of information, which again discourages co... |

39 | Unsupervised and semi-supervised clustering: a brief survey. A review of machine learning
- Grira, Crucianu, et al.
- 2004
(Show Context)
Citation Context ...um of squared distances within clusters). Procedurally one can modify the distance metric to try to accommodate the constraints, or one can bias the search. We refer readers to a recent short survey (=-=Grira et al., 2004-=-) for the literatures. 11.4 Semi-supervised Regression In principle all graph-based semi-supervised classification methods in section 6 are indeed function estimators. That is, they estimate ‘soft lab... |

37 | Proximity graphs for clustering and manifold learning - Carreira-Perpinán, Zemel - 2004 |

34 | Text classification from positive and unlabeled examples - Denis, Gilleron, et al. - 1927 |

34 | Manifold denoising - Hein, Maier - 2006 |

34 | Learning to Extract Entities from Labeled and Unlabeled Text - Jones - 2005 |

33 | Semi-supervised sequence modeling with syntactic topic models - Li, McCallum - 2005 |

29 | Word sense disambiguation using label propagation based semi-supervised learning - Niu, Ji, et al. - 2005 |

27 | Semi-supervised learning for structured output variables
- Brefeld, Scheffer
- 2006
(Show Context)
Citation Context ...long history (de Sa, 1993). It has been applied to semi-supervised regression (Sindhwani et al., 2005b; Brefeld et al., 2006), and the more challenging structured output spaces (Brefeld et al., 2005; =-=Brefeld & Scheffer, 2006-=-). Some theoretical analysis on the value of agreement among multiple learners can be found in (Leskes, 2005; Farquhar et al., 2006). 5 Avoiding Changes in Dense Regions 5.1 Transductive SVMs (S3VMs) ... |

27 | Branch and bound for semi-supervised support vector machines
- Chapelle, Sindhwani, et al.
- 2007
(Show Context)
Citation Context ...ints and the goal is clustering, is only briefly discussed later in the survey. We will follow the above convention in the survey. Q: Where can I learn more? A: A book on semi-supervised learning is (=-=Chapelle et al., 2006-=-c). An older survey can be found in (Seeger, 2001). I gave a tutorial at ICML 2007, the slides can be found at http://pages.cs.wisc.edu/ ∼ jerryzhu/icml07tutorial. html. 2 Generative Models Generative... |

27 | A hybrid generative/discriminative approach to text classification with additional information - Fujino, Ueda, et al. - 2007 |

26 | Statistical machine translation with word- and sentence-aligned parallel corpora - Callison-Burch, Talbot, et al. - 2004 |

26 | R.: Learning to model spatial dependency: Semi-supervised discriminative random fields - Lee, Wang, et al. - 2007 |

25 | Person identification in webcam images: An application of semi-supervised learning - Balcan, Blum, et al. - 2005 |

25 | Semi-supervised learning with trees - Kemp, Griffiths, et al. - 2003 |

23 | Hyperparameter and kernel learning for graph based semi-supervised classification - Kapoor, Qi, et al. - 2005 |

23 | On the relation between low density separation, spectral clustering and graph cuts - Narayanan, Belkin, et al. - 2007 |

22 | On transductive regression - Cortes, Mohri - 2006 |

22 | Generalization error bounds using unlabeled data - Kaariainen - 2005 |

20 | Link-based classification using labeled and unlabeled data - Lu, Getoor - 2003 |

19 |
Efficient co-regularized least squares regression
- Brefeld, Gaertner, et al.
(Show Context)
Citation Context ... required to make similar predictions on any given unlabeled instance. Multiview learning has a long history (de Sa, 1993). It has been applied to semi-supervised regression (Sindhwani et al., 2005b; =-=Brefeld et al., 2006-=-), and the more challenging structured output spaces (Brefeld et al., 2005; Brefeld & Scheffer, 2006). Some theoretical analysis on the value of agreement among multiple learners can be found in (Lesk... |

19 | Optimization approaches to semi-supervised learning
- Demiriz, Bennett
- 2000
(Show Context)
Citation Context ...e the one with solid lines. However finding the exact transductive SVM solution is NP-hard. Major effort has focused on efficient approximation algorithms. Early algorithms (Bennett & Demiriz, 1999) (=-=Demirez & Bennett, 2000-=-) (Fung & Mangasarian, 1999) either cannot handle more than a few hundred unlabeled examples, or did not do so in experiments. The SVM-light TSVM implementation (Joachims, 1999) is the first widely us... |

18 | Semi-supervised learning with sparse grids - Garcke, Griebel - 2005 |

18 | The value of agreement a new boosting algorithm
- Leskes, Torenvliet
- 2008
(Show Context)
Citation Context ...2006), and the more challenging structured output spaces (Brefeld et al., 2005; Brefeld & Scheffer, 2006). Some theoretical analysis on the value of agreement among multiple learners can be found in (=-=Leskes, 2005-=-; Farquhar et al., 2006). 5 Avoiding Changes in Dense Regions 5.1 Transductive SVMs (S3VMs) Discriminative methods work on p(y|x) directly. This brings up the danger of leaving p(x) outside of the par... |

17 | The Canonical Distortion Measure for Vector Quantization and Function Approximation - Baxter - 1997 |

17 | Spectral graph theory. Regional conference series - Chung - 1997 |

17 | Word sense disambiguation with semi-supervised learning - Pham, Ng, et al. - 2005 |

15 | Co-training for predicting emotions with spoken dialogue data - Maeireizo, Litman, et al. - 2004 |

14 | Exploiting unlabelled data for hybrid object classification - Holub, Welling, et al. - 2005 |

13 | Stable mixing of complete and incomplete information
- Corduneanu, Jaakkola
- 2001
(Show Context)
Citation Context ...Mixture of multivariate Bernoulli (McCallum & Nigam, 1998a) is not identifiable. More discussions on identifiability and semi-supervised learning can be found in e.g. (Ratsaby & Venkatesh, 1995) and (=-=Corduneanu & Jaakkola, 2001-=-). 2.2 Model Correctness If the mixture model assumption is correct, unlabeled data is guaranteed to improve accuracy (Castelli & Cover, 1995) (Castelli & Cover, 1996) (Ratsaby & Venkatesh, 1995). How... |

12 | Distributed information regularization on graphs - Corduneanu, Jaakkola - 2005 |

10 |
Clustering unlabeled data with soms improves classification of labeled real-world data
- Dara, Kremer, et al.
(Show Context)
Citation Context ...g an probabilistic generative mixture model, some approaches employ various clustering algorithms to cluster the whole dataset, then label each cluster with labeled data, e.g. (Demiriz et al., 1999) (=-=Dara et al., 2002-=-). Although they can perform well if the particular clustering algorithms match the true data distribution, these approaches are hard to analyze due to their algorithmic nature. 2.5 Fisher kernel for ... |

7 | Fast computational methods for visually guided robots - Mahdaviani, Freitas, et al. - 2005 |

6 | Semi-supervised learning with conditional harmonic mixing - Burges, Platt - 2005 |

5 | Semi-supervised learning a statistical physics approach - Getz, Shental, et al. - 2005 |

4 |
Efficient approximation methods for harmonic semi- supervised learning
- Argyriou
- 2004
(Show Context)
Citation Context ...m. Many methods are also transductive (section 6.4). In 2005 several papers start to address these problems. Fast computation of the harmonic function with conjugate gradient methods is discussed in (=-=Argyriou, 2004-=-). A comparison of three iterative methods: label propagation, conjugate gradient and loopy belief propagation is presented in (Zhu, 2005) Appendix F. Recently numerical methods for fast N-body proble... |

4 | Intelligent behavior in humans and machines
- Langley
- 2006
(Show Context)
Citation Context ...ervised Learning? Now let us turn our attention from machine learning to human learning. It is possible that understanding of the human cognitive model will lead to novel machine learning approaches (=-=Langley, 2006-=-; Mitchell, 2006). We ask the question: Do humans do semi-supervised learning? My hypothesis is yes. We humans accumulate ‘unlabeled’ input data, which we use (often unconsciously) to help building 41... |

4 |
Co-Validation: Using Model Disagreement to Validate Classification Algorithms. Neural Information Processing Systems
- Madani, Pennock, et al.
- 2004
(Show Context)
Citation Context ... very general and can be applied to almost any learning 40algorithms. However it only selects among hypotheses; it does not generate new hypothesis based on unlabeled data. The co-validation method (=-=Madani et al., 2005-=-) also uses unlabeled data for model selection and active learning. Kaariainen (2005) uses the metric to derive a generalization error bound, see Section 9. 11.10 Multi-Instance Learning In multi-inst... |

2 | Splitting the unsupervised and supervised components of semi-supervised learning - Oliveira, Cozman, et al. - 2005 |