## Large scale transductive svms

### Cached

### Download Links

Venue: | JMLR |

Citations: | 62 - 5 self |

### BibTeX

@ARTICLE{Collobert_largescale,

author = {Ronan Collobert and Fabian Sinz and Jason Weston and Léon Bottou and Thorsten Joachims},

title = {Large scale transductive svms},

journal = {JMLR},

year = {},

pages = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction. html. Keywords: transduction, transductive SVMs, semi-supervised learning, CCCP 1. Introduction Transductive support vector machines (TSVMs) (=-=Vapnik, 1995-=-) are a method of improving the generalization accuracy of SVMs (Boser et al., 1992) by using unlabeled data. TSVMs, like SVMs, learn a large margin hyperplane classifier using labeled training data, ... |

1291 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
(Show Context)
Citation Context ...nsduction, transductive SVMs, semi-supervised learning, CCCP 1. Introduction Transductive support vector machines (TSVMs) (Vapnik, 1995) are a method of improving the generalization accuracy of SVMs (=-=Boser et al., 1992-=-) by using unlabeled data. TSVMs, like SVMs, learn a large margin hyperplane classifier using labeled training data, but simultaneously force this hyperplane to be far away from the unlabeled data. On... |

1011 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1998
(Show Context)
Citation Context ...scent or stochastic gradient descent, often involve delicate hyper-parameters (LeCun et al., 1998). In contrast, convex optimization seems much more straight-forward. For instance, the SMO algorithm (=-=Platt, 1999-=-) locates the SVM solution efficiently and reliably. We propose to solve this non-convex problem using the “Concave-Convex Procedure” (CCCP) (Yuille and Rangarajan, 2002). The CCCP procedure is closel... |

803 |
Estimation of Dependencies Based on Empirical Data
- Vapnik
- 1979
(Show Context)
Citation Context ... test patterns only. They do not apply to test patterns that were not given to the algorithm in the first place. As a consequence, transductive bounds are purely derived from combinatorial arguments (=-=Vapnik, 1982-=-) and are more easily made data-dependent (Bottou et al., 1994; Derbeko et al., 2004). Whether this is a fundamental property or a technical issue is a matter of debate. Experiments The following expe... |

683 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ... model, and Φ(·) is the chosen feature map, often implemented implicitly using the kernel trick (Vapnik, 1995). TSVM Formulation The original TSVM optimization problem is the following (Vapnik, 1995; =-=Joachims, 1999-=-b; Bennett and Demiriz, 1998). Given a training setL and a test setU , find among the possible binary vectors {Y = (yL+1,...,yL+U)} the one such that an SVM trained onL ∪(U ×Y ) yields the largest mar... |

468 | Making large-scale support vector machine learning practical
- Joachims
- 1999
(Show Context)
Citation Context ... model, and Φ(·) is the chosen feature map, often implemented implicitly using the kernel trick (Vapnik, 1995). TSVM Formulation The original TSVM optimization problem is the following (Vapnik, 1995; =-=Joachims, 1999-=-b; Bennett and Demiriz, 1998). Given a training setL and a test setU , find among the possible binary vectors {Y = (yL+1,...,yL+U)} the one such that an SVM trained onL ∪(U ×Y ) yields the largest mar... |

436 | Rcv1: A new benchmark collection for text categorization research - Lewis, Yang, et al. |

434 | Learning with local and global consistency
- Zhou, Bousquet, et al.
(Show Context)
Citation Context ... et al., 2003). Other notable methods include generalizations of nearest-neighbor or Parzen window type approaches to learning manifolds given labeled data (Zhu et al., 2003; Belkin and Niyogi, 2002; =-=Zhou et al., 2004-=-). Finally, Bayesian approaches have also been pursued (Graepel et al., 2000; Lawrence and Jordan, 2005). We note that some cluster kernel methods (Chapelle and Zien, 2005) can perform significantly b... |

206 | Partially labeled classification with markov random walks - Szummer, Jaakkola - 2006 |

177 | Kernel principal component analysis
- Schölkopf, Müller
- 1999
(Show Context)
Citation Context ...code available at: http://www.kyb.tuebingen.mpg.de/bs/people/chapelle/lds. Since the gradient descent is carried out in the primal, to learn nonlinear functions it is necessary to perform kernel PCA (=-=Schölkopf et al., 1997-=-). The overall algorithm has a time complexity equal to the square of the number of variables times the complexity of evaluating the cost function. In this case, evaluating the objective scales linear... |

174 | Semi-supervised support vector machines
- Bennett, Demiriz
- 1999
(Show Context)
Citation Context ...is the chosen feature map, often implemented implicitly using the kernel trick (Vapnik, 1995). TSVM Formulation The original TSVM optimization problem is the following (Vapnik, 1995; Joachims, 1999b; =-=Bennett and Demiriz, 1998-=-). Given a training set L and a test set U, find among the possible binary vectors {Y = (yL+1, . . . , yL+U)} the one such that an SVM trained on L ∪ (U × Y) yields the largest margin. This is a combi... |

152 | Cluster kernels for semisupervised learning. Advances in neural information processing systems - Chapelle, Weston, et al. - 2002 |

125 | Efficient backprop
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...zing a non-convex cost function is often considered difficult. Gradient descent techniques, such as conjugate gradient descent or stochastic gradient descent, often involve delicate hyper-parameters (=-=LeCun et al., 1998-=-). In contrast, convex optimization seems much more straight-forward. For instance, the SMO algorithm (Platt, 1999) locates the SVM solution efficiently and reliably. We propose to solve this non-conv... |

121 | Semi-supervised classification by low density separation
- Chapelle, Zien
(Show Context)
Citation Context ...ing this algorithm, in the context of semi-supervised learning is that one is finding a decision boundary that lies in a region of low density, implementing the so-called cluster assumption (see e.g. =-=Chapelle and Zien, 2005-=-). In this framework, if you believe the underlying distribution of the two classes is such that there is a “gap” or low density region between them, then TSVMs can help because it selects a rule with... |

112 | Beyond the point cloud: from transductive to semi-supervised learning
- Sindhwani, Niyogi, et al.
(Show Context)
Citation Context ...n the new space are small if they are in the same cluster or on the same manifold. Some of the main methods include (Chapelle 1704sLARGE SCALE TRANSDUCTIVE SVMS et al., 2002; Chapelle and Zien, 2005; =-=Sindhwani et al., 2005-=-; Szummer and Jaakkola, 2001b); and (Weston et al., 2003). Other notable methods include generalizations of nearest-neighbor or Parzen window type approaches to learning manifolds given labeled data (... |

103 | Fast Kernel Classifiers with Online and Active Learning - Bottou, Bordes, et al. - 2005 |

101 | Asymptotic behaviors of support vector machines with Gaussian kernel - Keerthi, Lin |

80 | A modified finite newton method for fast solution of large scale linear SVMs
- Keerthi, DeCoste
(Show Context)
Citation Context ...standard SVM training is that any improvements in SVM scalability can immediately also be applied to TSVMs. For example in the linear case, one could easily apply fast linear SVM training such as in (=-=Keerthi and DeCoste, 2005-=-) to produce very fast linear TSVMs. For the nonlinear case, one could apply the online SVM training scheme of Bordes et al. (2005) to give a fast online transductive learning procedure. Acknowledgmen... |

77 | Maximum margin clustering
- Xu, Neufeld, et al.
- 2004
(Show Context)
Citation Context ...ales as (L+U) 3 , where L and U are the numbers of labeled and unlabeled examples. This method also stores the entire (L +U) ×(L +U) kernel matrix in memory. Other methods (Bie and Cristianini, 2004; =-=Xu et al., 2005-=-) transform the non-convex transductive problem into a convex semi-definite programming problem that scales as (L+U) 4 or worse. In this article we introduce a large scale training method for TSVMs us... |

75 |
Using manifold structure for partially labeled classification. Advances in neural information processing systems (NIPS) (Vol. 15
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...kola, 2001b); and (Weston et al., 2003). Other notable methods include generalizations of nearest-neighbor or Parzen window type approaches to learning manifolds given labeled data (Zhu et al., 2003; =-=Belkin and Niyogi, 2002-=-; Zhou et al., 2004). Finally, Bayesian approaches have also been pursued (Graepel et al., 2000; Lawrence and Jordan, 2005). We note that some cluster kernel methods (Chapelle and Zien, 2005) can perf... |

67 | Improved Generalization Through Explicit Optimization of Margins - Mason, Bartlett, et al. - 2000 |

52 | Trading convexity for scalability
- Collobert, Sinz, et al.
(Show Context)
Citation Context ...this article we introduce a large scale training method for TSVMs using the concave-convex procedure (CCCP) (Yuille and Rangarajan, 2002; Le Thi, 1994), expanding on the conference proceedings paper (=-=Collobert et al., 2006-=-). CCCP iteratively optimizes non-convex cost functions that can be expressed as the sum of a convex function and a concave function. The optimization is carried out iteratively by solving a sequence ... |

52 | Kernel methods for missing variables
- Smola, Vishwanathan, et al.
- 2005
(Show Context)
Citation Context ...vex” (DC) methods that have been developed by the optimization community during the last two decades (Le Thi, 1994). Such techniques have already been applied for dealing with missing values in SVMs (=-=Smola et al., 2005-=-), for improving boosting algorithms (Krause and Singer, 2004), and in the “Ψ-learning” framework (Shen et al., 2003). Assume that a cost function J(θ) can be rewritten as the sum of a convex part Jve... |

49 | Semi-supervised support vector machines for unlabeled data classification - Fung, Mangasarian |

45 | A.: The concave-convex procedure (CCCP
- Yuille, Rangarajan
- 2001
(Show Context)
Citation Context ...roblem into a convex semi-definite programming problem that scales as (L+U) 4 or worse. In this article we introduce a large scale training method for TSVMs using the concave-convex procedure (CCCP) (=-=Yuille and Rangarajan, 2002-=-; Le Thi, 1994), expanding on the conference proceedings paper (Collobert et al., 2006). CCCP iteratively optimizes non-convex cost functions that can be expressed as the sum of a convex function and ... |

40 |
Semi-supervised learning via gaussian processes
- Lawrence, Jordan
- 2004
(Show Context)
Citation Context ...dow type approaches to learning manifolds given labeled data (Zhu et al., 2003; Belkin and Niyogi, 2002; Zhou et al., 2004). Finally, Bayesian approaches have also been pursued (Graepel et al., 2000; =-=Lawrence and Jordan, 2005-=-). We note that some cluster kernel methods (Chapelle and Zien, 2005) can perform significantly better than TSVM on some data sets. In fact, Chapelle and Zien (2005) show that, as these methods provid... |

39 | Convex methods for transduction - Bie, Cristianini - 2003 |

22 | Explicit learning curves for transduction and application to clustering and compression algorithms
- Derbeko, El-Yaniv, et al.
- 2004
(Show Context)
Citation Context ...to the algorithm in the first place. As a consequence, transductive bounds are purely derived from combinatorial arguments (Vapnik, 1982) and are more easily made data-dependent (Bottou et al., 1994; =-=Derbeko et al., 2004-=-). Whether this is a fundamental property or a technical issue is a matter of debate. Experiments The following experiments attempt to determine whether the benefits of TSVMs are solely caused by the ... |

19 | Léon Bottou. Fast kernel classifiers with online and active learning - Bordes, Ertekin, et al. |

14 | Leveraging the margin more carefully
- Krause, Singer
- 2004
(Show Context)
Citation Context ...zation community during the last two decades (Le Thi, 1994). Such techniques have already been applied for dealing with missing values in SVMs (Smola et al., 2005), for improving boosting algorithms (=-=Krause and Singer, 2004-=-), and in the “Ψ-learning” framework (Shen et al., 2003). Assume that a cost function J(θ) can be rewritten as the sum of a convex part Jvex(θ) and a concave part Jcav(θ). Each iteration of the CCCP p... |

13 | On (psi)-learning - Shen, Tseng, et al. |

10 | Bayesian Transduction
- Graepel, Herbrich, et al.
- 2000
(Show Context)
Citation Context ...neighbor or Parzen window type approaches to learning manifolds given labeled data (Zhu et al., 2003; Belkin and Niyogi, 2002; Zhou et al., 2004). Finally, Bayesian approaches have also been pursued (=-=Graepel et al., 2000-=-; Lawrence and Jordan, 2005). We note that some cluster kernel methods (Chapelle and Zien, 2005) can perform significantly better than TSVM on some data sets. In fact, Chapelle and Zien (2005) show th... |

6 |
Cluster kernels for semisupervised protein classification
- Weston, Leslie, et al.
- 2003
(Show Context)
Citation Context ... or 18Large Scale Transductive SVMs on the same manifold. Some of the main methods include (Chapelle et al., 2002; Chapelle and Zien, 2005; Sindhwani et al., 2005; Szummer and Jaakkola, 2001b); and (=-=Weston et al., 2003-=-). Other notable methods include generalizations of nearest-neighbor or Parzen window type approaches to learning manifolds given labeled data (Zhu et al., 2003; Belkin and Niyogi, 2002; Zhou et al., ... |

4 | On the effective VC dimension
- Bottou, Cortes, et al.
- 1994
(Show Context)
Citation Context ... that were not given to the algorithm in the first place. As a consequence, transductive bounds are purely derived from combinatorial arguments (Vapnik, 1982) and are more easily made data-dependent (=-=Bottou et al., 1994-=-; Derbeko et al., 2004). Whether this is a fundamental property or a technical issue is a matter of debate. Experiments The following experiments attempt to determine whether the benefits of TSVMs are... |

3 | Analyse numérique des algorithmes de l’optimisation d.c. approches locales et globale. codes et simulations numériques en grande dimension. applications. Doctoral dissertation - Thi - 1994 |

1 | object image libary (coil-20 - Columbia - 1996 |

1 | Transductive SVMs - Scale |

1 |
Working document: Trading convexity for scalability
- Collobert, Weston, et al.
- 2005
(Show Context)
Citation Context ...we introduce a large scale training method for TSVMs using the ConcaveConcave Procedure (CCCP) (Yuille and Rangarajan, 2002; Thi, 1994). The current work is an expanded version of a technical report (=-=Collobert et al., 2005-=-). CCCP iteratively optimizes non-convex cost functions that can be expressed as the sum of a convex function and a concave function. The optimization is carried out iteratively by solving a sequence ... |

1 | 24 Scale Transductive SVMs - Derbeko, El-Yaniv, et al. |