## Less is more: Active learning with support vector machines (2000)

### Cached

### Download Links

- [wexler.free.fr]
- [wexler.free.fr]
- [wexler.free.fr]
- [www.ai.mit.edu]
- [www.cs.wustl.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 221 - 1 self |

### BibTeX

@INPROCEEDINGS{Schohn00lessis,

author = {Greg Schohn and David Cohn},

title = {Less is more: Active learning with support vector machines},

booktitle = {},

year = {2000},

pages = {839--846},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a simple active learning heuristic which greatly enhances the generalization behavior of support vector machines (SVMs) on several practical document classification tasks. We observe a number of benefits, the most surprising of which is that a SVM trained on a wellchosen subset of the available corpus frequently performs better than one trained on all available data. The heuristic for choosing this subset is simple to compute, and makes no use of information about the test set. Given that the training time of SVMs depends heavily on the training set size, our heuristic not only offers better performance with fewer data, it frequently does so in less time than the naive approach of training on all available data. 1.

### Citations

9811 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...margin, � This notion of “optimality” is not directly tied to the performance of the classifier. There is, however, evidence that maximizing the margin acts as a form of structural risk minimization (=-=Vapnik, 1998-=-). and thus limit the position of the optimal hyperplane. These ¤ � are the support vectors. The ¤ � for which � � © � also have special meaning — these are bound examples, examples which are incorrec... |

1867 | Text Categorization with Support Vector Machines: Learning with Many Relevant Features
- Joachims
- 1998
(Show Context)
Citation Context ...e incorrectly classified or are within the margin of the hyperplane. Support vector machines have demonstrated excellent performance in many domains, particularly those involving text classification (=-=Joachims, 1998-=-b; Dumais et al., 1998). Recent advances (Joachims, 1998a; Platt, 1998) have also sped up the optimization problem such that it is practical to solve support vector problems involving tens of thousand... |

1536 |
Making large-Scale SVM Learning Practical
- Joachims
- 1999
(Show Context)
Citation Context ...e incorrectly classified or are within the margin of the hyperplane. Support vector machines have demonstrated excellent performance in many domains, particularly those involving text classification (=-=Joachims, 1998-=-b; Dumais et al., 1998). Recent advances (Joachims, 1998a; Platt, 1998) have also sped up the optimization problem such that it is practical to solve support vector problems involving tens of thousand... |

1403 | A training algorithm for optimal margin classifiers
- Boser
- 1992
(Show Context)
Citation Context ...benefitting from a better heuristic to remove noise. 5.4 Relation to Previous Work Most SVM solvers reduce the size of original problem by disregarding dormant examples in the training set. Chunking (=-=Boser et al., 1992-=-) and shrinking (Joachims, 1998a) use heuristics to reduce the size of the training set. Chunking solves sub-problems by iteratively building a set of examples, using those that violate the optimality... |

1103 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ...ort vector machines have demonstrated excellent performance in many domains, particularly those involving text classification (Joachims, 1998b; Dumais et al., 1998). Recent advances (Joachims, 1998a; =-=Platt, 1998-=-) have also sped up the optimization problem such that it is practical to solve support vector problems involving tens of thousands of documents in a reasonable amount of time. The complexity of findi... |

755 | Probabilistic outputs for support vector machines and comparison to regularized likelihood methods - Platt - 2000 |

667 |
Queries and concept learning
- Angluin
- 1988
(Show Context)
Citation Context ...data requirements for some problems decrease drastically. In special cases, even the computational requirements decrease, and some NP-complete learning problems become polynomial in computation time (=-=Angluin, 1988-=-; Baum & Lang, 1991). In this paper, we will focus on a form of active learning called selective sampling. In selective sampling, the learner is presented with a large corpus of unlabeled examples, an... |

557 | Active learning with statistical models - Cohn, Ghahramani, et al. - 1996 |

554 | Inductive learning algorithms and representation for text categorization
- Dumain, Platt, et al.
- 1998
(Show Context)
Citation Context ...ssified or are within the margin of the hyperplane. Support vector machines have demonstrated excellent performance in many domains, particularly those involving text classification (Joachims, 1998b; =-=Dumais et al., 1998-=-). Recent advances (Joachims, 1998a; Platt, 1998) have also sped up the optimization problem such that it is practical to solve support vector problems involving tens of thousands of documents in a re... |

548 | Support vector machine active learning with applications to text classification - Tong, Koller |

547 | An evaluation of statistical approaches to text categorization
- Yang
- 1999
(Show Context)
Citation Context ...he documents. It is worth noting that the active learner’s performance is strongest over other methods when the split between categories is most uneven, where the smaller class can The macro-average (=-=Yang, 1999-=-) is the average over each class instead of each document. In other words, the average of each classes accuracy. be quickly exhausted. This is consistent with the relative parity between random and ac... |

505 | A sequential algorithm for training text classifiers - Lewis, Gale - 1994 |

278 | An improved training algorithm for support vector machines
- Osuna, Girosi
- 1995
(Show Context)
Citation Context ...iments discussed for the rest of the paper use ¨�©¦¡ . We used a QP solver based on Joachims (1998a) to train the SVMs at each iteration for the USENET data. The solver used the working set strategy (=-=Osuna et al., 1997-=-) with four elements and the PR LOQO solver (Smola, 1998) without shrinking to solve each QP sub-problem. We used a version of Platt’s SMO algorithm (Keerthi et al., 1999) to train the SVMs for the Re... |

205 | Improvements to platt’s smo algorithm for svm classifier design. Neural Computation 13(3):637–649
- Keerthi, Shevade, et al.
- 2001
(Show Context)
Citation Context ... used the working set strategy (Osuna et al., 1997) with four elements and the PR LOQO solver (Smola, 1998) without shrinking to solve each QP sub-problem. We used a version of Platt’s SMO algorithm (=-=Keerthi et al., 1999-=-) to train the SVMs for the Reuters data. Both algorithms produce comparable results on all data sets; the different methods were chosen in the interest of computational efficiency. 3.3 USENET – “20 N... |

171 | Learning to classify text from labeled and unlabeled documents
- Nigam, McCallum, et al.
- 1998
(Show Context)
Citation Context ... an active learning setting.sthan 100,000 dimensions, making many traditional machine learning approaches infeasible. The model however, has been shown to work very well with naive Bayes classifiers (=-=Nigam et al., 1998-=-) and SVMs (Joachims, 1998b). 3.2 Experimental setup We ran experiments on two text domains: binary classification of four newsgroup pairs from the “20 Newsgroups” data set (Nigam et al., 1998), and t... |

160 |
Reuters-21578 text categorization test collection
- Lewis
- 1997
(Show Context)
Citation Context ...two text domains: binary classification of four newsgroup pairs from the “20 Newsgroups” data set (Nigam et al., 1998), and topic classification on a subset of five topics from Reuters news articles (=-=Lewis, 1997-=-). Each document was normalized for document length, but no other weighting (such as TFIDF) was performed on the vectors. accuracy 0.95 0.9 0.85 0.8 0.75 4 per iteration 8 per iteration 0.7 16 per ite... |

127 |
Nonlinear Optimization: Complexity Issues
- Vavasis
- 1991
(Show Context)
Citation Context ...ments in a reasonable amount of time. The complexity of finding the optimal hyperplane and its support vectors involves a form of quadratic programming and, as such, is NP-complete in the worst case (=-=Vavasis, 1991-=-), with typical running times of � the training set. Given the superlinear time dependence on the number of training examples, as well as the cost of obtaining labels for the examples in the first pla... |

77 | editors. Advances in Kernel Methods - Support Vector Learning
- SchÄolkopf, Burges, et al.
- 1998
(Show Context)
Citation Context ..., perhaps into a Yahoolike hierarchy. The architecture which we will apply to this problem is the Support Vector Machine. 1.2 Support Vector Machines Given a domains, a linear support vector machine (=-=Schölkopf et al., 1999-=-) is defined in terms of the hyperplane ¡£¢¥¤§¦£¨�©�� (1) corresponding to the decision function ��� ¤���©������ � ¡£¢�¤�¦�¨���� (2) for ¡������ and ¨���� . Given a set of labeled data � ©�� � ¤s���s�... |

3 |
Neural network algorithms that learn in polynomial time from examples and queries
- Baum, Lang
- 1991
(Show Context)
Citation Context ...ts for some problems decrease drastically. In special cases, even the computational requirements decrease, and some NP-complete learning problems become polynomial in computation time (Angluin, 1988; =-=Baum & Lang, 1991-=-). In this paper, we will focus on a form of active learning called selective sampling. In selective sampling, the learner is presented with a large corpus of unlabeled examples, and is given the opti... |

1 |
Quadratic optimizer for pattern recognition. Unpublished manuscript, German National Research Center for Information Technology. Available at http://svm.first.gmd.de/software/loqosurvey.html
- Smola
- 1998
(Show Context)
Citation Context ...a QP solver based on Joachims (1998a) to train the SVMs at each iteration for the USENET data. The solver used the working set strategy (Osuna et al., 1997) with four elements and the PR LOQO solver (=-=Smola, 1998-=-) without shrinking to solve each QP sub-problem. We used a version of Platt’s SMO algorithm (Keerthi et al., 1999) to train the SVMs for the Reuters data. Both algorithms produce comparable results o... |