Results 11  20
of
617
Modeling local coherence: An entitybased approach
 In Proceedings of ACL 2005
, 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entitybased representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract

Cited by 185 (14 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of automatic assessment of local coherence. We present a novel entitybased representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a stateoftheart coherence model. 1
Training Invariant Support Vector Machines
, 2002
"... Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide ..."
Abstract

Cited by 184 (16 self)
 Add to MetaCart
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the wellknown MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Proximal support vector machine classifiers
 Proceedings KDD2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract

Cited by 152 (16 self)
 Add to MetaCart
(Show Context)
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
Core vector machines: Fast SVM training on very large data sets
 Journal of Machine Learning Research
, 2005
"... Standard SVM training has O(m 3) time and O(m 2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel met ..."
Abstract

Cited by 133 (15 self)
 Add to MetaCart
(Show Context)
Standard SVM training has O(m 3) time and O(m 2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such “approximateness ” in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and realworld data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scaleup methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium–4 PC.
Automatic document metadata extraction using support vector machines
 In JCDL ’03: Proceedings of the 3rd ACM/IEEECS Joint Conference on Digital Libraries
, 2003
"... Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classificationbased method for metadata extraction from header part o ..."
Abstract

Cited by 129 (27 self)
 Add to MetaCart
(Show Context)
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classificationbased method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer[17] and EbizSearch[24]. We believe it can be generalized to other digital libraries. 1 Introduction and related work Interoperability is crucial to the effective use of Digital
Torch: A Modular Machine Learning Software Library
, 2002
"... Many scientific communities have expressed a growing interest in machine learning algorithms recently, mainly due to the generally good results they provide, compared to traditional statistical or AI approaches. However, these machine learning algorithms are often complex to implement and to use pro ..."
Abstract

Cited by 128 (22 self)
 Add to MetaCart
Many scientific communities have expressed a growing interest in machine learning algorithms recently, mainly due to the generally good results they provide, compared to traditional statistical or AI approaches. However, these machine learning algorithms are often complex to implement and to use properly and efficiently. We thus present in this paper a new machine learning software library in which most stateoftheart algorithms have already been implemented and are available in a unified framework, in order for scientists to be able to use them, compare them, and even extend them. More interestingly, this library is freely available under a BSD license and can be retrieved on the web by everyone.
Social tag prediction
 In SIGIR ’08
, 2008
"... In this paper, we look at the “social tag prediction ” problem. Given a set of objects, and a set of tags applied to those objects by users, can we predict whether a given tag could/should be applied to a particular object? We investigated this question using one of the largest crawls of the social ..."
Abstract

Cited by 111 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we look at the “social tag prediction ” problem. Given a set of objects, and a set of tags applied to those objects by users, can we predict whether a given tag could/should be applied to a particular object? We investigated this question using one of the largest crawls of the social bookmarking system del.icio.us gathered to date. For URLs in del.icio.us, we predicted tags based on page text, anchor text, surrounding hosts, and other tags applied to the URL. We found an entropybased metric which captures the generality of a particular tag and informs an analysis of how well that tag can be predicted. We also found that tagbased association rules can produce very highprecision predictions as well as giving deeper understanding into the relationships between tags. Our results have implications for both the study of tagging systems as potential information retrieval tools, and for the design of such systems.
Use of Support Vector Learning for Chunk Identification
 In Proceedings of CoNLL2000 and LLL2000
, 2000
"... In this paper, we explore the use of Support Vector Machines (SVMs) for CoNLL2000 shared task, chunk identification. SVMs are socalled large margin classifiers and are wellknown as their good generalization performance. We investigate how SVMs with a very large number of features perform with the ..."
Abstract

Cited by 110 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we explore the use of Support Vector Machines (SVMs) for CoNLL2000 shared task, chunk identification. SVMs are socalled large margin classifiers and are wellknown as their good generalization performance. We investigate how SVMs with a very large number of features perform with the classification task of chunk labelling.
A Parallel Mixture of SVMs for Very Large Scale Problems
, 2002
"... Support Vector Machines (SVMs) are currently the stateoftheart models for many classification problems but they suffer from the complexity of their training algorithm which is at least quadratic with respect to the number of examples. ..."
Abstract

Cited by 108 (0 self)
 Add to MetaCart
(Show Context)
Support Vector Machines (SVMs) are currently the stateoftheart models for many classification problems but they suffer from the complexity of their training algorithm which is at least quadratic with respect to the number of examples.
Lagrangian Support Vector Machines
, 2000
"... An implicit Lagrangian for the dual of a simple reformulation of the standard quadratic program of a linear support vector machine is proposed. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. Thi ..."
Abstract

Cited by 107 (11 self)
 Add to MetaCart
An implicit Lagrangian for the dual of a simple reformulation of the standard quadratic program of a linear support vector machine is proposed. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. This problem is solvable by an extremely simple linearly convergent Lagrangian support vector machine (LSVM) algorithm. LSVM requires the inversion at the outset of a single matrix of the order of the much smaller dimensionality of the original input space plus one. The full algorithm is given in this paper in 11 lines of MATLAB code without any special optimization tools such as linear or quadratic programming solvers. This LSVM code can be used "as is" to solve classification problems with millions of points. For example, 2 million points in 10 dimensional input space were classified by a linear surface in 82 minutes on a Pentium III 500 MHz notebook with 384 megabytes of memory (and additional swap space), and in 7 minutes on a 250 MHz UltraSPARC II processor with 2 gigabytes of memory. Other standard classification test problems were also solved. Nonlinear kernel classification can also be solved by LSVM. Although it does not scale up to very large problems, it can handle any positive semidefinite kernel and is guaranteed to converge.